CN112309423A

CN112309423A - Respiratory tract symptom detection method based on smart phone audio perception in driving environment

Info

Publication number: CN112309423A
Application number: CN202011216514.2A
Authority: CN
Inventors: 李凡; 吴玥; 解亚东; 杨松
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-02-02

Abstract

The invention discloses a respiratory tract symptom detection method based on smart phone audio perception in a driving environment. The method includes the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between a driver and passengers. The invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises, and can accurately and efficiently realize the detection and classification of three typical respiratory tract symptoms.

Description

Respiratory tract symptom detection method based on smart phone audio perception in driving environment

Technical Field

The invention relates to a respiratory tract symptom detection method, in particular to a respiratory tract symptom detection method based on audio sensing capability of a smart phone audio sensor, namely a loudspeaker and a microphone in a driving environment, which is mainly used for monitoring whether drivers and passengers have three typical respiratory tract symptoms of cough, sneeze and nose inhalation, and belongs to the technical field of mobile computing application.

Background

Among respiratory symptoms closely related to human health, coughing, sneezing and nose inhalation are the most common respiratory symptoms in daily life. Although these respiratory symptoms appear to be negligible, they do correlate with more than 100 diseases, such as common cold, flu, allergies, and more severe respiratory diseases such as pneumonia, asthma, chronic lung disease, and the like. These respiratory diseases are mostly curable, but still need to be discovered as early as possible, especially infectious respiratory diseases. Thus, detection of respiratory symptoms can help not only individuals to find health problems, but also to prevent infectious diseases, promoting public health development.

Currently, methods for detecting respiratory symptoms rely primarily on specialized medical equipment deployed in hospitals and medical facilities, connected to medical systems. For example, the respiration monitoring device is used for detecting the air flow in and out of the mouth of the patient to judge whether the patient coughs; the patient is tested for abnormal breathing conditions by mounting a device with an accelerometer to the chest of the patient.

However, these methods generally have problems of high cost, difficulty in deployment, and applicability only to hospitals and medical institutions, etc. In the field of mobile computing applications, there are several methods of detecting respiratory symptoms using audio sensors. For example, by having a microphone device worn by a user to collect sounds around the user, it is determined whether the user coughs; the microphone on the mobile phone of the user is used for collecting the sound around the user to judge whether the user has behaviors of coughing, sneezing, nose sucking and the like. However, these methods have problems such as poor interference resistance and applicability only to a relatively quiet indoor environment. In a driving environment, particularly in commercial vehicles such as taxies, due to the small space and the close distance between passengers and drivers, infectious respiratory diseases are easy to spread. Due to the noise in the driving environment and the difficulty in deploying dedicated equipment, the existing methods are not suitable for detecting respiratory symptoms such as coughing, sneezing and inhaling nose in the driving environment.

In view of the foregoing, there is a need for a method for detecting whether a driver and a passenger in a driving environment have respiratory symptoms by using an audio sensor in a driver smart phone.

Disclosure of Invention

The invention aims to solve the problems of high cost and low anti-interference performance of detecting respiratory symptoms of a driver and passengers in a driving environment, and provides a method for detecting respiratory symptoms of cough, sneeze, nose inhalation and the like of the driver or the passengers by using a smart phone audio sensor.

The core idea of the invention is as follows: the method comprises the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by using a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The method is particularly suitable for the driving environment with stable driving noise and short distance between the driver and the passenger in the small automobile.

The purpose of the invention is realized by the following technical scheme:

a respiratory tract symptom detection method based on smart phone raised audio perception in a driving environment comprises the following steps:

step 1: the method comprises the steps of collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy denoising method, namely an ABSE denoising method.

Specifically, the implementation method of step 1 is as follows:

step 1.1: the smart phone is placed in a vehicle to collect sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers.

Step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals (such as 2 to 10 segments) of the beginning part to perform Fast Fourier Transform (FFT), then calculating the average energy spectrum of the sub-segment sounds, and initializing the threshold value of ABSE.

Threshold T of ABSE_s＝μ_θ+α·σ_θ(ii) a Wherein the content of the first and second substances,

H_b(l) Is the ABSE value of the l sub-segment; and alpha represents a weight value and is selected according to an experimental result.

Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. And if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) to obtain a denoised sound signal of the sub-segment sound. And if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound.

Step 1.4: and (4) repeating the step 1.3 until all the sound signals are denoised. And filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency band, taking out sound segments containing cough, sneeze and nose inhalation sound segments in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors.

Step 2: and (3) extracting mixed acoustic features based on Mel cepstrum coefficient (MFCC) and gamma cepstrum coefficient (GFCC) of each frame from the denoised and marked signal frames obtained in the step 1, and training a classifier based on a long-time memory (LSTM) neural network by using the features.

Specifically, the implementation method of step 2 is as follows:

step 2.1: dividing each signal frame containing the respiratory tract symptom obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into an MFCC feature vector of the frame.

Step 2.2: dividing each signal frame containing respiratory tract symptoms obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into the GFCC characteristic vector of the frame.

Step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifier of the three respiratory symptom sounds in the driving environment.

And step 3: in practical application, a microphone of the smart phone in the vehicle is used for continuously collecting sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the denoised sound signals to enable each section of sound signals to be equal-length signal frames. And then, extracting the acoustic characteristics of each signal frame by using the method in the step 2.2, and sending the characteristics to a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.

Specifically, the implementation method of step 3 is as follows:

step 3.1: the speaker sampling rate of the user's handset is set to 48kHz, and the handset microphone continues to receive the sound signal in the car.

Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. If the total duration of a plurality of sound sub-segments exceeding the threshold exceeds the time threshold T _1, the sub-segments are divided into overlapped sub-frames with fixed length. If the total duration of a number of consecutive sound sub-segments exceeding the threshold is less than a further time threshold T _2, the sub-segment sum is discarded. If the total time length of a plurality of sound sub-sections exceeding the threshold value is more than T _2 and less than T _1, the sub-sections are expanded and the length is a fixed frame length. Each frame is filtered through a high pass filter.

Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the MFCC feature vector of the frame is calculated in step 2.1, then the GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a mixed feature vector of the frame, and then the mixed feature vector is sent to a trained LSTM network for classification, so as to determine whether the frame contains cough, sneeze or nose sucking behavior.

Advantageous effects

1. Compared with the prior art, the method can realize the detection of the respiratory symptoms of the driver and the passengers only by continuously receiving the sound signals in the driving environment through the microphone in the smart phone. Therefore, the invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between the driver and the passenger.

2. Aiming at the difference of the characteristics of sound signals of typical respiratory symptoms and driving noises, the invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises.

3. The method extracts the mixed acoustic features aiming at different sound signal features of the three typical respiratory symptoms, and accurately and efficiently realizes the detection and classification of the three typical respiratory symptoms by combining the neural network and the deep learning technology.

Drawings

FIG. 1 is a schematic diagram of the method of the present invention.

FIG. 2 shows the accuracy of different methods for detecting respiratory symptoms according to embodiments of the present invention.

FIG. 3 is a confusion matrix for different airway symptom detections according to an embodiment of the present invention.

FIG. 4 shows recall rates of different respiratory symptoms in different scenarios according to embodiments of the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the following examples and the accompanying drawings.

As shown in fig. 1, a respiratory tract symptom detection method based on smartphone audio perception in a driving environment includes the following steps:

step 1: a microphone of the smart phone is used for collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment, and a denoising method based on adaptive subband spectral entropy (ABSE) is designed for filtering automobile driving noise in the collected sound signals.

Step 1.1: 16 volunteers were recruited as drivers or passengers to drive or ride the test vehicles, the volunteers placed the smart phone in the vehicle and collected the sound signals of the three behaviors of coughing, sneezing and nose sucking during the driving of the vehicle.

Step 1.2: dividing each sound signal collected in the step 1.1 into non-overlapping sub-segments with the length of 0.2 second, taking the sound signals of the first 10 sub-segments, calculating the average energy spectrum E of the sound of the sub-segments after Fast Fourier Transform (FFT), and initializing the threshold T of ABSE_s＝μ_θ+α·σ_θWherein

H_b(l) Is the ABSE value of the/sub-segment. The weight α is 0.1.

Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. If it isAnd if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) on the subtracted signal to obtain the sound signal of the sub-segment sound after denoising. If the ABSE value of the sub-segment sound does not exceed the threshold, updating the average energy spectrum, i.e. E, according to the energy spectrum of the sub-segment sound_new＝0.7E+0.3E_currentIn which E_currentIs the energy spectrum of the current sub-segment.

Step 2: collecting audio signals generated when the gasoline automobile runs, and training a classifier based on a long-time and short-time memory neural network (LSTM).

Step 2.1: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into a 120-dimensional MFCC feature vector of the frame.

Step 2.2: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into a GFCC characteristic vector of 240 dimensions of the frame.

Step 2.3: and splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a 360-dimensional mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifiers of the three respiratory symptom sounds in the driving environment. The LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function. The timestamp value of the LSTM network is set to 6, i.e. each time the input is the eigenvector of the current subframe and the eigenvector of the 5 subframes before the current subframe. For the tth timeout, the LSTM layer uses the formula h_t＝δ(W₀[h_t-1,x_t+b₀])·tanh(S_t) Will input x_tMapping to a compressed vector h_tWherein W is₀And b₀Respectively representing a weight matrix and an offset vector, S_tRepresents the state of the tth timer, h_t-1Represents the compressed vector corresponding to the previous timestamp, and δ () represents the activation function. After training, three classifiers of typical respiratory symptoms are obtained.

And step 3: in practical application, a microphone of the smart phone in the vehicle continuously collects sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be frames with equal length. And then, extracting the acoustic features of each frame by using the method in the step 2.2, and sending the features into a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.

Step 3.1: in practical applications, the speaker sampling rate of the user's smartphone is set to 44.1kHz, and the smartphone microphone continuously receives sound signals from the vehicle interior.

Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. Recording the total time length of a plurality of continuous sound subsegments exceeding a threshold as d, and if d is more than 0.4 second, dividing the subsegment into subframes with the length of 0.4 second and the length of an overlapping area of 0.2 second; if d <0.2 seconds, discarding the sub-field sum; if 0.2< d <0.4, 1/2(0.4-d) seconds long sound signal is added to the sub-segment sum forward and backward, respectively, to be a frame of length 0.4 seconds. Each frame is passed through a high pass filter to filter out sounds below 800 Hz.

Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the 120-dimensional MFCC feature vector of the frame is calculated in step 2.1, then the 240-dimensional GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a 360-dimensional hybrid feature vector of the frame, and then the 360-dimensional hybrid feature vector is sent to a trained LSTM network for classification, so that whether the frame contains cough, sneeze or nose sucking behavior is judged.

Examples

In order to test the performance of the method, the method is compiled into an android application program which is deployed in android mobile phones of different models. And 16 volunteers were recruited as drivers and passengers, respectively, driving and riding the test vehicle in different real scenarios.

First, the overall accuracy of the method in a driving environment was tested. Figure 2 shows the overall accuracy of this method and two other methods of detecting respiratory symptoms (SymDetector and CoughSense). As can be seen from the figure, the overall accuracy of the method for detecting three typical respiratory symptoms is 93.91%, while the overall accuracy of the other two methods is only 70.55% and 67.64%, which fully indicates that the method has higher accuracy under the driving environment.

The accuracy of three typical respiratory symptom classifiers based on LSTM were then tested. Fig. 3 shows the confusion matrix for the classifier. As can be seen from the figure, the recognition accuracy of each respiratory symptom is over 93.64 percent, and the average recognition accuracy is 95.52 percent. Very little data is classified into wrong categories because some respiratory symptoms with small sound are easily classified into other categories when the smart phone is far away from the user, and the method is embodied with high accuracy.

And finally, testing the detection accuracy of the method under different driving scenes. FIG. 4 shows the recall rates detected for each type of respiratory symptom in city streets, highways, country roads and parking lots, from which it can be seen that the parking lots are most quiet and therefore the recall rates detected for the three types of respiratory symptoms in the area are highest; the driving noise on the expressway is large, and the unevenness of the country road easily causes the vehicle to bump, so the detection recall rate of the three respiratory symptoms in the two areas is slightly low. However, the recall rate of the detection of the three respiratory symptoms is not lower than 88.37% in all scenes, and the universality of the invention is high.

The above-described embodiments are further illustrative of the present invention and are not intended to limit the scope of the invention, which is to be accorded the widest scope consistent with the principles and spirit of the present invention.

Claims

1. Respiratory tract symptom detection method based on smart phone raised audio perception under driving environment is characterized by comprising the following steps:

step 1: collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy (ABSE) denoising method;

step 1.1: placing the smart phone in a vehicle, and collecting sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers;

step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals of the beginning part to perform fast Fourier transform, then calculating the average energy spectrum of the sub-segment sound, and initializing the threshold T of ABSE_s＝μ_θ+α·σ_θ；

Wherein the content of the first and second substances,

H_b(l) Is the ABSE value of the l sub-segment; alpha represents a weight value;

step 1.3: calculating the ABSE value of the sound signal of the next sub-section, and comparing the ABSE value with the threshold value obtained in the step 1.2; if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing inverse fast Fourier transform to obtain a sound signal of the sub-segment sound after denoising; if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound;

step 1.4: repeating the step 1.3 until all the sound signals are denoised; filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency range, taking out sound segments containing cough, sneeze and nose inhalation in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors;

step 2: for the denoised and marked signal frames obtained in the step 1, extracting mixed acoustic features of each frame based on a Mel cepstrum coefficient MFCC and a gamma cepstrum coefficient GFCC, and training a classifier based on a long-time and short-time memory LSTM neural network by using the features;

and step 3: in practical application, a microphone of a smart phone in a vehicle is used for continuously collecting sound signals in the vehicle; removing the automobile driving noise from the collected sound signals by using the method in the step 1, and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be equal-length signal frames; then, extracting the acoustic characteristics of each signal frame by using the method in the step 2, and sending the characteristics into a trained classifier for judgment; once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.

2. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 2 comprises the following steps:

step 2.1: dividing each signal frame containing a respiratory tract symptom signal obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC characteristics of each sub-frame, and splicing the first 10-dimensional MFCC characteristics of each sub-frame into an MFCC characteristic vector of the frame;

step 2.2: dividing each signal frame containing a respiratory tract symptom obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into GFCC characteristic vectors of the frame;

step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain classifiers of three respiratory symptom sounds in a driving environment;

the LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function; the timestamp value of the LSTM network is set to be 6, namely, the input of each time is the feature vector of the current subframe and the feature vectors of 5 subframes before the current subframe; for the tth timeout, the LSTM layer utilizes h_t＝δ(W₀[h_t-1,x_t+b₀])·tanh(S_t) Will input x_tMapping to a compressed vector h_tWherein W is₀And b₀Respectively representing a weight matrix and an offset vector, S_tRepresents the state of the tth timestamp, δ () represents the activation function; h is_t-1Representing the compressed vector corresponding to the previous timestamp.

3. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 3 comprises the following steps:

step 3.1: continuously receiving sound signals in the car by using a microphone of a mobile phone of a user;

step 3.2: for the sound signals collected in the step 3.1, firstly removing the driving noise in the collected sound signals, and selecting sound sub-segments with the ABSE value exceeding a threshold value; if the total duration of a plurality of sound sub-segments exceeding the threshold exceeds a time threshold T _1, dividing the sub-segments into overlapped sub-frames with fixed length; if the total duration of a plurality of sound sub-segments exceeding the threshold is less than another time threshold T _2, discarding the sub-segment sum; if the total duration of a plurality of sound sub-segments exceeding the threshold is greater than T _2 and less than T _1, expanding the sub-segments and the length to be a fixed frame length; each frame is then filtered through a high pass filter.

Step 3.3: and (3) calculating the MFCC characteristic vector of each signal frame with fixed length obtained in the step (3.2), then calculating the GFCC characteristic vector of the frame, splicing the two vectors into a mixed characteristic vector of the frame, sending the mixed characteristic vector into a trained LSTM network for classification, and judging whether the frame contains cough, sneeze or nose sucking behaviors.