CN112309423A - Respiratory tract symptom detection method based on smart phone audio perception in driving environment - Google Patents
Respiratory tract symptom detection method based on smart phone audio perception in driving environment Download PDFInfo
- Publication number
- CN112309423A CN112309423A CN202011216514.2A CN202011216514A CN112309423A CN 112309423 A CN112309423 A CN 112309423A CN 202011216514 A CN202011216514 A CN 202011216514A CN 112309423 A CN112309423 A CN 112309423A
- Authority
- CN
- China
- Prior art keywords
- sub
- sound
- frame
- sound signals
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 208000024891 symptom Diseases 0.000 title claims abstract description 57
- 238000001514 detection method Methods 0.000 title claims abstract description 20
- 210000002345 respiratory system Anatomy 0.000 title claims abstract description 16
- 230000008447 perception Effects 0.000 title claims abstract description 8
- 238000000034 method Methods 0.000 claims abstract description 45
- 230000000241 respiratory effect Effects 0.000 claims abstract description 40
- 206010011224 Cough Diseases 0.000 claims abstract description 23
- 238000001914 filtration Methods 0.000 claims abstract description 7
- 230000003595 spectral effect Effects 0.000 claims abstract description 7
- 238000013528 artificial neural network Methods 0.000 claims abstract description 6
- 230000005236 sound signal Effects 0.000 claims description 53
- 239000013598 vector Substances 0.000 claims description 41
- 238000001228 spectrum Methods 0.000 claims description 19
- 230000006399 behavior Effects 0.000 claims description 12
- 206010041232 sneezing Diseases 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 230000007613 environmental effect Effects 0.000 abstract description 2
- 208000023504 respiratory system disease Diseases 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 208000015181 infectious disease Diseases 0.000 description 2
- 230000002458 infectious effect Effects 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 1
- 206010020751 Hypersensitivity Diseases 0.000 description 1
- 208000019693 Lung disease Diseases 0.000 description 1
- 206010035664 Pneumonia Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000007815 allergy Effects 0.000 description 1
- 208000006673 asthma Diseases 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000005802 health problem Effects 0.000 description 1
- 206010022000 influenza Diseases 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 201000009240 nasopharyngitis Diseases 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Detecting, measuring or recording devices for evaluating the respiratory organs
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/68—Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
- A61B5/6887—Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
- A61B5/6898—Portable consumer electronic devices, e.g. music players, telephones, tablet computers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B2503/00—Evaluating a particular growth phase or type of persons or animals
- A61B2503/20—Workers
- A61B2503/22—Motor vehicles operators, e.g. drivers, pilots, captains
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Surgery (AREA)
- Veterinary Medicine (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Animal Behavior & Ethology (AREA)
- Epidemiology (AREA)
- Quality & Reliability (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Physiology (AREA)
- Pulmonology (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a respiratory tract symptom detection method based on smart phone audio perception in a driving environment. The method includes the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between a driver and passengers. The invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises, and can accurately and efficiently realize the detection and classification of three typical respiratory tract symptoms.
Description
Technical Field
The invention relates to a respiratory tract symptom detection method, in particular to a respiratory tract symptom detection method based on audio sensing capability of a smart phone audio sensor, namely a loudspeaker and a microphone in a driving environment, which is mainly used for monitoring whether drivers and passengers have three typical respiratory tract symptoms of cough, sneeze and nose inhalation, and belongs to the technical field of mobile computing application.
Background
Among respiratory symptoms closely related to human health, coughing, sneezing and nose inhalation are the most common respiratory symptoms in daily life. Although these respiratory symptoms appear to be negligible, they do correlate with more than 100 diseases, such as common cold, flu, allergies, and more severe respiratory diseases such as pneumonia, asthma, chronic lung disease, and the like. These respiratory diseases are mostly curable, but still need to be discovered as early as possible, especially infectious respiratory diseases. Thus, detection of respiratory symptoms can help not only individuals to find health problems, but also to prevent infectious diseases, promoting public health development.
Currently, methods for detecting respiratory symptoms rely primarily on specialized medical equipment deployed in hospitals and medical facilities, connected to medical systems. For example, the respiration monitoring device is used for detecting the air flow in and out of the mouth of the patient to judge whether the patient coughs; the patient is tested for abnormal breathing conditions by mounting a device with an accelerometer to the chest of the patient.
However, these methods generally have problems of high cost, difficulty in deployment, and applicability only to hospitals and medical institutions, etc. In the field of mobile computing applications, there are several methods of detecting respiratory symptoms using audio sensors. For example, by having a microphone device worn by a user to collect sounds around the user, it is determined whether the user coughs; the microphone on the mobile phone of the user is used for collecting the sound around the user to judge whether the user has behaviors of coughing, sneezing, nose sucking and the like. However, these methods have problems such as poor interference resistance and applicability only to a relatively quiet indoor environment. In a driving environment, particularly in commercial vehicles such as taxies, due to the small space and the close distance between passengers and drivers, infectious respiratory diseases are easy to spread. Due to the noise in the driving environment and the difficulty in deploying dedicated equipment, the existing methods are not suitable for detecting respiratory symptoms such as coughing, sneezing and inhaling nose in the driving environment.
In view of the foregoing, there is a need for a method for detecting whether a driver and a passenger in a driving environment have respiratory symptoms by using an audio sensor in a driver smart phone.
Disclosure of Invention
The invention aims to solve the problems of high cost and low anti-interference performance of detecting respiratory symptoms of a driver and passengers in a driving environment, and provides a method for detecting respiratory symptoms of cough, sneeze, nose inhalation and the like of the driver or the passengers by using a smart phone audio sensor.
The core idea of the invention is as follows: the method comprises the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by using a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The method is particularly suitable for the driving environment with stable driving noise and short distance between the driver and the passenger in the small automobile.
The purpose of the invention is realized by the following technical scheme:
a respiratory tract symptom detection method based on smart phone raised audio perception in a driving environment comprises the following steps:
step 1: the method comprises the steps of collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy denoising method, namely an ABSE denoising method.
Specifically, the implementation method of step 1 is as follows:
step 1.1: the smart phone is placed in a vehicle to collect sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers.
Step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals (such as 2 to 10 segments) of the beginning part to perform Fast Fourier Transform (FFT), then calculating the average energy spectrum of the sub-segment sounds, and initializing the threshold value of ABSE.
Threshold T of ABSEs=μθ+α·σθ(ii) a Wherein the content of the first and second substances, Hb(l) Is the ABSE value of the l sub-segment; and alpha represents a weight value and is selected according to an experimental result.
Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. And if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) to obtain a denoised sound signal of the sub-segment sound. And if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound.
Step 1.4: and (4) repeating the step 1.3 until all the sound signals are denoised. And filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency band, taking out sound segments containing cough, sneeze and nose inhalation sound segments in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors.
Step 2: and (3) extracting mixed acoustic features based on Mel cepstrum coefficient (MFCC) and gamma cepstrum coefficient (GFCC) of each frame from the denoised and marked signal frames obtained in the step 1, and training a classifier based on a long-time memory (LSTM) neural network by using the features.
Specifically, the implementation method of step 2 is as follows:
step 2.1: dividing each signal frame containing the respiratory tract symptom obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into an MFCC feature vector of the frame.
Step 2.2: dividing each signal frame containing respiratory tract symptoms obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into the GFCC characteristic vector of the frame.
Step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifier of the three respiratory symptom sounds in the driving environment.
And step 3: in practical application, a microphone of the smart phone in the vehicle is used for continuously collecting sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the denoised sound signals to enable each section of sound signals to be equal-length signal frames. And then, extracting the acoustic characteristics of each signal frame by using the method in the step 2.2, and sending the characteristics to a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
Specifically, the implementation method of step 3 is as follows:
step 3.1: the speaker sampling rate of the user's handset is set to 48kHz, and the handset microphone continues to receive the sound signal in the car.
Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. If the total duration of a plurality of sound sub-segments exceeding the threshold exceeds the time threshold T _1, the sub-segments are divided into overlapped sub-frames with fixed length. If the total duration of a number of consecutive sound sub-segments exceeding the threshold is less than a further time threshold T _2, the sub-segment sum is discarded. If the total time length of a plurality of sound sub-sections exceeding the threshold value is more than T _2 and less than T _1, the sub-sections are expanded and the length is a fixed frame length. Each frame is filtered through a high pass filter.
Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the MFCC feature vector of the frame is calculated in step 2.1, then the GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a mixed feature vector of the frame, and then the mixed feature vector is sent to a trained LSTM network for classification, so as to determine whether the frame contains cough, sneeze or nose sucking behavior.
Advantageous effects
1. Compared with the prior art, the method can realize the detection of the respiratory symptoms of the driver and the passengers only by continuously receiving the sound signals in the driving environment through the microphone in the smart phone. Therefore, the invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between the driver and the passenger.
2. Aiming at the difference of the characteristics of sound signals of typical respiratory symptoms and driving noises, the invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises.
3. The method extracts the mixed acoustic features aiming at different sound signal features of the three typical respiratory symptoms, and accurately and efficiently realizes the detection and classification of the three typical respiratory symptoms by combining the neural network and the deep learning technology.
Drawings
FIG. 1 is a schematic diagram of the method of the present invention.
FIG. 2 shows the accuracy of different methods for detecting respiratory symptoms according to embodiments of the present invention.
FIG. 3 is a confusion matrix for different airway symptom detections according to an embodiment of the present invention.
FIG. 4 shows recall rates of different respiratory symptoms in different scenarios according to embodiments of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the following examples and the accompanying drawings.
As shown in fig. 1, a respiratory tract symptom detection method based on smartphone audio perception in a driving environment includes the following steps:
step 1: a microphone of the smart phone is used for collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment, and a denoising method based on adaptive subband spectral entropy (ABSE) is designed for filtering automobile driving noise in the collected sound signals.
Step 1.1: 16 volunteers were recruited as drivers or passengers to drive or ride the test vehicles, the volunteers placed the smart phone in the vehicle and collected the sound signals of the three behaviors of coughing, sneezing and nose sucking during the driving of the vehicle.
Step 1.2: dividing each sound signal collected in the step 1.1 into non-overlapping sub-segments with the length of 0.2 second, taking the sound signals of the first 10 sub-segments, calculating the average energy spectrum E of the sound of the sub-segments after Fast Fourier Transform (FFT), and initializing the threshold T of ABSEs=μθ+α·σθWherein Hb(l) Is the ABSE value of the/sub-segment. The weight α is 0.1.
Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. If it isAnd if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) on the subtracted signal to obtain the sound signal of the sub-segment sound after denoising. If the ABSE value of the sub-segment sound does not exceed the threshold, updating the average energy spectrum, i.e. E, according to the energy spectrum of the sub-segment soundnew=0.7E+0.3EcurrentIn which EcurrentIs the energy spectrum of the current sub-segment.
Step 2: collecting audio signals generated when the gasoline automobile runs, and training a classifier based on a long-time and short-time memory neural network (LSTM).
Step 2.1: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into a 120-dimensional MFCC feature vector of the frame.
Step 2.2: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into a GFCC characteristic vector of 240 dimensions of the frame.
Step 2.3: and splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a 360-dimensional mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifiers of the three respiratory symptom sounds in the driving environment. The LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function. The timestamp value of the LSTM network is set to 6, i.e. each time the input is the eigenvector of the current subframe and the eigenvector of the 5 subframes before the current subframe. For the tth timeout, the LSTM layer uses the formula ht=δ(W0[ht-1,xt+b0])·tanh(St) Will input xtMapping to a compressed vector htWherein W is0And b0Respectively representing a weight matrix and an offset vector, StRepresents the state of the tth timer, ht-1Represents the compressed vector corresponding to the previous timestamp, and δ () represents the activation function. After training, three classifiers of typical respiratory symptoms are obtained.
And step 3: in practical application, a microphone of the smart phone in the vehicle continuously collects sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be frames with equal length. And then, extracting the acoustic features of each frame by using the method in the step 2.2, and sending the features into a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
Step 3.1: in practical applications, the speaker sampling rate of the user's smartphone is set to 44.1kHz, and the smartphone microphone continuously receives sound signals from the vehicle interior.
Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. Recording the total time length of a plurality of continuous sound subsegments exceeding a threshold as d, and if d is more than 0.4 second, dividing the subsegment into subframes with the length of 0.4 second and the length of an overlapping area of 0.2 second; if d <0.2 seconds, discarding the sub-field sum; if 0.2< d <0.4, 1/2(0.4-d) seconds long sound signal is added to the sub-segment sum forward and backward, respectively, to be a frame of length 0.4 seconds. Each frame is passed through a high pass filter to filter out sounds below 800 Hz.
Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the 120-dimensional MFCC feature vector of the frame is calculated in step 2.1, then the 240-dimensional GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a 360-dimensional hybrid feature vector of the frame, and then the 360-dimensional hybrid feature vector is sent to a trained LSTM network for classification, so that whether the frame contains cough, sneeze or nose sucking behavior is judged.
Examples
In order to test the performance of the method, the method is compiled into an android application program which is deployed in android mobile phones of different models. And 16 volunteers were recruited as drivers and passengers, respectively, driving and riding the test vehicle in different real scenarios.
First, the overall accuracy of the method in a driving environment was tested. Figure 2 shows the overall accuracy of this method and two other methods of detecting respiratory symptoms (SymDetector and CoughSense). As can be seen from the figure, the overall accuracy of the method for detecting three typical respiratory symptoms is 93.91%, while the overall accuracy of the other two methods is only 70.55% and 67.64%, which fully indicates that the method has higher accuracy under the driving environment.
The accuracy of three typical respiratory symptom classifiers based on LSTM were then tested. Fig. 3 shows the confusion matrix for the classifier. As can be seen from the figure, the recognition accuracy of each respiratory symptom is over 93.64 percent, and the average recognition accuracy is 95.52 percent. Very little data is classified into wrong categories because some respiratory symptoms with small sound are easily classified into other categories when the smart phone is far away from the user, and the method is embodied with high accuracy.
And finally, testing the detection accuracy of the method under different driving scenes. FIG. 4 shows the recall rates detected for each type of respiratory symptom in city streets, highways, country roads and parking lots, from which it can be seen that the parking lots are most quiet and therefore the recall rates detected for the three types of respiratory symptoms in the area are highest; the driving noise on the expressway is large, and the unevenness of the country road easily causes the vehicle to bump, so the detection recall rate of the three respiratory symptoms in the two areas is slightly low. However, the recall rate of the detection of the three respiratory symptoms is not lower than 88.37% in all scenes, and the universality of the invention is high.
The above-described embodiments are further illustrative of the present invention and are not intended to limit the scope of the invention, which is to be accorded the widest scope consistent with the principles and spirit of the present invention.
Claims (3)
1. Respiratory tract symptom detection method based on smart phone raised audio perception under driving environment is characterized by comprising the following steps:
step 1: collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy (ABSE) denoising method;
step 1.1: placing the smart phone in a vehicle, and collecting sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers;
step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals of the beginning part to perform fast Fourier transform, then calculating the average energy spectrum of the sub-segment sound, and initializing the threshold T of ABSEs=μθ+α·σθ;
Wherein the content of the first and second substances,Hb(l) Is the ABSE value of the l sub-segment; alpha represents a weight value;
step 1.3: calculating the ABSE value of the sound signal of the next sub-section, and comparing the ABSE value with the threshold value obtained in the step 1.2; if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing inverse fast Fourier transform to obtain a sound signal of the sub-segment sound after denoising; if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound;
step 1.4: repeating the step 1.3 until all the sound signals are denoised; filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency range, taking out sound segments containing cough, sneeze and nose inhalation in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors;
step 2: for the denoised and marked signal frames obtained in the step 1, extracting mixed acoustic features of each frame based on a Mel cepstrum coefficient MFCC and a gamma cepstrum coefficient GFCC, and training a classifier based on a long-time and short-time memory LSTM neural network by using the features;
and step 3: in practical application, a microphone of a smart phone in a vehicle is used for continuously collecting sound signals in the vehicle; removing the automobile driving noise from the collected sound signals by using the method in the step 1, and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be equal-length signal frames; then, extracting the acoustic characteristics of each signal frame by using the method in the step 2, and sending the characteristics into a trained classifier for judgment; once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
2. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 2 comprises the following steps:
step 2.1: dividing each signal frame containing a respiratory tract symptom signal obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC characteristics of each sub-frame, and splicing the first 10-dimensional MFCC characteristics of each sub-frame into an MFCC characteristic vector of the frame;
step 2.2: dividing each signal frame containing a respiratory tract symptom obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into GFCC characteristic vectors of the frame;
step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain classifiers of three respiratory symptom sounds in a driving environment;
the LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function; the timestamp value of the LSTM network is set to be 6, namely, the input of each time is the feature vector of the current subframe and the feature vectors of 5 subframes before the current subframe; for the tth timeout, the LSTM layer utilizes ht=δ(W0[ht-1,xt+b0])·tanh(St) Will input xtMapping to a compressed vector htWherein W is0And b0Respectively representing a weight matrix and an offset vector, StRepresents the state of the tth timestamp, δ () represents the activation function; h ist-1Representing the compressed vector corresponding to the previous timestamp.
3. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 3 comprises the following steps:
step 3.1: continuously receiving sound signals in the car by using a microphone of a mobile phone of a user;
step 3.2: for the sound signals collected in the step 3.1, firstly removing the driving noise in the collected sound signals, and selecting sound sub-segments with the ABSE value exceeding a threshold value; if the total duration of a plurality of sound sub-segments exceeding the threshold exceeds a time threshold T _1, dividing the sub-segments into overlapped sub-frames with fixed length; if the total duration of a plurality of sound sub-segments exceeding the threshold is less than another time threshold T _2, discarding the sub-segment sum; if the total duration of a plurality of sound sub-segments exceeding the threshold is greater than T _2 and less than T _1, expanding the sub-segments and the length to be a fixed frame length; each frame is then filtered through a high pass filter.
Step 3.3: and (3) calculating the MFCC characteristic vector of each signal frame with fixed length obtained in the step (3.2), then calculating the GFCC characteristic vector of the frame, splicing the two vectors into a mixed characteristic vector of the frame, sending the mixed characteristic vector into a trained LSTM network for classification, and judging whether the frame contains cough, sneeze or nose sucking behaviors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011216514.2A CN112309423A (en) | 2020-11-04 | 2020-11-04 | Respiratory tract symptom detection method based on smart phone audio perception in driving environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011216514.2A CN112309423A (en) | 2020-11-04 | 2020-11-04 | Respiratory tract symptom detection method based on smart phone audio perception in driving environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112309423A true CN112309423A (en) | 2021-02-02 |
Family
ID=74325622
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011216514.2A Pending CN112309423A (en) | 2020-11-04 | 2020-11-04 | Respiratory tract symptom detection method based on smart phone audio perception in driving environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112309423A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951267A (en) * | 2021-02-23 | 2021-06-11 | 恒大新能源汽车投资控股集团有限公司 | Passenger health monitoring method and vehicle-mounted terminal |
JP2023018658A (en) * | 2021-07-27 | 2023-02-08 | 上海交通大学医学院付属第九人民医院 | Difficult airway evaluation method and device based on machine learning voice technology |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103413113A (en) * | 2013-01-15 | 2013-11-27 | 上海大学 | Intelligent emotional interaction method for service robot |
US20160210988A1 (en) * | 2015-01-19 | 2016-07-21 | Korea Institute Of Science And Technology | Device and method for sound classification in real time |
CN110383375A (en) * | 2017-02-01 | 2019-10-25 | 瑞爱普健康有限公司 | Method and apparatus for the cough in detection noise background environment |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110719553A (en) * | 2018-07-13 | 2020-01-21 | 国际商业机器公司 | Smart speaker system with cognitive sound analysis and response |
CN110853620A (en) * | 2018-07-25 | 2020-02-28 | 音频分析有限公司 | Sound detection |
-
2020
- 2020-11-04 CN CN202011216514.2A patent/CN112309423A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103413113A (en) * | 2013-01-15 | 2013-11-27 | 上海大学 | Intelligent emotional interaction method for service robot |
US20160210988A1 (en) * | 2015-01-19 | 2016-07-21 | Korea Institute Of Science And Technology | Device and method for sound classification in real time |
CN110383375A (en) * | 2017-02-01 | 2019-10-25 | 瑞爱普健康有限公司 | Method and apparatus for the cough in detection noise background environment |
CN110719553A (en) * | 2018-07-13 | 2020-01-21 | 国际商业机器公司 | Smart speaker system with cognitive sound analysis and response |
CN110853620A (en) * | 2018-07-25 | 2020-02-28 | 音频分析有限公司 | Sound detection |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
Non-Patent Citations (1)
Title |
---|
张科等: "基于融合特征以及卷积神经网络的环境声音分类***研究", 《西北工业大学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112951267A (en) * | 2021-02-23 | 2021-06-11 | 恒大新能源汽车投资控股集团有限公司 | Passenger health monitoring method and vehicle-mounted terminal |
JP2023018658A (en) * | 2021-07-27 | 2023-02-08 | 上海交通大学医学院付属第九人民医院 | Difficult airway evaluation method and device based on machine learning voice technology |
JP7291319B2 (en) | 2021-07-27 | 2023-06-15 | 上海交通大学医学院付属第九人民医院 | Evaluation method and apparatus for difficult airway based on speech technique by machine learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112309423A (en) | Respiratory tract symptom detection method based on smart phone audio perception in driving environment | |
CN104916289A (en) | Quick acoustic event detection method under vehicle-driving noise environment | |
CN102394062B (en) | Method and system for automatically identifying voice recording equipment source | |
Vij et al. | Smartphone based traffic state detection using acoustic analysis and crowdsourcing | |
CN109816987B (en) | Electronic police law enforcement snapshot system for automobile whistling and snapshot method thereof | |
CN110600054B (en) | Sound scene classification method based on network model fusion | |
CN109949823A (en) | A kind of interior abnormal sound recognition methods based on DWPT-MFCC and GMM | |
CN109965889B (en) | Fatigue driving detection method by using smart phone loudspeaker and microphone | |
CN107179119A (en) | The method and apparatus of sound detection information and the vehicle including the device are provided | |
CN115052761B (en) | Method and device for detecting tire abnormality | |
CN102499699A (en) | Vehicle-mounted embedded-type road rage driving state detection device based on brain electrical signal and method | |
CN110880328B (en) | Arrival reminding method, device, terminal and storage medium | |
CN109741609B (en) | Motor vehicle whistling monitoring method based on microphone array | |
CN104361887A (en) | Quick acoustic event detection system under traveling noise environment | |
Lee et al. | Acoustic hazard detection for pedestrians with obscured hearing | |
CN113793624B (en) | Acoustic scene classification method | |
Kubo et al. | Design of ultra low power vehicle detector utilizing discrete wavelet transform | |
Qi et al. | A low-cost driver and passenger activity detection system based on deep learning and multiple sensor fusion | |
CN206671813U (en) | Pure electric or hybrid pedestrian caution sound control system | |
Sobreira-Seoane et al. | Automatic classification of traffic noise | |
CN112230208B (en) | Automobile running speed detection method based on smart phone audio perception | |
Joshi et al. | Information fusion based learning for frugal traffic state sensing | |
CN109389994A (en) | Identification of sound source method and device for intelligent transportation system | |
CN110956977A (en) | Real-time positioning system and method for automobile whistling | |
CN202355417U (en) | Vehicular embedded road-rage driving state detection device based on electroencephalogram signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210202 |
|
RJ01 | Rejection of invention patent application after publication |