CN113436726B

CN113436726B - Automatic lung pathological sound analysis method based on multi-task classification

Info

Publication number: CN113436726B
Application number: CN202110728236.7A
Authority: CN
Inventors: 许静; 张建雯; 吴彦峰
Original assignee: Nankai University
Current assignee: Nankai University
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2022-03-04
Anticipated expiration: 2041-06-29
Also published as: CN113436726A

Abstract

The invention discloses a lung pathological sound automatic analysis method based on multi-task classification, which relates to the technical field of lung pathological analysis and comprises the following steps: and inputting the extracted audio features into a multitask classification model of a convolutional neural network MobileNet V2, wherein the multitask classification model of the convolutional neural network MobileNet V2 comprises a task of outputting the audio features for lung pathological sound identification and a task of outputting the audio features for lung disease prediction. The invention adopts a multitask learning method to implicitly increase the training data volume and improves the generalization performance of the model through the domain knowledge of a plurality of label information of the same data, thereby improving the prediction accuracy of the multitask classification model of the convolutional neural network MobileNet V2, and in addition, the lightweight multitask classification model of the convolutional neural network MobileNet V2 has fewer parameters and smaller requirements on the computing capacity and the memory size of training equipment, so that the prediction classification task can be completed on mobile or embedded equipment.

Description

Automatic lung pathological sound analysis method based on multi-task classification

Technical Field

The invention relates to the technical field of lung pathology analysis, in particular to a lung pathology sound automatic analysis method based on multi-task classification.

Background

Studies have shown that exacerbations in the state of subjects with pulmonary conditions (e.g., asthma, Chronic Obstructive Pulmonary Disease (COPD), emphysema, cystic fibrosis, etc.) are characterized by a combination of aspects. Breathing defects cause dyspnea (shortness of breath) and coughing. In fact, increased dyspnea and increased sputum purulence and/or volume (which leads to increased coughing) are generally considered to be the most distinct or major symptoms of exacerbation of pulmonary disease.

The lung sound signal is a physiological sound signal generated by the respiratory system of a human body and the outside in the ventilation process, the generation mechanism is complex, and the physiological and pathological information is rich, and the method for screening and diagnosing the lung respiratory diseases by listening to the respiratory sound by using a stethoscope is a main method. However, the stethoscope-based diagnosis has some disadvantages, such as strong subjectivity, incapability of continuous monitoring, limitation of human hearing and memory, etc., when professional medical staff are needed to judge auscultation signals, the problems are particularly significant in poor areas and respiratory disease epidemic periods, and the automated analysis of lung sounds can provide auxiliary diagnosis so as to reduce the workload of professional medical staff, thereby having great significance for intelligent medical treatment.

At present, the automated analysis of lung sounds mainly comprises two major tasks: lung pathological sound identification and lung disease prediction. Lung sounds (breath sounds) are divided into normal sounds and pathological sounds, the types of lung pathological sounds are many, and the most common lung pathological sounds are divided into two types: crackle (crackles) and wheeze (wheezes). The main task of lung pathological sound identification is to judge whether a lung pathological sound exists in a section of lung sound signal, which is beneficial to screening early lung diseases; the lung disease prediction is to predict whether the patient has lung disease and the type of lung disease through the analysis of the lung sound signal. At present, the data volume of the existing lung sound data set is small, noise interference in lung sounds is large, so that relevant and irrelevant characteristics are difficult to distinguish for a single lung sound recognition task, the model generalization capability is weak, the classification performance is poor, the used network model is complex, parameters are large, the requirements on the computing capability and the memory size of training equipment are large, and the lung sound data set needs to be operated on a large-scale server.

The invention patent CN103417241B discloses a diagnosis instrument main machine, three lung sound probes with acoustoelectric sensors and a wireless electronic stethoscope; the method is characterized in that: the diagnostic instrument host comprises a computer for diagnosis and a signal amplifier; the signal amplifier is connected with a corresponding interface of the computer host through a lead; the three lung sound probes are connected with a signal output terminal of the signal amplifier through a lead; the wireless electronic stethoscope is connected with the corresponding interface of the computer host through wireless transmission. The lung sounds are collected for all respiratory diseases, and the simultaneous collection and automatic analysis of multiple regions of the lung sounds have important significance for detecting pathological additional lung sounds, so that the diagnosis and treatment of patients are greatly facilitated. The invention can really establish a clinically available lung sound characteristic analysis means, is applied to clinical diagnosis and treatment of diseases such as infantile pneumonia and the like, adds an objective diagnosis means for the diseases, and has important application prospect for children health and medical treatment. But the method has the defects of low calculation precision and ineffective lung pathological sound identification result and lung disease prediction result of a corresponding patient.

An effective solution to the problems in the related art has not been proposed yet.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a lung pathological sound automatic analysis method based on multi-task classification, which aims to overcome the technical problems that the data volume of the existing lung sound data set is small, the noise interference in lung sounds is large, so that the related and unrelated characteristics are difficult to distinguish for a single lung sound recognition task, the generalization capability of a model is weak, the classification performance is poor, a network model used in the prior art is complex, parameters are large, the requirements on the computing capability and the memory size of training equipment are large, and the operation on a large server is required.

The technical scheme of the invention is realized as follows:

a lung pathological sound automatic analysis method based on multitask classification comprises the following steps:

inputting the extracted audio features into a multitask classification model of a convolutional neural network MobileNet V2, wherein the multitask classification model of the convolutional neural network MobileNet V2 comprises a task of outputting and using the extracted audio features for lung pathological sound identification and a task of outputting and using the extracted audio features for lung disease prediction, and the method comprises the following steps:

the output is used for lung pathological sound identification task, and the method comprises the following steps:

input to two fully-connected layers of sizes 512 and 128, the ReLU6 activation function, used to increase the nonlinearity of the neural network model, and using the dropout parameter normalization method, used to prevent overfitting, the calculation formula for the fully-connected layers is as follows:

y_i＝W^Tx_i+b；

wherein, y_iIs the output vector of the full connection layer, x_iFor the input vector of the fully connected layer, W and b represent the parameters that the neural network needs to learn. The calculation formula of the ReLU activation function is as follows:

wherein x is the input of the linear correction unit ReLU activation function and y is the output of the linear correction unit ReLU activation function;

adding a softmax activation function layer to obtain a prediction result of the model for lung pathological sound category identification, and calculating the cross entropy loss of the lung sound identification task by using the prediction result and the lung sound label, wherein the expression is as follows:

where x is the input vector of the softmax layer, class_lWeight class, a label representing the pathological sounds of the lungs of the breathing cycle audio_l]Is the equilibrium weight of the respiratory cycle label class, xj]Representing an input vector corresponding to each category in the softmax layer;

the output is used for a lung disease prediction task, comprising the steps of:

adding a full connection layer, a ReLU activation function, a dropout parameter normalization method and a softmax activation function layer in advance to obtain a prediction result of the model on the patient suffering from the disease, and calculating the cross entropy loss of a patient suffering from the disease classification task, wherein the expression is as follows:

where x is the input vector of the softmax layer, class_dEight classes of labels, class, that indicate patient lung disease_d]Is the balance weight of each class, x [ j ]]Representing input vectors corresponding to each category in the softmax layer。

Further, the loss function of the multi-task classification model of the convolutional neural network MobileNetV2 is the sum of cross-entropy losses of each task, and the expression is as follows:

loss＝loss_l+loss_d；

further, the method also comprises the following steps:

the method comprises the steps of collecting lung sound audio data information in advance, preprocessing the lung sound audio data information, unifying breathing period audio segments with different lengths, and using the unified breathing period audio segments as input data of a multitask classification model of a convolutional neural network MobileNet V2;

performing labeling training data, including labeling the type of lung pathological sound and labeling the type of lung diseases;

extracting acoustic features, extracting the Mel frequency spectrogram features of each section of lung sound breathing cycle audio signal, obtaining a spectrogram from the audio signal through short-time Fourier transform, changing the spectrogram into a Mel frequency spectrogram through a Mel scale filter bank, and cutting off a full black empty part to obtain a spectrum feature part;

and obtaining a lung pathological sound identification result of the input respiratory cycle characteristic data and a prediction result of the lung disease of the corresponding patient based on a multi-task classification model of the convolutional neural network MobileNet V2.

Further, the lung sound audio data information preprocessing comprises the following steps:

cutting the lung sound audio data by taking a breathing cycle as a unit;

removing audio noise of the cut lung sound audio data on the basis of a fifth-order Butterworth band-pass filter;

the size of the denoised lung sound audio data is uniformly mapped to a range from-1 to 1 by using standard normalization, and the data is represented as:

and then, segmenting and repeating segment filling are carried out, so that the breathing cycle audio segments with different lengths are unified into a fixed length value and are used as input data of a multitask classification model of the convolutional neural network MobileNet V2.

Further, the acquisition of the spectrogram comprises the following steps:

framing and windowing the lung sound breathing period audio signal;

then Fourier transform is carried out on each frame;

the results of each frame are stacked along another dimension to obtain a spectrogram.

The invention has the beneficial effects that:

the invention relates to a lung pathological sound automatic analysis method based on multitask classification, which comprises the steps of extracting acoustic features by collecting lung sound audio data information in advance, extracting the Mel frequency spectrogram features of each section of lung sound breathing cycle audio signal, obtaining the frequency spectrum feature part, inputting a multitask classification model of a convolutional neural network MobileNetV2 to obtain the lung pathological sound recognition result of the input breathing cycle feature data and the prediction result of lung diseases of a corresponding patient, implicitly increasing the training data quantity by adopting a multitask learning method, improving the generalization performance of the model by the field knowledge of a plurality of label information of the same data, improving the prediction accuracy of the multitask classification model of the convolutional neural network MobileNetV2, using the lightweight multitask classification model of the convolutional neural network MobileNetV2, having fewer parameters and smaller requirements on the calculation capability and the memory size of training equipment, so that the predictive classification task can be done on a mobile or embedded device.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of a method for automated analysis of lung pathology sounds based on multi-task classification according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a neural network architecture of a method for automated analysis of lung pathology sounds based on multitask classification according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

According to an embodiment of the invention, a method for automated analysis of lung pathology sounds based on multi-task classification is provided.

As shown in fig. 1, the method for automatically analyzing lung pathological sounds based on multi-task classification according to an embodiment of the present invention includes the following steps:

collecting lung sound audio data information in advance, preprocessing the lung sound audio data information, unifying breathing period audio segments with different lengths, and using the unified breathing period audio segments as input data of a neural network;

and obtaining a lung pathological sound identification result of the input respiratory cycle characteristic data and a prediction result of lung diseases of a corresponding patient based on a light-weighted multi-task classification model of the convolutional neural network MobileNet V2.

Specifically, the method comprises the following steps:

the method comprises the following steps: and (4) preprocessing data.

1) And cutting the lung sound audio data by taking the breathing cycle as a unit. Since the sampling rate of the acquired lung sound data set is between 4 khz and 44.1 khz, the present solution uses down-sampling to normalize the audio frequency to 4 khz.

2) Because the acquired lung sound data contains more noise, a fifth-order Butterworth band-pass filter (Butterworth band-pass filter) is used for removing audio noise such as heartbeat sound and background conversation sound, and the Butterworth band-pass filter (Butterworth band-pass filter) enables a frequency response curve in a pass frequency band to be flat to the maximum extent without fluctuation, and gradually drops to zero in a stop frequency band.

3) The size of the data is uniformly mapped onto the-1 to 1 interval using standard normalization, expressed as:

and each dimension of the data is standardized to a specific interval, so that the convergence speed of the gradient descent method based model can be increased.

4) Setting a fixed input length value, here set to 8s, unifying the respiratory cycle audio segments of different lengths to the fixed length value by segmenting and repeating segment filling so as to extract features and then using the features as input data of the neural network.

Step two: and marking training data.

1) The training data are labeled with lung pathological sound related labels, and the labels are classified into four types, namely normal sound, abnormal sound with crackle sound (crackles), abnormal sound with wheeze sound (wheezes) and abnormal sound with crackle sound (crackles) and wheeze sound (wheezes).

2) The patients corresponding to the respiratory cycle of the lung sounds are found out and relevant labels of lung diseases are marked, and eight groups are respectively healthy (not diseased), suffering from bronchiolitis, suffering from lower respiratory tract infection, asthma, chronic obstructive pulmonary disease, bronchiectasis, suffering from upper respiratory tract infection and pneumonia.

Step three: audio features of the data are extracted.

Extracting Mel-spectral features (Mel-spectra) of each section of lung sound breathing cycle audio signal, and obtaining a spectrogram from the audio signal through Short Time Fourier Transform (STFT). The principle is that the audio signal is divided into frames and windowed, then Fourier transform is carried out on each frame, and then the results of each frame are stacked along the other dimension, so that a two-dimensional signal form similar to one graph, namely the spectrogram, can be obtained.

The obtained Mel frequency spectrum characteristic diagram shows that a plurality of high-frequency regions of the audio are obvious and completely black, and the learning of the neural network to the characteristics is interfered, so that the completely black hollow part is cut off to ensure that the neural network learns the effective frequency spectrum characteristic part.

Step four: the neural network is trained using lung sound data.

As shown in fig. 2, the neural network architecture is a multitask classification model based on a lightweight convolutional neural network MobileNetV2, and a lung pathological sound identification result of input respiratory cycle feature data and a prediction result of a lung disease of a corresponding patient are finally obtained.

Specifically, the neural network architecture comprises:

inputting the extracted audio features, namely Mel-spectral graph features (mel-spectral) graphs into a lightweight network MobileNet V2 module with pre-training weights on a large image data set ImageNet, wherein the step length of a Bottleneck module of a MobileNet V2 module is 1, the size of a convolution kernel is marked in a convolution layer, and then batch normalization and an activation function ReLU6 layer are set, and a ReLU6 activation function represents a common ReLU activation function but limits the maximum output value to 6, so that the low-precision figures of float16/int8 can be used in mobile terminal equipment, and the good numerical resolution can be achieved, and the precision loss can be avoided; followed by a depth separable convolution (depthwise separable convolution), for which the convolution kernel is applied to all input channels, unlike the standard convolution, which first uses a different convolution kernel for each input channel, that is, one convolution kernel for each input channel, and then combines the outputs again using the standard convolution, has an overall effect similar to that of a standard convolution but with a greatly reduced amount of computation and model parameters, followed by batch normalization, the ReLU6 activation function layer, the convolution layer, batch normalization, and the linear activation function, where the linear transformation is used instead of the ReLU6 activation function, and the loss of information by the non-linear activation layer can be avoided. The convolution operation of the bottleeck module increases the number of channels of the picture first and decreases last, in contrast to the usual residual block, in order to extract more channel information. Finally, the output and the original input are subjected to element addition. The Bottleneck module step size is 2, and since the output is not the same as the original output dimension, no element addition is performed.

In addition, the MobileNetV2 module used removed the last classifier layer for the MobileNet network, and the overall framework of the network is shown in table 1:

TABLE 1 network Whole frame table

Number of input channels	Operation of	t	c	n	s
						3	Conv2d	-	32	1	2
32	Bottleneck	1	16	1	1
						16	Bottleneck	6	24	2	2
24	Bottleneck	6	32	3	2
						32	Bottleneck	6	64	4	2
64	Bottleneck	6	96	3	1
						96	Bottleneck	6	160	3	2
160	Bottleneck	6	320	1	1
						320	Conv2d 1x1	-	1280	1	1

With the help of table 1 above, each row represents a series of operations and repeats n times, the bottleeck operation is shown in fig. 2, t represents the multiplication coefficient of the input channel of the bottleeck operation, i.e. the number of channels in the middle part is a multiple of the number of input channels, n represents the number of times the operation is repeated, c represents the number of output channels, s represents the step size when the module is repeated for the first time (the following repetition steps are all 1), and the convolution operation without convolution kernel is using 3 × 3 convolution kernels.

Because the abnormal condition of the lung sound is related to the lung disease of the patient, the two tasks are subjected to parameter sharing in the MobileNet V2 network module, and joint learning and parallel learning are carried out, so that the difference between the tasks and the connection between the tasks are considered.

The model is then split into two outputs, the first for the lung pathology recognition task, continuing into two fully connected layers of 512 and 128, the ReLU6 activation function, for increasing the neural network model's nonlinearity, and using the dropout parameter normalization method for preventing overfitting, the fully connected layers' calculation formula is as follows:

y_i＝W^Tx_i+b；

where x is the input of the linear modifying unit ReLU activation function and y is the output of the linear modifying unit ReLU activation function.

Finally, adding a softmax activation function layer to obtain a prediction result of the model for recognizing the lung pathological sound category, and calculating by using the prediction result and a lung sound label to obtain the cross entropy loss of the lung sound recognition task, wherein the expression is as follows:

where x is the input vector of the softmax layer, class_lFour types of labels representing lung pathological sounds of respiratory cycle audio are normal sounds (no abnormal sound), crackle sound (crackles) abnormal sound only, wheeze sound (wheezes) abnormal sound only, crackle sound (crackles) and wheeze sound (wheezes) abnormal sound simultaneously, and weight [ class ] respectively_l]The balance weight of the breathing cycle label category is obtained by negating the proportion of the samples of the current category to the total samples, and is used for relieving the problem of data imbalance caused by excessive normal lung sound samples and fewer abnormal lung sound samples, namely x [ j ]]Representing the input vector for each category in the softmax layer, j is taken from 1 to the number of categories 4.

And the second output is used for lung disease prediction tasks, the same structure is used, namely a full connection layer, a ReLU activation function, a dropout parameter normalization method and a softmax activation function layer are added, the prediction result of the model on the patient disease information is obtained, and the cross entropy loss of the patient disease information classification task is calculated by using the following expression:

where x is the input vector of the softmax layer, class_dEight types of labels representing lung diseases of patients, including healthy (not diseased), bronchiolitis, lower respiratory tract infection, asthma, chronic obstructive pulmonary disease, bronchiectasis, upper respiratory tract infection, pneumonia, and weight [ class ]_d]Is the balance weight of each category, which is obtained by negating the proportion of the samples in the current category to the total samples, and is used for relieving the data imbalance problems of excessive normal (non-diseased) samples, less samples with different diseases and larger proportion difference, namely xj]Representing the input vector for each category in the softmax layer, j is taken from 1 to the number of categories 8.

The parameters of the neural network in the two part architectures are not shared any more, so that the neural network learns the parameters of the two tasks which are different. The loss function of the neural network model is the sum of cross entropy losses of each task, and the expression is as follows:

loss＝loss_l+loss_d；

step five: and (5) performing predictive diagnosis on the examinee.

When the training is completed until the neural network converges, the updated neural network parameters can be used for prediction. Recording lung audio signals when an examiner (a person seeking a doctor) breathes by taking a breathing cycle as a unit, processing the audio signals according to the first step to obtain a mel-spectrum feature map, inputting the mel-spectrum feature map into a neural network, and outputting a prediction result of lung pathological sounds of the patient by the neural network.

In summary, with the aid of the above technical solution of the present invention, the light-weighted network-based multi-task classification model of MobileNetV2 is used to identify pathological lung sounds and lung diseases, and the main innovation point of the framework is to utilize the characteristic that abnormal conditions of respiratory sounds of patients are correlated with lung disease information to perform multi-task learning, and reduce the complexity of the model by using the light-weighted model, that is, the advantages are:

1. the multi-task classification model can effectively improve the lung sound identification accuracy rate for the following reasons:

1) the data is implicitly added. The multi-task learning effectively increases the number of training data, and because all tasks have certain noise, two tasks can be more generally represented by learning at the same time. If only lung pathology sound recognition is learned, the risk of overfitting is assumed, however, learning both lung pathology sound classification and lung disease classification can average the noise pattern, so that the model obtains a better representation of features at the parameter sharing layer.

2) Attention-focusing mechanisms. Since the acquired lung sound data is noisy, the data volume is small, the data dimensionality is high, it is difficult for the model to distinguish relevant from irrelevant features, and multitasking helps to focus the model on the features that really have an effect, because the task of identifying the patient lung disease can provide additional evidence for the relevance and irrelevance of the features.

3) An eavesdropping mechanism. If it is easy to learn some features x for the task of identifying a lung disease of a patient, which are difficult to learn for the task of identifying a lung pathology sound, which may be caused by the more complicated interaction of the task of identifying a lung pathology sound with the features x, or by other features hindering the learning of the features x, model eavesdropping, i.e. the task of identifying a lung pathology sound learns the features x using the task of predicting a lung disease, may be allowed by multi-task learning.

4) Indicating a paradoxical mechanism. Since a hypothetical space that performs well for a sufficient number of training tasks will also perform well for new tasks from the same environment, this helps the model to exhibit the ability to generalize to new tasks.

5) A regularization mechanism. The multi-task learning plays the same role as regularization by introducing induction bias, so that the risk of overfitting the model is reduced, and the capability of fitting random noise is reduced.

2. The model is based on the lightweight network MobileNet V2, the complexity of the model is low, the parameter quantity is small, only 13.88M is needed, the requirements on computing power and memory are low, tasks which need to be trained and predicted on a large server originally can be completed on mobile or embedded equipment, and the training and predicting speed is increased.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A lung pathological sound automatic analysis method based on multitask classification is characterized by comprising the following steps:

y_i＝W^Tx_i+b；

wherein, y_iIs the output vector of the full connection layer, x_iFor the input vector of the fully-connected layer, W and b represent parameters that the neural network needs to learn, and the ReLU activation function is expressed as:

the output is used for a lung disease prediction task, comprising the steps of:

where x is the input vector of the softmax layer, class_dWeight class, a label indicating the lung disease of a patient_d]Is the balance weight of each class, x [ j ]]Representing the input vector corresponding to each category in the softmax layer.

2. The method for the automated analysis of pulmonary pathology sounds based on multitask classification according to claim 1, characterized in that the loss function of the multitask classification model of the convolutional neural network MobileNetV2 is the sum of cross-entropy losses of each task, expressed as follows:

loss＝loss_l+loss_d；

3. the method for the automated analysis of pulmonary pathology sounds based on multitasking classification according to claim 2, characterized by the further steps of:

4. The method for the automated analysis of lung pathology sound based on multitask classification according to claim 3, characterized in that said lung sound audio data information preprocessing includes the following steps:

cutting the lung sound audio data by taking a breathing cycle as a unit;

5. The method for the automated analysis of pulmonary pathological sounds based on multitasking classification according to claim 4, characterized in that said acquisition of spectrogram includes the following steps:

framing and windowing the lung sound breathing period audio signal;

then Fourier transform is carried out on each frame;