CN115690072A - Chest radiography feature extraction and disease classification method based on multi-mode deep learning - Google Patents

Chest radiography feature extraction and disease classification method based on multi-mode deep learning Download PDF

Info

Publication number
CN115690072A
CN115690072A CN202211414106.7A CN202211414106A CN115690072A CN 115690072 A CN115690072 A CN 115690072A CN 202211414106 A CN202211414106 A CN 202211414106A CN 115690072 A CN115690072 A CN 115690072A
Authority
CN
China
Prior art keywords
image
text
data
chest
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211414106.7A
Other languages
Chinese (zh)
Inventor
寸天睿
徐爱迪
韩健
杨段生
沙政
赵治红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuxiong Normal University
Original Assignee
Chuxiong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chuxiong Normal University filed Critical Chuxiong Normal University
Priority to CN202211414106.7A priority Critical patent/CN115690072A/en
Publication of CN115690072A publication Critical patent/CN115690072A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a chest radiography feature extraction and disease classification method based on multi-mode deep learning, which mainly comprises the following steps: s1, data source acquisition; s2, preprocessing data; s3, fusing and matching image-text characteristics; s4, building a model; s5, training and optimizing a model; by adopting the image-text combined self-supervision model training method, the network model can be stably and quickly trained and reasoned under the condition of limited training data or small samples; the Transformer network structure is optimized, improved and designed, so that the overall characteristics of the chest X-ray film can be captured, and the method can be applied to the chest X-ray film analysis scene with the characteristics of small focus, irregular focus shape and the like.

Description

Chest radiography feature extraction and disease classification method based on multi-mode deep learning
Technical Field
The invention belongs to the technical field of intelligent medical treatment, and particularly relates to a chest radiography feature extraction and disease classification method based on multi-mode deep learning.
Background
The interpretation of medical images requires extensive medical expertise, but is prone to human discriminant errors. In countries with a large number of people, such as china, a specialist needs to interpret a large number of medical images in a short time, which is a tedious and time-consuming process. Therefore, if the disease type in the image can be automatically and accurately judged in a short time, a large quantity of medical images can be rapidly screened, and the labor intensity of clinical workers can be greatly reduced. In recent years, with the rapid development of deep learning techniques in the fields of computer vision, natural language processing, and the like, computer-aided diagnosis techniques based on artificial intelligence have attracted more and more attention in the industry. The rapid success of these fields has created a desire to provide more efficient and economical healthcare for patients. Among many imaging examinations, X-ray is widely used in china as compared with imaging examinations such as CT and MRI, and X-ray examination can be performed even in a health care center at the level of the village and the town. Therefore, the method has wide application prospect in automatically and accurately judging the disease type according to the X-ray film, and the research has great promotion effect on the development of intelligent medical treatment in China.
At present, a chest X-ray film automatic diagnosis technology based on deep learning mainly adopts a supervision model based on a Convolutional Neural Network (CNN), the used CNN such as a general CNN framework of AlexNet, resNet, VGG, denseNet, fasterR-CNN, inception V3, ***Net, mobileNet V2, SR, U-NET and variants thereof and the like or a CNN framework of CheXNet, tieNET and the like specially designed for X-ray films are supervised and trained and classified on Open-I, chestx-ray8, cheXpert, padChest, MIMIC-CXR and other public data sets. However, the accuracy of such supervision models is difficult to further improve due to the following reasons, and the generalization capability is limited: (1) The marking of the medical image needs professional knowledge, the marking difficulty of the image is high, the marking cost is high, data with complete marking is difficult to obtain, and a reference data set similar to the size of ImageNet in the natural image field is lacked, so that a supervision model with excellent performance in the natural image field is easy to be over-fitted during the training and use in the medical image field; (2) The existing data sets are extremely unbalanced in data quantity of different categories, and lack of confidence intervals is also an important reason for influencing precision; (3) The influence of the locality of the CNN convolution operation is limited when modeling the long-distance dependency, although the receptive field can be increased by deepening the number of convolution layers or using other improved convolution structures, the model computation complexity is increased a lot, and the method is not suitable for the requirement of a real medical scene on the diagnosis speed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a chest radiography feature extraction and disease classification method based on multi-mode deep learning, and the invention adopts a self-supervision model training method combining image and text, so that a network model can be stably and rapidly trained and reasoned under the condition of limited training data or small samples; in addition, a Transformer network structure is optimized, improved and designed, so that the overall characteristics of the chest X-ray film can be captured, and the method can be applied to the chest X-ray film analysis scene with the characteristics of small focus, irregular focus shape and the like.
In order to achieve the technical purpose, the invention is realized by the following technical scheme:
the chest radiography feature extraction and disease classification method based on multi-modal deep learning comprises the following steps:
s1: data source acquisition: collecting an open-source chest X-ray film data set and an open-source medical image question and answer data set;
s2: data preprocessing: carrying out data cleaning and format unification on the acquired data, and dividing the data set into image-text pairs and an image-only data set; constructing a training set and a testing set of the project;
s3: fusing and matching image-text characteristics: performing image-text feature matching and fusion by adopting contrast learning in an AutoEncoder mode; performing image-text feature fusion in a cross attention mode by adopting a Transformer-based mode;
s4: constructing a model: constructing by using the image-text characteristics extracted in the S3 and adopting a Pythroch deep learning frame;
s5: model training and optimization: and repeatedly training the deep learning model on the constructed data training set, iteratively optimizing the structure and parameters of the model, and creating a project model which can be used for clinic.
Preferably, the data preprocessing specifically comprises the following steps:
1) Data cleaning and format unification are carried out on the acquired data, an original chest film is obtained from a plurality of data sets, the chest film has various formats such as dicom, jpg and png, the resolution difference is large, and therefore the data are uniformly converted into a 255x255 jpg gray image, and meanwhile, an ambiguous image of pathological diagnosis is cleaned;
2) Dividing the data set into a graph-text pair data set (accounting for 40 percent of the total data) and a graph-only data set (accounting for 40 percent of the total data);
3) The training set and the test set of the project are divided according to the proportion of 80 percent to 20 percent;
preferably, the specific method for image-text feature matching and fusion in the AutoEncoder mode is as follows: adopting contrast learning to carry out image-text feature matching and fusion, inputting the chest film into an image encoder based on ResNet depth convolution neural network or Vision Transformer to carry out feature extraction to obtain h v Then, MLP mapping is carried out to obtain a feature v, and the text part adopts pre-trained ClientBERT to carry out vectorization of the medical report and character feature extraction to obtain h u Obtaining u by carrying out nonlinear mapping on MLP, and finally obtaining fusion aligned image-text characteristics by maximizing the consistency between real image-text representation pairs with existing bidirectional loss, wherein the fusion aligned image-text characteristics have abundant clinical semantic information vectors and are used for downstream classification tasks;
preferably, the image encoder, the convolutional neural network uses a ResNet50 architecture, and the Transformer uses an original ViT model; for a text encoder, a BERT encoder is used for carrying out maximum pooling convergence output on all output vectors of the last layer, and the text encoder adopts a trained ClinicalBERT weight on an MIMIC data set;
preferably, the specific method for performing image-text feature fusion in the Transformer mode comprises the following steps: adopting a Transformer-based mode to perform image-text feature fusion in a cross attention mode: realizing feature fusion by using a Transformer self-attention mechanism and a cross attention mechanism; the chest piece image part is segmented into 16x16 patches by using a vision transform processing mode, the patches are linearly mapped into image embedding, the image embedding is sent into a standard transform, feature extraction is carried out through self-attribute, high-dimensional word vector embedding is obtained through pre-trained ClientBERT of the text part, text features are obtained through self-attribute, and then fusion matching is carried out on the text features and the image features through cross attention to obtain features which can be used for downstream tasks;
preferably, the transformer adopts a standard 6-layer self-attention encoder to extract respective features of the picture and the text, and then performs feature fusion alignment through an improved cross attention layer; wherein, query of the cross attention layer is an image feature, and Value and Key are text features;
preferably, in S4, for image enhancement of the input image, an image expansion method provided by itself in torchvision is used: random cutting, horizontal turning, affine transformation, color dithering and Gaussian smoothing; considering the particularity of the chest film, only brightness and contrast adjustment is used in color dithering; for text data, simple uniform distribution of samples from pathological text sentences rather than word-specific sampling is employed, considering that sentence-level sampling can preserve semantic information.
The beneficial effects of the invention are:
by adopting the image-text combined self-monitoring model training method, the network model can be stably and quickly trained and inferred under the condition of limited training data or small samples; the transform network structure is optimized, improved and designed, so that the overall characteristics of the chest X-ray film can be captured, and the method can be applied to the chest X-ray film analysis scene with the characteristics of small focus, irregular focus shape and the like.
Drawings
FIG. 1 is a schematic diagram of the technical scheme of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The chest radiography feature extraction and disease classification method based on multi-modal deep learning comprises the following steps:
s1: data source acquisition: an open source chest X-ray data set was collected as in table 1; and an open-source medical image question-answer dataset, as in table 2;
s2: data preprocessing: carrying out data cleaning and format unification on the acquired data, and dividing the data set into image-text pairs and an image-only data set; constructing a training set and a testing set of the project;
s3: image-text feature fusion and matching: performing image-text feature matching and fusion by adopting comparison learning in an AutoEncoder mode; performing image-text feature fusion in a cross attention mode by adopting a Transformer-based mode;
s4: constructing a model: constructing by using the graphics and text characteristics extracted in the step S3 and adopting a Pythrch deep learning framework;
s5: model training and optimization: and repeatedly training the deep learning model on the constructed data training set, iteratively optimizing the structure and parameters of the model, and creating a project model which can be used for clinic.
Preferably, the data preprocessing specifically comprises the following steps:
1) Data cleaning and format unification are carried out on the acquired data, an original chest film is obtained from a plurality of data sets, the chest film has various formats such as dicom, jpg and png, the resolution difference is large, and therefore the data are uniformly converted into a 255x255 jpg gray image, and meanwhile, an ambiguous image of pathological diagnosis is cleaned;
2) Dividing the data set into a graph-text pair data set (accounting for 40 percent of the total data) and a graph-only data set (accounting for 40 percent of the total data);
3) Segmenting a training set and a testing set of the items according to the proportion of 80% to 20%;
preferably, the specific method for image-text feature matching and fusion in the AutoEncoder mode is as follows: adopting contrast learning to carry out image-text feature matching and fusion, inputting the chest film into an image encoder based on ResNet deep convolution neural network or Vision Transformer to carry out feature extraction to obtain h v Then, MLP mapping is carried out to obtain a feature v, and the text part adopts pre-trained ClientBERT to carry out vectorization of the medical report and character feature extraction to obtain h u Obtaining u by carrying out nonlinear mapping on MLP, and finally obtaining fusion aligned image-text characteristics by maximizing the consistency between real image-text representation pairs with existing bidirectional loss, wherein the fusion aligned image-text characteristics have abundant clinical semantic information vectors and are used for downstream classification tasks;
preferably, the image encoder, the convolutional neural network uses a ResNet50 architecture, and the Transformer uses an original ViT model; for a text encoder, a BERT encoder is used for carrying out maximum pooling convergence output on all output vectors of the last layer, and the text encoder adopts a trained ClinicalBERT weight on an MIMIC data set;
preferably, the specific method for performing image-text feature fusion in the Transformer mode is as follows: adopting a Transformer-based mode to perform image-text feature fusion in a cross attention mode: realizing feature fusion by using a Transformer self-attention mechanism and a cross attention mechanism; the chest piece image part is segmented into 16x16 patches by using a vision transform processing mode, the patches are linearly mapped into image embedding, the image embedding is sent into a standard transform, feature extraction is carried out through self-attribute, high-dimensional word vector embedding is obtained through pre-trained ClientBERT of the text part, text features are obtained through self-attribute, and then fusion matching is carried out on the text features and the image features through cross attention to obtain features which can be used for downstream tasks;
preferably, the transform adopts a standard 6-layer self-attention encoder to extract respective features of the picture and the text, and then performs feature fusion alignment through an improved cross-attention layer; wherein, query of the cross attention layer is an image feature, and Value and Key are text features;
preferably, in S4, for image enhancement of the input image, an image expansion method provided by itself in torchvision is used: random cutting, horizontal turning, affine transformation, color dithering and Gaussian smoothing; considering the particularity of the chest film, only brightness and contrast adjustment is used in color dithering; for text data, simple uniform distribution of samples from pathological text sentences rather than word-specific samples are employed, considering that sentence-level samples can retain semantic information.
TABLE 1 open source chest X-ray data set collected
Data set Number of X-ray films # number of reports # number of patients
Open-I 8121 3996 3996
NIHChest-XRay8 108948 0 32717
CheXpert 224316 0 65240
PadChest 160868 109931 67625
MIMIC-CXR 473057 206563 63478
TABLE 2 open-source medicine VQA data set collected
Data set Number of X-ray films # QA Log number
VQA-RAD 315 3515
RadVisDial 91060 455300
SLAKE 642 14K
TABLE 3 disease Classification accuracy
Figure BDA0003939089340000071

Claims (7)

1. The chest radiography feature extraction and disease classification method based on multi-mode deep learning is characterized by comprising the following steps of:
s1: data source acquisition: collecting an open-source chest X-ray film data set and an open-source medical image question and answer data set;
s2: data preprocessing: carrying out data cleaning and format unification on the acquired data, and dividing the data set into image-text pairs and an image-only data set; constructing a training set and a testing set of the project;
s3: image-text feature fusion and matching: performing image-text feature matching and fusion by adopting contrast learning in an AutoEncoder mode; performing image-text feature fusion in a cross attention mode by adopting a Transformer-based mode;
s4: model construction: constructing by using the image-text characteristics extracted in the S3 and adopting a Pythroch deep learning frame;
s5: model training and optimization: and repeatedly training the deep learning model on the constructed data training set, iteratively optimizing the structure and parameters of the model, and creating a project model which can be used for clinic.
2. The method for extracting features of chest radiographs and classifying diseases based on multi-modal deep learning according to claim 1, wherein the data preprocessing comprises the following specific steps:
1) Data cleaning and format unification are carried out on the acquired data, an original chest film is obtained from a plurality of data sets, the chest film has various formats such as dicom, jpg and png, the resolution difference is large, and therefore the data are uniformly converted into a 255x255 jpg gray image, and meanwhile, an ambiguous image of pathological diagnosis is cleaned;
2) Dividing the data set into a graph-text pair data set and a graph-only data set; wherein, the image-text pair data set accounts for 40% of the total data amount, and only the image-containing data set accounts for 40% of the total data amount;
3) The training set and test set of items were segmented at a ratio of 80% to 20%.
3. The chest radiograph feature extraction and disease classification method based on multi-modal deep learning of claim 1, wherein the specific method for image-text feature matching and fusion in the AutoEncoder mode is as follows: adopting contrast learning to carry out image-text feature matching and fusion, inputting the chest film into an image encoder based on ResNet depth convolution neural network or Vision Transformer to carry out feature extraction to obtain h v Then, MLP mapping is carried out to obtain a feature v, and the text part adopts pre-trained ClientBERT to carry out vectorization of the medical report and character feature extraction to obtain h u And also carrying out nonlinear mapping by MLP to obtain u, and finally obtaining fusion aligned image-text characteristics by maximizing the consistency between real image-text representation pairs with existing bidirectional loss, wherein the fusion aligned image-text characteristics have abundant clinical semantic information vectors and are used for downstream classification tasks.
4. The method of claim 3, wherein the image encoder, convolutional neural network uses ResNet50 architecture, and Transformer uses original ViT model; for the text encoder, a BERT encoder is used to perform maximum pooled aggregate output for all output vectors of the last layer, the text encoder using the trained ClinicalBERT weights on the MIMIC dataset.
5. The chest radiograph feature extraction and disease classification method based on multi-modal deep learning of claim 1, wherein the concrete method for image-text feature fusion in a Transformer mode is as follows: adopting a Transformer-based mode to perform image-text feature fusion in a cross attention mode: realizing feature fusion by using a Transformer self-attention mechanism and a cross attention mechanism; the chest piece image part is segmented into 16x16 patches by using a vision transform processing mode, the patches are linearly mapped into image embedding, the image embedding is sent into a standard transform, feature extraction is carried out through self-attribute, high-dimensional word vector embedding is obtained through pre-trained ClientBERT of the text part, text features are obtained through self-attribute, and then fusion matching is carried out on the text features and the image features through cross attention, so that features which can be used for downstream tasks are obtained.
6. The method for chest radiograph feature extraction and disease classification based on multi-modal deep learning as claimed in claim 5, wherein the transform uses a standard 6-layer self-attention encoder to extract the respective features of picture and text, and then performs feature fusion alignment through improved cross-attention layer; wherein Query of the cross attention layer is an image feature, and Value and Key are text features.
7. The method for feature extraction and disease classification of chest radiograph based on multi-modal deep learning as claimed in claim 1, wherein for the image augmentation of the input image in S4, the self-contained image augmentation method in torchvision is used: random cutting, horizontal turning, affine transformation, color dithering and Gaussian smoothing; considering the particularity of the chest film, only brightness and contrast adjustment is used in color dithering; for text data, simple uniform distribution of samples from pathological text sentences rather than word-specific samples are employed, considering that sentence-level samples can retain semantic information.
CN202211414106.7A 2022-11-11 2022-11-11 Chest radiography feature extraction and disease classification method based on multi-mode deep learning Pending CN115690072A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211414106.7A CN115690072A (en) 2022-11-11 2022-11-11 Chest radiography feature extraction and disease classification method based on multi-mode deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211414106.7A CN115690072A (en) 2022-11-11 2022-11-11 Chest radiography feature extraction and disease classification method based on multi-mode deep learning

Publications (1)

Publication Number Publication Date
CN115690072A true CN115690072A (en) 2023-02-03

Family

ID=85052277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211414106.7A Pending CN115690072A (en) 2022-11-11 2022-11-11 Chest radiography feature extraction and disease classification method based on multi-mode deep learning

Country Status (1)

Country Link
CN (1) CN115690072A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116052847A (en) * 2023-02-08 2023-05-02 中国人民解放军陆军军医大学第二附属医院 Chest radiography multi-abnormality recognition system, device and method based on deep learning
CN116403180A (en) * 2023-06-02 2023-07-07 上海几何伙伴智能驾驶有限公司 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning
CN116452600A (en) * 2023-06-15 2023-07-18 上海蜜度信息技术有限公司 Instance segmentation method, system, model training method, medium and electronic equipment
CN116502092A (en) * 2023-06-26 2023-07-28 国网智能电网研究院有限公司 Semantic alignment method, device, equipment and storage medium for multi-source heterogeneous data
CN117522877A (en) * 2024-01-08 2024-02-06 吉林大学 Method for constructing chest multi-disease diagnosis model based on visual self-attention

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116052847A (en) * 2023-02-08 2023-05-02 中国人民解放军陆军军医大学第二附属医院 Chest radiography multi-abnormality recognition system, device and method based on deep learning
CN116052847B (en) * 2023-02-08 2024-01-23 中国人民解放军陆军军医大学第二附属医院 Chest radiography multi-abnormality recognition system, device and method based on deep learning
CN116403180A (en) * 2023-06-02 2023-07-07 上海几何伙伴智能驾驶有限公司 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning
CN116403180B (en) * 2023-06-02 2023-08-15 上海几何伙伴智能驾驶有限公司 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning
CN116452600A (en) * 2023-06-15 2023-07-18 上海蜜度信息技术有限公司 Instance segmentation method, system, model training method, medium and electronic equipment
CN116452600B (en) * 2023-06-15 2023-10-03 上海蜜度信息技术有限公司 Instance segmentation method, system, model training method, medium and electronic equipment
CN116502092A (en) * 2023-06-26 2023-07-28 国网智能电网研究院有限公司 Semantic alignment method, device, equipment and storage medium for multi-source heterogeneous data
CN117522877A (en) * 2024-01-08 2024-02-06 吉林大学 Method for constructing chest multi-disease diagnosis model based on visual self-attention
CN117522877B (en) * 2024-01-08 2024-04-05 吉林大学 Method for constructing chest multi-disease diagnosis model based on visual self-attention

Similar Documents

Publication Publication Date Title
CN115690072A (en) Chest radiography feature extraction and disease classification method based on multi-mode deep learning
CN110503654A (en) A kind of medical image cutting method, system and electronic equipment based on generation confrontation network
WO2016192612A1 (en) Method for analysing medical treatment data based on deep learning, and intelligent analyser thereof
CN109583440A (en) It is identified in conjunction with image and reports the medical image aided diagnosis method edited and system
CN111863237A (en) Intelligent auxiliary diagnosis system for mobile terminal diseases based on deep learning
CN110490242B (en) Training method of image classification network, fundus image classification method and related equipment
CN106372390A (en) Deep convolutional neural network-based lung cancer preventing self-service health cloud service system
CN107767935A (en) Medical image specification processing system and method based on artificial intelligence
CN109935336A (en) A kind of the intelligent auxiliary diagnosis method and diagnostic system of children's division of respiratory disease disease
CN110503635B (en) Hand bone X-ray film bone age assessment method based on heterogeneous data fusion network
CN109920538B (en) Zero sample learning method based on data enhancement
CN111430025B (en) Disease diagnosis model training method based on medical image data augmentation
Gao et al. Joint disc and cup segmentation based on recurrent fully convolutional network
Ye et al. Medical image diagnosis of prostate tumor based on PSP-Net+ VGG16 deep learning network
CN110443105A (en) The immunofluorescence image kenel recognition methods of autoimmunity antibody
Feng et al. Deep learning for chest radiology: a review
CN116364227A (en) Automatic medical image report generation method based on memory learning
Wang et al. Cataract detection based on ocular B-ultrasound images by collaborative monitoring deep learning
CN116797609A (en) Global-local feature association fusion lung CT image segmentation method
CN116883768A (en) Lung nodule intelligent grading method and system based on multi-modal feature fusion
CN115147640A (en) Brain tumor image classification method based on improved capsule network
CN114093507A (en) Skin disease intelligent classification method based on contrast learning in edge computing network
Dong et al. Supervised learning-based retinal vascular segmentation by m-unet full convolutional neural network
CN110136113A (en) A kind of vagina pathology image classification method based on convolutional neural networks
CN115862837A (en) Medical visual question-answering method based on type reasoning and semantic constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination