CN115132275A - Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network - Google Patents

Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network Download PDF

Info

Publication number
CN115132275A
CN115132275A CN202210583718.2A CN202210583718A CN115132275A CN 115132275 A CN115132275 A CN 115132275A CN 202210583718 A CN202210583718 A CN 202210583718A CN 115132275 A CN115132275 A CN 115132275A
Authority
CN
China
Prior art keywords
lung
dimensional
neural network
model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210583718.2A
Other languages
Chinese (zh)
Other versions
CN115132275B (en
Inventor
赵世杰
刘卓岩
韩军伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210583718.2A priority Critical patent/CN115132275B/en
Publication of CN115132275A publication Critical patent/CN115132275A/en
Application granted granted Critical
Publication of CN115132275B publication Critical patent/CN115132275B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Apparatus For Radiation Diagnosis (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for predicting EGFR gene mutation state based on an end-to-end three-dimensional convolution neural network, which provides end-to-end global feature extraction processing for lungs based on a densely connected three-dimensional convolution structure, and simultaneously captures local lung nodule features by utilizing a proposed multi-scale cavity asymmetric module. In a small range, the lung nodule characteristics with different sizes, directions and angles are processed by utilizing a multi-scale hole asymmetric module, each part of details in the input three-dimensional image is searched, and the details are densely connected from small to large. In a large range, micro features of different stages and different scales of a target are combined and processed by using a dense network, so that comprehensive features of the lung are extracted, deeper information of the lung is extracted by using the idea of feature transformation and channel fusion, and finally, a prediction result of a model is obtained through a full connection layer and an activation function layer.

Description

Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network
Technical Field
The invention belongs to the field of computer vision, relates to a method for predicting EGFR gene mutation state based on an end-to-end three-dimensional convolutional neural network, and particularly relates to a method for extracting features from CT images and predicting genes by utilizing a neural network architecture.
Background
Currently, common methods for diagnosing EGFR mutation status include tissue biopsy and fluid biopsy, but there are many limitations including: due to the heterogeneity of tumors, biopsies may have sampling defects; biopsy needs to meet the requirements of invasive biopsy; biopsy may increase the potential risk of cancer metastasis. Among these, tissue biopsy may fail due to low tissue quality and is relatively costly; liquid biopsy can extract peripheral blood for detection rather than tumor tissue, but may suffer from low or no concentration of ctDNA.
With the development of deep learning, some neural network-based models also appear, but most of the methods rely on accurate annotation of tumor boundaries by experienced doctors or radiologists, are time-consuming and labor-consuming, and may introduce some subjective errors into annotation results. Although some of the methods that are currently available relax the requirements for data annotation, a radiologist is still required to perform a coarse localization of the lung nodules. Most importantly, the extracted features only come from the interior of the nodule and annotate the edges of the tumor, and other important information including the relative location of the tumor, the size of the tumor, and the interactions between different lung regions are ignored.
The existing method only considers the information of the interior and the edge of the nodule, depends on the result labeled by an expert, and cannot utilize all effective information of the lung.
Disclosure of Invention
Technical problem to be solved
In order to avoid the defects of the prior art, the invention provides a method for predicting the EGFR gene mutation state based on an end-to-end three-dimensional convolution neural network, which combines a multi-scale cavity convolution module and an asymmetric three-dimensional convolution module to extract the complete lung CT characteristics of a lung adenocarcinoma patient, transforms and recombines the complete lung CT characteristics in a high-dimensional space, and predicts the EGFR gene mutation state of the lung adenocarcinoma patient by using complete and effective lung information.
Technical scheme
A method for predicting EGFR gene mutation state based on an end-to-end three-dimensional convolutional neural network is characterized by comprising the following steps:
step 1: preprocessing an original three-dimensional lung CT image, emptying contents except a lung area in the CT image, and scaling the CT to be uniform in size, namely 112 multiplied by 90, in a spline interpolation mode;
step 1 a: segmenting the lung through a U-Net model, and setting the region outside the lung area to be zero;
step 1 b: obtaining a preprocessing result in a spline interpolation mode, namely obtaining an image which is 112 multiplied by 90 and is positioned in the center of a visual field, ensuring that each slice in the X-axis, Y-axis and Z-axis directions can cut the lung, and keeping the detail characteristics of the lung;
step 2: constructing a neural network structure: adopting a multilayer convolution neural network with dense connection for extracting the overall characteristics of the target and fusing multi-scale characteristics;
step 2 a: the proposed method is based on Dense Block, four layers of densely connected convolutional neural networks are built, the output of each layer is connected in channel dimension and is used as the input of the next layer of convolution;
and step 2 b: in the connection process of layers, a bottleneck module for inter-channel feature fusion and dimension reduction is introduced, specifically, feature distribution is improved into normal distribution with the mean value of 0 and the variance of 1 through batch normalization, and the effect of dimension reduction is achieved through convolution of a linear rectification function and 1 multiplied by 1;
and step 2 c: embedding a multi-scale multi-expansion asymmetric cavity convolution module in a baseline model for capturing local tiny characteristics and paying attention to lung nodules appearing in the lung of a lung cancer patient in different directions, different sizes and different angles;
and step 3: training a neural network structure in an end-to-end mode, namely training the neural network structure by using a cross entropy loss function, and using a random gradient descent SGD (serving as an optimizer of a model), wherein the momentum is 0.9;
in the training process, the batch processing size is set to be 6, the iteration times of the model are set to be 300, and all learning rates are set to be 0.01; the EGFR mutant is defined as 1 and the EGFR wild type is defined as 0, i.e., the closer the model output result is to 1, the more easily it is determined as the EGFR mutant. The weight decay on the l2 regularization coefficients was set to 0.0004 to prevent overfitting;
and 4, step 4: and inputting the three-dimensional lung CT image data into the trained end-to-end three-dimensional convolutional neural network, inputting the data into a CT image of the whole lung, and outputting the data as a prediction result, namely predicting the EGFR gene mutation state.
Advantageous effects
The invention provides a method for predicting EGFR gene mutation state based on an end-to-end three-dimensional convolution neural network, which provides end-to-end global feature extraction processing for lungs based on a densely connected three-dimensional convolution structure, and simultaneously captures local lung nodule features by using a proposed multi-scale cavity asymmetric module. In a small range, the lung nodule features with different sizes, directions and angles are processed by utilizing a multi-scale hole asymmetric module, each part of detail in the input three-dimensional image is searched, and the detail is densely connected from small to large. In a large range, micro features of different stages and different scales of a target are combined and processed by using a dense network, so that comprehensive features of the lung are extracted, deeper information of the lung is extracted by using the idea of feature transformation and channel fusion, and finally, a prediction result of a model is obtained through a full connection layer and an activation function layer.
In particular, the proposed method contributes to the following: firstly, the proposed method is a method which does not need any pre-labeling step for the first time in the field of EGFR mutation state prediction historically, thereby greatly reducing the pressure of doctors and avoiding the introduction of manual errors. The model directly learns the characteristics of a complete three-dimensional CT image and predicts the EGFR mutation state of a lung adenocarcinoma patient in an end-to-end mode. Second, the present invention uses two intact lungs imported in three-dimensional space for the first time to predict EGFR gene mutation status, and in experiments it was demonstrated that EGFR mutation status is not only manifested in lung nodules but also in the entire lungs of patients. Thirdly, the proposed model is composed of dense connection modules composed of three-dimensional asymmetric convolution and three-dimensional multi-expansion dense convolution, the application of the modules can enable a network to capture lung nodule information in different directions, and the application of the three-dimensional multi-expansion blocks supports the model to expand the receptive field of the model under the condition of not losing resolution, so that the multi-scale context information of CT is captured, and the prediction performance of the model is further improved.
The invention successfully realizes the function of predicting the EGFR gene mutation state of a lung adenocarcinoma patient end to end from a three-dimensional CT image by utilizing a neural network, and has the following advantages in the field of EGFR mutation states:
the invention provides a deep learning model, namely a three-dimensional densely-communicated asymmetric convolution and multi-expansion density network for noninvasive prediction of EGFR mutation states of lung adenocarcinoma patients. At the same time a new view to study the EGFR mutation status was also innovatively proposed, i.e. information about EGFR status is present in the intact double lungs, not just in the lung nodules. Because the deep learning model requires the dimension of input data to be consistent, the invention provides a method for processing the problem of inconsistent CT image dimensions.
Firstly, compared with the traditional EGFR mutation state prediction model, the method provided by the invention does not need a professional radiologist to carry out edge marking on the nodule, and even does not need any rough positioning, thereby having wider application scenes. Meanwhile, the proposed method skillfully combines the target global feature processor and the lung nodule local feature extractor together and is embedded into a target three-dimensional prediction network, so that the problem of large-range search is resolved into the problem of small-range feature extraction, the problem that the target local feature is difficult to accurately capture during the prediction of a three-dimensional target is solved, the proposed three-dimensional shape deformation model can be trained end to end, and compared with the traditional two-dimensional slice prediction method, the method provided by the invention is based on a three-dimensional image and can not lose information in the depth direction.
Drawings
FIG. 1 shows a flow chart of the method of the present invention. The method aims at a high-quality three-dimensional model which is noninvasive, rapid and accurate in prediction, and directly extracts features and predicts the EGFR gene mutation state through the proposed end-to-end convolutional neural network by inputting a complete three-dimensional CT image of a target.
FIG. 2 is a graph comparing the method of the present invention with a conventional method. The 3DDADD Net framework provided by the invention is an end-to-end prediction network without any branch, and the EGFR mutation state prediction of the lung adenocarcinoma patients is directly realized through complete lung characteristics. The method takes a three-dimensional CT image as input, abandons the input mode of a two-dimensional lung nodule slice sequence or a two-dimensional CT sequence in the traditional method, and solves the problems that manual labeling wastes time and labor and context information of lung features is lost. The local feature extractor for capturing the nodules in different directions and under the multi-scale background is provided on the basis of the base line dense connection network, and the detail features can be well processed.
Fig. 3 is a diagram of a model framework of the present invention. The input to the model is a complete CT slice of a lung adenocarcinoma patient, and the input to each convolutional layer network is the concatenation of the outputs of the preceding networks. The dense network connectivity facilitates the use of all features learned from each layer before, without the need for repeated learning. Each layer is composed of a multi-expansion asymmetric convolution module, the expression capability of standard convolution is improved, multi-expansion convolution is involved, different expansion factors are arranged in a single layer to model different resolutions, and the aliasing problem of dense connection is avoided. The three-dimensional asymmetric convolution widens the path of model feature extraction, and improves the robustness of the model to certain transformation styles of lung nodules, such as turning and rotation.
Detailed Description
The invention will now be further described with reference to the following examples and drawings:
the implementation method for acquiring the EGFR gene mutation state of a lung adenocarcinoma patient from a three-dimensional lung CT image is characterized by comprising the following steps of:
step 1: the original CT image is pre-processed. In order to make the model focus more on the lungs of the patient, the outside lung area is emptied and the CT is scaled to a uniform size by spline interpolation, i.e.: 112 × 112 × 90.
Step 1 a: and (5) inputting the CT into a U-Net model to automatically segment the lung, and setting the regions outside the lung area to be zero. The model is enabled to pay more attention to the lung in the calculation process, and meanwhile, the interference outside the lung area is reduced.
Step 1 b: the spline interpolation method is used to obtain the final preprocessing result, namely an image with the size of 112 × 112 × 90 and located in the center of the visual field. Each slice in the X-axis, Y-axis and Z-axis directions can be cut into the lung, the detailed characteristics of the lung are kept to the maximum extent, and laboratory equipment is utilized to the maximum extent.
Step 2: .
Step 2 a: generally, the three-dimensional ensemble averaging features reflect global spatial structure information of such objects. The method is based on Dense Block, four layers of densely connected convolutional neural networks are built, the output of each layer is connected in channel dimension and is used as the input of the next layer of convolution. Therefore, the transfer of the characteristics can be enhanced, the characteristics can be better utilized, and the problem of gradient disappearance is effectively reduced.
And step 2 b: in the connection process of layers, a bottleneck module for inter-channel feature fusion and dimension reduction is introduced, so that the calculated amount of the model is effectively reduced. Specifically, the characteristic distribution is improved to be normal distribution with the mean value of 0 and the variance of 1 through batch normalization, and then the dimension reduction effect is achieved through the convolution of a linear rectification function and 1 multiplied by 1.
And step 2 c: a multi-scale multi-expansion asymmetric cavity convolution module is constructed and embedded in a baseline model, and the module is used for capturing local tiny characteristics and paying attention to lung nodules appearing in the lung of a lung cancer patient in different directions, different sizes and different angles. By densely connecting the three-dimensional cavity convolutions with the expansion factors of 1, 2 and 4 and combining the three-dimensional asymmetric convolution module, the extraction capability of the three-dimensional cavity convolution module on three-dimensional features of different scales is improved, and nodules appearing in the lung are subjected to carpet type search and accurately hit. The module appears in each layer of the baseline network, so that the processing capacity of the network on small features is improved, and the prediction performance of the model is improved.
And 3, step 3: the data set was from the university of medical Zunyi Hospital, and there were 173 lung adenocarcinoma patient samples, of which 119 were EGFR mutants and 54 were EGFR wild-type. The invention carries out experiments by a five-fold cross validation method, aims to obtain as much effective information as possible from limited data, reduces overfitting and better evaluates the prediction performance and generalization capability of the model. During the training process, the batch size was set to 6 and the number of iterations of the model was set to 300. All learning rates are set to 0.01. The proposed method defines EGFR mutants as 1 and EGFR wild type as 0, i.e. closer to 1 the model output results are more easily judged as EGFR mutants. The neural network was trained using a cross entropy loss function, using Stochastic Gradient Descent (SGD) as the optimizer for the model, with a momentum of 0.9. Meanwhile, the weight decay for the l2 regularization coefficients is set to 0.0004 to prevent overfitting.
And 4, step 4: and setting a training model according to the data set and the experimental parameters. Please note that the invention is trained in an end-to-end manner throughout, namely: the input is CT image of whole lung, and the output is prediction result.
The specific embodiment is as follows:
the present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
The computer hardware environment for implementation is: an Intel Xeon E5-2600 [email protected] 8-core CPU processor, 64GB memory, equipped with a GeForce GTX TITAN X GPU. The software environment in which the software runs is: the Linux 16.0464 bit operating system. The method provided by the invention is realized by Python3.6.7 and TensorFlow 2.1.0 software.
Step 1: a data set is constructed.
Step 1 a: a predictive image data set is constructed. After approval by the ethical committee of the Guizhou medical university, the proposed method retrospectively analyzed data of patients with lung adenocarcinoma diagnosed by the hospital affiliated with Zunyi medical university (Guizhou, China) in month 10 to month 11, 2018.
Inclusion criteria for the proposed method to follow when collecting patient data are as follows: (1) the patient is proved to be a primary lung adenocarcinoma tumor by CT and histological examination; (2) the patient has an EGFR gene mutation detection result; (3) the CT image data of the tumor pathological specimen (4) obtained within 1 month is complete. The exclusion criteria followed were as follows: (1) the patient receives anti-tumor treatment (radiotherapy, chemotherapy or radiotherapy) before operation; (2) the time interval of postoperative CT imaging is more than 1 month; (3) CT images have difficulty identifying tumor boundaries (e.g., tumors located in the hilum of the lung or lesions combined with atelectasis); (4) lung tumor diameter is less than 1cm or CT image artifacts; 5) the poor quality of CT images affects segmentation and feature extraction.
After screening, 173 patients met the requirements, of which 75 males, 98 females, age 31-79 years, mean (58 ± 10 years). Of these patients, 57 smokers and 116 non-smokers were present. Clinical staging distribution: 80 cases in I stage, 9 cases in II stage, 21 cases in III stage and 63 cases in IV stage. And compared to previous deep learning methods, when prediction was performed using ROI images, nodule annotations were also collected on CT images of each patient by two radiologists with 5 and 10 years extensive diagnostic experience, respectively. Different, the problems in the labeling process are solved through negotiation of two doctors, 50 cases of data are randomly extracted for secondary labeling, and the consistency among observers is evaluated.
Step 1 b: in the pre-processing stage of the CT image, in order to effectively use the limited device memory and make the model focus more on the lungs, the proposed method removes the blank locations in the three-axis direction, leaving only that portion of the volume containing the lungs, which is then adjusted to a size of 112 × 112 × 90 by spline interpolation.
Step 2: and constructing a network structure.
Step 2 a: and constructing a three-dimensional densely connected baseline network. The network adopts a Dense Net network to map an input three-dimensional CT image into a vector representing an EGFR mutation state prediction result. The first convolutional layer consists of a convolutional kernel with the size of 7 × 7 × 7, the expansion coefficient of 7 and the step size of 1, and is followed by an ELU activation function to convert the 90 × 112 × 112 input feature map into the size of 80 × 112 × 112; starting from the second convolution layer, a densely connected multi-scale multi-expansion asymmetric convolution module is used, and the output of the densely connected multi-scale multi-expansion asymmetric convolution module is used as the input of the next layer; and finally, passing the output through an average value pooling layer, and then connecting with a full-connection layer to obtain the output of the gene prediction result.
And step 2 b: and constructing a multi-expansion asymmetric convolution module for capturing local features. The characteristic diagram firstly passes through a three-dimensional asymmetric convolution layer, and the convolution kernel comprises four forms: 3 × 3 × 3, 1 × 1 × 3, 3 × 1 × 1 and 1 × 3 × 1, and the feature map is calculated by the four convolution kernels at the same time, and the sum of the four calculation results is output; then, output is normalized through a BatchNorm3d layer and an ELU layer, so that the output is distributed uniformly and overfitting is inhibited; then passing through a multi-expansion convolutional layer, namely a convolutional layer with three densely connected expansion coefficients of 1, 2 and 4 respectively, wherein the sizes of convolution kernels are 3 multiplied by 3; then, the output is normalized through a BatchNorm3d layer and an ELU layer, so that the output is distributed uniformly and overfitting is inhibited; finally, the input is converted into a conversion layer, the number of channels of the output characteristic diagram is converted into half of the original number, and the size of the characteristic diagram is reduced through a maximum value pooling layer.
And step 3: and training a three-dimensional end-to-end EGFR gene mutation state prediction model. The network employs the lung adenocarcinoma CT dataset provided by Zunyi medical university as described in step 1 b. The parameters of the network are optimized by using the SGD optimizer, the initial learning rate is set to be 0.01, and the learning rate is attenuated to 10% of the original learning rate every 150 epochs of training.
And 3a, giving a three-dimensional input CT image, and coding the input image by a first layer of feature extraction layer to obtain a feature map with the size of 80 multiplied by 112.
And step 3 b: and (4) inputting the feature map obtained in the step (3 a) into the model to obtain a feature vector with the size of 1 multiplied by 2, which is the prediction result of the model.
And 3c, inputting the tissue biopsy result corresponding to the lung adenocarcinoma patient given by the database and the EGFR gene mutation state predicted by the model into the loss function by using the cross entropy loss as the loss function. The neural network is trained using a back propagation algorithm.
And 3d, judging whether to stop training. And returning to the step 3a when the training is continued, and entering the step 4 when the training is stopped.
And 4, step 4: and the trained integral three-dimensional prediction network is used as a detector to predict the EGFR gene mutation state of the target on the test set.
And 4a, inputting a three-dimensional CT image into the EGFR gene mutation state prediction network trained in the step 3, and outputting a characteristic vector with the size of 1 multiplied by 2, which is a prediction result of the model and represents the probability of the mutation type and the wild type respectively.
And 5: 5 training and testing were performed by the 5-fold cross-validation method mentioned in the "summary of cross-validation methods in model selection" published 2013 by Van Yongdong et al, the method proposed by the present invention yielded an AUC of 0.79 and an ACC of 0.77 in 173 lung adenocarcinoma CT data sets following medical university.

Claims (1)

1. A method for predicting EGFR gene mutation state based on an end-to-end three-dimensional convolutional neural network is characterized by comprising the following steps:
step 1: preprocessing an original three-dimensional lung CT image, emptying contents except a lung area in the CT image, and scaling the CT to be uniform in size, namely 112 multiplied by 90, in a spline interpolation mode;
step 1 a: segmenting the lung through a U-Net model, and setting the region outside the lung area to be zero;
step 1 b: obtaining a preprocessing result in a spline interpolation mode, namely obtaining an image which is 112 multiplied by 90 and is positioned in the center of a visual field, ensuring that each slice in the X-axis, Y-axis and Z-axis directions can cut the lung, and keeping the detail characteristics of the lung;
step 2: constructing a neural network structure: adopting a densely connected multilayer convolutional neural network for extracting the overall characteristics of the target and fusing multi-scale characteristics;
step 2 a: the proposed method is based on a Dense Block, four layers of densely connected convolutional neural networks are built, the output of each layer is connected in channel dimensions and is used as the input of the convolution of the next layer;
and step 2 b: in the connection process of layers, a bottleneck module for inter-channel feature fusion and dimension reduction is introduced, specifically, feature distribution is improved into normal distribution with the mean value of 0 and the variance of 1 through batch normalization, and the effect of dimension reduction is achieved through convolution of a linear rectification function and 1 multiplied by 1;
and step 2 c: embedding a multi-scale multi-expansion asymmetric cavity convolution module in a baseline model for capturing local tiny characteristics and paying attention to lung nodules appearing in the lung of a lung cancer patient in different directions, different sizes and different angles;
and step 3: training a neural network structure in an end-to-end mode, namely a three-dimensional lung CT image data set, using a cross entropy loss function to train the neural network structure, using a random gradient descent SGD as an optimizer of a model, and enabling momentum to be 0.9;
in the training process, the batch processing size is set to be 6, the iteration times of the model are set to be 300, and all learning rates are set to be 0.01; the EGFR mutant is defined as 1 and the EGFR wild type is defined as 0, i.e., the closer the model output result is to 1, the more easily it is determined as the EGFR mutant. The weight decay on the l2 regularization coefficients was set to 0.0004 to prevent overfitting;
and 4, step 4: and inputting the three-dimensional lung CT image data into the trained end-to-end three-dimensional convolutional neural network, inputting the data into a CT image of the whole lung, and outputting the data as a prediction result, namely predicting the EGFR gene mutation state.
CN202210583718.2A 2022-05-25 2022-05-25 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network Active CN115132275B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210583718.2A CN115132275B (en) 2022-05-25 2022-05-25 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210583718.2A CN115132275B (en) 2022-05-25 2022-05-25 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network

Publications (2)

Publication Number Publication Date
CN115132275A true CN115132275A (en) 2022-09-30
CN115132275B CN115132275B (en) 2024-02-27

Family

ID=83376354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210583718.2A Active CN115132275B (en) 2022-05-25 2022-05-25 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network

Country Status (1)

Country Link
CN (1) CN115132275B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810016A (en) * 2023-02-13 2023-03-17 四川大学 Lung infection CXR image automatic identification method, system, storage medium and terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766051A (en) * 2019-09-20 2020-02-07 四川大学华西医院 Lung nodule morphological classification method based on neural network
CN110807764A (en) * 2019-09-20 2020-02-18 成都智能迭迦科技合伙企业(有限合伙) Lung cancer screening method based on neural network
CN111814611A (en) * 2020-06-24 2020-10-23 重庆邮电大学 Multi-scale face age estimation method and system embedded with high-order information
CN113345576A (en) * 2021-06-04 2021-09-03 江南大学 Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT
WO2022063200A1 (en) * 2020-09-24 2022-03-31 上海健康医学院 Non-small cell lung cancer prognosis survival prediction method, medium and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766051A (en) * 2019-09-20 2020-02-07 四川大学华西医院 Lung nodule morphological classification method based on neural network
CN110807764A (en) * 2019-09-20 2020-02-18 成都智能迭迦科技合伙企业(有限合伙) Lung cancer screening method based on neural network
CN111814611A (en) * 2020-06-24 2020-10-23 重庆邮电大学 Multi-scale face age estimation method and system embedded with high-order information
WO2022063200A1 (en) * 2020-09-24 2022-03-31 上海健康医学院 Non-small cell lung cancer prognosis survival prediction method, medium and electronic device
CN113345576A (en) * 2021-06-04 2021-09-03 江南大学 Rectal cancer lymph node metastasis diagnosis method based on deep learning multi-modal CT

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
石陆魁;杜伟?;马红祺;张军;: "基于多尺度和特征融合的肺癌识别方法", 计算机工程与设计, no. 05, 16 May 2020 (2020-05-16) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810016A (en) * 2023-02-13 2023-03-17 四川大学 Lung infection CXR image automatic identification method, system, storage medium and terminal

Also Published As

Publication number Publication date
CN115132275B (en) 2024-02-27

Similar Documents

Publication Publication Date Title
CN110458249B (en) Focus classification system based on deep learning and probabilistic imaging omics
CN108537773B (en) Method for intelligently assisting in identifying pancreatic cancer and pancreatic inflammatory diseases
Byra et al. Early prediction of response to neoadjuvant chemotherapy in breast cancer sonography using Siamese convolutional neural networks
CN111553892B (en) Lung nodule segmentation calculation method, device and system based on deep learning
CN111353998A (en) Tumor diagnosis and treatment prediction model and device based on artificial intelligence
Silva et al. EGFR assessment in lung cancer CT images: analysis of local and holistic regions of interest using deep unsupervised transfer learning
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
Wang et al. Study on automatic detection and classification of breast nodule using deep convolutional neural network system
CN111798424B (en) Medical image-based nodule detection method and device and electronic equipment
CN113538435B (en) Pancreatic cancer pathological image classification method and system based on deep learning
CN113208640A (en) Method for predicting axillary lymph node metastasis based on PET (positron emission tomography) imaging omics special for mammary gland
Li et al. A novel radiogenomics framework for genomic and image feature correlation using deep learning
CN112102343A (en) Ultrasound image-based PTC diagnostic system
WO2013151749A1 (en) System, method, and computer accessible medium for volumetric texture analysis for computer aided detection and diagnosis of polyps
CN115132275B (en) Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network
CN113764101B (en) Novel auxiliary chemotherapy multi-mode ultrasonic diagnosis system for breast cancer based on CNN
CN114565601A (en) Improved liver CT image segmentation algorithm based on DeepLabV3+
Danku et al. Cancer diagnosis with the aid of artificial intelligence modeling tools
JP2012504003A (en) Fault detection method and apparatus executed using computer
CN116740386A (en) Image processing method, apparatus, device and computer readable storage medium
WO2021197176A1 (en) Systems and methods for tumor characterization
CN114822842A (en) Magnetic resonance colorectal cancer T stage prediction method and system
CN114445374A (en) Image feature processing method and system based on diffusion kurtosis imaging MK image
Xie et al. [Retracted] Analysis of the Diagnosis Model of Peripheral Non‐Small‐Cell Lung Cancer under Computed Tomography Images
Bamigbade Gleason Score Prediction for the Severity of Prostate Metastasis Using Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant