CN112614084A

CN112614084A - Method for 3D depth convolution neural network aiming at CT image

Info

Publication number: CN112614084A
Application number: CN201910881070.5A
Authority: CN
Inventors: 赵奎; 吕晴; 曹吉龙; 魏景峰; 王其乐; 金继鑫; 周晓磊; 张镝; 祁柏林; 宋春梅; 李阳; 董开明; 沙文龙
Original assignee: Shenyang Institute of Computing Technology of CAS
Current assignee: Shenyang Institute of Computing Technology of CAS
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2021-04-06

Abstract

The invention relates to a method of a 3D depth convolution neural network aiming at positive and negative bidirectional input of a CT image. The method provides a positive and negative bidirectional input 3D depth convolution neural network model aiming at a medical image format, two input layers are added on the 3D depth convolution neural network model and respectively correspond to positive and negative input of CT image time dimension, information on lung slice dimension can be reserved, and more image features can be extracted. The method not only trains lung cancer image data, but also uses a small amount of liver cancer image data to finely adjust the model, and finally applies the model to the liver cancer CT image, thereby not only having high convergence speed, but also achieving higher accuracy.

Description

Method for 3D depth convolution neural network aiming at CT image

Technical Field

The invention relates to the field of image processing and medical auxiliary diagnosis, in particular to a method of a 3D depth convolution neural network aiming at positive and negative bidirectional input of a CT image.

Background

Liver cancer is a relatively serious malignant tumor, and the incidence rate of the liver cancer is second to that of lung cancer. In the early stage of liver cancer, patients generally have no discomfort, but once diagnosed, the liver cancer often belongs to the middle and late stages due to the insidious nature of the liver cancer. Therefore, the treatment difficulty of the liver cancer is high, the curative effect is poor, the survival time after the liver cancer is generally only six months, and the survival rate in 5 years is only 2% -16%. The early diagnosis research of liver cancer has great significance for improving the health level of human beings.

Conventional image diagnosis is performed by imaging with a corresponding device, and even though it takes about 10 minutes for an experienced physician to thoroughly examine a patient, the physician can evaluate the malignancy of a tumor according to the morphology, but the accuracy depends largely on the physician's experience, and different physicians may give different predictions.

Recent advances in deep learning have enabled computer vision models to help physicians diagnose various problems, and in some cases, models have shown competitiveness to physicians. By 3 months in 2019, the imaging research report of the convolutional neural network applied to the liver cancer is not seen, but the prior two-layer neural network has achieved a height which is incomparable with the diagnostic model research of other diseases.

Disclosure of Invention

In order to better classify the small sample liver cancer, the method for the forward and reverse bidirectional input 3D deep convolutional neural network of the CT image can better capture complex characteristics, and the problem of small sample liver cancer CT image classification is solved by a deep learning-based transfer learning method.

In order to solve the technical problem, the technical scheme adopted by the invention comprises the following steps:

a method of a 3D deep convolutional neural network for positive and negative bidirectional input of CT images comprises the following steps:

establishing a positive and negative bidirectional input 3D deep convolutional neural network model by utilizing CT image data of lung cancer and liver cancer, and dividing the CT image data of the lung cancer into a training set and a test set to adjust model parameters;

and transferring the established model to the liver cancer disease of the small similarity sample for prediction, and further adjusting model parameters to obtain a final neural network model for identifying the probability of the image diseases.

The method also comprises the following pretreatment steps:

a. collecting CT image data of lung cancer and liver cancer, and converting the image format into a format capable of being identified by a computer;

b. denoising the image by using a two-dimensional Gaussian filter;

c. carrying out image normalization by using a linear function;

d. screening according to the ratio of the cancerous area to the background area in the image to obtain a suspected layer where the lung or liver area is located; further adjusting the threshold value of the ratio of the cancer area to the background area to determine the level of the lung or liver area; manually marking the obtained images to obtain positive sample images of lung or liver diseases;

e. a data enhancement method is employed to balance the image data sets of lung or liver disease.

The computer-recognizable format is UNIT 8.

The data enhancement method includes image scaling, translation and rotation to expand positive or negative samples of lung or liver, equalizing the positive or negative sample ratio of lung data and liver data.

The structure of the positive and negative bidirectional input 3D depth convolution neural network model comprises a plurality of convolution layers, a pooling layer, a plurality of full-connection layers and an output layer, and the positive input image and the negative input image are received to train the model.

The positive sequence image and the reverse sequence image are subjected to the operations of the convolutional layer, the pooling layer, the convolutional layer and the pooling layer, then subjected to the operations of the convolutional layer, the pooling layer, the convolutional layer and the pooling layer, and finally subjected to the operation of 3 full-connection layers and 1 output layer to output the recognition probability.

And the established model is transferred to the liver cancer disease of the small similarity sample for prediction, and liver cancer image data is input into a neural network model to further adjust model parameters.

Further comprising: and (4) selecting a lung cancer or liver cancer CT image, inputting the lung cancer or liver cancer CT image into a final neural network model, and identifying the probability of the lung cancer or liver cancer CT image being ill.

The invention has the following advantages and beneficial effects:

1. bidirectional input: a forward 3D image and a reverse 3D image of the same image, the forward and reverse being for a third dimension of the image, i.e. the time dimension, the reverse 3D image being the reverse of the forward 3D image in the time dimension, better captures complex medical image features.

2. The classification problem of the small sample liver cancer is solved; the deep learning model aims to learn a mapping function containing a large number of parameters, and the parameter fitting capability learned by the data of the small sample is very limited, so that the similarity of the lung cancer image and the liver cancer image is considered, the model parameters are learned by using the data of the large sample, and then the parameters are finely adjusted by using the small sample, and the problem of under-fitting of the small sample can be solved.

Drawings

FIG. 1 is a schematic structural diagram of a 3D deep convolutional neural network model with bidirectional input;

FIG. 2 is a diagram of a forward-backward bidirectional input 3D deep convolutional neural network model.

Detailed description of the invention

The invention is described in further detail below with reference to the accompanying drawings, comprising the following steps:

as shown in fig. 1, the modeling steps are:

step 1: and cleaning the image. The raw data sets (lung and liver data) are both in Digital Imaging and Communications in Medicine (DICOM) format, so the raw data is first converted to Hounsfield Units (HU), which is a standard quantization scale describing the radiation density, using processing the data into an image format. Each tissue has its own specific HU range, which is the same for different people. To prepare the data for the deep network, we convert the image from HU to UINT8 (data format) and then normalize the data to between 0 and 255, the original size of the image being 512 x 512 pixels.

Step 2: and (5) noise reduction. The CT image is two-dimensional, and thus a two-dimensional gaussian filter is used to denoise the image.

And step 3: and (6) normalizing. Image normalization was performed using a linear function.

The normalized formula is: and y is (x-MinValue)/(MaxValue-MinValue), wherein x and y are pixel values before and after conversion, respectively, and MaxValue and MinValue are the maximum value and the minimum value of the pixels in the image, respectively.

And 4, step 4: and (5) carrying out CT image serialization processing. The CT image data is subjected to disease prediction, in order to enable the pertinence of the training data to be stronger, a general level of a lung or liver region contained in the CT image is counted, then a level meeting requirements is screened, and finally a doctor manually marks the level to obtain a positive sample.

The CT chest image has a serialization feature, and in order to extract more correlation information between adjacent layers, in the network training process, the whole CT sequence needs to be regarded as a training sample and assigned with a label (classified into diseased and non-diseased according to medical markers). A large number of CT images can be acquired by a CT imager, but in these CT images, there is partial noise, i.e., the image of a partial slice contains only a small number of images of the lung or liver. Therefore, in order to make the training data more targeted, the approximate level of the lung or liver region contained in the CT image is counted first, and then the level meeting the requirement is screened, and in the screening process, the threshold value can be selected according to the ratio of the lung area to the background area in the image for screening. For example, suppose there are 190-320 different levels of CT data, and some levels may contain a smaller percentage of lung area, so that the threshold is manually set to select 128 (128 found to be most effective in consideration of laboratory equipment and experimental results) levels containing lung or liver from each CT data.

And 5: data enhancement is balanced with the sample. The data enhancement method is adopted to balance the data set, and for a positive sample data set, three methods of image scaling (scaling refers to scaling pictures with different sizes), translation (translation refers to translating by an affine matrix to change the geometric spatial position of pixels in the image, so as to record and move a focus node pixel region with larger influence to the next frame) and rotation (rotating by 30-90 degrees according to the center of the image, the center of x and the center of y) are used for expanding positive or negative samples of the lung or the liver, and finally, the ratio of the positive samples to the negative samples of the lung data and the liver data is 1: 1. And finally obtaining a sample image set suitable for model processing.

FIG. 2 is a diagram of a forward-backward bidirectional input 3D deep convolutional neural network model. The structure of the model comprises a plurality of convolution layers and pooling layers, a plurality of full-connection layers and an output layer, and the forward input image and the reverse input image are received to train the model.

(Each image in the set of images acquired in step 5 is processed as follows in steps 6-13)

After the image processing in steps 1 to 5, a 3D image matrix with size of 128 × 512(128 slice thickness, 512 × 512 is the image pixel size) can be obtained.

The forward 3D image and the reverse 3D image are subjected to steps 6 to 10, respectively.

Step 6: the time sequence is taken as the 3 rd dimension, and a forward 3D image and a reverse 3D image of the same image are taken as the input of the model. The image size is: 1*128*512*512. (1 is a single pass, 128 is slice thickness, 512 x 512 is image size).

And 7, respectively passing the 3D image through the 1 st convolution layer with convolution kernels of 3 x 3, a filling value of 1 and a step length of 2. The image size is: 16 × 128 × 256 (formula: O ═ I-K +2P)/S +1, where O is the size of the output image, I is the size of the input image, K is the kernel size of the convolution layer, S is the move step, P is the fill number, example (512-3+2 × 1)/2+1 ═ 256,16 is 16 convolution kernels).

And 8: the 3D image was passed through the 1 st pooling layer with step size 2 and pooling size 2 x 2, respectively. The image size is: 16 × 64 × 256 (formula: O ═ Ps)/S +1, where O is the size of the output image, I is the size of the input image, S is the move step, and Ps is the pooling layer size.

And step 9: the 3D image passes through 2 nd convolution layers with a convolution kernel of 3 x 3, a fill value of 1, and a step size of 2, respectively. The image size is: 32 × 64 × 128 ((256-3 +2 × 1)/2+1 × 128, 32 being 32 convolution kernels).

Step 10, the 3D image passes through the 2 nd pooling layer with step size of 2 and pooling size of 2 x 2, respectively. The image size is: 32 × 128((64-2)/2+1 ═ 32, 32 being 32 convolution kernels).

Respectively operating the positive direction and the reverse direction through steps 6 to 10, and combining the steps 11 to 12:

step 11: the output of step 10 is input into the convolution layer with convolution kernel 3 and the pooling layer with step size 2. The image size is: 128 x 16 x 64(64 x 64 image size, (128-3 +2 x 1)/2+1 x 64,16 slices thick, (32-2)/2+1 x 16,128 for 128 convolution kernels).

Step 12: the output of step 11 is input into 2 convolutional layers with convolution kernel 3 x 3 and pooling layer with step 2. The image size is: 256 × 8 × 32(32 × 32 image size, (64-3 +2 × 1)/2+1 × 32,8 slices thick, (16-2)/2+1 × 8,256 are 256 convolution kernels).

Step 13: through the first fully connected layer. The image size is: 1*2097152. (fully connected layer output vector length equals number of neurons example: 256 × 8 × 32 ═ 2097152)

Step 14: through the second fully connected layer. The image size is: 1*512. (512 represents the number of hidden layers).

Step 15: through the third fully connected layer. The image size is: 1*64. (64 represents the number of hidden layers).

All the image data obtained in step 15 are processed as follows:

step 16: and (3) outputting a result y through the last output layer (only one node), wherein the value range of y is [0,1], and the probability of lung cancer diseases is represented.

In this embodiment, the probability of a disease can be better obtained by performing pretreatment and modeling on a lung disease or a liver disease as a treatment object. Since lung and liver cancer models are structurally identical, but differ in parameters. Firstly, a large amount of lung cancer data training models are used to obtain a better model parameter result, and then a small amount of liver cancer data are input into the models to finely adjust model parameters, so that a liver cancer classification model is obtained.

Claims

1. A method of a 3D deep convolutional neural network for positive and negative bidirectional input of CT images is characterized by comprising the following steps:

2. The method of the 3D deep convolutional neural network for positive and negative bidirectional input of the CT image as claimed in claim 1, further comprising the step of preprocessing:

b. denoising the image by using a two-dimensional Gaussian filter;

c. carrying out image normalization by using a linear function;

3. The method of claim 2, wherein the computer-recognizable format is UNIT8 for the 3D deep convolutional neural network with bidirectional input and inverse bidirectional input of the CT image.

4. The method of claim 2, wherein the data enhancement method comprises image scaling, translation and rotation to expand positive or negative samples of lung or liver, and to equalize the positive or negative sample ratio of lung data and liver data.

5. The method of claim 1, wherein the structure of the positive-negative bidirectional input 3D depth convolution neural network model comprises a plurality of convolution layers and pooling layers, a plurality of full-link layers, and an output layer, and the forward input image and the reverse input image are received to train the model.

6. The method of claim 5, wherein the forward and backward sequence images are processed by convolutional layer, pooling layer, and then by convolutional layer, pooling layer, and finally by 3 layers of fully-connected layer operation and 1 output layer to output the recognition probability.

7. The method as claimed in claim 1, wherein the moving the built model to liver cancer disease with small similarity samples for prediction comprises inputting liver cancer image data into the 3D deep convolutional neural network model with positive and negative bidirectional inputs to further adjust model parameters.

8. The method of claim 1, further comprising the steps of: and (4) selecting a lung cancer or liver cancer CT image, inputting the lung cancer or liver cancer CT image into a final neural network model, and identifying the probability of the lung cancer or liver cancer CT image being ill.