WO2024108425A1 - Method for classifying pancreatic images based on hybrid attention network - Google Patents

Method for classifying pancreatic images based on hybrid attention network Download PDF

Info

Publication number
WO2024108425A1
WO2024108425A1 PCT/CN2022/133719 CN2022133719W WO2024108425A1 WO 2024108425 A1 WO2024108425 A1 WO 2024108425A1 CN 2022133719 W CN2022133719 W CN 2022133719W WO 2024108425 A1 WO2024108425 A1 WO 2024108425A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
network
attention
module
image sequence
Prior art date
Application number
PCT/CN2022/133719
Other languages
French (fr)
Chinese (zh)
Inventor
黄建龙
贾富仓
陈藏
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Priority to PCT/CN2022/133719 priority Critical patent/WO2024108425A1/en
Publication of WO2024108425A1 publication Critical patent/WO2024108425A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to the technical field of medical image processing, and more specifically, to a method for classifying pancreatic images based on a hybrid attention network.
  • Acute pancreatitis is a common clinical emergency, with typical characteristics of acute chemical inflammation such as edema and inflammatory exudate of the pancreas and surrounding tissues.
  • CT computed tomography
  • MRI magnetic resonance imaging
  • CT severity index is often used to classify the severity of AP, but this requires physicians to perform time-consuming and inefficient manual analysis with low accuracy.
  • machine learning has become a powerful tool for analyzing imaging.
  • CNNs Convolutional neural networks
  • nnU-Net to mask 3D CT scans.
  • the mask-to-grid function outputs the pancreatic anatomy presented by the mesh model.
  • the generated mesh model is input into a graph residual network to quantitatively classify pancreatic ductal adenocarcinoma.
  • This scheme requires segmentation of the pancreas to ensure high performance, but the fully supervised learning of the segmentation algorithm requires data annotation by physicians, which is very time-consuming and depends on the physician's professional ability.
  • the region of interest (ROI) generation algorithm can mark the boundaries of pancreatic problems on CT images. This method makes it easier to focus on pancreatic features. ROI was originally used to locate the pancreas for manual measurement, but it can also be introduced into the automatic diagnosis of pancreatic diseases. For example, researchers have proposed a region-based CNN model that can create ROIs using feature maps. However, ROI-based diagnosis has the following problems. First, the shape of the ROI is confined to a rectangle, which is different from the true region, which causes irrelevant background data to be included in the predicted region, thereby adversely affecting the classification performance. Second, the accuracy of the model is highly dependent on the accuracy of the ROI generation algorithm, which is unreliable and often generates misplaced regions. Third, due to the size and variability of the pancreas, it is challenging to detect the pancreas.
  • pancreas identification schemes have problems such as time-consuming, poor accuracy, and individual adaptability.
  • the purpose of the present invention is to overcome the defects of the above-mentioned prior art and provide a method for classifying pancreatic images based on a hybrid attention network.
  • the method comprises the following steps:
  • the deep learning model includes a feature extraction module, a feature aggregation module and a classification layer.
  • the feature extraction module is used to extract feature maps of different depths for the image sequence to output feature vectors.
  • the feature maps of different depths are feature maps adaptively adjusted based on the spatial attention mechanism and the channel attention mechanism using a dual attention module;
  • the feature aggregation module is used to capture the temporal information of the feature vector and use a self-attention mechanism for weighting to obtain an aggregated feature vector, and the classification layer is used to classify the aggregated feature vector.
  • the advantage of the present invention is that, in view of the anatomical variability of acute pancreatitis on computed tomography, a hybrid attention network (Att-CNN-RNN) for classifying pancreatic images is proposed.
  • a hybrid attention network (Att-CNN-RNN) for classifying pancreatic images.
  • FIG1 is a flow chart of a method for classifying a pancreatic image based on a hybrid attention network according to an embodiment of the present invention
  • FIG2 is a schematic diagram of the overall structure of a hybrid attention network according to an embodiment of the present invention.
  • FIG3 is a schematic diagram of a process of pre-training a backbone network based on a twin network according to an embodiment of the present invention
  • FIG4 is a schematic diagram of the structure of a dual attention module according to an embodiment of the present invention.
  • FIG5 is a schematic diagram of the structure of a feature aggregation module according to an embodiment of the present invention.
  • FIG6 is a comparison diagram of a normal sample and an acute pancreatitis sample according to an embodiment of the present invention.
  • FIG. 7 is an attention map of different slices according to an embodiment of the present invention.
  • the present invention takes into account the problem of pancreas localization based on ROI, and introduces an attention mechanism to automatically focus on information without any shape restrictions.
  • the goal of the attention mechanism in image processing is to enable the network to learn to ignore irrelevant information and focus on key information by marking key features in a new weight layer.
  • soft attention In contrast to hard attention, which focuses more on points and is implemented through reinforcement learning, soft attention focuses more on regions or channels and can be trained in the network.
  • the weights of soft attention can be trained by feedforward and backpropagation. By embedding the attention mechanism in the original network, higher prediction accuracy can be achieved.
  • pancreatic CT images are described by taking pancreatic CT images as an example.
  • the original CT data is a three-dimensional scanning result.
  • data of all dimensions can be used, but the 3D network contains huge weights and is computationally expensive.
  • Another feasible solution is to use a continuous CT image sequence as input, which consists of vertically oriented axial slices.
  • the depth information in each slice will be lost.
  • the pancreas will occupy most of the slices in the middle of the image sequence, but will occupy very little space (such as not disappearing) in the slices at the beginning or end of the sequence. Therefore, the feature maps of different slices will have different degrees of influence on the final classification. Taking these limitations into account, the present invention proposes a hybrid attention network to classify pancreatic images.
  • the provided method for classifying pancreatic images based on a hybrid attention network includes the following steps.
  • Step S110 constructing a deep learning model, which includes a feature extraction module, a feature aggregation module and a classification layer, wherein the feature extraction module adopts a dual attention mechanism, and the feature aggregation module adopts a self-attention mechanism.
  • the deep learning model includes a feature extraction module, a feature aggregation module and a classification layer.
  • the feature extraction module is used to extract feature vectors from the input image sequence (or image slice sequence) and output the feature vectors of each slice.
  • the backbone network of the feature extraction module can be built based on various types of networks, such as residual network ResNet, VGG network, etc.
  • a dual attention mechanism is embedded in the feature extraction module, that is, the feature map extracted is adaptively adjusted based on spatial attention and channel attention. Spatial attention can help discover the spatial location of effective features, and channel attention can amplify the effective feature information in a specific channel.
  • the feature aggregation module is used to process the feature vector and input the processed features into the classification layer (such as the Softmax layer) to obtain the classification prediction results.
  • the feature aggregation module can be constructed using a recurrent neural network (RNN), such as a long short-term memory network (LSTM) or a gated recurrent unit (GRU). Taking LSTM as an example, the feature vector passes through the LSTM and a fully connected layer (FC) to obtain a vector representing the prediction result.
  • RNN recurrent neural network
  • LSTM long short-term memory network
  • GRU gated recurrent unit
  • the feature vector passes through the LSTM and a fully connected layer (FC) to obtain a vector representing the prediction result.
  • the feature aggregation module is a bidirectional LSTM with a self-attention mechanism or abi-LSTM network.
  • the residual network ResNet is used as the backbone network of the feature extraction module, and the bidirectional LSTM (Bi-LSTM) is used to construct the feature aggregation module.
  • Bi-LSTM bidirectional LSTM
  • ResNet uses shortcut connections to effectively solve the gradient vanishing and explosion problems of deep networks, significantly improves network performance, and makes the network easier to train.
  • the image is input into the first residual block to obtain a feature map of 96 ⁇ 96 ⁇ 128.
  • a shallow feature map of size 48 ⁇ 48 ⁇ 256 is obtained.
  • the second residual block the convolution kernel size is 3 ⁇ 3 and the step size is 2
  • a middle-level feature map of size 24 ⁇ 24 ⁇ 512 is obtained.
  • the middle-level feature map is input into the last residual block to obtain a deep feature map of size 8 ⁇ 8 ⁇ 256.
  • the spatial size of the feature map is reduced by half, and with each residual block, the number of channels is doubled.
  • abdominal CT images were used for unsupervised training using a twin network structure to improve the backbone network’s ability to represent organ features.
  • the aim is to improve the ability of the backbone network f to classify images with different features by introducing two multi-layer perceptrons: projection layer g and prediction layer h.
  • two enhanced images are obtained from the base image I using a random enhancement function aug.
  • I1 and I2 pass through the network in the order of f, g, and h.
  • the outputs of the projection and prediction are recorded as p and q respectively.
  • the loss function is defined in a symmetric mode and is expressed as:
  • D represents the negative cosine similarity, which is expressed by the following formula:
  • the stopping gradient function stopgrad makes q1 and q2 treated as constants in back-propagation, so the encoder is only updated according to the gradients of p1 and p2 .
  • the backbone f of the twin network can be converted into a pre-trained network.
  • the path projection layer is separated from the weight update to avoid unstable training.
  • the dual attention module consists of two sequential submodules: channel attention and spatial attention.
  • the input feature map is adaptively adjusted at each block of the feature extraction network.
  • two spatial feature information are obtained through the average pooling layer and the maximum pooling layer, marked as and
  • the two spatial feature information are then passed to a shared network with one hidden layer (or shared layer) to generate a channel attention feature map Mc ⁇ RC *1*1 , where C is the number of channels, the activation size of the hidden layer is set toRC/r*1*1 , and r is the reduction rate.
  • Mc output by the channel attention submodule is expressed as:
  • F represents the input feature map
  • MLP represents multi-layer perceptron
  • AvgPool represents average pooling
  • MaxPool represents maximum pooling
  • W0 and W1 represent weights
  • represents the activation function
  • the channel-refined features are input into the average pooling layer and the maximum pooling layer to obtain the corresponding interpretation of the features. and Then, the two interpretations are concatenated into a 2D map, on which a normal convolution is performed.
  • the output is Ms (F) ⁇ R H*W , which can be expressed by the following equation:
  • AvgPool represents average pooling and MaxPool represents maximum pooling.
  • the input feature map of the dual attention module is F ⁇ RC *H*W , where H is the height of the feature map, W is the width of the feature map, and C is the number of channels.
  • the input will pass through the channel attention submodule and the spatial attention submodule, during which Mc ⁇ RC *1*1 and Ms ⁇ R1 *H*W will be expanded to adapt to the size of F and perform the Hadamard product.
  • Mc ⁇ RC *1*1 and Ms ⁇ R1 *H*W will be expanded to adapt to the size of F and perform the Hadamard product.
  • F′′ represents the final attention-based feature map
  • F′ represents the output of the channel attention submodule
  • RNN is designed to learn long-term dependencies and can process sequential data by re-inputting the output of a neuron at a certain moment into the same neuron or another neuron.
  • This serial network structure is suitable for data sequences such as image slices because it can preserve dependency information in the data sequence.
  • RNN consists of a repetitive structure and shared parameters, which can significantly reduce the number of neural network parameters required for training.
  • the shared parameter structure also allows the model to process input sequences of random lengths. Therefore, RNN is particularly suitable for extracting temporal information from image sequences.
  • standard RNNs often have difficulty in achieving long-term preservation of storage memory.
  • standard RNNs may also experience gradient explosion and disappearance.
  • the present invention adopts a long short-term memory network (LSTM) to construct a feature aggregation module so as to store and update short-term memory in a more efficient and reliable manner, and solves the problem of gradient disappearance through a carefully designed structure, making it possible to process long inputs.
  • LSTM long short-term memory network
  • LSTM contains three gates, namely input gate, forget gate and output gate.
  • the input gate controls how much input is allowed to pass through the memory.
  • the forget gate determines whether to retain the data in the memory.
  • the output gate determines how much memory is allowed to be output.
  • Each gate is controlled by an external signal, which means that the input of LSTM is four times that of traditional RNN.
  • Each control signal z i , z f , z o first passes through the activation function f( ⁇ ) and then multiplied by the main signal. After the input signal z t enters the neuron, it must first pass through the activation function g( ⁇ ) and then multiplied by the input gate f(z i ). Then, the signal passes through the memory unit, and c t-1 in the memory unit is processed by the forget gate f(z f ) and mixed with the new input for storage.
  • the new memory is represented as follows:
  • the mixed data passes through the activation function h( ⁇ ) and then multiplied by the output gate f(z o ).
  • the activation function of the control signal is usually a Sigmoid function, which ranges from 0 to 1 and is designed to simulate a switch.
  • g( ⁇ ) and h( ⁇ ) are usually Tanh functions, which are designed to represent data from 0 to 1.
  • a network contains multiple LSTM neurons arranged side by side to form a memory array.
  • the input xt represents the tth vector in the sequence, which is multiplied by the corresponding transformation matrix Wf , Wi , Wi , W0 to generate the input signal and control signal
  • the number of neurons is equal to the number of vectors in the sequence, which in the present invention is equal to the number of slices of the 3D CT scan. Peephole connections are also introduced.
  • the memory ct -1 and output ht-1 of the previous neuron participate in signal generation using the transformation matrices pf , pi , pI , pO and Rf , Ri , RI , RO .
  • Bi-LSTM bidirectional long short-term memory network
  • the feature aggregation module is shown in FIG5 .
  • the Bi-LSTM network is composed of two sub-networks.
  • the t-th output vector is represented as:
  • h t and h′ t are the t-th outputs of LSTM, Represents a concatenation operation.
  • the prediction results are highly correlated not only with the fused features, but also with the position of the slices. This is because slices have different effects on the diagnosis.
  • the middle slice contains the largest image of the pancreas. To import this characteristic of the slice sequence, a self-attention mechanism is introduced to automatically find important feature vectors that should be given more weight.
  • u w is the context vector that needs to be learned during training, and T represents the transpose.
  • T represents the transpose.
  • a Softmax layer is used as the classification layer.
  • the array S is considered to contain pancreatic features that can be transformed and fused together.
  • the prediction is a binary classification problem, so the final result is a two-dimensional vector and can be expressed by the following equation.
  • Ws is the transformation matrix and bs is the bias.
  • the vector of the positive sample is specified as (1,0) and the vector of the negative sample is specified as (0,1).
  • Step S120 constructing a data set and training a deep learning model, wherein the data set reflects the correspondence between the sample image sequence and the pancreas classification label.
  • the dataset can be obtained by scanning multiple subjects, and the sample images of each subject are annotated with classification labels by professional physicians, that is, the dataset reflects the correspondence between the image slice sequence and the pancreatic classification label.
  • the optimized parameters of the deep learning model such as weights and biases, can be obtained.
  • Step S130 classify the target pancreatic image using the trained deep learning model.
  • actual pancreatic image classification can be applied, including: obtaining an image sequence of the target area; inputting the image sequence into the deep learning model to obtain a pancreatic classification result.
  • an ablation experiment is performed on the acquired data set to evaluate the effectiveness of the proposed deep learning model through some given evaluation formulas.
  • other methods are compared with the present invention in terms of accuracy and complexity. Then, the performance of the proposed hybrid attention network is evaluated using different data amounts and sequence lengths.
  • the network is implemented in PyTorch.
  • the experiments are run on NVIDIA A40 hardware with 64GB RAM.
  • the parameters remain unchanged during training.
  • the optimizer is Adam, the initial learning rate is 0.0001, halved every 10 epochs, and the number of epochs is set to 100.
  • batch normalization is performed before each activation function.
  • AbdomenCT-1K A large abdominal CT dataset (AbdomenCT-1K) is introduced in the pre-training step.
  • AbdomenCT-1K provides different annotations for various organ segmentation tasks and contains more than 1,000 scan sequences from 12 medical centers.
  • the dataset in the experiment comes from Xiangya Hospital and contains 153 subjects. It was acquired using a Siemens Prisma, 1.5 Tesla, model-syngo MRE11 scanner. The echo time was set to 1.33 and the repetition time was 321.79. The size of the original image was 1024 ⁇ 1024. The diagnosis was made by a professional physician. Written consent was obtained from all subjects participating in the experiment.
  • Figure 6 shows some examples of the two types of samples, where the left column is normal samples and the right column corresponds to acute pancreatitis samples.
  • the Pillow library is used to compress the images to a size of 224 ⁇ 244 before training.
  • an augmentation method is applied to the training data by performing random augmentation function transformations such as scaling, rotation, and gamma changes.
  • the importance of the dual attention module is experimentally evaluated.
  • the purpose of applying the attention mechanism is to focus on relevant areas at the channel level and pixel level and suppress irrelevant features.
  • the work of the module is visualized using heat maps from the Matplotlib library. As shown in Figure 4, the tokens with larger weights in the network are marked with colors.
  • the experimental results show that the attention mechanism proposed in the present invention can enable the network to focus on the area where the pancreas is located.
  • the size of the focused area varies, corresponding to the size of the pancreatic tissue.
  • the dual attention module can highlight the relevant area by learning the high-level features of the organ. Therefore, even if the pancreatic tissue becomes very small in the marginal layer, the module can locate it.
  • the attention module still tries to highlight certain areas near the location of the organ. This is because the characteristics of CT scans make organs and tissues appear in the same color, which affects the function of the module.
  • ADGNET This network consists of a ResNet with an attention module, a classifier, and a decoder.
  • the classifier and decoder work simultaneously to perform classification and reconstruction. This model is intended for use in diagnosing Alzheimer's disease.
  • 3DResAttNet This network is a 3D residual self-attention CNN with high interpretability.
  • the adopted self-attention module works as a mapping function, which contains keys, values, and queries.
  • the features extracted by each convolutional block are converted into vectors through 1 ⁇ 1 ⁇ 1 convolutions. These values are focused by the query function during training.
  • AG-CNN This network consists of three subnetworks. First, the attention prediction subnetwork generates attention for glaucoma diagnosis. Second, the pathological region localization subnetwork combines the generated attention map to form a masked feature map. Finally, the classification subnetwork receives the masked feature map and outputs the prediction result.
  • the classification performance of each classification method based on multiple measurements is shown in Table 3. It can be seen from Table 3 that the network proposed in the present invention has the highest precision and sensitivity. Compared with AG-CNN, the precision of the present invention is improved by 12.19% and the sensitivity is improved by 8.94%. Since the feature extraction subnets of other methods have similar structures, the reason why the network of the present invention can achieve high performance is that LSTM, which can extract deep features from slice sequences, is applied. In contrast, other 2D networks rely only on a single image, and their classification accuracy is highly dependent on this input. 3D networks can also process deep features, but they contain more weights, which increases the amount of calculation.
  • Att-CNN-RNN proposed in the present invention has better classification performance, high sensitivity and accuracy, while still maintaining a competitive advantage in terms of parameter size.
  • the present invention proposes a hybrid attention network for classifying pancreatic CT scans, which as a whole includes a feature extraction module and a feature aggregation module, wherein the feature extraction module is based on a residual network as the backbone, taking into account both channel attention and spatial attention.
  • This backbone network can solve the problems of gradient vanishing and explosion, which helps to deepen the network.
  • an improved recurrent neural network is used in the feature aggregation module to aggregate the feature vectors from the feature extraction module. Further, the network is verified on the acquired data set.
  • RNN recurrent neural network
  • the present invention may be a system, a method and/or a computer program product.
  • the computer program product may include a computer-readable storage medium carrying computer-readable program instructions for causing a processor to implement various aspects of the present invention.
  • Computer readable storage medium can be a tangible device that can hold and store instructions used by an instruction execution device.
  • Computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof.
  • Non-exhaustive list of computer readable storage medium include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanical encoding device, for example, a punch card or a convex structure in a groove on which instructions are stored, and any suitable combination thereof.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • mechanical encoding device for example, a punch card or a convex structure in a groove on which instructions are stored, and any suitable combination thereof.
  • the computer readable storage medium used here is not interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagated by a waveguide or other transmission medium (for example, a light pulse by an optical fiber cable), or an electrical signal transmitted by a wire.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.
  • the computer program instructions for performing the operation of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as "C" language or similar programming languages.
  • Computer-readable program instructions may be executed entirely on a user's computer, partially on a user's computer, as an independent software package, partially on a user's computer, partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., using an Internet service provider to connect via the Internet).
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be personalized by utilizing the state information of the computer-readable program instructions, and the electronic circuit may execute the computer-readable program instructions, thereby realizing various aspects of the present invention.
  • Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
  • each box in the flowchart or block diagram can represent a module, a program segment or a part of an instruction, and the module, a program segment or a part of an instruction contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each box in the block diagram and/or the flowchart, and the combination of the boxes in the block diagram and/or the flowchart can be implemented by a dedicated hardware-based system that performs the specified function or action, or can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that it is equivalent to implement it by hardware, implement it by software, and implement it by combining software and hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present invention is a method for classifying pancreatic images based on a hybrid attention network. The method comprises: acquiring an image sequence of a target area; and inputting the image sequence into a deep learning model to obtain a pancreas classification result, wherein the deep learning model comprises a feature extraction module, a feature aggregation module, and a classification layer, wherein the feature extraction module is used for extracting feature maps of different depths for the image sequence to output feature vectors, the feature maps of different depths being feature maps adaptively adjusted on the basis of a spatial attention mechanism and a channel attention mechanism by using a dual-attention module, the feature aggregation module is used for capturing time information of the feature vectors and performing weighting by using a self-attention mechanism to obtain aggregated feature vectors, and the classification layer is used for classifying the aggregated feature vectors. According to the present invention, pancreatic images can be accurately classified, thereby guiding automatic diagnosis of pancreatitis.

Description

一种基于混合注意力网络对胰腺图像进行分类的方法A method for pancreatic image classification based on hybrid attention network 技术领域Technical Field
本发明涉及医学图像处理技术领域,更具体地,涉及一种基于混合注意力网络对胰腺图像进行分类的方法。The present invention relates to the technical field of medical image processing, and more specifically, to a method for classifying pancreatic images based on a hybrid attention network.
背景技术Background technique
急性胰腺炎(AP)是一种常见的临床急症,典型特征为胰腺及周围组织水肿和炎性渗出等急性化学性炎症。目前,急性胰腺炎的临床诊断包括临床表现、实验室检查和影像学检查,其中影像学检查包括计算机断层扫描(CT)和磁共振成像(MRI)等。由于CT具有扫描速度快、后处理技术强大的特点,常使用CT严重程度指数(CTSI)对AP的严重程度进行分类,但这需要医师进行耗时而低效的手动分析,且准确度较低。近年来,机器学习已成为分析成像的有力工具。特别是利用深度学习对CT图像进行端到端检测以辅助疾病诊断,已成为研究热点之一。然而,目前急性胰腺炎的准确诊断仍存在诸多困难,包括胰腺的高解剖变异性、急性胰腺炎病变的多样性以及成像性能的复杂性等。此外,急性胰腺炎的表现也因人而异,这增加了CT图像的复杂性。因此,如何有效提取胰腺图像中的目标特征成为了CT诊断的最大问题。Acute pancreatitis (AP) is a common clinical emergency, with typical characteristics of acute chemical inflammation such as edema and inflammatory exudate of the pancreas and surrounding tissues. At present, the clinical diagnosis of acute pancreatitis includes clinical manifestations, laboratory tests and imaging examinations, among which imaging examinations include computed tomography (CT) and magnetic resonance imaging (MRI). Due to the characteristics of CT's fast scanning speed and powerful post-processing technology, the CT severity index (CTSI) is often used to classify the severity of AP, but this requires physicians to perform time-consuming and inefficient manual analysis with low accuracy. In recent years, machine learning has become a powerful tool for analyzing imaging. In particular, the use of deep learning to perform end-to-end detection of CT images to assist in disease diagnosis has become one of the research hotspots. However, there are still many difficulties in the accurate diagnosis of acute pancreatitis, including the high anatomical variability of the pancreas, the diversity of acute pancreatitis lesions, and the complexity of imaging performance. In addition, the manifestations of acute pancreatitis vary from person to person, which increases the complexity of CT images. Therefore, how to effectively extract target features in pancreatic images has become the biggest problem in CT diagnosis.
卷积神经网络(CNN)是图像分类的常用工具,具有出色的分类准确度和耐用性,已被应用于诊断胰腺疾病。例如,有研究者提出了一种掩模到网格的分割方法,其先使用nnU-Net对3D CT扫描进行掩模。然后,从掩模到网格函数输出由网格模型呈现的胰腺解剖结构。最后,将生成的网格模型输入到图残差网络中,以对胰腺导管腺癌进行定量分类。这种方案需要对胰腺进行分割以确保高性能,但分割算法的全监督学习需要由医师执行数据注释,因而非常耗时并且依赖于医师的专业能力。Convolutional neural networks (CNNs) are a common tool for image classification with excellent classification accuracy and robustness, and have been applied to diagnose pancreatic diseases. For example, researchers have proposed a mask-to-grid segmentation method that first uses nnU-Net to mask 3D CT scans. Then, the mask-to-grid function outputs the pancreatic anatomy presented by the mesh model. Finally, the generated mesh model is input into a graph residual network to quantitatively classify pancreatic ductal adenocarcinoma. This scheme requires segmentation of the pancreas to ensure high performance, but the fully supervised learning of the segmentation algorithm requires data annotation by physicians, which is very time-consuming and depends on the physician's professional ability.
感兴趣区域(ROI)的生成算法可在CT图像上标注胰腺问题的边界, 利用这种方法可以更便捷地关注胰腺特征。ROI最初用于定位胰腺,以进行手动测量,但也可以将其引入胰腺疾病的自动诊断。例如,有研究者提出了一种基于区域的CNN模型,该模型可以使用特征图创建ROI。然而,基于ROI的诊断存在以下问题。第一,ROI的形状被限制在一个矩形内,与真实区域不同,这会导致预测区域中包含无关的背景数据,从而对分类性能产生不良影响。第二,模型的准确度高度依赖于ROI生成算法的准确度,而该算法并不可靠,经常会生成错位区域。第三,由于胰腺的大小和变异性,对胰腺进行检测是一项挑战。The region of interest (ROI) generation algorithm can mark the boundaries of pancreatic problems on CT images. This method makes it easier to focus on pancreatic features. ROI was originally used to locate the pancreas for manual measurement, but it can also be introduced into the automatic diagnosis of pancreatic diseases. For example, researchers have proposed a region-based CNN model that can create ROIs using feature maps. However, ROI-based diagnosis has the following problems. First, the shape of the ROI is confined to a rectangle, which is different from the true region, which causes irrelevant background data to be included in the predicted region, thereby adversely affecting the classification performance. Second, the accuracy of the model is highly dependent on the accuracy of the ROI generation algorithm, which is unreliable and often generates misplaced regions. Third, due to the size and variability of the pancreas, it is challenging to detect the pancreas.
综上,现有的胰腺识别方案存在耗时、准确度以及个体适应性较差等问题。In summary, existing pancreas identification schemes have problems such as time-consuming, poor accuracy, and individual adaptability.
发明内容Summary of the invention
本发明的目的是克服上述现有技术的缺陷,提供一种基于混合注意力网络对胰腺图像进行分类的方法。该方法包括以下步骤:The purpose of the present invention is to overcome the defects of the above-mentioned prior art and provide a method for classifying pancreatic images based on a hybrid attention network. The method comprises the following steps:
获取目标区域的图像序列;Acquire an image sequence of the target area;
将所述图像序列输入到深度学习模型,获得胰腺分类结果;Inputting the image sequence into a deep learning model to obtain a pancreas classification result;
其中,所述深度学习模型包括特征提取模块、特征聚合模块和分类层,所述特征提取模块用于针对所述图像序列提取不同深度的特征图,以输出特征向量,所述不同深度的特征图是利用双注意力模块基于空间注意力机制和通道注意力机制自适应调节的特征图;所述特征聚合模块用于捕获所述特征向量的时间信息并采用自注意力机制进行加权,获得聚合的特征向量,所述分类层用于对所述聚合的特征向量进行分类。Among them, the deep learning model includes a feature extraction module, a feature aggregation module and a classification layer. The feature extraction module is used to extract feature maps of different depths for the image sequence to output feature vectors. The feature maps of different depths are feature maps adaptively adjusted based on the spatial attention mechanism and the channel attention mechanism using a dual attention module; the feature aggregation module is used to capture the temporal information of the feature vector and use a self-attention mechanism for weighting to obtain an aggregated feature vector, and the classification layer is used to classify the aggregated feature vector.
与现有技术相比,本发明的优点在于,针对急性胰腺炎在计算机断层扫描上的解剖变异性,提出了一种用于对胰腺图像进行分类的混合注意力网络(Att-CNN-RNN),通过对胰腺图像应用混合注意力网络,实现了耐用且准确的胰腺炎自动诊断方法,克服了手动分类导致的耗时大且高度依赖于医师能力的缺陷。Compared with the prior art, the advantage of the present invention is that, in view of the anatomical variability of acute pancreatitis on computed tomography, a hybrid attention network (Att-CNN-RNN) for classifying pancreatic images is proposed. By applying the hybrid attention network to pancreatic images, a durable and accurate automatic diagnosis method for pancreatitis is realized, overcoming the defects of manual classification that is time-consuming and highly dependent on the physician's ability.
通过以下参照附图对本发明的示例性实施例的详细描述,本发明的其它特征及其优点将会变得清楚。Further features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments of the present invention with reference to the attached drawings.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
被结合在说明书中并构成说明书的一部分的附图示出了本发明的实施例,并且连同其说明一起用于解释本发明的原理。The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
图1是根据本发明一个实施例的基于混合注意力网络对胰腺图像进行分类的方法的流程图;FIG1 is a flow chart of a method for classifying a pancreatic image based on a hybrid attention network according to an embodiment of the present invention;
图2是根据本发明一个实施例的混合注意力网络的整体结构示意图;FIG2 is a schematic diagram of the overall structure of a hybrid attention network according to an embodiment of the present invention;
图3是根据本发明一个实施例的基于孪生网络预训练主干网络的过程示意图;FIG3 is a schematic diagram of a process of pre-training a backbone network based on a twin network according to an embodiment of the present invention;
图4是根据本发明一个实施例的双注意力模块的结构示意图;FIG4 is a schematic diagram of the structure of a dual attention module according to an embodiment of the present invention;
图5是根据本发明一个实施例的特征聚合模块的结构示意图;FIG5 is a schematic diagram of the structure of a feature aggregation module according to an embodiment of the present invention;
图6是根据本发明一个实施例的正常样本和急性胰腺炎样本的对比图;FIG6 is a comparison diagram of a normal sample and an acute pancreatitis sample according to an embodiment of the present invention;
图7是根据本发明一个实施例的不同切片的注意力图。FIG. 7 is an attention map of different slices according to an embodiment of the present invention.
具体实施方式Detailed ways
现在将参照附图来详细描述本发明的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本发明的范围。Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless otherwise specifically stated.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本发明及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Technologies, methods, and equipment known to ordinary technicians in the relevant art may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be considered as part of the specification.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific values should be interpreted as merely exemplary and not limiting. Therefore, other examples of the exemplary embodiments may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that like reference numerals and letters refer to similar items in the following figures, and therefore, once an item is defined in one figure, it need not be further discussed in subsequent figures.
本发明考虑到基于ROI进行胰腺定位的问题,引入注意力机制来自动 关注信息,同时不受任何形状限制。图像处理中注意力机制的目标是通过在新的权重层中标注关键特征,使网络学会忽略不相关的信息,并专注于关键信息。注意力方法分为两种:软注意力和硬注意力。硬注意力更侧重于点并通过强化学习实现,与之相比,软注意力更侧重于区域或通道,且可以在网络中进行训练。此外,软注意力的权重可以通过前馈和反向传播来训练。通过在原始网络中嵌入注意力机制,可以实现更高的预测准确度。The present invention takes into account the problem of pancreas localization based on ROI, and introduces an attention mechanism to automatically focus on information without any shape restrictions. The goal of the attention mechanism in image processing is to enable the network to learn to ignore irrelevant information and focus on key information by marking key features in a new weight layer. There are two types of attention methods: soft attention and hard attention. In contrast to hard attention, which focuses more on points and is implemented through reinforcement learning, soft attention focuses more on regions or channels and can be trained in the network. In addition, the weights of soft attention can be trained by feedforward and backpropagation. By embedding the attention mechanism in the original network, higher prediction accuracy can be achieved.
在下文中,以胰腺CT图像为例描述对胰腺图像分类的过程。原始CT数据为三维扫描结果。为获取更多的关键信息,可以利用所有维度的数据,但3D网络包含巨大的权重,且计算成本高昂。另一种可行解决方案是使用连续CT图像序列作为输入,其由垂直定向的轴向切片组成。然而,如果最后将每幅特征图展平并相互连接,各切片中的深度信息将丢失。另外,胰腺会占据图像序列中间的大部分切片,但在序列开头或结尾的切片中占据很小的空间(如未消失)。因此,不同切片的特征图会对最终分类造成不同程度的影响。考虑到这些限制,本发明提出了一种混合注意力网络来对胰腺图像进行分类。In the following, the process of classifying pancreatic images is described by taking pancreatic CT images as an example. The original CT data is a three-dimensional scanning result. In order to obtain more key information, data of all dimensions can be used, but the 3D network contains huge weights and is computationally expensive. Another feasible solution is to use a continuous CT image sequence as input, which consists of vertically oriented axial slices. However, if each feature map is finally flattened and interconnected, the depth information in each slice will be lost. In addition, the pancreas will occupy most of the slices in the middle of the image sequence, but will occupy very little space (such as not disappearing) in the slices at the beginning or end of the sequence. Therefore, the feature maps of different slices will have different degrees of influence on the final classification. Taking these limitations into account, the present invention proposes a hybrid attention network to classify pancreatic images.
具体地,参见图1所示,所提供的基于混合注意力网络对胰腺图像进行分类的方法包括以下步骤。Specifically, as shown in FIG1 , the provided method for classifying pancreatic images based on a hybrid attention network includes the following steps.
步骤S110,构建深度学习模型,该深度学习模型包括特征提取模块、特征聚合模块和分类层,其中特征提取模块采用双注意力机制,特征聚合模块采用自注意力机制。Step S110, constructing a deep learning model, which includes a feature extraction module, a feature aggregation module and a classification layer, wherein the feature extraction module adopts a dual attention mechanism, and the feature aggregation module adopts a self-attention mechanism.
结合图2所示,深度学习模型包括特征提取模块、特征聚合模块和分类层。As shown in Figure 2, the deep learning model includes a feature extraction module, a feature aggregation module and a classification layer.
特征提取模块用于从输入的图像序列(或称图像切片序列)中提取特征向量,输出各切片的特征向量。特征提取模块的主干网络可基于多种类型的网络构建,例如,残差网络ResNet、VGG网络等。此外,特征提取模块中嵌入了双注意力机制,即基于空间注意力和通道注意力自适应调节提取的特征图。空间注意力可以帮助发现有效特征的空间位置,通道注意力可以放大特定通道中的有效特征信息。The feature extraction module is used to extract feature vectors from the input image sequence (or image slice sequence) and output the feature vectors of each slice. The backbone network of the feature extraction module can be built based on various types of networks, such as residual network ResNet, VGG network, etc. In addition, a dual attention mechanism is embedded in the feature extraction module, that is, the feature map extracted is adaptively adjusted based on spatial attention and channel attention. Spatial attention can help discover the spatial location of effective features, and channel attention can amplify the effective feature information in a specific channel.
特征聚合模块用于处理特征向量,并将处理后的特征输入到分类层 (如Softmax层)获得分类预测结果。特征聚合模块可采用循环神经网络(RNN)构建,如长短期记忆网络(LSTM)或门控循环单元(GRU)等。以LSTM为例,特征向量经过LSTM和一个全连接层(FC),获得一个代表预测结果的向量。优选地,特征聚合模块是具有自注意力机制的双向LSTM或称为abi-LSTM网络。The feature aggregation module is used to process the feature vector and input the processed features into the classification layer (such as the Softmax layer) to obtain the classification prediction results. The feature aggregation module can be constructed using a recurrent neural network (RNN), such as a long short-term memory network (LSTM) or a gated recurrent unit (GRU). Taking LSTM as an example, the feature vector passes through the LSTM and a fully connected layer (FC) to obtain a vector representing the prediction result. Preferably, the feature aggregation module is a bidirectional LSTM with a self-attention mechanism or abi-LSTM network.
在下文中,以采用残差网络ResNet作为特征提取模块的主干网络,以采用双向LSTM(Bi-LSTM)构建特征聚合模块为例进行说明。In the following, the residual network ResNet is used as the backbone network of the feature extraction module, and the bidirectional LSTM (Bi-LSTM) is used to construct the feature aggregation module.
1)ResNet主干网络1) ResNet backbone network
在具有多个卷积块的卷积网络中,每个块均引入双注意力机制。本发明采用预训练的ResNet主干网络。ResNet使用快捷连接有效解决深度网络的梯度消失和***问题,显著提高了网络性能,并使网络更易于训练。In a convolutional network with multiple convolutional blocks, a dual attention mechanism is introduced in each block. The present invention adopts a pre-trained ResNet backbone network. ResNet uses shortcut connections to effectively solve the gradient vanishing and explosion problems of deep networks, significantly improves network performance, and makes the network easier to train.
具体地,首先,将图像输入第一个残差块,获得一幅96×96×128的特征图。然后,经过3个卷积层(如卷积核为3×3,步长为2),得到一幅尺寸为48×48×256的浅层特征图。然后经过第二个残差块(卷积核大小为3×3、步长为2),得到一幅尺寸为24×24×512的中层特征图。接下来将中层特征图输入最后一个残差块,得到一幅尺寸为8×8×256的深层特征图。每经过一个残差块,特征图的空间尺寸就减小一半,而每经过一个残差块,通道数就增加至两倍。Specifically, first, the image is input into the first residual block to obtain a feature map of 96×96×128. Then, after three convolutional layers (such as the convolution kernel is 3×3 and the step size is 2), a shallow feature map of size 48×48×256 is obtained. Then, after the second residual block (the convolution kernel size is 3×3 and the step size is 2), a middle-level feature map of size 24×24×512 is obtained. Next, the middle-level feature map is input into the last residual block to obtain a deep feature map of size 8×8×256. With each residual block, the spatial size of the feature map is reduced by half, and with each residual block, the number of channels is doubled.
在预训练步骤,使用215325片腹部CT,通过孪生网络结构进行无监督训练,以提高主干网络表示器官特征的能力。In the pre-training step, 215,325 abdominal CT images were used for unsupervised training using a twin network structure to improve the backbone network’s ability to represent organ features.
如图3所示,在预训练中,旨在通过引入两个多层感知器:投影层g和预测层h来提高主干网络f对具有不同特征的图像进行分类的能力。首先,使用随机增强函数aug从基础图像I中获得两个增强图像。As shown in Figure 3, in pre-training, the aim is to improve the ability of the backbone network f to classify images with different features by introducing two multi-layer perceptrons: projection layer g and prediction layer h. First, two enhanced images are obtained from the base image I using a random enhancement function aug.
I 1=aug(I)  (1) I 1 = aug(I) (1)
I 2=aug(I)  (2) I 2 = aug(I) (2)
然后I 1和I 2按f、g和h的顺序通过网络。投影和预测的输出分别记录为p和q。 Then I1 and I2 pass through the network in the order of f, g, and h. The outputs of the projection and prediction are recorded as p and q respectively.
p 1=g(f(I 1))  (3) p 1 =g(f(I 1 )) (3)
p 2=g(f(I 2))  (4) p 2 =g(f(I 2 )) (4)
q 1=h(p 1)  (5) q 1 =h(p 1 ) (5)
q 2=h(p 2)  (6) q 2 =h(p 2 ) (6)
预训练过程中,对称模式定义损失函数,表示为:During pre-training, the loss function is defined in a symmetric mode and is expressed as:
Figure PCTCN2022133719-appb-000001
Figure PCTCN2022133719-appb-000001
其中D表示负余弦相似度,采用以下公式表示:Where D represents the negative cosine similarity, which is expressed by the following formula:
Figure PCTCN2022133719-appb-000002
Figure PCTCN2022133719-appb-000002
停止梯度函数stopgrad使得q 1和q 2在反向传播中被视为常数,因此编码器只根据p 1和p 1的梯度来更新。 The stopping gradient function stopgrad makes q1 and q2 treated as constants in back-propagation, so the encoder is only updated according to the gradients of p1 and p2 .
训练过程结束后,可以将孪生网络的主干f改装为预训练网络。同时将路径投影层从权重更新中分离以避免不稳定的训练。After the training process is completed, the backbone f of the twin network can be converted into a pre-trained network. At the same time, the path projection layer is separated from the weight update to avoid unstable training.
2)双注意力模块2) Dual Attention Module
双注意力模块由通道注意力和空间注意力两个顺序子模块组成。输入的特征图在特征提取网络的每个块上进行自适应调整。The dual attention module consists of two sequential submodules: channel attention and spatial attention. The input feature map is adaptively adjusted at each block of the feature extraction network.
参见图4所示,首先,对于输入特征图,通过平均池化层和最大池化层分别获得两个空间特征信息,标记为
Figure PCTCN2022133719-appb-000003
Figure PCTCN2022133719-appb-000004
然后将这两个空间特征信息传递到具有一个隐藏层的共享网络(或称共享层)中,以生成通道注意力特征图M c∈R C*1*1,其中C为通道数,隐藏层的激活尺寸设置为R C/r*1*1,r为减少率。通道注意力子模块输出的结果特征向量M c表示为:
As shown in Figure 4, first, for the input feature map, two spatial feature information are obtained through the average pooling layer and the maximum pooling layer, marked as
Figure PCTCN2022133719-appb-000003
and
Figure PCTCN2022133719-appb-000004
The two spatial feature information are then passed to a shared network with one hidden layer (or shared layer) to generate a channel attention feature map Mc∈RC *1*1 , where C is the number of channels, the activation size of the hidden layer is set toRC/r*1*1 , and r is the reduction rate. The resulting feature vector Mc output by the channel attention submodule is expressed as:
Figure PCTCN2022133719-appb-000005
Figure PCTCN2022133719-appb-000005
其中,F表示输入特征图,MLP表示多层感知机,AvgPool表示平均池化,MaxPool表示最大池化,W 0和W 1表示权重,σ表示激活函数。 Among them, F represents the input feature map, MLP represents multi-layer perceptron, AvgPool represents average pooling, MaxPool represents maximum pooling, W0 and W1 represent weights, and σ represents the activation function.
接下来,将通道细化的特征输入平均池化层和最大池化层,以获得特征的相应解释
Figure PCTCN2022133719-appb-000006
Figure PCTCN2022133719-appb-000007
然后,将两个解释连接成一个2D映射,在该映射上执行一个正常卷积。输出为M s(F)∈R H*W,可以用以下等式表示:
Next, the channel-refined features are input into the average pooling layer and the maximum pooling layer to obtain the corresponding interpretation of the features.
Figure PCTCN2022133719-appb-000006
and
Figure PCTCN2022133719-appb-000007
Then, the two interpretations are concatenated into a 2D map, on which a normal convolution is performed. The output is Ms (F)∈R H*W , which can be expressed by the following equation:
Figure PCTCN2022133719-appb-000008
Figure PCTCN2022133719-appb-000008
其中,AvgPool表示平均池化,MaxPool表示最大池化。Among them, AvgPool represents average pooling and MaxPool represents maximum pooling.
如图4所示,双注意力模块的输入特征图为F∈R C*H*W,其中H是特征图的高度,W是特征图的宽度,C是通道数。该输入会经过通道注意力子模块和空间注意力子模块,期间,M c∈R C*1*1和M s∈R 1*H*W会扩展以适应F的尺寸,并执行哈达玛积。该过程表示如下: As shown in Figure 4, the input feature map of the dual attention module is F∈RC *H*W , where H is the height of the feature map, W is the width of the feature map, and C is the number of channels. The input will pass through the channel attention submodule and the spatial attention submodule, during which Mc∈RC *1*1 and Ms∈R1 *H*W will be expanded to adapt to the size of F and perform the Hadamard product. The process is expressed as follows:
Figure PCTCN2022133719-appb-000009
Figure PCTCN2022133719-appb-000009
其中,F″表示最终获得的基于注意力的特征图,F′表示通道注意力子模块的输出。Among them, F″ represents the final attention-based feature map, and F′ represents the output of the channel attention submodule.
3)循环神经网络3) Recurrent Neural Network
RNN旨在学习长期依赖关系,可以在将某个神经元某一时刻的输出重新输入同一神经元或另一个神经元的过程中处理顺序数据。这种串行网络结构适用于图像切片等数据序列,因为其可以保存数据序列中的依赖信息。RNN由重复结构和共享参数组成,可显著减少训练所需的神经网络参数数量。另一方面,共享参数结构还允许模型处理随机长度的输入序列。因此,RNN特别适用于从图像序列中提取时间信息。然而,大量实践也表明,标准RNN往往难以实现存储记忆的长期保存。此外,标准RNN也可能出现梯度***和消失。RNN is designed to learn long-term dependencies and can process sequential data by re-inputting the output of a neuron at a certain moment into the same neuron or another neuron. This serial network structure is suitable for data sequences such as image slices because it can preserve dependency information in the data sequence. RNN consists of a repetitive structure and shared parameters, which can significantly reduce the number of neural network parameters required for training. On the other hand, the shared parameter structure also allows the model to process input sequences of random lengths. Therefore, RNN is particularly suitable for extracting temporal information from image sequences. However, a large amount of practice has also shown that standard RNNs often have difficulty in achieving long-term preservation of storage memory. In addition, standard RNNs may also experience gradient explosion and disappearance.
优选地,本发明采用长短期记忆网络(LSTM)构建特征聚合模块,以便以更有效且可靠的方式存储和更新短期记忆,并通过精心设计的结构解决了梯度消失的问题,使处理长输入成为可能。Preferably, the present invention adopts a long short-term memory network (LSTM) to construct a feature aggregation module so as to store and update short-term memory in a more efficient and reliable manner, and solves the problem of gradient disappearance through a carefully designed structure, making it possible to process long inputs.
LSTM包含三个门,即输入门、遗忘门和输出门。输入门控制允许多少输入通过记忆。遗忘门决定是否保留记忆中的数据。输出门决定允许输出多少记忆。每个门均由外部信号控制,这意味着LSTM的输入是传统RNN的四倍。LSTM contains three gates, namely input gate, forget gate and output gate. The input gate controls how much input is allowed to pass through the memory. The forget gate determines whether to retain the data in the memory. The output gate determines how much memory is allowed to be output. Each gate is controlled by an external signal, which means that the input of LSTM is four times that of traditional RNN.
每个控制信号z i,z f,z o先通过激活函数f(·),然后乘以主信号。输入信号z t输入神经元后,必须先通过激活函数g(·),然后乘以输入门f(z i)。然后,该信号通过记忆单元,记忆单元中的c t-1经过遗忘门f(z f)处理后与新输入混合存储。新记忆表示如下: Each control signal z i , z f , z o first passes through the activation function f(·) and then multiplied by the main signal. After the input signal z t enters the neuron, it must first pass through the activation function g(·) and then multiplied by the input gate f(z i ). Then, the signal passes through the memory unit, and c t-1 in the memory unit is processed by the forget gate f(z f ) and mixed with the new input for storage. The new memory is represented as follows:
c t=c t-1×f(z f)+g(z t)f(z i)  (12) c t = c t-1 × f(z f )+g(z t )f( zi ) (12)
最后,混合数据通过激活函数h(·),然后乘以输出门f(z o)。 Finally, the mixed data passes through the activation function h(·) and then multiplied by the output gate f(z o ).
y=h(c t)×f(z o)  (13) y=h(c t )×f( zo ) (13)
控制信号的激活函数通常是Sigmoid函数,取值范围在0到1之间,旨在模拟开关。g(·)和h(·)通常是Tanh函数,旨在表示从0到1的数据。The activation function of the control signal is usually a Sigmoid function, which ranges from 0 to 1 and is designed to simulate a switch. g(·) and h(·) are usually Tanh functions, which are designed to represent data from 0 to 1.
在实践中,一个网络包含多个并排排列的LSTM神经元,形成一个记忆阵列。输入x t代表序列中的第t个向量,乘以相应的变换矩阵W f,W i,W I,W o,产生输入信号和控制信号
Figure PCTCN2022133719-appb-000010
神经元的数量等于序列中向量的数量,在本发明中,等于3D CT扫描的切片数。同时引入窥探孔连接。上一个神经元的记忆c t-1和输出h t-1参与使用变换矩阵p f,p i,p I,p o和R f,R i,R I,R o的信号生成。
In practice, a network contains multiple LSTM neurons arranged side by side to form a memory array. The input xt represents the tth vector in the sequence, which is multiplied by the corresponding transformation matrix Wf , Wi , Wi , W0 to generate the input signal and control signal
Figure PCTCN2022133719-appb-000010
The number of neurons is equal to the number of vectors in the sequence, which in the present invention is equal to the number of slices of the 3D CT scan. Peephole connections are also introduced. The memory ct -1 and output ht-1 of the previous neuron participate in signal generation using the transformation matrices pf , pi , pI , pO and Rf , Ri , RI , RO .
对于3D网络,深度方向的数据两侧均透明,这意味着3D网络可以在t时捕获t+1切片的信息。然而,单向LSTM仅可记忆该时间之前的信息,而忽略该时间之后的数据。双向长短期记忆网络(Bi-LSTM)网络通过引入第二层网络结构扩展单向LSTM网络,第二层结构的记忆以相反的顺序流动。因此,Bi-LSTM可以利用时间点前后的信息,模拟3D网络中的条件,以将双向信息考虑在内。For 3D networks, data in the depth direction is transparent on both sides, which means that the 3D network can capture the information of the t+1 slice at time t. However, the unidirectional LSTM can only remember the information before that time and ignore the data after that time. The bidirectional long short-term memory network (Bi-LSTM) network extends the unidirectional LSTM network by introducing a second layer of network structure, and the memory of the second layer flows in the opposite order. Therefore, Bi-LSTM can use the information before and after the time point to simulate the conditions in the 3D network to take the bidirectional information into account.
在一个实施例中,特征聚合模块如图5所示,该Bi-LSTM网络由两个子网络组成。第t个输出向量表示为:In one embodiment, the feature aggregation module is shown in FIG5 . The Bi-LSTM network is composed of two sub-networks. The t-th output vector is represented as:
Figure PCTCN2022133719-appb-000011
Figure PCTCN2022133719-appb-000011
其中,h t和h′ t是LSTM的第t个输出,
Figure PCTCN2022133719-appb-000012
表示连接运算。
Among them, h t and h′ t are the t-th outputs of LSTM,
Figure PCTCN2022133719-appb-000012
Represents a concatenation operation.
预测结果不仅与融合特征高度相关,还与切片的位置相关。因为切片会对诊断造成不同影响。中间切片包含胰腺的最大图像。为导入切片序列的这一特性,引入了自注意力机制,以自动寻找应给予更多权重的重要特征向量。The prediction results are highly correlated not only with the fused features, but also with the position of the slices. This is because slices have different effects on the diagnosis. The middle slice contains the largest image of the pancreas. To import this characteristic of the slice sequence, a self-attention mechanism is introduced to automatically find important feature vectors that should be given more weight.
自注意力机制的描述如下:Bi-LSTM的输出是连接的向量数组V={v 1,...,v l},将该数组输入多层感知器,获得解释U={u 1,...,u l}。 The self-attention mechanism is described as follows: the output of Bi-LSTM is a connected vector array V = {v 1 ,...,v l }, which is input into a multilayer perceptron to obtain an explanation U = {u 1 ,...,u l }.
u t=tan h(W wh t+b w)  (15) u t =tan h(W w h t +b w ) (15)
然后,指示序列范围内第t切片重要性的
Figure PCTCN2022133719-appb-000013
可表示为:
Then, the importance of the tth slice in the sequence range is indicated by
Figure PCTCN2022133719-appb-000013
It can be expressed as:
Figure PCTCN2022133719-appb-000014
Figure PCTCN2022133719-appb-000014
其中,u w是需要在训练过程中学习的上下文向量,T表示转置。最后,新的加权S={s 1,...,s l}可用以下等式表示。 Among them, u w is the context vector that needs to be learned during training, and T represents the transpose. Finally, the new weight S = {s 1 ,...,s l } can be expressed by the following equation.
Figure PCTCN2022133719-appb-000015
Figure PCTCN2022133719-appb-000015
4)分类层4) Classification layer
在深度学习模型的网络末端,使用Softmax层作为分类层。认为数组S包含胰腺特征,可转化和融合在一起。预测是一个二分类问题,因此最终结果
Figure PCTCN2022133719-appb-000016
是一个二维向量,可用以下等式表示。
At the end of the network of the deep learning model, a Softmax layer is used as the classification layer. The array S is considered to contain pancreatic features that can be transformed and fused together. The prediction is a binary classification problem, so the final result
Figure PCTCN2022133719-appb-000016
is a two-dimensional vector and can be expressed by the following equation.
Figure PCTCN2022133719-appb-000017
Figure PCTCN2022133719-appb-000017
式中,W s为变换矩阵,b s为偏差。正样本的向量指定为(1,0),负样本的向量指定为(0,1)。 Where Ws is the transformation matrix and bs is the bias. The vector of the positive sample is specified as (1,0) and the vector of the negative sample is specified as (0,1).
步骤S120,构建数据集并训练深度学习模型,该数据集反映样本图像序列与胰腺分类标签之间的对应关系。Step S120, constructing a data set and training a deep learning model, wherein the data set reflects the correspondence between the sample image sequence and the pancreas classification label.
数据集可通过扫描多名受试者获得,对于每名受试者的样本图像由专业医师标注分类标签,即数据集反映图像切片序列与胰腺分类标签之间的对应关系。经训练后,可获得深度学习模型的优化参数,如权重和偏置等。The dataset can be obtained by scanning multiple subjects, and the sample images of each subject are annotated with classification labels by professional physicians, that is, the dataset reflects the correspondence between the image slice sequence and the pancreatic classification label. After training, the optimized parameters of the deep learning model, such as weights and biases, can be obtained.
步骤S130,利用经训练的深度学习模型对目标胰腺图像进行分类。Step S130: classify the target pancreatic image using the trained deep learning model.
获得模型的优化参数后,即可应用实际的胰腺图像分类,包括:获取目标区域的图像序列;将所述图像序列输入到深度学习模型,获得胰腺分类结果。After obtaining the optimized parameters of the model, actual pancreatic image classification can be applied, including: obtaining an image sequence of the target area; inputting the image sequence into the deep learning model to obtain a pancreatic classification result.
为进一步验证本发明的效果,对获取的数据集进行消融实验,以通过一些给定的评价公式评价所提出深度学习模型的有效性。此外,将其他方法与本发明在准确度和复杂性方面进行比较。然后,使用不同的数据量和序列长度评价所提出基于混合注意力网络的性能。To further verify the effect of the present invention, an ablation experiment is performed on the acquired data set to evaluate the effectiveness of the proposed deep learning model through some given evaluation formulas. In addition, other methods are compared with the present invention in terms of accuracy and complexity. Then, the performance of the proposed hybrid attention network is evaluated using different data amounts and sequence lengths.
网络应用于PyTorch上。实验在NVIDIA A40和64GB RAM的硬件上运行。训练过程中参数保持不变。优化器为Adam,初始学习率为0.0001,每10个Epoch减半,Epoch数设为100。为将输入标准化,在每个激活函数之前执行批规范化。The network is implemented in PyTorch. The experiments are run on NVIDIA A40 hardware with 64GB RAM. The parameters remain unchanged during training. The optimizer is Adam, the initial learning rate is 0.0001, halved every 10 epochs, and the number of epochs is set to 100. To standardize the input, batch normalization is performed before each activation function.
1)数据集1) Dataset
在预训练步骤中引入了一个大型腹部CT数据集(AbdomenCT-1K)。 AbdomenCT-1K为各种器官分割任务提供了不同的注释,包含来自12个医疗中心的1000多个扫描序列。A large abdominal CT dataset (AbdomenCT-1K) is introduced in the pre-training step. AbdomenCT-1K provides different annotations for various organ segmentation tasks and contains more than 1,000 scan sequences from 12 medical centers.
实验中的数据集来自湘雅医院(Xiangya Hospital),包含153例受试者。使用Siemens Prisma,1.5 Tesla,model-syngo MRE11扫描仪进行采集。回声时间设置为1.33,重复时间为321.79。原始图像的尺寸为1024×1024。诊断由专业医师作出。已获得参与实验的所有受试者的书面同意。图6显示了两类样本的部分示例,其中左列是正常样本,右列对应急性胰腺炎样本。The dataset in the experiment comes from Xiangya Hospital and contains 153 subjects. It was acquired using a Siemens Prisma, 1.5 Tesla, model-syngo MRE11 scanner. The echo time was set to 1.33 and the repetition time was 321.79. The size of the original image was 1024×1024. The diagnosis was made by a professional physician. Written consent was obtained from all subjects participating in the experiment. Figure 6 shows some examples of the two types of samples, where the left column is normal samples and the right column corresponds to acute pancreatitis samples.
2)预处理2) Preprocessing
为了减少计算量,在训练前使用Pillow库将图像压缩到224×244的大小。为了提高微调模块的鲁棒性和通用性,通过进行随机增强函数变换,如缩放、旋转和伽玛改变,将增强方法应用于训练数据。To reduce the amount of computation, the Pillow library is used to compress the images to a size of 224 × 244 before training. To improve the robustness and generality of the fine-tuning module, an augmentation method is applied to the training data by performing random augmentation function transformations such as scaling, rotation, and gamma changes.
4)双注意力模块4) Dual Attention Module
实验评价了双注意力模块的重要性。应用注意力机制的目的是关注通道层面和像素层面的相关区域并抑制不相关特征。首先,使用Matplotlib库中的热图对模块的工作进行可视化。如图4所示,网络中权重较大的令牌用彩色标注。实验结果表明,本发明提出的注意力机制可使网络聚焦于胰腺所在的区域。此外,在不同的图像切片中,聚焦区域大小不一,与胰腺组织的大小相对应。这表明该双注意力模块可以通过学习器官的高级特征来突显相关区域。因此,即使胰腺组织在边缘层中变得非常小,该模块也可以对其进行定位。然而,当胰腺消失时,注意力模块仍会尽量突出显示器官所在位置附近的某些区域。这是因为CT扫描的特点使器官和组织呈现出相同的颜色,从而影响模块的功能。The importance of the dual attention module is experimentally evaluated. The purpose of applying the attention mechanism is to focus on relevant areas at the channel level and pixel level and suppress irrelevant features. First, the work of the module is visualized using heat maps from the Matplotlib library. As shown in Figure 4, the tokens with larger weights in the network are marked with colors. The experimental results show that the attention mechanism proposed in the present invention can enable the network to focus on the area where the pancreas is located. In addition, in different image slices, the size of the focused area varies, corresponding to the size of the pancreatic tissue. This shows that the dual attention module can highlight the relevant area by learning the high-level features of the organ. Therefore, even if the pancreatic tissue becomes very small in the marginal layer, the module can locate it. However, when the pancreas disappears, the attention module still tries to highlight certain areas near the location of the organ. This is because the characteristics of CT scans make organs and tissues appear in the same color, which affects the function of the module.
由于双注意力模块的准确度得到了证明,进一步在分类准确度方面开展实验。在具有ResNet 50主干的正常量的数据集上对两种方法(即本发明提出的双注意力方法和无CBAM(基于卷积块的注意力机制)的方法)进行了比较。两种网络的比较见表2。在其他参数保持不变的情况下,双注意力模块可使查准率提高6.63%,并使灵敏度提高8.78%,而参数数量的增加很小。Since the accuracy of the dual attention module has been demonstrated, further experiments are conducted on classification accuracy. The two methods, the dual attention method proposed in this paper and the method without CBAM (convolutional block-based attention mechanism), are compared on a normal dataset with a ResNet 50 backbone. The comparison of the two networks is shown in Table 2. With other parameters remaining unchanged, the dual attention module can improve the precision by 6.63% and the sensitivity by 8.78%, with a small increase in the number of parameters.
表2:CBAM贡献的定量表示Table 2: Quantitative representation of CBAM contribution
Figure PCTCN2022133719-appb-000018
Figure PCTCN2022133719-appb-000018
在表2中,“真”表示嵌入CBAM,“假”表示不采用CBAM。In Table 2, “true” means that CBAM is embedded, and “false” means that CBAM is not adopted.
5)循环神经网络5) Recurrent Neural Networks
下文将介绍基于注意力的方法,这些方法在验证中达到了极佳的性能,参见图7所示的不同切片的注意力图,其中左列是原始图像,右列是注意力特征图。循环神经网络的结构简述如下。The following will introduce attention-based methods, which have achieved excellent performance in verification. See the attention maps of different slices shown in Figure 7, where the left column is the original image and the right column is the attention feature map. The structure of the recurrent neural network is briefly described as follows.
ADGNET:该网络包括带有注意力模块、分类器和解码器的ResNet。分类器和解码器同时工作,以执行分类和重建。该模型旨在用于诊断阿尔茨海默病。ADGNET: This network consists of a ResNet with an attention module, a classifier, and a decoder. The classifier and decoder work simultaneously to perform classification and reconstruction. This model is intended for use in diagnosing Alzheimer's disease.
3DResAttNet:该网络是具有高可解释性的3D残差自注意力CNN。采用的自注意力模块可作为映射函数发挥作用,包含键、值和查询。每个卷积块提取的特征通过1×1×1卷积转换为向量。这些值由训练过程中的查询函数聚焦。3DResAttNet: This network is a 3D residual self-attention CNN with high interpretability. The adopted self-attention module works as a mapping function, which contains keys, values, and queries. The features extracted by each convolutional block are converted into vectors through 1×1×1 convolutions. These values are focused by the query function during training.
AG-CNN:该网络包括三个子网。首先,注意力预测子网生成对青光眼诊断的注意力。其次,病理区域定位子网结合生成的注意力图,以形成掩蔽特征图。最后,分类子网接收掩蔽特征图并输出预测结果。AG-CNN: This network consists of three subnetworks. First, the attention prediction subnetwork generates attention for glaucoma diagnosis. Second, the pathological region localization subnetwork combines the generated attention map to form a masked feature map. Finally, the classification subnetwork receives the masked feature map and outputs the prediction result.
根据引入的评价标准,基于多次测量的每种分类方法的分类性能见表3。由表3可以看出,本发明提出的网络查准率和灵敏度最高。与AG-CNN相比,本发明的查准率提高了12.19%,灵敏度提高了8.94%。由于其他方法的特征提取子网具有相似的结构,因此本发明的网络可实现高性能的原因在于应用了可从切片序列中提取深度特征的LSTM。相比之下,其他2D网络仅依赖于单个图像,其分类准确度高度依赖于该输入。3D网络也可以处理深度特征,但其包含更多权重,这会增加计算量。According to the introduced evaluation criteria, the classification performance of each classification method based on multiple measurements is shown in Table 3. It can be seen from Table 3 that the network proposed in the present invention has the highest precision and sensitivity. Compared with AG-CNN, the precision of the present invention is improved by 12.19% and the sensitivity is improved by 8.94%. Since the feature extraction subnets of other methods have similar structures, the reason why the network of the present invention can achieve high performance is that LSTM, which can extract deep features from slice sequences, is applied. In contrast, other 2D networks rely only on a single image, and their classification accuracy is highly dependent on this input. 3D networks can also process deep features, but they contain more weights, which increases the amount of calculation.
表3:四个模型的性能比较Table 3: Performance comparison of four models
Figure PCTCN2022133719-appb-000019
Figure PCTCN2022133719-appb-000019
综上,本发明提出的Att-CNN-RNN分类性能更优,具有高灵敏度和精确性,同时在参数大小方面仍保持竞争优势。In summary, the Att-CNN-RNN proposed in the present invention has better classification performance, high sensitivity and accuracy, while still maintaining a competitive advantage in terms of parameter size.
综上所述,本发明提出了一种用于对胰腺CT扫描进行分类的混合注意力网络,该网络整体上包括特征提取模块和特征聚合模块,其中特征提取模块以残差网络为主干,兼顾通道注意力和空间注意力。这种主干网络可以解决梯度消失和***,这有助于加深网络。此外,为提取连续切片的深度信息,在特征聚合模块中采用了改进的循环神经网络(RNN),以聚合来自于特征提取模块的特征向量。进一步地,在获取的数据集上对该网络进行了验证。实验表明,混合注意力机制可提高模型捕捉显著特征的能力,并且RNN使模型优于丢失深度信息的2D网络和需要过多计算的3D网络,结果证明了所提出的网络结构具有合理性和优势。In summary, the present invention proposes a hybrid attention network for classifying pancreatic CT scans, which as a whole includes a feature extraction module and a feature aggregation module, wherein the feature extraction module is based on a residual network as the backbone, taking into account both channel attention and spatial attention. This backbone network can solve the problems of gradient vanishing and explosion, which helps to deepen the network. In addition, in order to extract the depth information of continuous slices, an improved recurrent neural network (RNN) is used in the feature aggregation module to aggregate the feature vectors from the feature extraction module. Further, the network is verified on the acquired data set. Experiments show that the hybrid attention mechanism can improve the model's ability to capture significant features, and RNN makes the model superior to 2D networks that lose depth information and 3D networks that require too much calculation. The results prove that the proposed network structure is reasonable and advantageous.
本发明可以是***、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。The present invention may be a system, a method and/or a computer program product. The computer program product may include a computer-readable storage medium carrying computer-readable program instructions for causing a processor to implement various aspects of the present invention.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。Computer readable storage medium can be a tangible device that can hold and store instructions used by an instruction execution device. Computer readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination thereof. More specific examples (non-exhaustive list) of computer readable storage medium include: a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a static random access memory (SRAM), a portable compact disk read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanical encoding device, for example, a punch card or a convex structure in a groove on which instructions are stored, and any suitable combination thereof. The computer readable storage medium used here is not interpreted as a transient signal itself, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagated by a waveguide or other transmission medium (for example, a light pulse by an optical fiber cable), or an electrical signal transmitted by a wire.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to each computing/processing device, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network can include copper transmission cables, optical fiber transmissions, wireless transmissions, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device.
用于执行本发明操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++、Python等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本发明的各个方面。The computer program instructions for performing the operation of the present invention may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-dependent instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including object-oriented programming languages, such as Smalltalk, C++, Python, etc., and conventional procedural programming languages, such as "C" language or similar programming languages. Computer-readable program instructions may be executed entirely on a user's computer, partially on a user's computer, as an independent software package, partially on a user's computer, partially on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer via any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., using an Internet service provider to connect via the Internet). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), may be personalized by utilizing the state information of the computer-readable program instructions, and the electronic circuit may execute the computer-readable program instructions, thereby realizing various aspects of the present invention.
这里参照根据本发明实施例的方法、装置(***)和计算机程序产品的流程图和/或框图描述了本发明的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Various aspects of the present invention are described herein with reference to the flow charts and/or block diagrams of the methods, devices (systems) and computer program products according to embodiments of the present invention. It should be understood that each box of the flow chart and/or block diagram and the combination of each box in the flow chart and/or block diagram can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计 算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine, so that when these instructions are executed by the processor of the computer or other programmable data processing device, a device that implements the functions/actions specified in one or more boxes in the flowchart and/or block diagram is generated. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other equipment to work in a specific manner, so that the computer-readable medium storing the instructions includes a manufactured product, which includes instructions for implementing various aspects of the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。Computer-readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device so that a series of operating steps are performed on the computer, other programmable data processing apparatus, or other device to produce a computer-implemented process, thereby causing the instructions executed on the computer, other programmable data processing apparatus, or other device to implement the functions/actions specified in one or more boxes in the flowchart and/or block diagram.
附图中的流程图和框图显示了根据本发明的多个实施例的***、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowcharts and block diagrams in the accompanying drawings show the possible architecture, functions and operations of the systems, methods and computer program products according to multiple embodiments of the present invention. In this regard, each box in the flowchart or block diagram can represent a module, a program segment or a part of an instruction, and the module, a program segment or a part of an instruction contains one or more executable instructions for realizing the specified logical function. In some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two consecutive boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram and/or the flowchart, and the combination of the boxes in the block diagram and/or the flowchart can be implemented by a dedicated hardware-based system that performs the specified function or action, or can be implemented by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that it is equivalent to implement it by hardware, implement it by software, and implement it by combining software and hardware.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本发明的范围由所附权利要求来限定。Embodiments of the present invention have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The selection of terms used herein is intended to best explain the principles of the embodiments, practical applications, or technical improvements in the marketplace, or to enable other persons of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the present invention is defined by the appended claims.

Claims (10)

  1. 一种基于混合注意力网络对胰腺图像进行分类的方法,包括以下步骤:A method for classifying pancreatic images based on a hybrid attention network comprises the following steps:
    获取目标区域的图像序列;Acquire an image sequence of the target area;
    将所述图像序列输入到深度学习模型,获得胰腺分类结果;Inputting the image sequence into a deep learning model to obtain a pancreas classification result;
    其中,所述深度学习模型包括特征提取模块、特征聚合模块和分类层,所述特征提取模块用于针对所述图像序列提取不同深度的特征图,以输出特征向量,所述不同深度的特征图是利用双注意力模块基于空间注意力机制和通道注意力机制自适应调节的特征图;所述特征聚合模块用于捕获所述特征向量的时间信息并采用自注意力机制进行加权,获得聚合的特征向量,所述分类层用于对所述聚合的特征向量进行分类。Among them, the deep learning model includes a feature extraction module, a feature aggregation module and a classification layer. The feature extraction module is used to extract feature maps of different depths for the image sequence to output feature vectors. The feature maps of different depths are feature maps adaptively adjusted based on the spatial attention mechanism and the channel attention mechanism using a dual attention module; the feature aggregation module is used to capture the temporal information of the feature vector and use a self-attention mechanism for weighting to obtain an aggregated feature vector, and the classification layer is used to classify the aggregated feature vector.
  2. 根据权利要求1所述的方法,其特征在于,所述特征提取模块的主干网络基于预训练的残差网络构建,包含多个残差块,每经过一个残差块,输入特征图的尺寸减小一半,通道数增加至两倍。The method according to claim 1 is characterized in that the backbone network of the feature extraction module is constructed based on a pre-trained residual network, comprising a plurality of residual blocks, and each time a residual block is passed, the size of the input feature map is reduced by half and the number of channels is increased to twice.
  3. 根据权利要求2所述的方法,其特征在于,所述主干网络的预训练过程采用基于孪生网络结构进行无监督训练。The method according to claim 2 is characterized in that the pre-training process of the backbone network adopts unsupervised training based on the twin network structure.
  4. 根据权利要求1所述的方法,其特征在于,所述双注意力模块依次包含通道注意力子模块和空间注意力子模块,所示通道注意力子模块包含第一平均池化层、第一最大池化层和共享层,所述空间注意力子模块包含第二平均池化层和第二最大池化层。The method according to claim 1 is characterized in that the dual attention module comprises a channel attention submodule and a spatial attention submodule in sequence, the channel attention submodule comprises a first average pooling layer, a first maximum pooling layer and a shared layer, and the spatial attention submodule comprises a second average pooling layer and a second maximum pooling layer.
  5. 根据权利要求1所述的方法,其特征在于,所述特征聚合模块采用双向长短期记忆网络构建,第t个输出向量表示为:The method according to claim 1 is characterized in that the feature aggregation module is constructed using a bidirectional long short-term memory network, and the t-th output vector is expressed as:
    Figure PCTCN2022133719-appb-100001
    Figure PCTCN2022133719-appb-100001
    其中,h t和h′ t是长短期记忆网络的第t个输出,
    Figure PCTCN2022133719-appb-100002
    表示连接运算。
    Among them, h t and h′ t are the t-th outputs of the LSTM network,
    Figure PCTCN2022133719-appb-100002
    Represents a concatenation operation.
  6. 根据权利要求5所述的方法,其特征在于,所述自注意力机制的描述如下:The method according to claim 5, characterized in that the self-attention mechanism is described as follows:
    双向长短期网络输出是连接的向量数组V={v 1,…,v l},将该数组输入多层感知器,获得解释U={u 1,…,u l}; The output of the bidirectional long-term short-term network is a connected vector array V = {v 1 ,…,v l }, which is input into the multilayer perceptron to obtain the explanation U = {u 1 ,…,u l };
    u t=tan h(W wh t+b w) u t =tan h(W w h t +b w )
    将图像序列范围内第t切片的重要性的
    Figure PCTCN2022133719-appb-100003
    表示为:
    The importance of the tth slice in the image sequence
    Figure PCTCN2022133719-appb-100003
    Expressed as:
    Figure PCTCN2022133719-appb-100004
    Figure PCTCN2022133719-appb-100004
    其中,u w是需要在训练过程中学习的上下文向量; Among them, u w is the context vector that needs to be learned during training;
    最后,新的加权S={s 1,…,s l}采用下式表示: Finally, the new weight S = {s 1 ,…,s l } is expressed as follows:
    Figure PCTCN2022133719-appb-100005
    Figure PCTCN2022133719-appb-100005
    其中,W w表示权重,b w表示偏置。 Among them, W w represents the weight and b w represents the bias.
  7. 根据权利要求1所述的方法,其特征在于,所述分类层是Softmax层,预测最终结果
    Figure PCTCN2022133719-appb-100006
    表示为:
    The method according to claim 1, characterized in that the classification layer is a Softmax layer, predicting the final result
    Figure PCTCN2022133719-appb-100006
    Expressed as:
    Figure PCTCN2022133719-appb-100007
    Figure PCTCN2022133719-appb-100007
    其中,W s为变换矩阵,b s为偏差,s是所述特征聚合模块的输出。 Among them, Ws is the transformation matrix, bs is the deviation, and s is the output of the feature aggregation module.
  8. 根据权利要求1所述的方法,其特征在于,所述目标区域的图像序列是采用CT扫描获得的图像切片序列。The method according to claim 1 is characterized in that the image sequence of the target area is an image slice sequence obtained by CT scanning.
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 8.
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8中任一项所述的方法的步骤。A computer device comprises a memory and a processor, wherein a computer program that can be run on the processor is stored in the memory, and wherein the processor implements the steps of any one of the methods of claims 1 to 8 when executing the computer program.
PCT/CN2022/133719 2022-11-23 2022-11-23 Method for classifying pancreatic images based on hybrid attention network WO2024108425A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/133719 WO2024108425A1 (en) 2022-11-23 2022-11-23 Method for classifying pancreatic images based on hybrid attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/133719 WO2024108425A1 (en) 2022-11-23 2022-11-23 Method for classifying pancreatic images based on hybrid attention network

Publications (1)

Publication Number Publication Date
WO2024108425A1 true WO2024108425A1 (en) 2024-05-30

Family

ID=91194843

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/133719 WO2024108425A1 (en) 2022-11-23 2022-11-23 Method for classifying pancreatic images based on hybrid attention network

Country Status (1)

Country Link
WO (1) WO2024108425A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism
CN113538435A (en) * 2021-09-17 2021-10-22 北京航空航天大学 Pancreatic cancer pathological image classification method and system based on deep learning
CN114359164A (en) * 2021-12-10 2022-04-15 中国科学院深圳先进技术研究院 Method and system for automatically predicting Alzheimer disease based on deep learning
CN114565557A (en) * 2022-01-14 2022-05-31 山东师范大学 Contrast enhancement energy spectrum photography classification method and device based on coordinate attention
US20220180506A1 (en) * 2020-12-03 2022-06-09 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and storage medium for pancreatic mass segmentation, diagnosis, and quantitative patient management
CN115204463A (en) * 2022-06-07 2022-10-18 南京理工大学 Residual service life uncertainty prediction method based on multi-attention machine mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259982A (en) * 2020-02-13 2020-06-09 苏州大学 Premature infant retina image classification method and device based on attention mechanism
US20220180506A1 (en) * 2020-12-03 2022-06-09 Ping An Technology (Shenzhen) Co., Ltd. Method, device, and storage medium for pancreatic mass segmentation, diagnosis, and quantitative patient management
CN113538435A (en) * 2021-09-17 2021-10-22 北京航空航天大学 Pancreatic cancer pathological image classification method and system based on deep learning
CN114359164A (en) * 2021-12-10 2022-04-15 中国科学院深圳先进技术研究院 Method and system for automatically predicting Alzheimer disease based on deep learning
CN114565557A (en) * 2022-01-14 2022-05-31 山东师范大学 Contrast enhancement energy spectrum photography classification method and device based on coordinate attention
CN115204463A (en) * 2022-06-07 2022-10-18 南京理工大学 Residual service life uncertainty prediction method based on multi-attention machine mechanism

Similar Documents

Publication Publication Date Title
Chen et al. Detection of rice plant diseases based on deep transfer learning
Jin et al. Cascade knowledge diffusion network for skin lesion diagnosis and segmentation
Alsaade et al. Developing a recognition system for diagnosing melanoma skin lesions using artificial intelligence algorithms
Pang et al. Tumor attention networks: Better feature selection, better tumor segmentation
Abed et al. A modern deep learning framework in robot vision for automated bean leaves diseases detection
Kotia et al. Few shot learning for medical imaging
Iqbal et al. UNet: A semi-supervised method for segmentation of breast tumor images using a U-shaped pyramid-dilated network
Chatterjee et al. A survey on techniques used in medical imaging processing
Lahoti et al. Whole Tumor Segmentation from Brain MR images using Multi-view 2D Convolutional Neural Network
Rasool et al. Unveiling the complexity of medical imaging through deep learning approaches
Sangeetha Francelin Vinnarasi et al. Deep learning supported disease detection with multi-modality image fusion
Prasad et al. Lung cancer detection and classification using deep neural network based on hybrid metaheuristic algorithm
CN115760797A (en) Method for classifying pancreatic images based on mixed attention network
WO2024108425A1 (en) Method for classifying pancreatic images based on hybrid attention network
Feng et al. Trusted multi-scale classification framework for whole slide image
Zhang Medical image classification under class imbalance
Baskaran et al. MSRFNet for skin lesion segmentation and deep learning with hybrid optimization for skin cancer detection
Atiyah et al. Segmentation of human brain gliomas tumour images using u-net architecture with transfer learning
Majumder et al. MENet: A Mitscherlich function based ensemble of CNN models to classify lung cancer using CT scans
Chandra et al. A Novel Framework For Brain Disease Classification Using Quantum Convolutional Neural Network
Gandikota CT scan pancreatic cancer segmentation and classification using deep learning and the tunicate swarm algorithm
Shereena et al. Medical Ultrasound Image Segmentation Using U-Net Architecture
Mactina An towards efficient optimal recurrent neural network-based brian tumour classification using cat and rat swarm (CARS) optimisation
Tsai et al. Polyp classification based on deep neural network for colonoscopic images
Lalitha et al. Segmentation and Classification of 3D Lung Tumor Diagnoses Using Convolutional Neural Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22966123

Country of ref document: EP

Kind code of ref document: A1