CN115760797A

CN115760797A - Method for classifying pancreatic images based on mixed attention network

Info

Publication number: CN115760797A
Application number: CN202211470745.5A
Authority: CN
Inventors: 黄建龙; 贾富仓; 陈藏
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2023-03-07

Abstract

The invention discloses a method for classifying pancreatic images based on a hybrid attention network. The method comprises the following steps: acquiring an image sequence of a target area; inputting the image sequence into a deep learning model to obtain a pancreas classification result; the deep learning model comprises a feature extraction module, a feature aggregation module and a classification layer, wherein the feature extraction module is used for extracting feature maps of different depths aiming at the image sequence so as to output feature vectors, and the feature maps of the different depths are feature maps which are adaptively adjusted by utilizing a double-attention module based on a space attention mechanism and a channel attention mechanism; the feature aggregation module is used for capturing time information of the feature vectors and weighting the time information by adopting a self-attention mechanism to obtain aggregated feature vectors, and the classification layer is used for classifying the aggregated feature vectors. The invention can accurately classify the pancreatic images, thereby guiding the automatic diagnosis of pancreatitis.

Description

Method for classifying pancreatic images based on mixed attention network

Technical Field

The invention relates to the technical field of medical image processing, in particular to a method for classifying pancreatic images based on a mixed attention network.

Background

Acute Pancreatitis (AP) is a common clinical emergency, typically characterized by acute chemical inflammation, such as edema and inflammatory exudation of the pancreas and surrounding tissues. Currently, clinical diagnosis of acute pancreatitis includes clinical manifestations, laboratory examinations, and imaging examinations, including Computed Tomography (CT) and Magnetic Resonance Imaging (MRI), among others. Due to the fast scanning speed and the powerful post-processing technique of CT, the severity of AP is often classified using the CT severity index (CTSI), but this requires time-consuming and inefficient manual analysis by the physician and is less accurate. In recent years, machine learning has become a powerful tool for analyzing imaging. In particular, end-to-end detection of CT images by deep learning has been one of the research hotspots. However, there are still difficulties in accurate diagnosis of acute pancreatitis today, including high anatomical variability of the pancreas, diversity of acute pancreatitis lesions, and complexity of imaging performance. Furthermore, the manifestation of acute pancreatitis also varies from person to person, which increases the complexity of CT images. Therefore, how to effectively extract the target feature in the pancreas image becomes the biggest problem in CT diagnosis.

Convolutional Neural Networks (CNNs) are common tools for image classification, have excellent classification accuracy and durability, and have been applied to diagnosis of pancreatic diseases. For example, researchers have proposed a mask-to-grid segmentation method that first masks a 3D CT scan using nnU-Net. The pancreatic anatomy represented by the mesh model is then output from the mask to the mesh function. Finally, the generated mesh model is input into a graph residual error network to quantitatively classify the pancreatic ductal adenocarcinoma. This approach requires segmentation of the pancreas to ensure high performance, but the supervised learning of segmentation algorithms requires data annotation by the physician and is therefore very time consuming and dependent on the expertise of the physician.

The region of interest (ROI) generation algorithm can mark the boundary of a pancreatic problem on a CT image, and pancreatic features can be conveniently focused by the method. The ROI was originally used to locate the pancreas for manual measurement, but it may also be introduced into the automated diagnosis of pancreatic disease. For example, researchers have proposed a region-based CNN model that can create ROIs using feature maps. However, the ROI-based diagnosis has the following problems. First, the shape of the ROI is confined to a rectangle, unlike the real region, which results in the predicted region containing irrelevant background data, thereby adversely affecting the classification performance. Second, the accuracy of the model is highly dependent on the accuracy of the ROI generation algorithm, which is unreliable and often generates misplaced regions. Third, detection of the pancreas is a challenge due to its size and variability.

In conclusion, the existing pancreas identification scheme has the problems of time consumption, poor accuracy, poor individual adaptability and the like.

Disclosure of Invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide a method for classifying pancreatic images based on a hybrid attention network. The method comprises the following steps:

acquiring an image sequence of a target area;

inputting the image sequence into a deep learning model to obtain a pancreas classification result;

the deep learning model comprises a feature extraction module, a feature aggregation module and a classification layer, wherein the feature extraction module is used for extracting feature maps of different depths aiming at the image sequence so as to output feature vectors, and the feature maps of the different depths are feature maps which are adaptively adjusted by utilizing a double-attention module based on a space attention mechanism and a channel attention mechanism; the feature aggregation module is used for capturing time information of the feature vectors and weighting the time information by adopting a self-attention mechanism to obtain aggregated feature vectors, and the classification layer is used for classifying the aggregated feature vectors.

Compared with the prior art, the invention has the advantages that aiming at the anatomical variability of acute pancreatitis in computed tomography, the mixed attention network (Att-CNN-RNN) for classifying the pancreatic images is provided, the durable and accurate pancreatitis automatic diagnosis method is realized by applying the mixed attention network to the pancreatic images, and the defects of large time consumption and high dependence on physician capability caused by manual classification are overcome.

Other features and advantages of the present invention will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram of a method of classifying pancreatic images based on a hybrid attention network, according to one embodiment of the present invention;

FIG. 2 is a schematic diagram of the overall structure of a hybrid attention network according to one embodiment of the present invention;

FIG. 3 is a schematic diagram of a process for pre-training a backbone network based on a twin network according to one embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a dual attention module in accordance with one embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a feature aggregation module according to one embodiment of the invention;

FIG. 6 is a graph comparing normal and acute pancreatitis samples, according to one embodiment of the present invention;

fig. 7 is an attention diagram of different slices according to one embodiment of the invention.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be discussed further in subsequent figures.

The present invention takes into account the problem of pancreatic localization based on ROI, introducing a mechanism of attention to automatically focus on information without any shape constraints. The goal of the attention mechanism in image processing is to let the web society ignore irrelevant information and focus on critical information by labeling key features in the new weight layer. The attention methods are divided into two categories: soft attention and hard attention. Hard attention is more focused on points and is achieved through reinforcement learning, compared to soft attention which is more focused on areas or channels and can be trained in the network. In addition, the weights for soft attention can be trained by feed forward and back propagation. By embedding the attention mechanism in the original network, higher prediction accuracy can be achieved.

Hereinafter, the process of classifying the pancreatic image is described taking the pancreatic CT image as an example. The raw CT data is the three-dimensional scan result. To obtain more critical information, data of all dimensions can be utilized, but the 3D network contains huge weight and is computationally expensive. Another possible solution is to use as input a sequence of consecutive CT images, which consist of vertically oriented axial slices. However, if each feature is finally flattened and connected to each other, the depth information in each slice is lost. In addition, the pancreas occupies most of the slices in the middle of the image sequence, but takes up little space (e.g., does not disappear) in the slices at the beginning or end of the sequence. Therefore, the feature maps of different slices may have different degrees of influence on the final classification. In view of these limitations, the present invention proposes a hybrid attention network to classify pancreatic images.

Specifically, referring to fig. 1, the method for classifying pancreatic images based on a hybrid attention network is provided, which includes the following steps.

Step S110, a deep learning model is constructed, wherein the deep learning model comprises a feature extraction module, a feature aggregation module and a classification layer, the feature extraction module adopts a double-attention mechanism, and the feature aggregation module adopts a self-attention mechanism.

As shown in fig. 2, the deep learning model includes a feature extraction module, a feature aggregation module, and a classification layer.

The feature extraction module is used for extracting feature vectors from an input image sequence (or called an image slice sequence) and outputting the feature vectors of each slice. The backbone network of the feature extraction module can be constructed based on a variety of types of networks, such as residual error network ResNet, VGG networks, and the like. In addition, a double-attention mechanism is embedded in the feature extraction module, namely the extracted feature map is adaptively adjusted based on the space attention and the channel attention. Spatial attention may help to find the spatial location of the active features, and channel attention may magnify the active feature information in a particular channel.

The feature aggregation module is used for processing the feature vectors and inputting the processed features into a classification layer (such as a Softmax layer) to obtain a classification prediction result. The feature aggregation module may be constructed using a Recurrent Neural Network (RNN), such as a long short term memory network (LSTM) or a gated cyclic unit (GRU). Taking LSTM as an example, the feature vector passes LSTM and a full connectivity layer (FC) to obtain a vector representing the prediction result. Preferably, the feature aggregation module is a bidirectional LSTM with a self-attention mechanism or a network known as abi-LSTM.

Hereinafter, the description will be given by taking a residual error network ResNet as a backbone network of the feature extraction module and taking a bidirectional LSTM (Bi-LSTM) as an example for constructing the feature aggregation module.

1) ResNet backbone network

In a convolutional network with multiple volume blocks, a double attention mechanism is introduced for each block. The invention adopts a pretrained ResNet backbone network. ResNet uses quick connection to effectively solve the gradient loss and explosion problems of a deep network, obviously improves the network performance, and makes the network easier to train.

Specifically, first, an image is input into a first residual block, and a 96 × 96 × 128 feature map is obtained. Then, 3 convolution layers (e.g. convolution kernel 3 × 3, step size 2) are passed through to obtain a shallow feature map with size 48 × 48 × 256. Then, a second residual block (convolution kernel size 3 × 3, step size 2) is passed through to obtain a middle layer feature map with a size of 24 × 24 × 512. Then, the middle layer feature map is input into the last residual block, and a deep layer feature map with the size of 8 × 8 × 256 is obtained. The spatial size of the feature map is reduced by half for every pass of a residual block, and the number of channels is increased by two for every pass of a residual block.

In the pre-training step, 215325 abdominal CT is used to perform unsupervised training through twin network structure to improve the capability of the main network to represent the organ characteristics.

As shown in fig. 3, in pre-training, it is intended to improve the quality of the signal by introducing two multi-layered perceptrons: the projection layer g and the prediction layer h improve the capability of the backbone network f to classify images with different characteristics. First, two enhanced images are obtained from a base image I using a random enhancement function aug.

I ₁ ＝aug(I) (1)

I ₂ ＝aug(I) (2)

Then I ₁ And I ₂ Through the network in the order f, g and h. The projected and predicted outputs are recorded as p and q, respectively.

p ₁ ＝g(f(I ₁ )) (3)

p ₂ ＝g(f(I ₂ )) (4)

q ₁ ＝h(p ₁ ) (5)

q ₂ ＝h(p ₂ ) (6)

During the pre-training process, the symmetric pattern defines a loss function, which is expressed as:

wherein D represents the similarity of negative cosines and is represented by the following formula:

stopping the gradient function stopcrad so that q is ₁ And q is ₂ Treated as constants in the backward propagation, so the encoder is only based on p ₁ And p ₁ Is updated.

After the training process is finished, the backbone f of the twin network can be modified into a pre-training network. While the path projection layer is separated from the weight update to avoid unstable training.

2) Dual attention module

The dual attention module consists of two sequential sub-modules of channel attention and spatial attention. The input feature map is adaptively adjusted on each block of the feature extraction network.

Referring to FIG. 4, first, for the input feature map, two pieces of spatial feature information, labeled as "Measure layer" and "MaxMeasure layer", are obtained through the average pooling layer and the maximum pooling layer, respectively

And

then, the two spatial feature information are transferred to a shared network (or called a shared layer) with a hidden layer to generate a channel attention feature map M _c ∈R ^C*1*1 Where C is the number of channels and the activation size of the hidden layer is set to R ^C/r*1*1 And r is the reduction rate. Result feature vector M output by channel attention submodule _c Expressed as:

where F denotes the input profile, MLP denotes the multilayer perceptron, avgPool denotes the average pooling, maxPool denotes the maximum pooling, W denotes the maximum pooling ₀ And W ₁ Representing the weight and sigma the activation function.

Next, the channel-refined features are input into the average pooling layer and the maximum pooling layer to obtain corresponding interpretations of the features

And

the two interpretations are then concatenated into a 2D map on which a normal convolution is performed. The output is M _s (F)∈R ^H*W This can be expressed by the following equation:

where AvgPool indicates mean pooling and MaxPool indicates maximum pooling.

As shown in FIG. 4, the input profile of the dual attention module is F ∈ R ^C*H*W Where H is the height of the feature map, W is the width of the feature map, and C is the number of channels. The infusion is carriedThe conference passes through the channel attention submodule and the spatial attention submodule, during which M _c ∈R ^C*1*1 And M _s ∈R ^1*H*W It is expanded to fit the size of F and a hadamard product is performed. The process is represented as follows:

where F 'represents the final attention-based feature map obtained, and F' represents the output of the channel attention sub-module.

3) Recurrent neural network

RNN aims to learn long-term dependencies, and can process sequential data in re-inputting the output of a certain neuron at a certain time into the same neuron or another neuron. This serial network structure is suitable for data sequences such as image slices, because it can hold dependency information in the data sequences. RNNs consist of repetitive structures and shared parameters, which can significantly reduce the number of neural network parameters required for training. On the other hand, sharing the parameter structure also allows the model to process input sequences of random length. Therefore, RNNs are particularly suitable for extracting temporal information from image sequences. However, a number of practices also indicate that standard RNNs tend to have difficulty achieving long-term preservation of storage memory. In addition, the standard RNN may also appear to be gradient explosive and disappearing.

Preferably, the present invention constructs the feature aggregation module using a long short term memory network (LSTM) to store and update short term memory in a more efficient and reliable manner and to address the problem of gradient vanishing through a well-designed structure, making it possible to handle long inputs.

The LSTM contains three gates, an input gate, a forgetting gate, and an output gate. The input gate control allows many few inputs to pass through the memory. The forgetting gate determines whether to retain the data in memory. The output gate determines how much memory is allowed to be output. Each gate is controlled by an external signal, which means that the input of the LSTM is four times that of the conventional RNN.

Each control signal z ⁱ ,z ^f ,z ^o First through the activation function f (-) and then multiplied by the main signal. Input signal z ^t After the neuron is input, it must first pass through the activation function g (-) and then multiply by the input gate f (z) _i ). The signal then passes through the memory cell, c in the memory cell ^t-1 Passing through forgetting gate f (z) ^f ) And after processing, mixing the processed data with the new input and storing the new input. The new memory is represented as follows:

c ^t ＝c ^t-1 ×f(z ^f )+g(z ^t )f(z ⁱ ) (12)

finally, the blended data is passed through an activation function h (-) and then multiplied by an output gate f (z) ^o )。

y＝h(c ^t )×f(z ^o ) (13)

The activation function of the control signal is typically a Sigmoid function, with values ranging from 0 to 1, intended to simulate a switch. g (-) and h (-) are typically Tanh functions, intended to represent data from 0 to 1.

In practice, a network comprises a plurality of LSTM neurons arranged side by side, forming a memory array. Input x ^t Representing the t-th vector in the sequence, multiplied by a corresponding transformation matrix W _f ,W _i ,W _I ,W _o Generating an input signal and a control signal

z _i ,z _o And z. The number of neurons is equal to the number of vectors in the sequence, in the present invention, the number of slices of the 3D CT scan. While introducing a snoop hole connection. Memory of last neuron c ^t-1 And output h ^t-1 Participating in the use of a transformation matrix p _f ,p _i ,p _I ,p _o And R _f ,R _i ,R _I ,R _o Is generated.

For 3D networks, both sides of the data in the depth direction are transparent, which means that the 3D network can capture information of t +1 slice at t. However, one-way LSTM can only remember information before that time, ignoring data after that time. A bidirectional long-short term memory network (Bi-LSTM) network extends the unidirectional LSTM network by introducing a layer two network structure, the memory of which flows in reverse order. Therefore, bi-LSTM can simulate conditions in a 3D network using information before and after a point in time to take Bi-directional information into account.

In one embodiment, the feature aggregation module is shown in FIG. 5, and the Bi-LSTM network is comprised of two sub-networks. The t-th output vector is represented as:

wherein h is _t And h' _t Is the t-th output of the LSTM,

indicating a join operation.

The prediction result is not only highly correlated with the fusion feature, but also correlated with the position of the slice. Since the slices will have different effects on the diagnosis. The middle slice contains the largest image of the pancreas. To introduce this property of the slice sequence, a self-attention mechanism is introduced to automatically find important feature vectors that should be given more weight.

The self-attention mechanism is described as follows: the output of Bi-LSTM is a concatenated vector array V = { V = { ¹ ,…,v ^l Inputting the array into a multilayer perceptron to obtain an interpretation U = { U = } ¹ ,…,u ^l }。

u ^t ＝tan h(W _w h ^t +b _w ) (15)

Then, the importance of the t-th slice in the sequence range is indicated

Can be expressed as:

wherein u is _w Is the context vector that needs to be learned during training, T denotes transpose. Most preferablyAfter that, new weighted S = { S = ¹ ,…,s ^l It can be expressed by the following equation.

4) A classification layer

At the network end of the deep learning model, a Softmax layer is used as a classification layer. Array S is believed to contain pancreatic features that can be transformed and fused together. Prediction is a binary problem, so the end result is

Is a two-dimensional vector and can be represented by the following equation.

In the formula, W _s To transform the matrix, b _s Is a deviation. The vector of positive samples is designated (1, 0) and the vector of negative samples is designated (0, 1).

And step S120, constructing a data set and training a deep learning model, wherein the data set reflects the corresponding relation between the sample image sequence and the pancreas classification label.

The data set may be obtained by scanning a plurality of subjects, and the sample images for each subject are labeled with the classification label by a specialist, i.e. the data set reflects the correspondence between the sequence of image slices and the pancreatic classification label. After training, the optimized parameters of the deep learning model, such as weight and bias, can be obtained.

And step S130, classifying the target pancreas images by using the trained deep learning model.

After obtaining the optimized parameters of the model, the actual pancreatic image classification can be applied, including: acquiring an image sequence of a target area; and inputting the image sequence into a deep learning model to obtain a pancreas classification result.

To further verify the effectiveness of the present invention, ablation experiments were performed on the acquired data set to evaluate the effectiveness of the proposed deep learning model by some given evaluation formulas. In addition, other methods are compared to the present invention in terms of accuracy and complexity. The performance of the proposed hybrid attention-based network was then evaluated using different data volumes and sequence lengths.

The network is applied to PyTorch. The experiments were run on hardware on NVIDIA a40 and 64GB RAM. The parameters remain unchanged during the training process. The optimizer is Adam, the initial learning rate is 0.0001, halved for every 10 epochs, and the number of epochs is set to 100. To standardize the input, batch normalization is performed before each activation function.

1) Data set

A large abdominal CT dataset (AbdomenCT-1K) was introduced during the pre-training step. abdomenc-1K provides different annotations for various organ segmentation tasks, including 1000 multiple scan sequences from 12 medical centers.

The data set in the experiment was from Xiangya Hospital (Xiangya Hospital) and contained 153 subjects. The acquisition was performed using a Siemens prism, 1.5 Tesla, model-syngo MRE11 scanner. The echo time was set to 1.33 and the repetition time was 321.79. The size of the original image is 1024 × 1024. The diagnosis is made by a specialized physician. Written consent was obtained from all subjects participating in the experiment. Fig. 6 shows a partial example of two types of samples, where the left column is a normal sample and the right column corresponds to an acute pancreatitis sample.

2) Pretreatment of

To reduce the computational load, the images were compressed to a size of 224 x 244 using the pilotw library prior to training. To improve the robustness and versatility of the hinting module, enhancement methods are applied to the training data by performing random enhancement function transformations, such as scaling, rotation and gamma changes.

4) Dual attention module

The importance of the dual attention module was evaluated experimentally. The purpose of applying the attention mechanism is to focus on the relevant areas at the channel level and the pixel level and suppress irrelevant features. First, the operation of the module is visualized using heatmaps in the Matplotlib library. As shown in fig. 4, the more heavily weighted tokens in the network are labeled with color. Experimental results show that the attention mechanism provided by the invention can focus the network on the area where the pancreas is located. Furthermore, the focal region varies in size from image slice to image slice, corresponding to the size of the pancreatic gland tissue. This indicates that the dual attention module can highlight the relevant regions by learning the high-level characteristics of the organ. Thus, the module can locate pancreatic tissue even if it becomes very small in the limbal layer. However, when the pancreas disappears, the attention module still tries to highlight certain areas around where the organ is located. This is because the nature of the CT scan causes the organ and tissue to appear the same color, thereby affecting the functionality of the module.

Since the accuracy of the dual attention module was demonstrated, further experiments were developed in terms of classification accuracy. A comparison was made between the two methods, i.e. the double attention method proposed by the present invention and the CBAM-free (volume block based attention mechanism) method, on a normal volume data set with a ResNet 50 backbone. A comparison of the two networks is shown in table 2. With the other parameters remaining unchanged, the dual attention module can increase precision by 6.63% and sensitivity by 8.78%, with little increase in the number of parameters.

Table 2: quantitative representation of CBAM contribution

In Table 2, "true" indicates that CBAM is embedded, and "false" indicates that CBAM is not employed.

5) Recurrent neural networks

Attention-based methods, which achieve excellent performance in the verification, are described below, with reference to the attention map of the different slices shown in fig. 7, where the left column is the original image and the right column is the attention profile. The structure of the recurrent neural network is briefly described below.

ADGNET: the network includes a ResNet with an attention module, a classifier, and a decoder. The classifier and decoder work simultaneously to perform classification and reconstruction. This model is intended for the diagnosis of alzheimer's disease.

3DResAttNet: the network is a 3D residual self-attention CNN with high interpretability. The employed self-attention module can function as a mapping function, including keys, values, and queries. The features extracted for each convolution block are converted to vectors by a 1 × 1 × 1 convolution. These values are focused by the query function in the training process.

AG-CNN: the network comprises three sub-networks. First, the attention-predictive subnetwork generates attention for glaucoma diagnosis. Second, the pathological area localization subnetworks are combined with the generated attention maps to form a masking signature. And finally, the classification subnet receives the masking feature map and outputs a prediction result.

The classification performance of each classification method based on multiple measurements according to the introduced evaluation criteria is shown in table 3. As can be seen from table 3, the network precision and sensitivity proposed by the present invention are the highest. Compared with AG-CNN, the precision ratio of the invention is improved by 12.19%, and the sensitivity is improved by 8.94%. Since the feature extraction subnetworks of other methods have similar structures, the reason that the network of the present invention can achieve high performance is to apply LSTM that can extract depth features from a slice sequence. In contrast, other 2D networks rely on only a single image, the classification accuracy of which is highly dependent on the input. The 3D network may also handle depth features, but it contains more weights, which increases the amount of computation.

Table 3: performance comparison of four models

In conclusion, the Att-CNN-RNN classification provided by the invention has better performance, high sensitivity and accuracy, and simultaneously, the competitive advantage is still kept in the parameter size aspect.

In summary, the present invention provides a hybrid attention network for classifying pancreatic CT scans, which integrally includes a feature extraction module and a feature aggregation module, wherein the feature extraction module takes a residual network as a main component and considers both channel attention and spatial attention. Such a backbone network can address gradient vanishing and explosion, which helps deepen the network. Furthermore, to extract depth information for successive slices, a modified Recurrent Neural Network (RNN) is employed in the feature aggregation module to aggregate feature vectors from the feature extraction module. Further, the network is validated on the acquired data set. Experiments show that the mixed attention mechanism can improve the capability of the model for capturing the remarkable features, and the RNN enables the model to be superior to a 2D network losing depth information and a 3D network needing excessive calculation, and the results prove that the proposed network structure has rationality and advantages.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be interpreted as a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or an electrical signal transmitted through an electrical wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + +, python, or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, implementation by software, and implementation by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A method of classifying pancreatic images based on a hybrid attention network, comprising the steps of:

acquiring an image sequence of a target area;

2. The method of claim 1, wherein the main network of the feature extraction module is constructed based on a pre-trained residual network, and comprises a plurality of residual blocks, and the size of the input feature map is reduced by half and the number of channels is increased by two for each residual block.

3. The method of claim 2, wherein the pre-training process of the backbone network employs unsupervised training based on a twin network structure.

4. The method of claim 1, wherein the dual attention module comprises, in order, a channel attention sub-module comprising a first average pooling layer, a first maximum pooling layer, and a shared layer, and a spatial attention sub-module comprising a second average pooling layer and a second maximum pooling layer.

5. The method of claim 1, wherein the feature aggregation module is constructed using a bidirectional long-short term memory network, and wherein the tth output vector is represented as:

wherein h is _t And h' _t Is the t-th output of the long-short term memory network,

indicating a join operation.

6. The method of claim 5, wherein the self-attention mechanism is described as follows:

the bidirectional long-short term network output is a connected vector array V = { V = } ¹ ,…,v ^l Inputting the array into a multilayer perceptron to obtain an interpretation U = { U = } ¹ ,…,u ^l }；

u ^t ＝tanh(W _w h ^t +b _w )

Of importance of the tth slice in the image sequence

Expressed as:

wherein u is _w Is a context vector that needs to be learned during the training process;

finally, the new weight S = { S = { S = } ¹ ,…,s ^l Is represented by the following formula:

wherein, W _w Represents a weight, b _w Indicating the bias.

7. The method of claim 1, wherein the classification layer is a Softmax layer, and wherein the predicted end result is a Softmax layer

Expressed as:

wherein, W _s To transform the matrix, b _s For deviations, s is the output of the feature aggregation module.

8. The method of claim 1, wherein the sequence of images of the target region is a sequence of image slices obtained using a CT scan.

9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program realizes the steps of the method according to any one of claims 1 to 8 when executed by a processor.

10. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the processor realizes the steps of the method of any one of claims 1 to 8 when executing the computer program.