CN114998647A - Breast cancer full-size pathological image classification method based on attention multi-instance learning - Google Patents
Breast cancer full-size pathological image classification method based on attention multi-instance learning Download PDFInfo
- Publication number
- CN114998647A CN114998647A CN202210526657.6A CN202210526657A CN114998647A CN 114998647 A CN114998647 A CN 114998647A CN 202210526657 A CN202210526657 A CN 202210526657A CN 114998647 A CN114998647 A CN 114998647A
- Authority
- CN
- China
- Prior art keywords
- network
- stage
- full
- attention
- size
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010006187 Breast cancer Diseases 0.000 title claims abstract description 22
- 208000026310 Breast neoplasm Diseases 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims abstract description 20
- 230000001575 pathological effect Effects 0.000 title claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 20
- 238000012795 verification Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 7
- 238000004220 aggregation Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 6
- 230000003993 interaction Effects 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 230000007774 longterm Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 239000000126 substance Substances 0.000 claims description 3
- 210000001519 tissue Anatomy 0.000 description 7
- 206010028980 Neoplasm Diseases 0.000 description 4
- 201000011510 cancer Diseases 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 4
- 208000007433 Lymphatic Metastasis Diseases 0.000 description 3
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 2
- WZUVPPKBWHMQCE-UHFFFAOYSA-N Haematoxylin Chemical compound C12=CC(O)=C(O)C=C2CC2(O)C1C1=CC=C(O)C(O)=C1OC2 WZUVPPKBWHMQCE-UHFFFAOYSA-N 0.000 description 2
- 206010027476 Metastases Diseases 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 210000001165 lymph node Anatomy 0.000 description 2
- 230000009401 metastasis Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 206010040736 Sinoatrial block Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- YQGOJNYOYNNSMM-UHFFFAOYSA-N eosin Chemical compound [Na+].OC(=O)C1=CC=CC=C1C1=C2C=C(Br)C(=O)C(Br)=C2OC2=C(Br)C(O)=C(Br)C=C21 YQGOJNYOYNNSMM-UHFFFAOYSA-N 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000001394 metastastic effect Effects 0.000 description 1
- 206010061289 metastatic neoplasm Diseases 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The breast cancer full-size pathological image classification method based on attention multi-instance learning comprises the following steps: step 1: acquiring a data set and a label; step 2: preprocessing a data set; and step 3: constructing a two-stage full-size pathological image (WSI) classification network; and 4, step 4: saving the optimal weight of the two-stage network; and 5: and calculating the accuracy of the network on the test set. The SAMIL of the present invention introduces a lightweight and efficient SA module, SA fusing spatial attention and channel attention, which are used to capture pixel-level pairings and channel dependencies, respectively. SAMIL stacks MHA with LSTM to adaptively highlight the most unique instance features to better compute the correlation between selected instances, improving classification accuracy.
Description
Technical Field
The invention relates to the technical field of image classification methods, in particular to a breast cancer full-size pathological image classification method based on attention multi-instance learning.
Background
According to the recent global cancer estimation, 230 ten thousand newly diagnosed cases of breast cancer in women in 2020 have been the most common cancer in the world over lung cancer. Meanwhile, the digitization of full-size images (WSI), i.e., hematoxylin and eosin (H & E) -stained biopsy tissue specimens, provides an exact reference for breast cancer diagnosis.
In recent years, computer-assisted methods of WSI classification for cancer diagnosis have received increased attention with the breakthrough success of deep learning in various computer tasks. In particular, some researchers turned WSI classification into a weakly supervised task and introduced multi-instance learning (MIL) as a solution to the problem caused by the large scale of WSI and the difficulty of pixel-level labeling in fully supervised learning. The MIL solution mainly focuses on two key links, namely an instance level selection module is constructed, the positive probability of a slice level image is calculated based on extracted depth features, and the first K slices with the maximum probability are used as candidate instances; the design aggregation operator generates packet insertions for calculating a score for each packet. Although multi-instance learning has made great progress in the task of full-slice pathology image classification.
It has the following disadvantages: the characteristic correlation of each sub-feature is rarely described in the spatial or channel dimensions, which is not conducive to finding cancer cells with minimal breast cancer lymph node metastasis. There are limitations in capturing the dependencies between different instances that help classify WSI.
Disclosure of Invention
The invention aims to provide a breast cancer full-size pathological image classification method based on attention multi-instance learning, which can obtain patch level representation with higher discriminability and can improve the accuracy of breast cancer metastasis lymph node pathological image classification.
A breast cancer full-size pathological image classification method based on attention multi-instance learning comprises the following steps:
step 1: acquiring a data set and a label: acquiring a data set and a label of a breast cancer histopathology image, and randomly dividing the breast cancer histopathology image into a training set, a verification set and a test set according to a proportion;
step 2: preprocessing a data set: preprocessing the divided data sets based on the anti-binarization threshold processing operation, generating a mask of a background/tissue area for each WSI picture, cutting the tissue area into slices of a multiplied by a size, and storing coordinate groups of the slices. To further reduce the amount of calculation, a probability p is added, and when the portion of the tissue region in the slice is greater than the probability p, the coordinates of the slice are saved, and the processed WSI image X' i Can be represented by X' i ={x i,1 ,x i,2 …,x i,m M is the number of slices in each full-size breast cancer pathological image;
and step 3: constructing a two-stage full-size pathological image (WSI) classification network: the first stage is used for selecting examples, the SA-ResNet50 network is used for carrying out feature extraction on slices, the first K examples with the highest probability in each WSI are selected based on a multi-example learning method, the second stage is used for predicting at the full-size level, and an aggregator constructed by overlapping a multi-head attention (MHA) network with a long-short term memory (LSTM) network makes reliable prediction on the whole WSI image;
step 31: in one stage, the SA-ResNet50 network performs feature extraction on the slices: let slice X' be epsilon R C×H×W As an input to the pre-trained SA-ResNet50 network, after the residual structure of ResNet50, a feature matrix X ∈ R is obtained c×h×w Replacement attention first divides X into G groups along the channel dimension, i.e., X ═ X 1 ,…,X G ],X k ∈R c/G×h×w ,X k Is continuously divided into two branches, X respectively k1 ,X k2 ∈R c/2G×h×w One branch uses the mutual relation among channels to output a channel attention map, the other branch uses the spatial relation among features to generate a spatial attention map, and the results of the two branches are connected so that the number of channels is X' k And X k Are the same, and then all the feature matrices X' k Performing an aggregation operation with the final output of the SA module being X out ∈R c×h×w 。X out Generation of feature vector X for a slice by global mean pooling gap 。
Step 32: obtaining a small training SA-ResNet50 network: after the feature vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are sorted from small to large, and T small blocks with the most front probability ranking in each full-size image are used for training the SA-ResNet50 network.
Step 33: input V to obtain full-scale level prediction: predicting slices in each WSI by using an optimal weight file pre-trained in one stage, sequencing predicted probabilities, and taking the first K instances with the highest probability in each full-size image as input V ═ V of full-size level prediction 1 ,…,v K ]∈R K×C 。
Step 34: the first K instances with the highest aggregation probability: using MHA and LSTM, for the ith head attention cell (H) in MHA i ) The calculation formula is as follows:
wherein V ═ V 1 ,…,v K ]∈R K×C V denotes the characteristics of the first K selected instances, K denotes the number of instances, V 1 ,…,v K Representing a single instance feature, v j ,v k E.g. V, C is an example characteristic embedding dimension, and a convolution kernel is W e.R D×1 And Z ∈ CR D×C And D is the feature embedding dimension. The hyperbolic tangent tanh is the activation function. After element multiplication, for MHA, connecting all outputs of the head unit, another convolution is performed to project back to the original dimension:
wherein the content of the first and second substances,denotes the first K examples after feature enhancement, V ═ V 1 ,…,v K ]∈R K×C V denotes the characteristics of the first K selected instances, K denotes the number of instances, V 1 ,…,v K Representing a single instance feature, W pro ∈R (H×D)×C Representing a convolution kernel, T representing the transpose of a matrix, H 1 ,…,H h Representing head attention unit, h representing number of heads, C and D feature embedding dimensions.
Step 35: the dependencies between the selected Top-K instances are further modeled: LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image-level representations. LSTM can capture short-term and long-term dependencies given an input signature sequence (v) 1 ,…,v K ) The hidden layer of LSTM is recursively calculated from t-1 to t-K using the following formula:
wherein f is t ,i t ,o t Respectively showing a forgetting gate, an input gate and an output gate. W {f,i,o,c} And U {f,i,o,c} Weight matrix representing the need for learning, b {f,i,o,c} Represents a deviation vector, h t-1 Is a hidden vector, c t Representing memory cells, Sigmoid and tanh represent activation functions. The output of the last LSTM is used as the final packet-level representation vector for prediction.
And 4, step 4: and (3) saving the optimal weight of the two-stage network: inputting a data set into a two-stage classification network, training a one-stage network by adopting a training set, updating network parameters in each iteration, carrying out primary verification on a verification set every three iterations, storing the optimal weight of the one-stage network according to the accuracy of the optimal verification set, processing the data set by using the optimal weight of the one stage, selecting K examples with the most top probability ranking in each WSI as two-stage input, initializing the two-stage network by using the optimal weight of the one stage, carrying out primary verification after one iteration is completed in each training, and storing the optimal weight of the two-stage network according to the accuracy of the optimal verification set;
and 5: and calculating the accuracy of the network on the test set: and initializing the network by using the two-stage optimal weight, inputting the test set into the network to obtain a prediction result of each WSI, comparing the prediction result with the real label data, counting the number of the WSIs with correct prediction and wrong prediction, and calculating the accuracy of the network on the test set.
Compared with the prior art, the invention has the following beneficial effects:
(1) SAMIL introduces a lightweight and efficient SA module, which fuses spatial attention and channel attention, which are used to capture pixel-level pairings and channel dependencies, respectively.
(2) SAMIL stacks MHA with LSTM to adaptively highlight the most distinctive instance features to better compute the correlation between selected instances, improving classification accuracy.
Drawings
FIG. 1 is an overall block diagram of the SAMIL model.
Detailed Description
The experimental data used in the present invention was from a lymph node metastasis data set of 2016Camelyon Grand Challenge. The data set contained 399 full-size images, including both normal and metastatic forms, intact for detection of metastasis in HE stained histological sections of sentinel assist lymph nodes in breast cancer patients.
In the schematic diagram of the invention, a method for classifying two-stage breast cancer full-size pathological images based on attention multiple instances comprises the following steps:
step 1: acquiring a data set and a label: the lymph node metastasis data set is randomly divided into training sets according to the proportion of 2:1: and (4) verification set: and testing sets, wherein 204 training sets, 95 verification sets and 100 testing sets are used.
Step 2: preprocessing a data set: the method is characterized in that the divided data sets are preprocessed based on an anti-binarization threshold processing operation, masks of background/tissue areas are generated for each WSI picture, the tissue areas are divided into slices with the size of 512 multiplied by 512, and coordinate groups of the slices are stored. To further reduce the amount of calculation, a probability value of 0.4 is added, and when the tissue region in the slice is larger than 0.4, the coordinates of the slice are saved, and the processed WSI image X' i Can be represented by X' i ={x i,1 ,x i,2 …,x i,m M is the number of slices in each full-size breast cancer pathological image;
and step 3: constructing a two-stage full-size pathological image (WSI) classification network: the first stage is used for selecting examples, the SA-ResNet50 network is used for carrying out feature extraction on slices, 10 examples with the maximum probability in each WSI are selected based on a multi-example learning method, the second stage is used for predicting at the full-size level, and an aggregator constructed by overlapping a multi-head attention (MHA) network with a long-short term memory (LSTM) network makes reliable prediction on the whole WSI image;
step 31: in one stage, the SA-ResNet50 network performs feature extraction on the slices: slice x i,j ∈R 3×512×512 Scaled to 224 x 3 pixels as input to the pre-trained SA-ResNet50 network. An SA module is inserted into each residual phase (e.g., Conv2_ x) in the ResNet-50. The input of SA is a characteristic matrix X ∈ R 256×56×56 . The SA module first divides X into 64 groups along the channel dimension, i.e., X ═ X 1 ,…,X k ,…,X 64 ],X k ∈R 4×56×56 ,X k Is continuously divided into two branches, X respectively k1 ,X k2 ∈R 2×56×56 One branch utilizes the interrelationship between channels, outputting channel attention maps X' k1 ∈R 2×56×56 The other branch utilizes the spatial relationship among the features to generate a spatial attention map X' k2 ∈R 2×56×56 Connecting the two branches to obtain X' k ∈R 4×56×56 Then all feature matrices X' k Performing an aggregation operation with the final output of the SA module being X out ∈R 256×56×56 . SA blocks in Conv3_ X, Conv4_ X, Conv5_ X residual blocks are similar, X out The feature vector generated by global average pooling is X gap ∈R 2048×1×1 。
Step 32: acquiring a small training SA-ResNet50 network: and after the feature vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are ranked from small to large, and 2 small blocks with the most front probability rank in each full-size image are taken to train the SA-ResNet50 network.
Step 33: input V to obtain full-scale level prediction: predicting slices in each WSI by using a one-stage pre-trained optimal weight file, sequencing predicted probabilities, and taking the first 10 examples with highest probability in each full-size image as input V ═ V ═ of two-stage full-size level prediction 1 ,…,v 10 ]∈R 2048×1 。
Step 34: the first K instances with the highest aggregation probability: with MHA and LSTM, for the ith head attention cell in multi-head attention, the calculation formula is as follows:
wherein V ═ V 1 ,…,v 10 ]∈R 10×2048 V denotes the first 10 selected example features, V 1 ,…,v 10 Representing a single instance feature, v j ,v k Belongs to V, and the convolution kernel is W belongs to R 512×1 And Z ∈ R 512×2048 . The hyperbolic tangent tanh is the activation function. After element multiplications, key instances are highlighted according to the relationship between them. For MHA, the invention connects all the outputs of the head unit, performs another convolution to project back to the original dimensions:
wherein the content of the first and second substances,the first 10 examples after feature enhancement are shown, V ═ V 1 ,…,v 10 ]∈R 10×2048 V denotes the first 10 selected example features, V 1 ,…,v 10 Representing a single instance feature, W pro ∈R (3×512)×2048 Representing a convolution kernel, T representing the transpose of a matrix, H 1 ,…,H h The head attention unit is indicated, h indicates the number of heads, and in this study, h is 3. Multi-headed attention recalibrates all instance features from the different representation subspaces, enriching the original selected instance V.
Step 35: the dependencies between the first 10 selected instances are further modeled: LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image-level representations. LSTM can capture short-term and long-term dependencies given an input signature sequence (v) 1 ,…,v 10 ) The hidden layer of LSTM is recursively calculated from t 1 to t 10 using the following formula: :
wherein f is t ,i t ,o t Respectively representing a forgetting gate, an input gate and an output gate. W {f,i,o,c} And U {f,i,o,c} Weight matrix representing the need for learning, b {f,i,o,c} Represents a deviation vector, h t Is a hidden vector, c t Is a memory unit, Sigmoid and tanh represent activation functions. In the feature fusion module, the present invention stacks two layers of LSTMs so that the enhanced instances can interact more fully. The output of the last LSTM is used as the final packet-level representation vector for prediction.
And 4, step 4: and (3) saving the optimal weight of the two-stage network: inputting a data set into a two-stage classification network, training the one-stage network by adopting a training set, updating network parameters in each iteration, verifying a verification set once in every three iterations, saving the optimal weight of the one-stage network according to the accuracy of the optimal verification set, and relieving the gradient oscillation problem by using an Adam optimizer in the training process, wherein the learning rate is set to be 1e-4, and the weight attenuation is set to be 1 e-5. Processing the data set by using the optimal weight in the first stage, selecting 10 examples with the most top probability ranking in each WSI as input in the second stage, initializing the network in the second stage by using the optimal weight in the first stage, in the training process in the second stage, using an Adam optimizer, setting the learning rate to be 1e-4, setting the weight attenuation to be 1e-4, performing 1 verification after 1 iteration is completed in each training, and storing the optimal weight of the network in the second stage according to the accuracy of the optimal verification set;
and 5: calculating the accuracy of the network on a test set: and initializing the network by using two-stage optimal weight, inputting the test set into the network to obtain a prediction result of each WSI, comparing the prediction result with the real label data of 100 test sets, and counting the number of the WSIs with correct prediction and wrong prediction so as to calculate the accuracy of SAMIL on the test sets.
According to the steps, the invention provides a novel SAMIL model for a breast cancer WSI classification task. SAMIL uses a displacement attention (SA) module to select discriminant instances and implements packet-level prediction using multi-head attention (MHA) of LSTM, thereby well exploring the advantages of the attention mechanism to solve the MIL problem. In addition, experimental results show that compared with the most advanced MIL method, the method has superior performance on the Camellyon 16 data set, and the accuracy rate is 96.56% at most.
Claims (2)
1. The breast cancer full-size pathological image classification method based on attention multi-instance learning is characterized by comprising the following steps of: the method comprises the following steps: step 1: acquiring a data set and a label: acquiring a data set and a label of a breast cancer histopathology image, and randomly dividing the breast cancer histopathology image into a training set, a verification set and a test set according to a proportion; step 2: preprocessing a data set: preprocessing the divided data set based on an inverse binarization threshold processing operation, generating a mask of a background/tissue area for each WSI picture, cutting the tissue area into slices of a X a size, storing a coordinate group of the slices, adding a probability p for further reducing the calculation amount, storing the coordinates of the slices when the part of the tissue area in the slices is greater than the probability p, and processing the WSI image X' i Can be represented by X' i ={x i,1 ,x i,2 …,x i,m Wherein m is eachThe number of slices in the full-size breast cancer pathological image; and step 3: constructing a two-stage full-size pathological image (WSI) classification network: the first stage is used for selecting examples, the SA-ResNet50 network is used for extracting features of slices, the first K examples with the highest probability in each WSI are selected through a multi-example learning-based method, the second stage is used for predicting at the full-size level, and an aggregator constructed by overlapping multi-head attention (MHA) and a long-short term memory (LSTM) network makes reliable prediction on the whole WSI image; and 4, step 4: and (3) saving the optimal weight of the two-stage network: inputting a data set into a two-stage classification network, training a one-stage network by adopting a training set, updating network parameters in each iteration, carrying out primary verification on the verification set every three iterations, storing the optimal weight of the one-stage network according to the accuracy of the optimal verification set, processing the data set by using the optimal weight of the one stage, selecting K instances with the most advanced probability ranking in each WSI as two-stage input, initializing the two-stage network by using the optimal weight of the one stage, carrying out primary verification after one iteration is completed in each training, and storing the optimal weight of the two-stage network according to the accuracy of the optimal verification set; and 5: calculating the accuracy of the network on a test set: and initializing the network by using the two-stage optimal weight, inputting the test set into the network to obtain a prediction result of each WSI, comparing the prediction result with the real label data, counting the number of the WSIs with correct prediction and wrong prediction, and calculating the accuracy of the network on the test set.
2. The breast cancer full-size pathological image classification method based on attention multi-instance learning according to claim 1, characterized in that: in step 3, step 31: in one stage, the SA-ResNet50 network performs feature extraction on the slices: the slice X' is the same as R C×H×W As an input to the pre-trained SA-ResNet50 network, after the residual structure of ResNet50, a feature matrix X ∈ R is obtained c×h×w Replacement attention first divides X into G groups along the channel dimension, i.e., X ═ X 1 ,…,X G ],X k ∈R c/G×h×w ,X k Is successively divided into two branches, X respectively k1 ,X k2 ∈R c/2G×h×w One branch uses the mutual relation among channels to output a channel attention map, the other branch uses the spatial relation among features to generate a spatial attention map, and the results of the two branches are connected so that the number of channels is X' k And X k Are the same, and then all feature matrices X' k Performing an aggregation operation with the final output of the SA module being X out ∈R c×h×w ,X out Generation of feature vector X for a slice by global mean pooling gap (ii) a Step 32: acquiring a small training SA-ResNet50 network: after the characteristic vector of each slice is obtained, the probability of each slice is obtained through a Softmax function, the probabilities of the slices in each full-size image are sorted from small to large, and T small blocks with the most front probability ranking in each full-size image are used for training an SA-ResNet50 network; step 33: input V to obtain full-scale level prediction: predicting slices in each WSI by using an optimal weight file pre-trained in one stage, sequencing predicted probabilities, and taking the first K instances with the highest probability in each full-size image as input V ═ V of full-size level prediction 1 ,…,v K ]∈R K×C (ii) a Step 34: the first K instances with the highest aggregation probability: using MHA and LSTM, for the ith head attention cell (H) in MHA i ) The calculation formula is as follows:
wherein V is [ V ═ V 1 ,…,v K ]∈R K×C V denotes the characteristics of the first K selected instances, K denotes the number of instances, V 1 ,…,v K Representing a single instance feature, v j ,v k E.g. V, C is an example characteristic embedding dimension, and a convolution kernel is W e.R D×1 And Z ∈ R D×C D is the feature embedding dimension, tanh is the activation function, in element multiplicationThen, for the MHA, connecting all the outputs of the head unit, another convolution is performed to project back to the original dimensions:
wherein the content of the first and second substances,denotes the first K examples after feature enhancement, V ═ V 1 ,…,v K ]∈R K×C V denotes the characteristics of the first K selected instances, K denotes the number of instances, V 1 ,…,v K Representing a single instance feature, W pro ∈R (H×D)×C Representing a convolution kernel, T representing the transpose of a matrix, H 1 ,…,H h Representing head attention units, h representing number of heads, C and D feature embedding dimensions; step 35: further modeling the dependencies between the selected Top-K instances: the LSTM is further used to construct interactions and fuse interaction instances to obtain differentiated image-level representations, which can capture short-term and long-term dependencies given a sequence of input features (v;) 1 ,…,v K ) The hidden layer of LSTM is recursively calculated from t-1 to t-K using the following formula:
wherein f is t ,i t ,o t Respectively showing a forgetting gate, an input gate and an output gate, W {f,i,o,c} And U {f,i,o,c} Weight matrix representing the need for learning, b {f,i,o,c} Represents a deviation vector, h t-1 Is a hidden vector, c t Representing memory units, Sigmoid and tanh represent activation functions, and the output of the last LSTM serves as the final packet level representation vector for prediction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210526657.6A CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210526657.6A CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114998647A true CN114998647A (en) | 2022-09-02 |
CN114998647B CN114998647B (en) | 2024-05-07 |
Family
ID=83027208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210526657.6A Active CN114998647B (en) | 2022-05-16 | 2022-05-16 | Breast cancer full-size pathological image classification method based on attention multi-instance learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114998647B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237781A (en) * | 2023-11-16 | 2023-12-15 | 哈尔滨工业大学(威海) | Attention mechanism-based double-element fusion space-time prediction method |
CN117830227A (en) * | 2023-12-11 | 2024-04-05 | 皖南医学院 | Oral cancer tumor staging method based on pathology and CT multi-modal model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110415212A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Abnormal cell detection method, device and computer readable storage medium |
US20200356724A1 (en) * | 2019-05-06 | 2020-11-12 | University Of Electronic Science And Technology Of China | Multi-hop attention and depth model, method, storage medium and terminal for classification of target sentiments |
CN114238577A (en) * | 2021-12-17 | 2022-03-25 | 中国计量大学上虞高等研究院有限公司 | Multi-task learning emotion classification method integrated with multi-head attention mechanism |
-
2022
- 2022-05-16 CN CN202210526657.6A patent/CN114998647B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200356724A1 (en) * | 2019-05-06 | 2020-11-12 | University Of Electronic Science And Technology Of China | Multi-hop attention and depth model, method, storage medium and terminal for classification of target sentiments |
CN110415212A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Abnormal cell detection method, device and computer readable storage medium |
CN114238577A (en) * | 2021-12-17 | 2022-03-25 | 中国计量大学上虞高等研究院有限公司 | Multi-task learning emotion classification method integrated with multi-head attention mechanism |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237781A (en) * | 2023-11-16 | 2023-12-15 | 哈尔滨工业大学(威海) | Attention mechanism-based double-element fusion space-time prediction method |
CN117237781B (en) * | 2023-11-16 | 2024-03-19 | 哈尔滨工业大学(威海) | Attention mechanism-based double-element fusion space-time prediction method |
CN117830227A (en) * | 2023-12-11 | 2024-04-05 | 皖南医学院 | Oral cancer tumor staging method based on pathology and CT multi-modal model |
Also Published As
Publication number | Publication date |
---|---|
CN114998647B (en) | 2024-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107292256B (en) | Auxiliary task-based deep convolution wavelet neural network expression recognition method | |
CN105512289B (en) | Image search method based on deep learning and Hash | |
CN111126488B (en) | Dual-attention-based image recognition method | |
CN108875076B (en) | Rapid trademark image retrieval method based on Attention mechanism and convolutional neural network | |
CN108765383B (en) | Video description method based on deep migration learning | |
CN114998647A (en) | Breast cancer full-size pathological image classification method based on attention multi-instance learning | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN109033978B (en) | Error correction strategy-based CNN-SVM hybrid model gesture recognition method | |
CN111898703B (en) | Multi-label video classification method, model training method, device and medium | |
CN110110602A (en) | A kind of dynamic sign Language Recognition Method based on three-dimensional residual error neural network and video sequence | |
CN104077742B (en) | Human face sketch synthetic method and system based on Gabor characteristic | |
CN111276240A (en) | Multi-label multi-mode holographic pulse condition identification method based on graph convolution network | |
CN113688894A (en) | Fine-grained image classification method fusing multi-grained features | |
CN115966010A (en) | Expression recognition method based on attention and multi-scale feature fusion | |
CN114783034A (en) | Facial expression recognition method based on fusion of local sensitive features and global features | |
CN111582506A (en) | Multi-label learning method based on global and local label relation | |
CN111325237A (en) | Image identification method based on attention interaction mechanism | |
CN114530222A (en) | Cancer patient classification system based on multiomics and image data fusion | |
CN115797929A (en) | Small farmland image segmentation method and device based on double-attention machine system | |
CN116229179A (en) | Dual-relaxation image classification method based on width learning system | |
CN115310589A (en) | Group identification method and system based on depth map self-supervision learning | |
CN111259264A (en) | Time sequence scoring prediction method based on generation countermeasure network | |
Afzal et al. | Discriminative feature abstraction by deep L2 hypersphere embedding for 3D mesh CNNs | |
CN111242102B (en) | Fine-grained image recognition algorithm of Gaussian mixture model based on discriminant feature guide | |
CN109583406B (en) | Facial expression recognition method based on feature attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |