CN117152554A - ViT model-based pathological section data identification method and system - Google Patents
ViT model-based pathological section data identification method and system Download PDFInfo
- Publication number
- CN117152554A CN117152554A CN202310963241.5A CN202310963241A CN117152554A CN 117152554 A CN117152554 A CN 117152554A CN 202310963241 A CN202310963241 A CN 202310963241A CN 117152554 A CN117152554 A CN 117152554A
- Authority
- CN
- China
- Prior art keywords
- pathological section
- section data
- vit
- model
- vit model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001575 pathological effect Effects 0.000 title claims abstract description 131
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000012549 training Methods 0.000 claims abstract description 66
- 239000013598 vector Substances 0.000 claims abstract description 52
- 230000007246 mechanism Effects 0.000 claims abstract description 24
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 238000003860 storage Methods 0.000 claims abstract description 10
- 238000013507 mapping Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 9
- 238000005520 cutting process Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 238000005266 casting Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 8
- 230000008569 process Effects 0.000 abstract description 6
- 238000012545 processing Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000010827 pathological analysis Methods 0.000 description 1
- 230000007170 pathology Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/776—Validation; Performance evaluation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/60—ICT specially adapted for the handling or processing of medical references relating to pathologies
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/03—Recognition of patterns in medical or anatomical images
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Pathology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention discloses a pathological section data identification method, a system, electronic equipment and a storage medium based on a ViT model, wherein the pathological section data identification method based on the ViT model comprises the following steps: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into a ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium.
Background
The traditional pathological analysis diagnosis needs a specialized pathologist to search for target areas and cells one by one under a microscope, a pathological section usually comprises tens of thousands of cells, but the target areas and cells related to diseases only occupy a very small part, and a great amount of redundant information can cause serious 'reading fatigue' to the pathologist.
AI can assist pathologist's more accurate judgement pathological section data of higher efficiency, reduces misdiagnosis rate and missed diagnosis rate, but at the present stage will transmit the former to the vision field and can appear two problems, firstly, compare with one-dimensional text, process two-dimensional image and can lead to the calculated amount to increase greatly, secondly involves more size, noise in the image processing, this requires the model to have stronger feature extraction ability.
What is needed is a model that can assist a pathologist in more efficient and accurate determination of pathological slice data with low computational effort and a strong feature extraction capability.
Disclosure of Invention
The embodiment of the invention aims to provide a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium, which are used for solving the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.
In order to achieve the above objective, an embodiment of the present invention provides a pathological section data identification method based on ViT model, which specifically includes:
acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value;
inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
Based on the technical scheme, the invention can also be improved as follows:
further, the ViT model-based pathological section data identification method further comprises the following steps:
the preprocessing of the full-view pathological section data to obtain preprocessed pathological section data comprises the following steps:
acquiring full-view pathological section data, wherein the full-view pathological section data comprises full-view pathological sections and pathological labeling data;
reading the downsampling multiplying power of the full-view pathological section, and ensuring the integrity of the color features, texture features, shape features and spatial features of the full-view pathological section;
the pathological labeling data are read, and the labeling information is processed to obtain a point coordinate set;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of single full-view pathological section ROI areas;
HSV space information of the ROI area is converted, and the color space of the full-view pathological section is ensured to be RGB.
Further, the constructing a training set based on the preprocessed pathological section data includes;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
Further, the building ViT model, wherein the ViT model includes a multi-head attention mechanism, uses multiple attention heads to calculate multiple sets of key vectors in parallel to obtain a calculated value, predicts based on the calculated value, and includes:
and (5) carrying out one-step iterative solution by a gradient descent method to obtain a minimum loss function and an optimal ViT model parameter value.
Further, the inputting the training set into the ViT model for training, to obtain a trained ViT model, includes:
dividing the preprocessing pathological section data into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
Further, the identifying the pathological section data based on the trained ViT model to obtain an identifying result includes:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
A ViT model-based pathological section data identification system, comprising:
the acquisition module is used for acquiring full-view pathological section data;
the preprocessing module is used for preprocessing the full-view pathological section data to obtain preprocessed pathological section data;
the first construction module is used for constructing a training set based on the preprocessing pathological section data;
a second construction module, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module is used for inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
Further, the ViT model is also used to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
according to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; identifying pathological section data based on the trained ViT model to obtain an identification result; the method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a flow chart of a pathological section data identification method based on a ViT model;
FIG. 2 is a block diagram of a system for identifying pathological section data based on ViT model according to the present invention;
fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises an acquisition module 10, a preprocessing module 20, a first construction module 30, a second construction module 40, a training module 50, an electronic device 60, a processor 601, a memory 602 and a bus 603.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a pathological section data identification method based on a ViT model, and as shown in fig. 1, the pathological section data identification method based on a ViT model provided by the embodiment of the invention comprises the following steps:
s101, acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
specifically, full-view pathological section data are obtained, wherein the full-view pathological section data comprise full-view pathological sections and pathological labeling data; due to the different scanning instruments, the method is divided into 40-fold mirror and 20-fold mirror kfb pathological sections.
Reading the downsampling multiplying power of the full-view pathological section by using kf cutter or converting the kf cutter into SVS, and ensuring the integrality of the color features, texture features, shape features and spatial features of the full-view pathological section;
reading pathology marking data, processing marking information to obtain a point coordinate set, scaling the point coordinate set by using the same downsampling multiplying power, and setting a negative coordinate as 0;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and a points coordinate set, respectively obtaining the maximum (x, y) and the minimum (x, y) as the boundary of the ROI region, cutting and removing the blank region, and obtaining one of the single full-view pathological section ROI regions;
converting HSV space information of the ROI area, ensuring that the color space of the full-field pathological section is RGB, and storing and finishing the conversion into a data set;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
In view of the characteristics of rotational symmetry, chemical preparation difference, scarcity of data volume and the like of pathological images, firstly, the data volume of each stage is counted, and different enhancement modes are designed according to the data volume.
The enhancement mode is as follows: setting an angle to generate a rotation matrix, and rotating by 90 degrees, 180 degrees and 270 degrees through affine transformation; horizontally and vertically turning over; random hue, saturation, brightness, contrast; 30% overlap cut in the horizontal and vertical directions, etc.
On-line Mixup, the following formula: generating a new image by mixing the random (0-1) proportion in a linear interpolation mode, wherein (xi, yi) and (xj, yj) are two samples randomly extracted from training data, and lambda epsilon [0,1 ]]It should be noted that the label loading is onehot coding, and the final classification result is not affected by coefficient weighting.
S102, constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values;
specifically, the minimum loss function and the optimal ViT model parameter value are obtained through one-step iterative solution by a gradient descent method.
The cross entropy loss formula is as follows: wherein y is i For the tag value, y' i Is a predicted value.
Setting smaller initial learning rate and momentum parameters, adding weight attenuation bias to 5E-5, waiting for model stabilization, and properly adding learning rate to prevent model oscillation.
Stable learning rate and cosine function addition, better solvingThe multi-peak optimization function is characterized in that the learning rate is controlled to periodically change along with the cosine function, the period is 32, the local optimal solution is skipped, and the whole optimal solution is searched, wherein the formula is as follows: wherein eta t Represents the current learning rate, eta min Represents the minimum value of learning rate, eta max Represent the maximum learning rate, T cur Representing the current epoch, T i Indicating the maximum epoch.
The Macro-F1 is used as an evaluation index, and precision and recovery of each category are calculated respectively by using the following formula, so that a corresponding F1score is calculated, and category averaging is used for calculating the Macro-F1. Wherein: recall indicates Recall, TP indicates true positives, and FN indicates false negatives.
S103, inputting the training set into a ViT model for training to obtain a trained ViT model;
specifically, the preprocessing pathological section data is divided into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
S104, identifying pathological section data based on a trained ViT model to obtain an identification result;
specifically, a two-dimensional image input into a ViT model is cut into patches with fixed sizes, each Patch is a tensor, each tensor is stretched into a vector, and Patch embedding is obtained through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
When the attention mechanism is applied in the transducer, firstly, the input is mapped to three Key vectors Q (Query), K (Key) and V (Value) through matrix operation, and then the context vector z=softmax ((qk≡t)/V (d_k)) V is obtained based on the three vectors, wherein d_k is the dimension of the K vector. The essence of applying this context vector (attention mechanism) for prediction is that the averaging effect of the position information in the input sequence with the attention results in a reduction of the effective resolution, in order to solve this problem a Multi-headattention (MSA) mechanism is usually employed, i.e. multiple heads are used to calculate the values of multiple sets Q, K, V of vectors in parallel and to use them for prediction after compressing the information from these vectors.
The VIT cuts the two-dimensional image into fixed-size patches, each Patch being a small color picture, with three channels of RGB, i.e., each Patch being a tensor. The next step is Vectorization, i.e. stretching the individual tensors into vectors. Obtaining class marks for patch emplacement through a layer of linear transformation, attaching class marks to the heads of each patch emplacement, adding the class marks with a position vector to obtain a final emplacement vector, and taking the final emplacement vector as input of a transducer for training and prediction, wherein the class marks and the position vector are learnable vectors, the class marks are used for prediction classification, and the position vector is used for representing the position information of each patch in an image;
VIT step splitting:
step1 image blocking expansion, providing a patch16 and a patch32 blocking method, for an input image xE R H×W×C The method is characterized by comprising the following steps of:
step2 Patch embedding (Patch embedding), adding class mark x class And spatial position information E pos The expression at this time is: wherein x is class Representing embedded learnable categories, E representing feature mapping of full connection layer, N+1 representing classification features of newly added patch
Step3, transducer calculation processing: wherein MSA represents residual, MLP represents residual block, LN is LayerNorm normalized layer, z l-1 Representing the output of the last sub-encoder.
MSA layer: z's' l =MSA(LN(z l-1 ))+z l-1 ,l=1,...,L
MLP layer: z is Z l =MLP(LN(z′ l ))+z′ l ,l=1,...,L
Step4, category calculation processing:
when pre-trained on a common ImageNet-21k dataset, viT reached or exceeded the latest level on multiple image recognition benchmarks.
An accuracy of 88.55% was achieved on ImageNet
An accuracy of 90.72% was achieved on ImageNet-ReaL
A 94.55% accuracy was achieved on CIFAR-100.
Vision Transformer (ViT), a visual transducer, reshapes the image xεRH xW xC into a series of flattened two-dimensional slices xp εRNX (P2-C) for processing the two-dimensional image. (H, W) is the resolution of the original image, and (P, P) is the resolution of each image slice. n=hw/P2 is then the effective sequence length of the transformer. Since the transformer uses a constant width through all its layers, one trainable linear projection maps each vectorized path onto the model dimension D, which they refer to as slice embedding its output.
The visual transformer employs a learnable embedding of the embedded slice sequence, the state of which at the output of the transform encoder is represented as an image. The size of the classification head is the same during the pre-training and fine tuning. Furthermore, a 1D position embedding is added in slice embedding to preserve position information. They explored the location embedding of different 2D-aware variants and did not obtain significant benefits compared to standard 1D location embedding. The jointly embedded severs serve as the input to the encoder. Notably, the visual transducer uses only the encoder of the standard transducer, while the MLP header is followed by the output of the transducer encoder.
Typically, vision Transformer is first pre-trained on a large dataset and fine-tuned for smaller downstream tasks. To this end, the pre-trained pre-header is removed, and a zero-initialized DxK feed-forward layer is added, where K is the number of downstream classes. It is often beneficial to fine tune at a higher resolution than pre-training. When a higher resolution image is input, the slice size remains unchanged, thereby obtaining a larger effective sequence length. Vision Transformer can handle arbitrary sequence lengths, however, the pre-trained position embedding may no longer be meaningful. Thus, the pre-trained positions are interpolated in two dimensions from their positions embedded in the original image. It is noted that this resolution adjustment and slice extraction is the only point at which induced deviations in the two-dimensional structure of the image are manually injected into the visual transducer.
The traditional CNN is used for learning 2D characteristic representation, CNN output tiling is used as a transducer input and comprises position coding information of the characteristic, and prediction information is obtained through coding and decoding by a transducer self-attention mechanism.
According to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.
FIG. 2 is a flowchart of an embodiment of a pathological section data identification system based on a ViT model according to the present invention; as shown in fig. 2, the pathological section data identification system based on ViT model provided by the embodiment of the invention comprises the following steps:
an acquisition module 10 for acquiring full-field pathological section data;
a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data;
a first construction module 30 for constructing a training set based on the preprocessed pathological section data;
a second construction module 40, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module 50 is configured to input the training set into the ViT model for training, to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
The ViT model is also used to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
According to the ViT model-based pathological section data identification system, full-view pathological section data are acquired through the acquisition module 10; a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data; constructing, by a first construction module 30, a training set based on the pre-processed pathological section data; constructing a ViT model by a second constructing module 40, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into the ViT model through a training module 50 for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.
Fig. 3 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 3, an electronic device 60 includes: a processor 601 (processor), a memory 602 (memory), and a bus 603;
wherein, the processor 601 and the memory 602 complete communication with each other through the bus 603;
the processor 601 is configured to invoke program instructions in the memory 602 to perform the methods provided by the method embodiments described above, including, for example: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (10)
1. A ViT model-based pathological section data identification method, which is characterized by comprising the following steps:
acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value;
inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
2. The method for identifying pathological section data based on ViT model according to claim 1, wherein the preprocessing of the full-field pathological section data to obtain preprocessed pathological section data comprises:
acquiring full-view pathological section data, wherein the full-view pathological section data comprises full-view pathological sections and pathological labeling data;
reading the downsampling multiplying power of the full-view pathological section, and ensuring the integrity of the color features, texture features, shape features and spatial features of the full-view pathological section;
the pathological labeling data are read, and the labeling information is processed to obtain a point coordinate set;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of the RO I areas of the single full-view pathological section;
HSV space information of the RO I area is converted, and the color space of the full-view pathological section is ensured to be RGB.
3. The ViT model-based pathological section data identification method as claimed in claim 1, wherein the constructing a training set based on the preprocessed pathological section data includes;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
4. The method of claim 1, wherein the constructing ViT a model, wherein the ViT model includes a multi-head attention mechanism, wherein the computing multiple sets of key vectors in parallel using multiple attention heads results in a computed value, and wherein the predicting based on the computed value comprises:
and (5) carrying out one-step iterative solution by a gradient descent method to obtain a minimum loss function and an optimal ViT model parameter value.
5. The method for identifying pathological section data based on the ViT model according to claim 1, wherein the step of inputting the training set into the ViT model for training to obtain a trained ViT model comprises:
dividing the preprocessing pathological section data into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
6. The method for identifying pathological section data based on ViT model according to claim 1, wherein the identifying pathological section data based on ViT trained model to obtain an identification result comprises:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
7. A ViT model-based pathological section data identification system, comprising:
the acquisition module is used for acquiring full-view pathological section data;
the preprocessing module is used for preprocessing the full-view pathological section data to obtain preprocessed pathological section data;
the first construction module is used for constructing a training set based on the preprocessing pathological section data;
a second construction module, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module is used for inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
8. The ViT model-based pathological section data recognition system of claim 7, wherein the ViT model is further configured to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310963241.5A CN117152554A (en) | 2023-08-02 | 2023-08-02 | ViT model-based pathological section data identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310963241.5A CN117152554A (en) | 2023-08-02 | 2023-08-02 | ViT model-based pathological section data identification method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117152554A true CN117152554A (en) | 2023-12-01 |
Family
ID=88899649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310963241.5A Pending CN117152554A (en) | 2023-08-02 | 2023-08-02 | ViT model-based pathological section data identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117152554A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117392468A (en) * | 2023-12-11 | 2024-01-12 | 山东大学 | Cancer pathology image classification system, medium and equipment based on multi-example learning |
CN117457235A (en) * | 2023-12-22 | 2024-01-26 | 首都医科大学附属北京友谊医院 | Pathological damage mode prediction method and device, storage medium and electronic equipment |
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
-
2023
- 2023-08-02 CN CN202310963241.5A patent/CN117152554A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117392468A (en) * | 2023-12-11 | 2024-01-12 | 山东大学 | Cancer pathology image classification system, medium and equipment based on multi-example learning |
CN117392468B (en) * | 2023-12-11 | 2024-02-13 | 山东大学 | Cancer pathology image classification system, medium and equipment based on multi-example learning |
CN117457235A (en) * | 2023-12-22 | 2024-01-26 | 首都医科大学附属北京友谊医院 | Pathological damage mode prediction method and device, storage medium and electronic equipment |
CN117457235B (en) * | 2023-12-22 | 2024-03-19 | 首都医科大学附属北京友谊医院 | Pathological damage mode prediction method and device, storage medium and electronic equipment |
CN117689044A (en) * | 2024-02-01 | 2024-03-12 | 厦门大学 | Quantification method suitable for vision self-attention model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN117152554A (en) | ViT model-based pathological section data identification method and system | |
Lu et al. | Sparse coding from a Bayesian perspective | |
CN111462120A (en) | Defect detection method, device, medium and equipment based on semantic segmentation model | |
CN111127364A (en) | Image data enhancement strategy selection method and face recognition image data enhancement method | |
CN114418030A (en) | Image classification method, and training method and device of image classification model | |
CN116258874A (en) | SAR recognition database sample gesture expansion method based on depth condition diffusion network | |
CN111179270A (en) | Image co-segmentation method and device based on attention mechanism | |
CN116503399A (en) | Insulator pollution flashover detection method based on YOLO-AFPS | |
WO2020194792A1 (en) | Search device, learning device, search method, learning method, and program | |
CN115393690A (en) | Light neural network air-to-ground observation multi-target identification method | |
CN116091823A (en) | Single-feature anchor-frame-free target detection method based on fast grouping residual error module | |
CN114973136A (en) | Scene image recognition method under extreme conditions | |
CN116612382A (en) | Urban remote sensing image target detection method and device | |
CN115861595B (en) | Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning | |
CN116824138A (en) | Interactive image segmentation method and device based on click point influence enhancement | |
CN115457638A (en) | Model training method, data retrieval method, device, equipment and storage medium | |
CN115424275A (en) | Fishing boat brand identification method and system based on deep learning technology | |
CN115063831A (en) | High-performance pedestrian retrieval and re-identification method and device | |
CN113920311A (en) | Remote sensing image segmentation method and system based on edge auxiliary information | |
CN114385831B (en) | Knowledge-graph relation prediction method based on feature extraction | |
CN117058437B (en) | Flower classification method, system, equipment and medium based on knowledge distillation | |
CN111382761A (en) | CNN-based detector, image detection method and terminal | |
CN112348806B (en) | No-reference digital pathological section ambiguity evaluation method | |
CN116030347B (en) | High-resolution remote sensing image building extraction method based on attention network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |