CN117152554A - ViT model-based pathological section data identification method and system - Google Patents

ViT model-based pathological section data identification method and system Download PDF

Info

Publication number
CN117152554A
CN117152554A CN202310963241.5A CN202310963241A CN117152554A CN 117152554 A CN117152554 A CN 117152554A CN 202310963241 A CN202310963241 A CN 202310963241A CN 117152554 A CN117152554 A CN 117152554A
Authority
CN
China
Prior art keywords
pathological section
section data
vit
model
vit model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310963241.5A
Other languages
Chinese (zh)
Inventor
杜登斌
陈昊
***
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuzheng Intelligent Technology Beijing Co ltd
Original Assignee
Wuzheng Intelligent Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuzheng Intelligent Technology Beijing Co ltd filed Critical Wuzheng Intelligent Technology Beijing Co ltd
Priority to CN202310963241.5A priority Critical patent/CN117152554A/en
Publication of CN117152554A publication Critical patent/CN117152554A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/60ICT specially adapted for the handling or processing of medical references relating to pathologies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a pathological section data identification method, a system, electronic equipment and a storage medium based on a ViT model, wherein the pathological section data identification method based on the ViT model comprises the following steps: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into a ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.

Description

ViT model-based pathological section data identification method and system
Technical Field
The invention relates to the technical field of computers, in particular to a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium.
Background
The traditional pathological analysis diagnosis needs a specialized pathologist to search for target areas and cells one by one under a microscope, a pathological section usually comprises tens of thousands of cells, but the target areas and cells related to diseases only occupy a very small part, and a great amount of redundant information can cause serious 'reading fatigue' to the pathologist.
AI can assist pathologist's more accurate judgement pathological section data of higher efficiency, reduces misdiagnosis rate and missed diagnosis rate, but at the present stage will transmit the former to the vision field and can appear two problems, firstly, compare with one-dimensional text, process two-dimensional image and can lead to the calculated amount to increase greatly, secondly involves more size, noise in the image processing, this requires the model to have stronger feature extraction ability.
What is needed is a model that can assist a pathologist in more efficient and accurate determination of pathological slice data with low computational effort and a strong feature extraction capability.
Disclosure of Invention
The embodiment of the invention aims to provide a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium, which are used for solving the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.
In order to achieve the above objective, an embodiment of the present invention provides a pathological section data identification method based on ViT model, which specifically includes:
acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value;
inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
Based on the technical scheme, the invention can also be improved as follows:
further, the ViT model-based pathological section data identification method further comprises the following steps:
the preprocessing of the full-view pathological section data to obtain preprocessed pathological section data comprises the following steps:
acquiring full-view pathological section data, wherein the full-view pathological section data comprises full-view pathological sections and pathological labeling data;
reading the downsampling multiplying power of the full-view pathological section, and ensuring the integrity of the color features, texture features, shape features and spatial features of the full-view pathological section;
the pathological labeling data are read, and the labeling information is processed to obtain a point coordinate set;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of single full-view pathological section ROI areas;
HSV space information of the ROI area is converted, and the color space of the full-view pathological section is ensured to be RGB.
Further, the constructing a training set based on the preprocessed pathological section data includes;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
Further, the building ViT model, wherein the ViT model includes a multi-head attention mechanism, uses multiple attention heads to calculate multiple sets of key vectors in parallel to obtain a calculated value, predicts based on the calculated value, and includes:
and (5) carrying out one-step iterative solution by a gradient descent method to obtain a minimum loss function and an optimal ViT model parameter value.
Further, the inputting the training set into the ViT model for training, to obtain a trained ViT model, includes:
dividing the preprocessing pathological section data into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
Further, the identifying the pathological section data based on the trained ViT model to obtain an identifying result includes:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
A ViT model-based pathological section data identification system, comprising:
the acquisition module is used for acquiring full-view pathological section data;
the preprocessing module is used for preprocessing the full-view pathological section data to obtain preprocessed pathological section data;
the first construction module is used for constructing a training set based on the preprocessing pathological section data;
a second construction module, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module is used for inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
Further, the ViT model is also used to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
according to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; identifying pathological section data based on the trained ViT model to obtain an identification result; the method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a flow chart of a pathological section data identification method based on a ViT model;
FIG. 2 is a block diagram of a system for identifying pathological section data based on ViT model according to the present invention;
fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises an acquisition module 10, a preprocessing module 20, a first construction module 30, a second construction module 40, a training module 50, an electronic device 60, a processor 601, a memory 602 and a bus 603.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a pathological section data identification method based on a ViT model, and as shown in fig. 1, the pathological section data identification method based on a ViT model provided by the embodiment of the invention comprises the following steps:
s101, acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
specifically, full-view pathological section data are obtained, wherein the full-view pathological section data comprise full-view pathological sections and pathological labeling data; due to the different scanning instruments, the method is divided into 40-fold mirror and 20-fold mirror kfb pathological sections.
Reading the downsampling multiplying power of the full-view pathological section by using kf cutter or converting the kf cutter into SVS, and ensuring the integrality of the color features, texture features, shape features and spatial features of the full-view pathological section;
reading pathology marking data, processing marking information to obtain a point coordinate set, scaling the point coordinate set by using the same downsampling multiplying power, and setting a negative coordinate as 0;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and a points coordinate set, respectively obtaining the maximum (x, y) and the minimum (x, y) as the boundary of the ROI region, cutting and removing the blank region, and obtaining one of the single full-view pathological section ROI regions;
converting HSV space information of the ROI area, ensuring that the color space of the full-field pathological section is RGB, and storing and finishing the conversion into a data set;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
In view of the characteristics of rotational symmetry, chemical preparation difference, scarcity of data volume and the like of pathological images, firstly, the data volume of each stage is counted, and different enhancement modes are designed according to the data volume.
The enhancement mode is as follows: setting an angle to generate a rotation matrix, and rotating by 90 degrees, 180 degrees and 270 degrees through affine transformation; horizontally and vertically turning over; random hue, saturation, brightness, contrast; 30% overlap cut in the horizontal and vertical directions, etc.
On-line Mixup, the following formula: generating a new image by mixing the random (0-1) proportion in a linear interpolation mode, wherein (xi, yi) and (xj, yj) are two samples randomly extracted from training data, and lambda epsilon [0,1 ]]It should be noted that the label loading is onehot coding, and the final classification result is not affected by coefficient weighting.
S102, constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values;
specifically, the minimum loss function and the optimal ViT model parameter value are obtained through one-step iterative solution by a gradient descent method.
The cross entropy loss formula is as follows: wherein y is i For the tag value, y' i Is a predicted value.
Setting smaller initial learning rate and momentum parameters, adding weight attenuation bias to 5E-5, waiting for model stabilization, and properly adding learning rate to prevent model oscillation.
Stable learning rate and cosine function addition, better solvingThe multi-peak optimization function is characterized in that the learning rate is controlled to periodically change along with the cosine function, the period is 32, the local optimal solution is skipped, and the whole optimal solution is searched, wherein the formula is as follows: wherein eta t Represents the current learning rate, eta min Represents the minimum value of learning rate, eta max Represent the maximum learning rate, T cur Representing the current epoch, T i Indicating the maximum epoch.
The Macro-F1 is used as an evaluation index, and precision and recovery of each category are calculated respectively by using the following formula, so that a corresponding F1score is calculated, and category averaging is used for calculating the Macro-F1. Wherein: recall indicates Recall, TP indicates true positives, and FN indicates false negatives.
S103, inputting the training set into a ViT model for training to obtain a trained ViT model;
specifically, the preprocessing pathological section data is divided into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
S104, identifying pathological section data based on a trained ViT model to obtain an identification result;
specifically, a two-dimensional image input into a ViT model is cut into patches with fixed sizes, each Patch is a tensor, each tensor is stretched into a vector, and Patch embedding is obtained through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
When the attention mechanism is applied in the transducer, firstly, the input is mapped to three Key vectors Q (Query), K (Key) and V (Value) through matrix operation, and then the context vector z=softmax ((qk≡t)/V (d_k)) V is obtained based on the three vectors, wherein d_k is the dimension of the K vector. The essence of applying this context vector (attention mechanism) for prediction is that the averaging effect of the position information in the input sequence with the attention results in a reduction of the effective resolution, in order to solve this problem a Multi-headattention (MSA) mechanism is usually employed, i.e. multiple heads are used to calculate the values of multiple sets Q, K, V of vectors in parallel and to use them for prediction after compressing the information from these vectors.
The VIT cuts the two-dimensional image into fixed-size patches, each Patch being a small color picture, with three channels of RGB, i.e., each Patch being a tensor. The next step is Vectorization, i.e. stretching the individual tensors into vectors. Obtaining class marks for patch emplacement through a layer of linear transformation, attaching class marks to the heads of each patch emplacement, adding the class marks with a position vector to obtain a final emplacement vector, and taking the final emplacement vector as input of a transducer for training and prediction, wherein the class marks and the position vector are learnable vectors, the class marks are used for prediction classification, and the position vector is used for representing the position information of each patch in an image;
VIT step splitting:
step1 image blocking expansion, providing a patch16 and a patch32 blocking method, for an input image xE R H×W×C The method is characterized by comprising the following steps of:
step2 Patch embedding (Patch embedding), adding class mark x class And spatial position information E pos The expression at this time is: wherein x is class Representing embedded learnable categories, E representing feature mapping of full connection layer, N+1 representing classification features of newly added patch
Step3, transducer calculation processing: wherein MSA represents residual, MLP represents residual block, LN is LayerNorm normalized layer, z l-1 Representing the output of the last sub-encoder.
MSA layer: z's' l =MSA(LN(z l-1 ))+z l-1 ,l=1,...,L
MLP layer: z is Z l =MLP(LN(z′ l ))+z′ l ,l=1,...,L
Step4, category calculation processing:
when pre-trained on a common ImageNet-21k dataset, viT reached or exceeded the latest level on multiple image recognition benchmarks.
An accuracy of 88.55% was achieved on ImageNet
An accuracy of 90.72% was achieved on ImageNet-ReaL
A 94.55% accuracy was achieved on CIFAR-100.
Vision Transformer (ViT), a visual transducer, reshapes the image xεRH xW xC into a series of flattened two-dimensional slices xp εRNX (P2-C) for processing the two-dimensional image. (H, W) is the resolution of the original image, and (P, P) is the resolution of each image slice. n=hw/P2 is then the effective sequence length of the transformer. Since the transformer uses a constant width through all its layers, one trainable linear projection maps each vectorized path onto the model dimension D, which they refer to as slice embedding its output.
The visual transformer employs a learnable embedding of the embedded slice sequence, the state of which at the output of the transform encoder is represented as an image. The size of the classification head is the same during the pre-training and fine tuning. Furthermore, a 1D position embedding is added in slice embedding to preserve position information. They explored the location embedding of different 2D-aware variants and did not obtain significant benefits compared to standard 1D location embedding. The jointly embedded severs serve as the input to the encoder. Notably, the visual transducer uses only the encoder of the standard transducer, while the MLP header is followed by the output of the transducer encoder.
Typically, vision Transformer is first pre-trained on a large dataset and fine-tuned for smaller downstream tasks. To this end, the pre-trained pre-header is removed, and a zero-initialized DxK feed-forward layer is added, where K is the number of downstream classes. It is often beneficial to fine tune at a higher resolution than pre-training. When a higher resolution image is input, the slice size remains unchanged, thereby obtaining a larger effective sequence length. Vision Transformer can handle arbitrary sequence lengths, however, the pre-trained position embedding may no longer be meaningful. Thus, the pre-trained positions are interpolated in two dimensions from their positions embedded in the original image. It is noted that this resolution adjustment and slice extraction is the only point at which induced deviations in the two-dimensional structure of the image are manually injected into the visual transducer.
The traditional CNN is used for learning 2D characteristic representation, CNN output tiling is used as a transducer input and comprises position coding information of the characteristic, and prediction information is obtained through coding and decoding by a transducer self-attention mechanism.
According to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.
FIG. 2 is a flowchart of an embodiment of a pathological section data identification system based on a ViT model according to the present invention; as shown in fig. 2, the pathological section data identification system based on ViT model provided by the embodiment of the invention comprises the following steps:
an acquisition module 10 for acquiring full-field pathological section data;
a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data;
a first construction module 30 for constructing a training set based on the preprocessed pathological section data;
a second construction module 40, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module 50 is configured to input the training set into the ViT model for training, to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
The ViT model is also used to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
According to the ViT model-based pathological section data identification system, full-view pathological section data are acquired through the acquisition module 10; a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data; constructing, by a first construction module 30, a training set based on the pre-processed pathological section data; constructing a ViT model by a second constructing module 40, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into the ViT model through a training module 50 for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.
Fig. 3 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 3, an electronic device 60 includes: a processor 601 (processor), a memory 602 (memory), and a bus 603;
wherein, the processor 601 and the memory 602 complete communication with each other through the bus 603;
the processor 601 is configured to invoke program instructions in the memory 602 to perform the methods provided by the method embodiments described above, including, for example: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (10)

1. A ViT model-based pathological section data identification method, which is characterized by comprising the following steps:
acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;
constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value;
inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
2. The method for identifying pathological section data based on ViT model according to claim 1, wherein the preprocessing of the full-field pathological section data to obtain preprocessed pathological section data comprises:
acquiring full-view pathological section data, wherein the full-view pathological section data comprises full-view pathological sections and pathological labeling data;
reading the downsampling multiplying power of the full-view pathological section, and ensuring the integrity of the color features, texture features, shape features and spatial features of the full-view pathological section;
the pathological labeling data are read, and the labeling information is processed to obtain a point coordinate set;
establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of the RO I areas of the single full-view pathological section;
HSV space information of the RO I area is converted, and the color space of the full-view pathological section is ensured to be RGB.
3. The ViT model-based pathological section data identification method as claimed in claim 1, wherein the constructing a training set based on the preprocessed pathological section data includes;
and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.
4. The method of claim 1, wherein the constructing ViT a model, wherein the ViT model includes a multi-head attention mechanism, wherein the computing multiple sets of key vectors in parallel using multiple attention heads results in a computed value, and wherein the predicting based on the computed value comprises:
and (5) carrying out one-step iterative solution by a gradient descent method to obtain a minimum loss function and an optimal ViT model parameter value.
5. The method for identifying pathological section data based on the ViT model according to claim 1, wherein the step of inputting the training set into the ViT model for training to obtain a trained ViT model comprises:
dividing the preprocessing pathological section data into a training set, a verification set and a test set;
training the ViT model based on the training set;
performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;
and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.
6. The method for identifying pathological section data based on ViT model according to claim 1, wherein the identifying pathological section data based on ViT trained model to obtain an identification result comprises:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
7. A ViT model-based pathological section data identification system, comprising:
the acquisition module is used for acquiring full-view pathological section data;
the preprocessing module is used for preprocessing the full-view pathological section data to obtain preprocessed pathological section data;
the first construction module is used for constructing a training set based on the preprocessing pathological section data;
a second construction module, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;
the training module is used for inputting the training set into the ViT model for training to obtain a trained ViT model;
and identifying pathological section data based on the trained ViT model to obtain an identification result.
8. The ViT model-based pathological section data recognition system of claim 7, wherein the ViT model is further configured to:
cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;
attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;
inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 6.
CN202310963241.5A 2023-08-02 2023-08-02 ViT model-based pathological section data identification method and system Pending CN117152554A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310963241.5A CN117152554A (en) 2023-08-02 2023-08-02 ViT model-based pathological section data identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310963241.5A CN117152554A (en) 2023-08-02 2023-08-02 ViT model-based pathological section data identification method and system

Publications (1)

Publication Number Publication Date
CN117152554A true CN117152554A (en) 2023-12-01

Family

ID=88899649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310963241.5A Pending CN117152554A (en) 2023-08-02 2023-08-02 ViT model-based pathological section data identification method and system

Country Status (1)

Country Link
CN (1) CN117152554A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392468A (en) * 2023-12-11 2024-01-12 山东大学 Cancer pathology image classification system, medium and equipment based on multi-example learning
CN117457235A (en) * 2023-12-22 2024-01-26 首都医科大学附属北京友谊医院 Pathological damage mode prediction method and device, storage medium and electronic equipment
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392468A (en) * 2023-12-11 2024-01-12 山东大学 Cancer pathology image classification system, medium and equipment based on multi-example learning
CN117392468B (en) * 2023-12-11 2024-02-13 山东大学 Cancer pathology image classification system, medium and equipment based on multi-example learning
CN117457235A (en) * 2023-12-22 2024-01-26 首都医科大学附属北京友谊医院 Pathological damage mode prediction method and device, storage medium and electronic equipment
CN117457235B (en) * 2023-12-22 2024-03-19 首都医科大学附属北京友谊医院 Pathological damage mode prediction method and device, storage medium and electronic equipment
CN117689044A (en) * 2024-02-01 2024-03-12 厦门大学 Quantification method suitable for vision self-attention model

Similar Documents

Publication Publication Date Title
CN111950453B (en) Random shape text recognition method based on selective attention mechanism
CN117152554A (en) ViT model-based pathological section data identification method and system
Lu et al. Sparse coding from a Bayesian perspective
CN111462120A (en) Defect detection method, device, medium and equipment based on semantic segmentation model
CN111127364A (en) Image data enhancement strategy selection method and face recognition image data enhancement method
CN114418030A (en) Image classification method, and training method and device of image classification model
CN116258874A (en) SAR recognition database sample gesture expansion method based on depth condition diffusion network
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN116503399A (en) Insulator pollution flashover detection method based on YOLO-AFPS
WO2020194792A1 (en) Search device, learning device, search method, learning method, and program
CN115393690A (en) Light neural network air-to-ground observation multi-target identification method
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
CN114973136A (en) Scene image recognition method under extreme conditions
CN116612382A (en) Urban remote sensing image target detection method and device
CN115861595B (en) Multi-scale domain self-adaptive heterogeneous image matching method based on deep learning
CN116824138A (en) Interactive image segmentation method and device based on click point influence enhancement
CN115457638A (en) Model training method, data retrieval method, device, equipment and storage medium
CN115424275A (en) Fishing boat brand identification method and system based on deep learning technology
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
CN113920311A (en) Remote sensing image segmentation method and system based on edge auxiliary information
CN114385831B (en) Knowledge-graph relation prediction method based on feature extraction
CN117058437B (en) Flower classification method, system, equipment and medium based on knowledge distillation
CN111382761A (en) CNN-based detector, image detection method and terminal
CN112348806B (en) No-reference digital pathological section ambiguity evaluation method
CN116030347B (en) High-resolution remote sensing image building extraction method based on attention network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination