CN117152554A

CN117152554A - ViT model-based pathological section data identification method and system

Info

Publication number: CN117152554A
Application number: CN202310963241.5A
Authority: CN
Inventors: 杜登斌; 陈昊; ***
Original assignee: Wuzheng Intelligent Technology Beijing Co ltd
Current assignee: Wuzheng Intelligent Technology Beijing Co ltd
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-12-01

Abstract

The embodiment of the invention discloses a pathological section data identification method, a system, electronic equipment and a storage medium based on a ViT model, wherein the pathological section data identification method based on the ViT model comprises the following steps: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into a ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.

Description

ViT model-based pathological section data identification method and system

Technical Field

The invention relates to the technical field of computers, in particular to a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium.

Background

The traditional pathological analysis diagnosis needs a specialized pathologist to search for target areas and cells one by one under a microscope, a pathological section usually comprises tens of thousands of cells, but the target areas and cells related to diseases only occupy a very small part, and a great amount of redundant information can cause serious 'reading fatigue' to the pathologist.

AI can assist pathologist's more accurate judgement pathological section data of higher efficiency, reduces misdiagnosis rate and missed diagnosis rate, but at the present stage will transmit the former to the vision field and can appear two problems, firstly, compare with one-dimensional text, process two-dimensional image and can lead to the calculated amount to increase greatly, secondly involves more size, noise in the image processing, this requires the model to have stronger feature extraction ability.

What is needed is a model that can assist a pathologist in more efficient and accurate determination of pathological slice data with low computational effort and a strong feature extraction capability.

Disclosure of Invention

The embodiment of the invention aims to provide a ViT model-based pathological section data identification method, a ViT model-based pathological section data identification system, electronic equipment and a storage medium, which are used for solving the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.

In order to achieve the above objective, an embodiment of the present invention provides a pathological section data identification method based on ViT model, which specifically includes:

acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;

constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value;

inputting the training set into the ViT model for training to obtain a trained ViT model;

and identifying pathological section data based on the trained ViT model to obtain an identification result.

Based on the technical scheme, the invention can also be improved as follows:

further, the ViT model-based pathological section data identification method further comprises the following steps:

the preprocessing of the full-view pathological section data to obtain preprocessed pathological section data comprises the following steps:

acquiring full-view pathological section data, wherein the full-view pathological section data comprises full-view pathological sections and pathological labeling data;

reading the downsampling multiplying power of the full-view pathological section, and ensuring the integrity of the color features, texture features, shape features and spatial features of the full-view pathological section;

the pathological labeling data are read, and the labeling information is processed to obtain a point coordinate set;

establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of single full-view pathological section ROI areas;

HSV space information of the ROI area is converted, and the color space of the full-view pathological section is ensured to be RGB.

Further, the constructing a training set based on the preprocessed pathological section data includes;

and performing off-line enhancement and on-line enhancement on the training set through an enhancement algorithm.

Further, the building ViT model, wherein the ViT model includes a multi-head attention mechanism, uses multiple attention heads to calculate multiple sets of key vectors in parallel to obtain a calculated value, predicts based on the calculated value, and includes:

and (5) carrying out one-step iterative solution by a gradient descent method to obtain a minimum loss function and an optimal ViT model parameter value.

Further, the inputting the training set into the ViT model for training, to obtain a trained ViT model, includes:

dividing the preprocessing pathological section data into a training set, a verification set and a test set;

training the ViT model based on the training set;

performing performance evaluation on the ViT model after training based on the verification set to obtain a ViT model meeting performance conditions;

and evaluating the segmentation result of the ViT model meeting the performance condition based on the test set to obtain an evaluation index corresponding to the ViT model.

Further, the identifying the pathological section data based on the trained ViT model to obtain an identifying result includes:

cutting a two-dimensional image input into a ViT model into patches with fixed sizes, wherein each Patch is a tensor, stretching each tensor into a vector, and obtaining Patch casting through one-layer linear transformation;

attaching a category mark to each patch mapping head, and adding the category mark with the position vector to obtain a final mapping vector;

inputting the emmbedding vector into a transducer, and obtaining prediction information through encoding and decoding by a transducer self-attention mechanism.

A ViT model-based pathological section data identification system, comprising:

the acquisition module is used for acquiring full-view pathological section data;

the preprocessing module is used for preprocessing the full-view pathological section data to obtain preprocessed pathological section data;

the first construction module is used for constructing a training set based on the preprocessing pathological section data;

a second construction module, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;

the training module is used for inputting the training set into the ViT model for training to obtain a trained ViT model;

Further, the ViT model is also used to:

An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.

A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.

The embodiment of the invention has the following advantages:

according to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; identifying pathological section data based on the trained ViT model to obtain an identification result; the method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.

The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.

FIG. 1 is a flow chart of a pathological section data identification method based on a ViT model;

FIG. 2 is a block diagram of a system for identifying pathological section data based on ViT model according to the present invention;

fig. 3 is a schematic diagram of an entity structure of an electronic device according to the present invention.

Wherein the reference numerals are as follows:

the system comprises an acquisition module 10, a preprocessing module 20, a first construction module 30, a second construction module 40, a training module 50, an electronic device 60, a processor 601, a memory 602 and a bus 603.

Detailed Description

Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

Fig. 1 is a flowchart of an embodiment of a pathological section data identification method based on a ViT model, and as shown in fig. 1, the pathological section data identification method based on a ViT model provided by the embodiment of the invention comprises the following steps:

s101, acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data;

specifically, full-view pathological section data are obtained, wherein the full-view pathological section data comprise full-view pathological sections and pathological labeling data; due to the different scanning instruments, the method is divided into 40-fold mirror and 20-fold mirror kfb pathological sections.

Reading the downsampling multiplying power of the full-view pathological section by using kf cutter or converting the kf cutter into SVS, and ensuring the integrality of the color features, texture features, shape features and spatial features of the full-view pathological section;

reading pathology marking data, processing marking information to obtain a point coordinate set, scaling the point coordinate set by using the same downsampling multiplying power, and setting a negative coordinate as 0;

establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and a points coordinate set, respectively obtaining the maximum (x, y) and the minimum (x, y) as the boundary of the ROI region, cutting and removing the blank region, and obtaining one of the single full-view pathological section ROI regions;

converting HSV space information of the ROI area, ensuring that the color space of the full-field pathological section is RGB, and storing and finishing the conversion into a data set;

In view of the characteristics of rotational symmetry, chemical preparation difference, scarcity of data volume and the like of pathological images, firstly, the data volume of each stage is counted, and different enhancement modes are designed according to the data volume.

The enhancement mode is as follows: setting an angle to generate a rotation matrix, and rotating by 90 degrees, 180 degrees and 270 degrees through affine transformation; horizontally and vertically turning over; random hue, saturation, brightness, contrast; 30% overlap cut in the horizontal and vertical directions, etc.

On-line Mixup, the following formula: generating a new image by mixing the random (0-1) proportion in a linear interpolation mode, wherein (xi, yi) and (xj, yj) are two samples randomly extracted from training data, and lambda epsilon [0,1 ]]It should be noted that the label loading is onehot coding, and the final classification result is not affected by coefficient weighting.

S102, constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values;

specifically, the minimum loss function and the optimal ViT model parameter value are obtained through one-step iterative solution by a gradient descent method.

The cross entropy loss formula is as follows: wherein y is _i For the tag value, y' _i Is a predicted value.

Setting smaller initial learning rate and momentum parameters, adding weight attenuation bias to 5E-5, waiting for model stabilization, and properly adding learning rate to prevent model oscillation.

Stable learning rate and cosine function addition, better solvingThe multi-peak optimization function is characterized in that the learning rate is controlled to periodically change along with the cosine function, the period is 32, the local optimal solution is skipped, and the whole optimal solution is searched, wherein the formula is as follows: wherein eta _t Represents the current learning rate, eta _min Represents the minimum value of learning rate, eta _max Represent the maximum learning rate, T _cur Representing the current epoch, T _i Indicating the maximum epoch.

The Macro-F1 is used as an evaluation index, and precision and recovery of each category are calculated respectively by using the following formula, so that a corresponding F1score is calculated, and category averaging is used for calculating the Macro-F1. Wherein: recall indicates Recall, TP indicates true positives, and FN indicates false negatives.

S103, inputting the training set into a ViT model for training to obtain a trained ViT model;

specifically, the preprocessing pathological section data is divided into a training set, a verification set and a test set;

training the ViT model based on the training set;

S104, identifying pathological section data based on a trained ViT model to obtain an identification result;

specifically, a two-dimensional image input into a ViT model is cut into patches with fixed sizes, each Patch is a tensor, each tensor is stretched into a vector, and Patch embedding is obtained through one-layer linear transformation;

When the attention mechanism is applied in the transducer, firstly, the input is mapped to three Key vectors Q (Query), K (Key) and V (Value) through matrix operation, and then the context vector z=softmax ((qk≡t)/V (d_k)) V is obtained based on the three vectors, wherein d_k is the dimension of the K vector. The essence of applying this context vector (attention mechanism) for prediction is that the averaging effect of the position information in the input sequence with the attention results in a reduction of the effective resolution, in order to solve this problem a Multi-headattention (MSA) mechanism is usually employed, i.e. multiple heads are used to calculate the values of multiple sets Q, K, V of vectors in parallel and to use them for prediction after compressing the information from these vectors.

The VIT cuts the two-dimensional image into fixed-size patches, each Patch being a small color picture, with three channels of RGB, i.e., each Patch being a tensor. The next step is Vectorization, i.e. stretching the individual tensors into vectors. Obtaining class marks for patch emplacement through a layer of linear transformation, attaching class marks to the heads of each patch emplacement, adding the class marks with a position vector to obtain a final emplacement vector, and taking the final emplacement vector as input of a transducer for training and prediction, wherein the class marks and the position vector are learnable vectors, the class marks are used for prediction classification, and the position vector is used for representing the position information of each patch in an image;

VIT step splitting:

step1 image blocking expansion, providing a patch16 and a patch32 blocking method, for an input image xE R ^H×W×C The method is characterized by comprising the following steps of:

step2 Patch embedding (Patch embedding), adding class mark x _class And spatial position information E _pos The expression at this time is: wherein x is _class Representing embedded learnable categories, E representing feature mapping of full connection layer, N+1 representing classification features of newly added patch

Step3, transducer calculation processing: wherein MSA represents residual, MLP represents residual block, LN is LayerNorm normalized layer, z _l-1 Representing the output of the last sub-encoder.

MSA layer: z's' _l ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1,...,L

MLP layer: z is Z _l ＝MLP(LN(z′ _l ))+z′ _l ,l＝1,...,L

Step4, category calculation processing:

when pre-trained on a common ImageNet-21k dataset, viT reached or exceeded the latest level on multiple image recognition benchmarks.

An accuracy of 88.55% was achieved on ImageNet

An accuracy of 90.72% was achieved on ImageNet-ReaL

A 94.55% accuracy was achieved on CIFAR-100.

Vision Transformer (ViT), a visual transducer, reshapes the image xεRH xW xC into a series of flattened two-dimensional slices xp εRNX (P2-C) for processing the two-dimensional image. (H, W) is the resolution of the original image, and (P, P) is the resolution of each image slice. n=hw/P2 is then the effective sequence length of the transformer. Since the transformer uses a constant width through all its layers, one trainable linear projection maps each vectorized path onto the model dimension D, which they refer to as slice embedding its output.

The visual transformer employs a learnable embedding of the embedded slice sequence, the state of which at the output of the transform encoder is represented as an image. The size of the classification head is the same during the pre-training and fine tuning. Furthermore, a 1D position embedding is added in slice embedding to preserve position information. They explored the location embedding of different 2D-aware variants and did not obtain significant benefits compared to standard 1D location embedding. The jointly embedded severs serve as the input to the encoder. Notably, the visual transducer uses only the encoder of the standard transducer, while the MLP header is followed by the output of the transducer encoder.

Typically, vision Transformer is first pre-trained on a large dataset and fine-tuned for smaller downstream tasks. To this end, the pre-trained pre-header is removed, and a zero-initialized DxK feed-forward layer is added, where K is the number of downstream classes. It is often beneficial to fine tune at a higher resolution than pre-training. When a higher resolution image is input, the slice size remains unchanged, thereby obtaining a larger effective sequence length. Vision Transformer can handle arbitrary sequence lengths, however, the pre-trained position embedding may no longer be meaningful. Thus, the pre-trained positions are interpolated in two dimensions from their positions embedded in the original image. It is noted that this resolution adjustment and slice extraction is the only point at which induced deviations in the two-dimensional structure of the image are manually injected into the visual transducer.

The traditional CNN is used for learning 2D characteristic representation, CNN output tiling is used as a transducer input and comprises position coding information of the characteristic, and prediction information is obtained through coding and decoding by a transducer self-attention mechanism.

According to the ViT model-based pathological section data identification method, full-view pathological section data are acquired, the full-view pathological section data are preprocessed to obtain preprocessed pathological section data, and a training set is constructed based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The method solves the problems of large calculated amount and poor feature extraction capability when the model processes the two-dimensional image in the prior art.

FIG. 2 is a flowchart of an embodiment of a pathological section data identification system based on a ViT model according to the present invention; as shown in fig. 2, the pathological section data identification system based on ViT model provided by the embodiment of the invention comprises the following steps:

an acquisition module 10 for acquiring full-field pathological section data;

a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data;

a first construction module 30 for constructing a training set based on the preprocessed pathological section data;

a second construction module 40, configured to construct a ViT model, where the ViT model includes a multi-head attention mechanism, and calculate multiple sets of key vectors in parallel using multiple attention heads to obtain a calculated value, and predict based on the calculated value;

the training module 50 is configured to input the training set into the ViT model for training, to obtain a trained ViT model;

The ViT model is also used to:

According to the ViT model-based pathological section data identification system, full-view pathological section data are acquired through the acquisition module 10; a preprocessing module 20, configured to preprocess the full-field pathological section data to obtain preprocessed pathological section data; constructing, by a first construction module 30, a training set based on the pre-processed pathological section data; constructing a ViT model by a second constructing module 40, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain calculated values, and predicting based on the calculated values; inputting the training set into the ViT model through a training module 50 for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result. The ViT model-based pathological section data identification method solves the problems of large calculated amount and poor feature extraction capability when a model processes a two-dimensional image in the prior art.

Fig. 3 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 3, an electronic device 60 includes: a processor 601 (processor), a memory 602 (memory), and a bus 603;

wherein, the processor 601 and the memory 602 complete communication with each other through the bus 603;

the processor 601 is configured to invoke program instructions in the memory 602 to perform the methods provided by the method embodiments described above, including, for example: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.

The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: acquiring full-view pathological section data, preprocessing the full-view pathological section data to obtain preprocessed pathological section data, and constructing a training set based on the preprocessed pathological section data; constructing a ViT model, wherein the ViT model comprises a multi-head attention mechanism, calculating a plurality of groups of key vectors in parallel by using a plurality of attention heads to obtain a calculated value, and predicting based on the calculated value; inputting the training set into the ViT model for training to obtain a trained ViT model; and identifying pathological section data based on the trained ViT model to obtain an identification result.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.

The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.

While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims

1. A ViT model-based pathological section data identification method, which is characterized by comprising the following steps:

2. The method for identifying pathological section data based on ViT model according to claim 1, wherein the preprocessing of the full-field pathological section data to obtain preprocessed pathological section data comprises:

establishing a whole-graph blank mask, carrying out bitwise operation on the blank mask and the points coordinate set, cutting and removing blank areas, and obtaining one of the RO I areas of the single full-view pathological section;

HSV space information of the RO I area is converted, and the color space of the full-view pathological section is ensured to be RGB.

3. The ViT model-based pathological section data identification method as claimed in claim 1, wherein the constructing a training set based on the preprocessed pathological section data includes;

4. The method of claim 1, wherein the constructing ViT a model, wherein the ViT model includes a multi-head attention mechanism, wherein the computing multiple sets of key vectors in parallel using multiple attention heads results in a computed value, and wherein the predicting based on the computed value comprises:

5. The method for identifying pathological section data based on the ViT model according to claim 1, wherein the step of inputting the training set into the ViT model for training to obtain a trained ViT model comprises:

training the ViT model based on the training set;

6. The method for identifying pathological section data based on ViT model according to claim 1, wherein the identifying pathological section data based on ViT trained model to obtain an identification result comprises:

7. A ViT model-based pathological section data identification system, comprising:

8. The ViT model-based pathological section data recognition system of claim 7, wherein the ViT model is further configured to:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the computer program is executed.

10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 6.