CN114820481A

CN114820481A - Lung cancer histopathology full-section EGFR state prediction method based on converter

Info

Publication number: CN114820481A
Application number: CN202210385274.1A
Authority: CN
Inventors: 祝新宇; 史骏; 束童; 唐昆铭; 孙宇; 杨志鹏; 张元�; 王垚; 郑利平
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-04-13
Filing date: 2022-04-13
Publication date: 2022-07-29

Abstract

The invention discloses a lung cancer histopathology full-section EGFR state prediction method based on a converter, which comprises the following steps: 1. acquiring a lung cancer histopathology full-section data set and preprocessing the data set; 2. the method comprises the following steps that a first stage is carried out, wherein a visual converter network model capable of predicting the positivity and negativity of an image block is established and trained; 3. predicting the positive and negative categories of the image blocks in the data set by using a trained visual converter network model capable of predicting the positive and negative of the image blocks, screening out negative image blocks, and generating an EGFR mutation type data set by using the positive image blocks; 4. establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the generated EGFR mutation type data set; 5. and (4) completing the prediction of the full-section EGFR state by using the first and second-stage trained models. The invention uses two visual converter network models to form a backbone network, thereby effectively reducing the error rate of the pseudo label and improving the accuracy of prediction.

Description

Lung cancer histopathology full-section EGFR state prediction method based on converter

Technical Field

The invention relates to the technical field of computer vision, in particular to a lung cancer histopathology full-section EGFR state prediction method based on a converter.

Background

EGFR (Epidermal growth factor receptor) is a transmembrane protein with cytoplasmic kinase activity that transduces important growth factor signals from the extracellular environment to the cell. Lung adenocarcinoma is a common histological type of lung cancer, and the discovery of Epidermal Growth Factor Receptor (EGFR) mutations revolutionized its therapeutic approach. Positive lung adenocarcinoma EGFR states can be broadly classified into mutant (mutant) and wild (wild) types, and positive EGFR states other than mutant and wild types are classified as other (other) types in this patent in order to ensure the accuracy of state classification. In first-line therapy, detection of EGFR mutations is crucial, as there are significant differences in medication and treatment for different types of EGFR status. Therefore, accurately judging the EGFR state plays an important role in patient treatment and doctor medication.

Sequencing of mutations from biopsies has become the gold standard for detection of EGFR mutations. In the actual diagnosis and treatment process, a pathologist needs to visually check tens of thousands of cells under a microscope. Each pathologist needs to process a large number of patient specimens every day, so that the fatigue of film reading is often caused, and misdiagnosis sometimes happens. Therefore, an efficient and quantitative method for predicting the EGFR state of the lung cancer histopathology full-section is needed, so that the burden of a pathologist on reading the lung cancer histopathology full-section is reduced, and the accuracy of predicting the EGFR state of the lung cancer histopathology full-section is improved. At present, the algorithm for predicting the EGFR state of the lung cancer histopathology full-section mainly takes a supervised learning classification algorithm based on deep learning as a main algorithm.

In recent years, deep learning models have achieved remarkable effects in various fields of computer vision, and some researchers have applied convolutional neural networks to lung cancer histopathology full-section EGFR state prediction tasks, such as residual error network (ResNet) and dense convolutional network (densenert), but they rely on inductive bias, cannot dynamically and adaptively model, cannot capture features between EGFR receptors on a spatial scale, and are difficult to accurately predict lung cancer EGFR states.

Disclosure of Invention

The invention aims to make up for the defects of the prior art, provides a converter-based lung cancer histopathology full-section EGFR state prediction method, aims to solve the problem that the lung cancer histopathology full-section EGFR state prediction is difficult due to the fact that pathology images are complex in structure, variable in type and rich in characteristic information, and obtains the internal long-distance dependence relationship of the lung cancer histopathology full-section images by constructing a two-stage network based on a visual converter, so that corresponding representations of EGFR receptors of different types are obtained, and accurate and efficient prediction of the lung cancer histopathology full-section EGFR state is completed.

The invention is realized by the following technical scheme:

a lung cancer histopathology full-section EGFR state prediction method based on a converter specifically comprises the following steps:

(1) acquiring a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image and preprocessing the lung cancer histopathology full-section data set;

(2) establishing and utilizing the data set in the step (1) to train a visual converter network model capable of predicting the positivity and negativity of the image block;

(3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks;

(4) establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3);

(5) and (4) utilizing the visual converter network model which is established in the step (2) and the step (4) and can predict the positivity and negativity of the image block and the visual converter network model which can predict the EGFR mutation type of the image block to complete the prediction of the full-slice EGFR state.

Acquiring and preprocessing a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image in the step (1), wherein the method specifically comprises the following steps:

the lung cancer histopathology full-slice images are sorted according to negative and positive labels, the full-slice images are subjected to blank background area removal and blocking treatment, a plurality of image blocks are obtained through random sampling and are marked as

Wherein the content of the first and second substances,

representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y is _i Representing the ith image block X _i Assigning the positive and negative labels of the full-slice to the image blocks as the pseudo labels thereof according to the corresponding categories, thereby obtaining each image block and the positive and negative classification thereof; 1,2, …, N; n denotes the number of image blocks.

Establishing and training a visual converter network model capable of predicting the positivity and negativity of an image block by using the data set in the step (1), wherein the method specifically comprises the following steps:

a visual transformer ViT composed of L encoders is constructed as a first stage network, each encoder including: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;

step 2.1, for image block X _i Performing block processing to obtain a sequence containing m image blocks

Wherein the content of the first and second substances,

representing image blocks X _i The jth image block of (1);

representing image blocks X _i Width and height of each image block after blocking, and m is P ² /p ² ；

Step 2.2, setting a learnable classification mark x _class And obtaining m image blocks and a classification mark x by using the formula (1) _class D-dimensional embedded representation z _l0 As input to the 1 st encoder;

in the formula (1), E _pos Representing m image blocks and a class mark x _class In image block X _i (iii) a spatial position of; e represents the set embedding matrix;

step 2.3, obtaining m image blocks and classification marks x by using the formula (2) _class Output z 'of multi-head attention device layer at l-th encoder' _l ；

z' _l ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1,…,L (2)

In formula (2), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of _l-1 Represents the output of the l-1 st encoder;

step 2.4, obtaining the output z of the multi-layer perceptron of the first encoder by using the formula (3) _l ；

z _l ＝MLP(LN(z′ _l ))+z′ _l ,l＝1,…,L (3)

In formula (3), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;

step 2.5, output z of the multi-layer perceptron of the first encoder _l Multiple attention layers fed to the l +1 st encoder gave output z' _l+1 Z 'is further prepared' _l+1 The multi-layer sensor which is sent to the (l + 1) th encoder obtains an output z _l+1 Repeating the step 2.5 times until the L encoder, and obtaining the output z of the L encoder _L ；

Step 2.6, obtaining the output z 'after normalization treatment by utilizing the formula (4)' _L And extracting the classification mark x _class Corresponding D-dimensional features

z′ _L ＝LN(z _L ) (4)

In formula (4), LN (·) represents the processing of the normalization layer;

step 2.7, performing linear transformation on the characteristics by using the formula (5) to obtain an output result pos of the linear classifier _pred ；

In formula (5), Linear (·) represents a Linear classification function;

c represents negative/positive;

2.8, constructing a cross entropy loss function L by using the formula (6), and training a first-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting whether an image block is positive or negative is obtained;

in the formula (6), y _label And N is the total number of the image blocks.

And (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out the negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:

sorting lung cancer histopathology full-slice images according to EGFR mutation state class labels, removing blank background areas of the full-slice images, partitioning, randomly sampling to obtain a plurality of image blocks, sending the image blocks into a trained visual converter network model capable of predicting the positivity and negativity of the image blocks, predicting the positivity and negativity class of each image block, screening out negative image blocks to obtain n positive image blocks, generating an EGFR mutation type data set, and recording the EGFR mutation type data set as an EGFR mutation type data set

Wherein the content of the first and second substances,

representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' _i Denotes image block X' _i The corresponding class, the EGFR mutation class label in the dataset; 1,2, …, n; n denotes the number of image blocks.

Establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:

constructing a visual transformer ViT of S encoders as a second stage network, each encoder comprising: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;

step 4.1, image block X' _i Performing block processing to obtain a sequence containing m' image blocks

Wherein the content of the first and second substances,

denotes an image block X' _i The jth image block of (1);

p × p denotes an image block X' _i Width and height of each image block after blocking processing, and m ═ P ² /p ² ；

Step 4.2, setting a learnable classification mark x' _class M image blocks and a classification mark x 'are obtained by using the formula (7)' _class D-dimensional embedded representation z _s0 As input to the 1 st encoder;

in formula (7), E' _pos Representing m 'image blocks and class labels x' _class In image block X' _i (iii) a spatial position of; e' represents the set embedding matrix;

step 4.3, m ' image blocks and classification marks x ' are obtained by utilizing the formula (8) ' _class Output z 'of multi-head attention device layer at s-th encoder' _l ：

z′ _s ＝MSA(LN(z _s-1 ))+z _s-1 ,s＝1,…,S (8)

In formula (8), MSA (-) indicates the processing of a multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of _s-1 Represents the output of the s-1 th encoder;

step 4.4, obtaining the output z of the multi-layer perceptron of the s encoder by using the formula (9) _s ；

z _s ＝MLP(LN(z′ _s ))+z′ _s ,s＝1,…,S (9)

In formula (9), MLP (·) represents the processing of the multilayer perceptron; LN (-) represents the processing of the normalization layer;

step 4.5, output z of the multi-layer perceptron of the s encoder _s The multi-head attention-making layer sent to the (s + 1) th encoder obtains an output z' _s+1 Z 'is further prepared' _s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z _s+1 Repeating the step 4.5 for a plurality of times until the S encoder,obtaining the output z of the S encoder _S ；

Step 4.6, obtaining the normalized output z 'by utilizing the formula (10)' _S And extracting a classification mark x' _class Corresponding D-dimensional features

z′ _S ＝LN(z _S ) (10)

In formula (10), LN (·) represents the processing of the normalization layer;

step 4.7, performing linear transformation on the characteristics by using the formula (11) to obtain an output result egfr of the linear classifier _pred ：

In formula (11), Linear (·) represents a Linear classification function;

c represents the number of EGFR state classes;

step 4.8, constructing a cross entropy loss function L by using the formula (12), and training a second-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting the EGFR mutation type of the image block is obtained:

in the formula (12), y _label Is the EGFR state pseudo label of the image block, and N is the total number of the image blocks.

And (5) completing prediction of the full-slice EGFR state by using the visual converter network model capable of predicting the negativity and positivity of the image block and the visual converter network model capable of predicting the EGFR mutation type of the image block, which are established in the steps (2) and (4), and specifically comprising the following steps:

step 5.1, removing blank background areas from the lung cancer histopathology full-section images and carrying out blocking processing to obtain a plurality of image blocks, and recording the image blocks as sequences (x) ₁ ,x ₂ ,…,x _j ,…,x _m )；

Step 5.2, image block (x) ₁ ,x ₂ ,…,x _j ,…,x _m ) Sending the image blocks into the visual converter network model capable of predicting the negative and positive of the image blocks, predicting the negative and positive types of the image blocks, and screening out the negative image blocks to obtain a positive image block sequence (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) (ii) a Setting a positive and negative classification threshold t, and calculating the proportion t of the positive image block according to the formula (13) _pos Comparing the classification threshold t with the positive image block ratio t _pos Determining the positive and negative classification of the whole section;

step 5.3, carrying out next prediction on the total section which is predicted to be positive in the step 5.2; all-slice positive image block (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) Inputting the visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) _i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n _i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states;

the invention has the advantages that:

1. according to the lung cancer histopathology full-section image feature learning method, the vision converter is used for carrying out feature learning on the lung cancer histopathology full-section image, the vision converter can carry out dynamic self-adaptive modeling, local and global features of the image are captured based on an attention mechanism, and feature representation capability of the lung cancer histopathology full-section image is improved;

2. the invention utilizes the vision converter to learn the remote dependence relationship in the image, thereby establishing the dependence relationship among all parts of the lung cancer histopathology full-section image and further improving the EGFR state prediction accuracy.

3. According to the invention, two vision converters ViT are used to form a backbone network, the first vision converter completes classification of positive and negative of lung cancer histopathology full-section images and extracts positive image blocks, and the second vision converter only performs EGFR type classification on the positive image blocks, so that the error rate of pseudo labels is effectively reduced, and the prediction accuracy is improved.

Drawings

FIG. 1 is a block diagram of a network in accordance with the present invention;

fig. 2 is a general flow diagram of the present invention.

Detailed Description

In this embodiment, a converter-based method for predicting EGFR states of lung cancer histopathology full-section images comprehensively considers the difficulty of EGFR state classification tasks, so that images are firstly input into a first visual converter network to predict positive and negative classifications of full-section images, and positive image blocks are input into a second visual converter network to predict EGFR states of the image blocks, thereby completing classification of EGFR states of lung cancer histopathology full-section images, as shown in fig. 1 and 2, the method specifically includes the following steps:

the method comprises the following steps of (1) acquiring a lung cancer histopathology full-section data set according to a lung cancer histopathology full-section image and preprocessing the lung cancer histopathology full-section data set, wherein the lung cancer histopathology full-section data set specifically comprises the following steps:

Wherein the content of the first and second substances,

is shown asThe method comprises the following steps that i image blocks, C represents the number of channels of the image blocks, and P multiplied by P represents the width and the height of each image block; y is _i Representing the ith image block X _i Assigning the positive and negative labels of the full-slice to the image blocks as the pseudo labels thereof according to the corresponding categories, thereby obtaining each image block and the positive and negative classification thereof; 1,2, …, N; n represents the number of image blocks; the data EGFR status used in this example contains 2 categories of negative and positive; the data set comprises 100 full slices, and 500 image blocks are randomly sampled on each full slice, so that N is 500, and each image block size is 256 × 256, so that C is 3, and P is 256; 80% of each class in the dataset was used for training and the remaining 20% was used for testing.

Step (2), establishing and utilizing the data set in the step (1) to train a visual converter network model capable of predicting the positivity and negativity of the image block, which specifically comprises the following steps:

a deep learning network model based on visual converters as shown in fig. 1 is established, and the deep learning network comprises 2 visual converter ViT networks. A visual transformer ViT composed of L encoders is constructed as a first stage network, each encoder including: two normalization layers, a multi-head attention mechanism layer and a multi-layer sensor;

step 2.1, for image block X _i Performing block division to obtain a sequence containing m image blocks

Wherein, the first and the second end of the pipe are connected with each other,

representing image blocks X _i The jth image block of (1);

representing image blocks X _i Width and height of each image block after blocking, and m is P ² /p ² (ii) a In the present embodiment, each image block size is 16 × 16, so p is 16, and m is 196.

Step 2.2, setting a learnable classification mark x _class And obtaining m image block sums by using the formula (1)Classification tag x _class D-dimensional embedded representation z _l0 As input to the 1 st encoder;

in the formula (1), E _pos Representing m image blocks and a class mark x _class In image block X _i (iii) a spatial position of; e represents the set embedding matrix; in this example, D is 768, x _class Is a 768-dimensional vector formed by 768 random numbers, E is a matrix formed by 768 × 768 random numbers, the number of rows of the matrix is 768, the number of columns is 768, and E is _pos The random number matrix is a matrix formed by 197 x 768 random numbers, the number of rows of the matrix is 197, and the number of columns of the matrix is 768.

Step 2.3, obtaining m image blocks and classification marks x by using the formula (2) _class Output z 'of multi-head attention device layer at l-th encoder' _l ：

z' _l ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1,…,L (2)

z _l ＝MLP(LN(z′ _l ))+z′ _l ,l＝1,…,L (3)

In formula (3), MLP (·) represents the processing of the multi-layered sensor, which in this embodiment includes two layers of networks and a GELU nonlinear activation layer; LN (-) represents the processing of the normalization layer;

step 2.5, output z of the multi-layer perceptron of the first encoder _l Multiple attention layers fed to the l +1 st encoder gave output z' _l+1 Z 'is further prepared' _l+1 The multi-layer sensor which is sent to the (l + 1) th encoder obtains an output z _l+1 Repeating the step 2.5 for multiple times until the L encoder is obtainedOutput z _L 。

z′ _L ＝LN(z _L ) (4)

In formula (4), LN (·) represents the processing of the normalization layer;

step 2.7, performing linear transformation on the characteristics by using the formula (5) to obtain an output result pos of the linear classifier _pred ：

In formula (5), Linear (·) represents a Linear classification function;

c represents negative/positive;

step 2.8, constructing a cross entropy loss function L by using the formula (6), and training a first-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so that a trained visual converter network model capable of predicting whether an image block is positive or negative is obtained:

in the formula (6), y _label And N is the total number of the image blocks.

And (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:

for lung cancer tissue diseaseSorting the full-slice images according to the EGFR mutation state class labels, removing blank background areas of the full-slice images, partitioning the full-slice images, randomly sampling to obtain a plurality of image blocks, sending the image blocks into the vision converter network model which is trained in the step 2 and can predict the positivity and negativity of the image blocks, predicting the positivity and negativity class of each image block, screening out the negative image blocks to obtain n positive image blocks, generating an EGFR mutation type data set, and recording the EGFR mutation type data set as

Wherein the content of the first and second substances,

representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' _i Denotes image block X' _i The corresponding class, i.e., EGFR mutation class label in the data set; 1,2, …, n; n represents the number of image blocks; the data EGFR mutation status used in this example contains 3 categories of Mutant, Wild and Other; the data set comprises 100 full slices, 500 image blocks are randomly sampled on each full slice, and each image block size is 256 × 256, so that C is 3 and P is 256; 80% of each class in the dataset was used for training and the remaining 20% was used for testing.

Step (4), establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:

step 4.1 of image block X' _i Performing block processing to obtain a sequence containing m' image blocks

Wherein the content of the first and second substances,

denotes an image block X' _i J (th) image ofA block;

p X p denotes image block X _i Width and height of each image block after blocking processing, and m ═ P ² /p ² (ii) a In the present embodiment, each image block size is 16 × 16, so p is 16, and m' is 196.

in formula (7), E' _pos Representing m 'image blocks and class labels x' _class In image block X' _i (iii) a spatial position of; e' represents the set embedding matrix; in this example, D-768, x' _class Is a 768-dimensional vector formed by 768 random numbers, E 'is a matrix formed by 768 × 768 random numbers, the number of rows of the matrix is 768, the number of columns is 768, E' _pos The random number generator is a matrix formed by 197 × 768 random numbers, the number of rows of the matrix is 197, and the number of columns of the matrix is 768.

z′ _s ＝MSA(LN(z _s-1 ))+z _s-1 ,s＝1,…,S (8)

step 4.4, obtaining the output z of the multi-layer perceptron of the s encoder by using the formula (9) _s ：

z _s ＝MLP(LN(z′ _s ))+z′ _s ,s＝1,…,S (9)

In equation (9), MLP (·) represents the processing of the multi-layered sensor, which in this embodiment comprises two layers of networks and a GELU nonlinear activation layer; LN (·) denotes the processing of the normalization layer;

step 4.5, output z of the multi-layer perceptron of the s encoder _s The multi-head attention-making layer sent to the (s + 1) th encoder obtains an output z' _s+1 Z 'is further prepared' _s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z _s+1 Repeating the step 4.5 times until the S encoder, and obtaining the output z of the S encoder _S 。

z′ _S ＝LN(z _S ) (10)

In formula (10), LN (·) represents the processing of the normalization layer;

In formula (11), Linear (·) represents a Linear classification function;

c represents the number of EGFR state classes;

step 4.8, constructing a cross entropy loss function L by using the formula (12), and training a second-stage network formed by a visual converter and a linear classifier by using a gradient descent algorithm to ensure that the cross entropy loss function L is converged, so as to obtain a trained visual converter network model capable of predicting the EGFR mutation type of the image block:

Step 5.2, image block (x) ₁ ,x ₂ ,…,x _j ,…,x _m ) Sending into a visual converter network model capable of predicting the positive and negative of the image block, predicting the positive and negative categories of the image block, and screening out the negative image blocks to obtain a positive image block sequence (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) (ii) a Setting a positive and negative classification threshold t, and calculating the positive image block ratio t according to the formula (13) _pos Comparing the classification threshold t with the positive image block ratio t _pos Determining the positive and negative classification of the whole section;

step 5.3, carrying out next prediction on the total section which is predicted to be positive in the step 5.2; all-slice positive image block (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) Inputting a visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) _i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n _i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states; in this embodiment, K is 3.

Claims

1. A lung cancer histopathology full-section EGFR state prediction method based on a converter is characterized by comprising the following steps: the method specifically comprises the following steps:

2. The method of claim 1, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: acquiring and preprocessing a lung cancer histopathology full-section data set according to the lung cancer histopathology full-section image in the step (1), wherein the method specifically comprises the following steps:

Wherein the content of the first and second substances,

3. The method of claim 2, wherein the EGFR state prediction method for the lung cancer histopathology full-section based on the converter is as follows: establishing and training a visual converter network model capable of predicting the positivity and negativity of an image block by using the data set in the step (1), wherein the method specifically comprises the following steps:

Wherein the content of the first and second substances,

representing image blocks X _i The jth image block of (1);

z' _l ＝MSA(LN(z _l-1 ))+z _l-1 ,l＝1,…,L (2)

z _l ＝MLP(LN(z′ _l ))+z′ _l ,l＝1,…,L (3)

z′ _L ＝LN(z _L ) (4)

In formula (4), LN (·) represents the processing of the normalization layer;

In formula (5), Linear (·) represents a Linear classification function;

c represents negative/positive;

in the formula (6), y _label And N is the total number of the image blocks.

4. The method of claim 3, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: and (3) predicting the positive and negative categories of the image blocks in the data set by using the visual converter network model capable of predicting the positive and negative of the image blocks established in the step (2), screening out the negative image blocks, and generating an EGFR mutation type data set by using the reserved positive image blocks, wherein the method specifically comprises the following steps:

the lung cancer histopathology full-slice images are sorted according to EGFR mutation state class labels, blank background areas of the full-slice images are removed, blocking processing is carried out, a plurality of image blocks are obtained through random sampling, the image blocks are sent to a trained visual converter network model capable of predicting the positivity and negativity of the image blocks, and prediction is carried outScreening out negative and positive image blocks to obtain n positive image blocks, generating EGFR mutation type data set, and recording as

Wherein the content of the first and second substances,

representing the ith image block, C representing the channel number of the image block, and P multiplied by P representing the width and height of each image block; y' _i Denotes image block X' _i The corresponding class, i.e., EGFR mutation class label in the data set; 1,2, …, n; n denotes the number of image blocks.

5. The method of claim 4 for predicting the EGFR state of the lung cancer histopathology whole section based on the converter, wherein the EGFR state prediction method comprises the following steps: establishing and training a visual converter network model capable of predicting the EGFR mutation type of the image block by using the data set generated in the step (3), wherein the method specifically comprises the following steps:

Wherein the content of the first and second substances,

denotes image block X' _i The jth image block of (1);

p × p denotes image block X' _i Width and height of each image block after blocking processing, and m ═ P ² /p ² ；

Step 4.2, setting oneLearnable classification mark x' _class M image blocks and a classification mark x 'are obtained by using the formula (7)' _class D-dimensional embedded representation z _s0 As input to the 1 st encoder;

z′ _s ＝MSA(LN(z _s-1 ))+z _s-1 ,s＝1,…,S (8)

In equation (8), MSA (. cndot.) represents the processing of the multi-headed autofrettage layer; LN (-) represents the processing of the normalization layer; z is a radical of _s-1 Represents the output of the s-1 th encoder;

z _s ＝MLP(LN(z′ _s ))+z′ _s ,s＝1,…,S (9)

step 4.5, output z of the multi-layer perceptron of the s encoder _s The multi-head attention mechanism layer sent to the s +1 th encoder obtains output z' _s+1 Z 'is further prepared' _s+1 The multi-layer sensor which is sent to the (s + 1) th encoder obtains output z _s+1 Repeating the step 4.5 times until the S encoder, and obtaining the output z of the S encoder _S ；

z′ _S ＝LN(z _S ) (10)

In formula (10), LN (·) represents the processing of the normalization layer;

In formula (11), Linear (·) represents a Linear classification function;

representing the number of EGFR state classes;

6. The method of claim 5, wherein the EGFR state prediction method for lung cancer histopathology full-section based on transducer is as follows: and (5) completing prediction of the full-slice EGFR state by using the visual converter network model capable of predicting the negativity and positivity of the image block and the visual converter network model capable of predicting the EGFR mutation type of the image block, which are established in the steps (2) and (4), and specifically comprising the following steps:

step 5.1, removing blank from the lung cancer histopathology full-section imageCarrying out block processing on the white background area to obtain a plurality of image blocks, and recording the image blocks as sequences (x) ₁ ,x ₂ ,…,x _j ,…,x _m )；

Step 5.2, image block (x) ₁ ,x ₂ ,…,x _j ,…,x _m ) Sending the image blocks into the visual converter network model capable of predicting the negative and positive of the image blocks, predicting the negative and positive types of the image blocks, and screening out the negative image blocks to obtain a positive image block sequence (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) (ii) a Setting a positive and negative classification threshold t, and calculating the positive image block ratio t according to the formula (13) _pos Comparing the classification threshold t with the positive image block ratio t _pos Determining the positive and negative classification of the whole section;

step 5.3, carrying out next prediction on the whole slice which is predicted to be positive in the step 5.2; all-slice positive image block (x) ₁ ,x ₂ ,…,x _j ,…,x _n ) Inputting the visual converter network model capable of predicting the EGFR mutation type of the image block, predicting the EGFR mutation type corresponding to each image block, and calculating the ratio EGFR of each EGFR state in the n image blocks according to the formula (14) _i Taking the highest-proportion type as the EGFR state of a lung cancer histopathological full section, wherein n _i The number of image blocks corresponding to the i-type EGFR mutation state is K, and K is the classification of all EGFR mutation states;