CN114565593B

CN114565593B - Full-field digital image classification and detection method based on semi-supervision and attention

Info

Publication number: CN114565593B
Application number: CN202210208369.6A
Authority: CN
Inventors: 薛梦凡; 陈怡达; 贾士绅; 江浩东; 杨岗; 陈明皓
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2024-04-02
Anticipated expiration: 2042-03-04
Also published as: CN114565593A

Abstract

The invention discloses a full-view digital image classification and detection method based on semi-supervision and attention. The full-view digital image classification and detection framework is constructed, classification results can be directly output, the region of interest can be visually displayed, a user can be assisted in accurately judging the type of the image, and meanwhile, the region of interest can be rapidly locked. Compared with a weak supervision learning method without labeling the region of interest, the method can greatly improve the classification accuracy of the full-field digital image and accurately detect the region of interest by only labeling a small amount of the region of interest of the full-field digital image, and has higher practicability.

Description

Full-field digital image classification and detection method based on semi-supervision and attention

Technical Field

The invention relates to the technical field of full-field digital image processing, in particular to a full-field digital image classification and region-of-interest detection method based on semi-supervised learning and attention.

Background

The full-view digital image is an image with ultra-high resolution, which is generated by scanning by a full-automatic microscope scanner and automatically processing by a computer technology, generally can exceed ten giga pixels, one full-view digital image contains a large amount of information, a great amount of time is required to be consumed when a professional searches for a region of interest in the full-view digital image for marking, the judgment of the image type and the retrieval of the region of interest are based on subjective opinion of people, the limitation of subjectivity, fatigue and difference cognition is imposed, and even the full-view digital image with the best expression in a sample is difficult to obtain a good consistency result, and key problems such as false detection, omission detection and the like are easy to occur.

In recent years, artificial intelligence technology is gradually introduced into the field of classification of full-field digital images and achieves excellent effects, and is receiving unprecedented attention. The convolutional neural network does not depend on a characteristic description operator which is defined, selected and designed manually, can automatically mine deep information of the full-view digital image to extract image characteristics and finish classification, and has the advantages of high efficiency, high stability, strong generalization and the like.

The inventor finds that in the current full-view digital image classification method based on deep learning, the full-view digital image is required to be marked on the region of interest by a professional, and then the region of interest is extracted and sent into a network for training and classification tasks are completed. The classification method has the advantage of high accuracy, but requires a huge full-field digital image dataset marked on the region of interest. Because labeling the region of interest of the full-field digital image requires a lot of time and human resources, this method is largely limited by the inability to construct large-scale full-field digital image sample datasets. The learner uses the data set which is not marked by the region of interest to classify, and the accuracy of the classification model is low because the characteristics of the region of interest such as space, texture and the like cannot be effectively extracted. In addition, the two methods can only complete the classification task of the full-field digital image, the region of interest is not detected, and the user can not quickly lock the position of the region of interest when judging the type of the image, and still a great deal of time is required.

Therefore, there is a need for a full-field digital image classification and region of interest detection method that does not require a large-scale region of interest labeled dataset but has high classification accuracy.

Disclosure of Invention

The invention aims to solve the technical problem that the existing full-view digital image classification method based on deep learning is limited by the lack of a large-scale region-of-interest labeled data set, and provides a full-view digital image classification and region-of-interest detection method based on semi-supervised learning and attention, which can greatly improve the classification accuracy only by a small amount of region-of-interest labeled data sets.

The method comprises the following steps:

step S1: full field digital images are collected and preprocessed.

Step S2: the pre-training feature extraction network Resnet18 is used for extracting the features of the full-field digital image, and specifically comprises the following steps:

step S21: selecting a part of full-view digital image and a standard control sample, framing a region of interest by using a marking frame, and framing the content part of the standard control sample by using the marking frame;

step S22: generating a mask with the same size and position as those of the region of interest on the preprocessed full-field digital image by using a labeling frame of the region of interest;

step S23: dividing the preprocessed full-field digital image into a plurality of n multiplied by n small image blocks by utilizing a sliding window, wherein n is the pixel width and the pixel height of the small image blocks;

step S24: overlapping the mask and the preprocessed full-field digital image, removing small image blocks at non-overlapping positions, and reserving the small image blocks at the overlapping positions;

step S25: and (3) sending the small image blocks stored in the step S24 into a Resnet18 network for training, and storing and outputting the trained network structure and parameters thereof.

Step S3: all full-field digital images are sent to the Resnet18 network pre-trained in the previous step to extract features, and the specific steps are as follows:

step S31: and (3) automatically dividing all the full-view digital images by using opencv, filtering blank background and artificially formed holes, dividing the blank background and the artificially formed holes into n multiplied by n small image blocks, and storing the coordinates of each image block.

Step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into 512-dimensional feature vectors h at a fourth residual block _k I.e. eachFeatures extracted from the small image blocks.

Step S4: and (3) sending the features extracted in the step (S3) to a depth gating channel attention module, comprehensively generating Slide-level features, and classifying the full-field digital image through a classification layer. The method comprises the following specific steps:

step S41: the feature vector h _k Sending the attention score into a depth gating channel attention module to obtain an attention score a corresponding to each small image block _k,n ：

Wherein a is _k,n Representing the attention score, P, of the kth small image block belonging to the nth class _a,n Representing linear layers belonging to the N-th class, σ ()'s representing sigmoid activation functions, tanh ()'s representing tanh activation functions, V (), W (), G (), J (), L (·) respectively representing different linear layers, N being the total number of image blocks;

step S42: comprehensively generating a Slide-level feature h by the feature vector and the attention score corresponding to each small image block _slide,n ：

h _slide,n Features representing each full-field digital image in an nth class;

step S43: feature vector h of Slide level _slide,n Into a sorting layerObtaining a classification result, and realizing the classification of the full-field digital image;

step S5: and (3) extracting attention scores of all the small image blocks generated in the step (4) corresponding to the model prediction types, generating color blocks with corresponding colors by using matplotlib, covering the corresponding positions on the original full-view digital image with a certain transparency, and obtaining a detection heat map of the region of interest after fuzzy and smoothing operations.

Preferably, the preprocessing is to perform color normalization on the collected full-field digital image according to the input image template.

Preferably, the transparency is 0.4 to 0.6.

Compared with the prior art, the invention has the beneficial effects that:

(1) The method can be popularized and applied to various tasks of classification and interested region detection according to the full-field digital image, and has universality.

(2) The full-view digital image classification and region-of-interest detection method based on semi-supervised learning and attention provided by the invention only uses a small amount of full-view digital images marked by the region-of-interest when training the feature extraction model, uses full-view digital images marked by the region-of-interest when training the classification network, greatly improves the accuracy of the classification network in the full-view digital image classification task while reducing the preparation work of a data set, and integrates simplicity and high accuracy.

(3) The invention separates the feature extraction module from the classification layer classification module, can arbitrarily add and replace the attention module in the middle, and has stronger adaptability. And after the change, all networks are not required to be trained, and only the newly added attention module and classification layer are required to be retrained, so that the training time is greatly shortened.

(4) The depth gating channel attention network provided by the invention can capture channel information, uses deeper-level attention branches, strengthens the distinction degree of the attention scores layer by layer, ensures that the attention scores of small image blocks have stronger robustness and more accuracy, effectively improves the accuracy of full-field digital image classification, and is easy to realize and high in practicability.

(5) The invention constructs the full-view digital image classification and detection framework, can directly output classification results and visually display the interested region, can assist a user to accurately judge the type of the interest region, and simultaneously can quickly lock the interested region.

Drawings

FIG. 1 is a flow chart of a full-field digital image classification and region of interest detection method based on semi-supervised learning and attention

FIG. 2 is a flow chart of a pre-training feature extraction network of the present invention

Detailed Description

The invention will be described in further detail with reference to specific examples and figures.

As shown in fig. 1, the present example is directed to classification and detection of lung adenocarcinoma and lung squamous carcinoma, and the collected data includes 1724 lung adenocarcinomas, 1707 lung squamous carcinomas, and 30 normal tissue samples. The full-field digital pathology image of the lesion area label accounts for only 1.75% of the total sample. The feature extraction network uses a Resnet18. The lung cancer pathological image classification and focus detection based on the semi-supervised learning and attention method comprises the following steps:

step S1: and collecting 3431 total lung adenocarcinoma and lung squamous carcinoma full-field digital pathological images, and 30 normal tissue samples. And reading all pathological image information, and performing color normalization processing on all pathological images to eliminate pathological image color differences caused by different colorant proportions, staining and scanning factors.

Step S2: the pre-training feature extraction network Resnet18 is used for extracting features of all lung cancer pathological images, as shown in fig. 2, and specifically comprises the following steps:

step S21: selecting 30 lung adenocarcinoma, lung squamous carcinoma and normal tissue samples, framing the cancerous tissue samples by a specialized pathologist on a focus area, and framing the normal tissue samples on a tissue area by a marking frame.

Step S22: and generating a mask with the same size as the original pathological size through a calibration frame marked by a doctor.

Step S23: the pathological section is segmented into a plurality of 256×256 small image blocks by using a sliding window.

Step S24: overlapping the mask with the original pathological image, removing small image blocks at non-overlapping positions, and storing the small image blocks at overlapping positions.

Step S3: all lung adenocarcinoma and lung squamous carcinoma full-field digital pathological images are sent to the Resnet18 network pre-trained in the previous step to extract features, and the specific steps are as follows:

step S31: and (3) automatically segmenting the pathological images of all cancerous samples by using opencv, filtering the background and artificially formed cavities, and only reserving tissue parts in the pathological images. The tissue portion is segmented into 256 x 256 small image blocks and saved as a pile of image blocks and their coordinates.

Step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into 512-dimensional feature vectors h at a fourth residual block _k I.e. the features extracted per small image block.

Step S4: and (3) sending the features extracted in the step (S3) into a depth-gating channel attention module, comprehensively generating Slide-level features, and classifying lung cancer pathological images through a classification layer. The method comprises the following specific steps:

Wherein a is _k,n Representing the attention score, P, of the kth small image block belonging to the nth class _a,n Represents a linear layer belonging to the nth class, sigma (·) represents a sigmoid activation function, tanh (·) represents a tanh activation function, V (·),

W (·), G (·), J (·), L (·) represent different linear layers, respectively, N being the total number of image blocks.

Step S43: feature vector h of Slide level _slide,n Entering classification layers of corresponding categoriesAnd obtaining a classification result, and realizing classification of lung cancer pathological images.

Step S5: and (3) extracting attention scores of all the small image blocks generated in the step (4) corresponding to the model prediction class, generating color blocks with corresponding colors by using matplotlib, covering the corresponding positions on the original full-view digital image with transparency of 0.5, and obtaining a detection heat map of the region of interest after fuzzy and smoothing operations.

The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above embodiments, and falls within the scope of the present invention as long as the present invention meets the requirements.

What is not described in detail in the present specification belongs to the prior art known to those skilled in the art.

Claims

1. The full-field digital image classification and detection method based on semi-supervision and attention is characterized by comprising the following steps of:

step S1: collecting a full-field digital image and preprocessing;

step S25: the small image block saved in the step S24 is sent to a Resnet18 network for training, and the network structure and parameters thereof after training are saved and output;

step S31: using opencv to automatically divide all full-view digital images, filtering blank background and artificially formed holes, dividing the blank background and the artificially formed holes into n multiplied by n small image blocks, and storing the coordinates of each image block;

step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into 512-dimensional feature vectors h at a fourth residual block _k I.e. the features extracted from each small image block;

step S4: the feature h extracted in the step S3 is processed _k Sending the full-view digital image to a depth gating channel attention module, comprehensively generating Slide-level features, and classifying the full-view digital image through a classification layer; the method comprises the following specific steps:

Wherein a is _k,n Representing the attention score, P, of the kth small image block belonging to the nth class _a,n Representing linear layers belonging to the nth class, sigma (·) representing a sigmoid activation function, tanh (·) representing a tanh activation function, V (·), W (·), G (·), J (·), L (·) representing different linear layers respectively, N being the total number of image blocks;

step S42: generating a Slide-level feature vector h by combining the feature vector corresponding to each small image block and the attention score _slide,n ：

h _slide,n Features representing each full-field digital image in an nth class;

2. The semi-supervised and attention based full field digital image classification and detection method as claimed in claim 1, wherein: the preprocessing is to perform color normalization on the collected full-field digital image according to an input image template.

3. The semi-supervised and attention based full field digital image classification and detection method as claimed in claim 1, wherein: the transparency is 0.4-0.7.