CN111985536B

CN111985536B - Based on weak supervised learning gastroscopic pathology image Classification method

Info

Publication number: CN111985536B
Application number: CN202010690425.5A
Authority: CN
Inventors: 丁偕; 张敬谊; 刘全祥; 王瑜; 韦金江; 刘鸣
Original assignee: Shanghai Fugao Computer Technology Co ltd; WONDERS INFORMATION CO Ltd
Current assignee: Shanghai Fugao Computer Technology Co ltd; WONDERS INFORMATION CO Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2024-02-09
Anticipated expiration: 2040-07-17
Also published as: CN111985536A

Abstract

The invention provides a based on weak supervision learned gastroscopic pathology an image classification method. Digital pathology for gastroscope the cost of image annotation is high, the problem of difficulty in obtaining large labeled datasets, the invention utilizes easily obtained coarse-grained labels to construct large-scale labels and training the gastroscope pathological image data set to obtain a weak supervision network model. And extracting the characteristics of the gastroscopic pathology image through the obtained weak supervision network model, further fusing the global and local qualitative characteristics of the gastroscopic pathology image, and finally realizing the negative and positive classification of the whole gastroscopic pathology image through a random forest classifier. By using the gastroscope pathological image classification method based on weak supervision, provided by the invention, pathological data screening information can be provided for a pathologist, diagnosis work of the pathologist is assisted, and working efficiency of the pathologist is improved.

Description

Based on weak supervised learning gastroscopic pathology image Classification method

Technical Field

The invention relates to a gastroscope pathological image classification method based on weak supervision learning, and belongs to the technical field of medical image computer-aided diagnosis.

Background

The full-view digital pathological section (Whole Slide Image, WSI) is a full-view digital image generated by fully scanning the traditional glass section with high precision and without seamless splicing by using a full-automatic microscope scanning platform. The original digital pathology image is quite large in size, and a 40-fold-magnified electronic pathology is usually composed of more than billions of pixels. This makes it necessary for the doctor to spend a great deal of time examining the digital pathology image, identifying tiny cancer cells in pictures of billions of pixels, and the effort is very great.

Deep Learning (DL) is a very popular machine Learning method in recent years, particularly convolutional neural networks (Convolutional Neural Network, CNN) have been widely applied in the field of medical imaging in recent years, and have made great progress in the aspects of classification and identification of targets of medical images, positioning and detection, segmentation of tissues and organs and lesions, and the like, so as to provide scientific methods and advanced techniques for screening, diagnosis, treatment planning, efficacy evaluation and follow-up of various serious diseases in clinical medicine. Deep learning in the current medical field mainly uses a supervised learning mode to perform model training and later test application, and has achieved great success, but the deep learning is seriously dependent on data resources, and a large number of marked data samples are required to play a role. However, for digital pathology images, the acquisition of large labeling datasets is very difficult, requiring finding and labeling the region of interest over 40-fold enlarged digital pathology images, while in order to meet the effectiveness of model learning, these datasets need to have a broad specificity of cases, which would make time and labor costs very high.

The weakly supervised learning (Weakly Supervised Learning) is a machine learning method for solving the problem of insufficient labeling sample, and can be generally divided into three categories:

first category: incomplete supervision (incomplete supervision): only a portion of the training data has labels.

The second category: uncertain supervision (inexact supervision): the training data only has coarse-grained labels.

Third category: inaccuracy supervision (inaccurate supervision): the labels of the training data are not always true.

The deep learning model can learn effective features from a large enough data set only by taking coarse granularity labels at the image level as labels of training data without definite supervision learning, so that the extensive specificity of the model is ensured. The gastric cancer pathology image has obvious specific characteristics, obvious glandular or cell aberration expression, and characteristic information is obviously different from negative pathology images.

Disclosure of Invention

The purpose of the invention is that: by utilizing the characteristic that weak supervision learning is sensitive to specific information and through reasonable algorithm model design, the problem of insufficient labeling samples is solved, and automatic classification and discrimination of gastroscope digital pathological images are realized.

In order to achieve the above purpose, the technical scheme of the invention is to provide a gastroscope pathological image classification method based on weak supervision learning, which is characterized by comprising the following steps:

step 1, obtaining pathological images and constructing a database

Collecting and acquiring a gastroscope biopsy digital pathology image, cleaning the collected data, and carrying out coarse granularity labeling on the gastroscope biopsy digital pathology image by clinical specialists, wherein the coarse granularity labeling only relates to benign and malignant classification of the gastroscope biopsy digital pathology image, so as to form a gastroscope pathology image database;

step 2, obtaining a plurality of gastroscope biopsy digital pathological images used for training the feature extraction network model from a gastroscope pathological image database, and entering step 3 after corresponding labeling results;

step 3, image preprocessing:

extracting tissue parts of each gastroscope biopsy digital pathological image, removing invalid areas, performing small block image cutting processing on the extracted images, and performing color standardization and data enhancement processing on cut image small blocks;

step 4, after the gastroscope biopsy digital pathology image obtained in the step 2 is subjected to the image preprocessing process in the step 3, a training data set consisting of a group of multi-example package bag with classification labels is obtained, wherein each gastroscope biopsy digital pathology image is regarded as a multi-example package bag, each multi-example package bag contains a plurality of example instances without classification labels, and each example instance is an image patch obtained in the step 3; if the multi-instance package bag contains at least one positive instance, the multi-instance package bag is marked as a positive type multi-instance package, and if all the instances of the multi-instance package bag are negative instances, the multi-instance package bag is marked as a negative type multi-instance package;

step 5, obtaining a feature extraction network model based on weak supervision learning:

constructing a feature extraction network model architecture, designing corresponding feature output, training a feature extraction network model and obtaining a trained feature extraction network model, wherein:

the feature extraction network model adopts a multi-example learning algorithm, the mapping relation between example instance in a multi-example packet bag and labels of the multi-example packet bag in a training data set is learned through the feature extraction network model, specific image data is subjected to feature extraction through an encoder and a plurality of feature extraction convolution layers, finally, result output is achieved, and feature vectors and probability values of image small blocks are output simultaneously through the feature extraction network model:

feature vector information: after the image small blocks pass through a feature extraction module in the feature extraction network model, the dimension reduction is realized, the features of the small block images are automatically extracted, and finally, a feature vector with one dimension is output;

probability value: the characteristic vector of the image small block is obtained through the steps, then the characteristic vector is compressed into a vector with the length of two through full connection, the first position element value of the vector with the length of two represents the weight of the image small block which is a negative image small block, the second position element value represents the weight of the image small block which is a positive image small block, finally the vector element value with the length of two is mapped to a (0, 1) interval through a normalized exponential function, and the probability value that the image small block is negative and positive is obtained and output;

step 6, after obtaining the real-time input digital pathology image of the gastroscope biopsy, preprocessing the digital pathology image of the gastroscope biopsy through the step 3, constructing and training a feature extraction network model by the step 5 through a plurality of example package bag obtained after preprocessing, and outputting corresponding feature vector information and probability values by the feature extraction network model;

step 7, feature extraction and feature fusion, comprising the following steps:

step 701, first feature extraction:

acquiring a thermodynamic diagram of a full-field gastroscope biopsy digital pathological image, extracting suspicious tissue regions according to the thermodynamic diagram, and then extracting characteristic information of the suspicious tissue regions of the pathological image as first characteristics, wherein the method specifically comprises the following steps of:

step 7011 generates a thermodynamic diagram: utilizing the feature extraction network model to output a set of probability values, and splicing the probability values by the reverse process of overlapping the image small blocks to obtain a thermodynamic diagram hetmap of the whole gastroscope biopsy digital pathological image;

step 7012, determining a mask of a suspicious lesion area: resetting a pixel value which is larger than or equal to a preset threshold value I in the thermodynamic diagram to be 1, and resetting a pixel value which is smaller than the threshold value I to be 0 to obtain a mask of the suspicious region;

step 7013, suspicious tissue region extraction: selecting an image with the lowest resolution from the digital pathology image of the gastroscope biopsy, converting the image into a gray image, and combining a mask of a suspicious region to extract the pathology image of the suspicious region, wherein the mask of the suspicious region is subjected to a nearest neighbor interpolation algorithm, and resampling the mask size to the size of the pathology image with the lowest resolution;

step 7014, extracting suspicious tissue region characteristics;

step 702, second feature extraction:

screening N image small blocks with highest probability values by using probability values of the image small blocks in each multi-example packet bag, and obtaining feature vector information corresponding to the current image small blocks output by the feature extraction network model according to indexes to serve as second features;

step 703, third feature extraction:

screening N small block images with highest probability values by using probability values of the small blocks in each multi-example packet bag, and extracting cell characteristics of the N small blocks as third characteristics, wherein the method comprises the following steps:

step 7031, calculating cell number: converting the selected image small blocks into gray images, converting the gray images into binary images according to corresponding threshold values, setting the pixel value of a target area to be 1, setting the pixel value of a background area to be 0, taking the binary images as an effective area mask, performing image processing operation based on mathematical morphology on the obtained binary images, performing morphological opening operation, namely firstly corroding and then expanding to remove fine impurity areas, separating cells with sticky conditions by using a watershed algorithm to obtain a discrete cell mask image, and counting cells in the discrete cell mask;

step 7032, calculating cell perimeter area: taking the outline of the cell as a standard ellipse approximately, counting the number of pixel points of each cell in the discrete cell mask obtained in the steps as the cell area, counting the longest axis and the shortest axis of the cell, and obtaining the perimeter of the outline of the cell by using an ellipse perimeter calculation formula;

step 7033, cell texture feature statistics: converting the selected image small blocks into gray level images, positioning cell areas on the gray level images by using a discrete cell mask, and extracting cell texture features by using a gray level symbiotic matrix in the current cell areas, wherein the common texture features defined by the gray level symbiotic matrix are as follows:

texture second order matrix ASM: corresponding to the uniformity of the texture, the smaller the ASM value, the more uniform the staining presented by the cell nucleus;

entropy ENT: the texture information is one of image information, and the image texture richness and the entropy value are in positive correlation;

inverse differential moment IDM: describing the amount of local change of the cell image texture, wherein the larger the value is, the smaller the change of different areas of the cell texture is, namely the local is very uniform;

contrast CON: reflecting the definition of the cell image and the depth of the texture groove, the greater the CON value is, the more pixels with large contrast are;

step 7034, feature fusion:

splicing the second and third features extracted from each selected image small block with the first features extracted from the whole pathological image according to rows to obtain feature vectors of the current small block image, and splicing the feature vectors of the N image small blocks according to columns to obtain a feature matrix of the current gastroscope biopsy digital pathological image; carrying out normalization processing on the feature matrix, and mapping the features of different attributes to the same distribution space so that the features of different attributes have the same initial weight; averaging the normalized feature matrix according to columns, and compressing the feature matrix into a one-dimensional feature vector;

step 8, distinguishing and classifying gastroscope pathological images:

inputting the one-dimensional feature vector obtained in the step 7034 into a pre-trained random forest classifier to obtain benign and malignant classification of the current gastroscope biopsy digital pathological image.

Preferably, the step 1 includes the steps of:

step 101, data acquisition and screening:

determining a unified dyeing mode, collecting a digital pathology image of the gastroscope biopsy, screening the collected digital pathology image of the gastroscope biopsy, and eliminating pathology data with wrong image or inaccurate information;

step 102, data desensitization:

desensitizing each acquired digital pathology image of gastroscope biopsy;

step 103, data marking:

the professional pathologist carries out image-level coarse granularity labeling on the gastroscope biopsy digital pathological images screened in the step 101 and desensitized in the step 102, wherein the coarse granularity labeling only relates to benign and malignant classification of the gastroscope biopsy digital pathological images, and does not need pixel-level image segmentation labeling in a pathological region; the marking result of the gastroscope biopsy digital pathological image which is judged to be positive is '1', and the marking result of the gastroscope biopsy digital pathological image which is judged to be negative is '0'; the original gastroscope biopsy digital pathology image and the corresponding labeling result together form the gastroscope digital pathology image database.

Preferably, the step 3 includes the steps of:

step 301, background and invalid region removal:

extracting tissue areas of the digital pathological images of the gastroscope biopsy by using an Ojin method and a quality control method, and filtering invalid tissue areas;

step 302, graph cutting processing:

for the gastroscope biopsy digital pathology image processed in the step 301, adopting a mode of overlapping and cutting images under the same size, and transferring the pathology image into an image small block with a fixed size, wherein the size of the image small block is matched with the size of an input image of a feature extraction network model based on weak supervision learning;

step 303, color standardization processing:

mapping the image small blocks corresponding to the effective tissues to the same color gamut space by using a color standardization algorithm, ensuring that the pixel value distribution of the image small blocks with the same tissue structure is subjected to the forward distribution, and eliminating the potential influence caused by inconsistent imaging of the gastroscopic pathological images due to uncontrollable differences;

step 304, image enhancement:

the random image enhancement technology is adopted, and on the basis of guaranteeing the invariance of space translation, the operations of random rotation, random translation, random mirroring and random distortion are carried out on the image small blocks subjected to the color standardization processing, so that gastroscopic pathological images under different fields of view are simulated, and the characteristics of the image small blocks are guaranteed to be fully learned and extracted by a model.

Preferably, in the step 5, the training data set is divided into three sets of sub-data sets of a training set, a verification set and a test set according to a certain proportion, so as to perform model training, and a final trained feature extraction network model is obtained, wherein the training set is used for training the feature extraction network model, the verification data set determines super parameters and training stop positions of the feature extraction network model, and the test data set verifies the final effect of the feature extraction network model.

Aiming at the problems that the cost for marking the gastroscope digital pathological image is high and a large-scale marked data set is difficult to obtain, the invention trains and obtains a weak supervision network model by utilizing the large-scale gastroscope pathological image data set constructed by the easily-obtained coarse-granularity labels. And extracting the characteristics of the gastroscopic pathology image through the obtained weak supervision network model, further fusing the global and local qualitative characteristics of the gastroscopic pathology image, and finally realizing the negative and positive classification of the whole gastroscopic pathology image through a random forest classifier. By using the gastroscope pathological image classification method based on weak supervision, provided by the invention, pathological data screening information can be provided for a pathologist, diagnosis work of the pathologist is assisted, and working efficiency of the pathologist is improved.

Drawings

FIG. 1 is a flow chart of an overall framework of the present invention;

FIG. 2 is a database construction flow;

FIG. 3 is an image preprocessing flow;

FIG. 4 is a CNN-based feature extraction flow;

fig. 5 is a feature extraction and feature fusion flow.

Detailed Description

The invention will be further illustrated with reference to specific examples. It is to be understood that these examples are illustrative of the present invention and are not intended to limit the scope of the present invention. Further, it is understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents are intended to fall within the scope of the claims appended hereto.

The gastroscope pathological image classification method based on weak supervision learning provided by the invention utilizes the advantage that coarse granularity labeling of pathological images is easy to obtain, collects gastroscope pathological images to form a large gastroscope pathological database, and then combines the weak supervision method and a random forest classifier to classify the gastroscope pathological images. Firstly, the whole pathology image is diced through image preprocessing, then the characteristics of small blocks of the image are automatically extracted through weak supervision learning, the global and local qualitative characteristics of the gastroscopic pathology image are further fused, and finally, the negative and positive classification of the whole gastroscopic pathology image is realized through a random forest classifier.

Specifically, as shown in fig. 1, the present invention includes the steps of:

s1, obtaining pathological images and constructing a database

Collecting and acquiring a gastroscope biopsy digital pathology image, cleaning the collected data, and marking the gastroscope biopsy digital pathology image by a clinical expert to form a gastroscope pathology image database.

As shown in fig. 2, the step S1 specifically includes the following steps:

step 101, data acquisition and screening:

and determining a unified staining mode, and collecting digital pathological images of the gastroscope biopsy. And screening the acquired digital pathology images of the gastroscope biopsy, and removing pathology data with wrong images or inaccurate information.

Step 102, data desensitization:

pathology data and pathology reports typically involve private information of the patient, and the use of pathology data that is not desensitized would result in disclosure of the patient's personal privacy. Therefore, the sensitive information is desensitized by anonymizing, information conversion and other modes on each case of collected pathological data;

step 103, data marking:

the digital pathology image of gastroscope biopsy which is screened in step 101 and desensitized in step 102 is marked with coarse granularity in the image level by a professional pathologist. Coarse-granularity labeling only relates to the classification of benign and malignant images of digital pathology images of gastroscopy, and does not need labeling of pixel-level image segmentation in a lesion area. The labeling result of the gastroscope biopsy digital pathological image judged to be positive is '1', and the labeling result of the gastroscope biopsy digital pathological image judged to be negative is '0'. The original gastroscope biopsy digital pathology image and the corresponding labeling result together form a gastroscope digital pathology image database.

Step S2, image preprocessing:

obtaining a plurality of gastroscope biopsy digital pathological images from a gastroscope pathological image database, extracting a tissue part of each gastroscope biopsy digital pathological image, removing an invalid area, and performing small block cutting image processing on the extracted images. And performing color standardization and data enhancement processing on the cut image small blocks, wherein the image small blocks corresponding to all the processed gastroscope biopsy digital pathology images are used for forming a training data set of a training feature extraction network model.

As shown in fig. 3, the step S2 specifically includes the following steps:

step 201, background and invalid region removal:

after a plurality of gastroscopic biopsy digital pathological images are obtained from a gastroscopic pathological image database, tissue areas of each gastroscopic biopsy digital pathological image are extracted by an Otsu method (OTSU) and a quality control method, and invalid tissue areas with handwriting, overlapping, smudge and focusing blurring are filtered. The method aims to remove invalid information areas irrelevant to the pathological image tissues and pathological change texture forms of the gastroscope and reduce errors generated in the process of positioning the pathological image distortion areas of the model.

Step 202, cutting a graph:

and (3) for the gastroscope biopsy digital pathology image processed in the step (201), adopting a mode of overlapping and cutting images under the same size, and transferring the pathology image with the hundred million-level pixel level into an image small block with a fixed size, wherein the size of the image small block is matched with the size of an input image of the feature extraction network model based on weak supervision learning. The overlapping ground cut map endows the continuity of the space features between the adjacent image small blocks, and improves the utilization rate of the weak supervision model on the edge features of the image small blocks.

Step 203, color normalization processing:

and mapping the image small blocks corresponding to the effective tissues into the same color gamut space by using a color standardization algorithm, ensuring that the pixel value distribution of the image small blocks with the same tissue structure is subjected to the positive too distribution, and eliminating the potential influence caused by inconsistent imaging of the gastroscope pathological images due to uncontrollable differences of different film picking instruments, dyeing depths and the like.

Step 204, image enhancement: the random image enhancement technology is adopted, and on the basis of guaranteeing the invariance of space translation, the operations of random rotation, random translation, random mirroring and random distortion are carried out on the image small blocks subjected to the color standardization processing, so that gastroscopic pathological images under different fields of view are simulated, and the characteristics of the image small blocks are guaranteed to be fully learned and extracted by a model.

Step S3, obtaining a feature extraction network model based on weak supervision learning:

constructing a feature extraction network model architecture, designing corresponding feature output, training a feature extraction network model and obtaining a trained network model.

The multi-instance learning (Multiple Instance Learning, MILs) algorithm is a very effective and inexact supervision algorithm that can use simple coarse-grained labeling data, in combination with a deep learning network, to enable statistical analysis of image features of several gigapixels. In multi-instance learning, the training set consists of a set of multi-instance packages bag with classification labels, each of which contains a number of instance instances without classification labels. If the multi-instance package bag contains at least one positive instance, the multi-instance package bag is marked as a positive-class multi-instance package (positive package). If all examples of the multi-example package bag are negative examples, the multi-example package bag is marked as a negative-type multi-example package (negative package).

The whole model is processed as shown in fig. 4, and comprises the following steps:

step 301, constructing a training data set:

each gastroscopic biopsy digital pathology image in the gastroscopic pathology image database processed in the step S2 can be regarded as a multi-example bag, wherein the multi-example bag contains all image small blocks of the current gastroscopic biopsy digital pathology image, and each image small block serves as an example instance. This results in a training dataset in units of multiple example package bag whose labels are of the original gastroscopic biopsy digital pathology image.

Step 302, designing a network model:

and building a feature extraction network model adopting a multi-example learning algorithm, and learning a mapping relation between the example instance in the multi-example package bag and the labels of the multi-example package bag in the training data set through the feature extraction network model. The specific image data is subjected to feature extraction through an Encoder (Encoder) and a plurality of feature extraction convolution layers, and finally the result output is realized.

Step 303, outputting a result:

in order to ensure the analysis requirement of the subsequent feature extraction, the feature extraction network model outputs the feature vector and the probability value of the image small block at the same time.

Feature vector information: after the image small blocks pass through a feature extraction module in the feature extraction network model, the dimension reduction is realized, the features such as textures, tissues, characters and the like of the small block images are automatically extracted, and finally a feature vector with one dimension is output;

probability value: the characteristic vector of the small image block is obtained through the steps, and then the small image block is compressed into a vector with the length of two through full connection. The vector first position element value represents the weight of the image patch as a negative image patch, and the second position element value represents the weight of the image patch as a positive image patch. Finally, mapping the vector element values with the lengths of two to a (0, 1) interval through a normalized exponential function (Soft Max Function), obtaining probability values of negative and positive of the image small blocks, and outputting the probability values.

Step 304, model training: and dividing the training data set into three groups of sub data sets of a training set, a verification set and a test set according to a certain proportion for model training, and obtaining a final trained feature extraction network model. The training set is used for training the feature extraction network model, the verification data set is used for determining the super parameters and the training stopping positions of the feature extraction network model, and the test data set is used for checking the final effect of the feature extraction network model.

Step S4, feature extraction and feature fusion

In order to ensure the reliability of the classification model, qualitative analysis feature extraction is added on the basis of weak supervision learning automatic feature extraction, wherein the feature extraction comprises global features of a whole gastroscopic pathological image and local features of a suspected pathological region small image. A flow chart of feature extraction is shown in fig. 5.

After obtaining a real-time input gastroscope biopsy digital pathology image, preprocessing the gastroscope biopsy digital pathology image through the step S2, constructing and training a feature extraction network model through the step S3 by using a multi-example packet bag obtained after preprocessing, outputting corresponding feature vector information and probability values by using the feature extraction network model, and then, comprising the following steps of:

step 401, first feature extraction:

obtaining a thermodynamic diagram of the full-field gastroscope biopsy digital pathological image, extracting suspicious tissue regions according to the thermodynamic diagram, and then extracting characteristic information of the suspicious tissue regions of the pathological image as first characteristics. The method specifically comprises the following steps:

a. generating a thermodynamic diagram: and (3) utilizing the set of probability values output by the feature extraction network model, and splicing by the reverse process of overlapping the image small blocks to obtain the thermodynamic diagram hetmap of the whole gastroscope biopsy digital pathological image.

b. Determining a lesion suspicious region mask: setting a threshold value, resetting a pixel value which is larger than or equal to the threshold value in the thermodynamic diagram to be 1, and resetting a pixel value which is smaller than the threshold value to be 0, thereby obtaining a mask of the suspicious region.

c. Suspicious tissue region extraction: and selecting an image with the lowest resolution from the digital pathology image of the gastroscope biopsy, converting the image into a gray image, and extracting the pathology image of the suspicious region by combining the mask of the suspicious region. The mask of the suspicious region needs to be resampled to the size of the pathological image with the lowest resolution through the nearest neighbor interpolation algorithm.

d. Suspicious tissue region feature extraction: including but not limited to, the maximum pixel value, the average pixel value, the variance of the pixel values, the skewness and kurtosis of the distribution of the pixel values in the suspicious region, the maximum pixel value, the average pixel value, the variance of the pixel values, the skewness and kurtosis of the distribution of the pixel values in the boundary of the suspicious region, the area ratio of the suspicious region to the whole tissue region, the longest diameter of the maximum connected domain of the suspicious region, the number of pixels of the suspicious region, and the like. Wherein the suspicious region boundary is defined as: and a pixel point set formed by four adjacent pixel points from top to bottom, left to right around the center of the pixel point on the boundary line of the suspicious region.

Step 402, second feature extraction:

and screening 5 image small blocks with highest probability values by using probability values of the image small blocks in each multi-example packet bag, and obtaining feature vector information corresponding to the current image small block output by the feature extraction network model according to indexes to serve as second features. Wherein, the small image block screening quantity can be correspondingly increased or decreased according to the classification performance requirement.

Step 403, third feature extraction:

and screening 5 small block images with highest probability values by using probability values of the small image blocks in each multi-example packet bag, and extracting cell characteristics of the 5 small image blocks as third characteristics.

Quantitative cell characteristic information plays an important role in processing and analyzing pathological images. In clinic, the rapid, accurate, reliable and objective parameter analysis can avoid the influence caused by subjective factors. Based on the method, the cell characteristic extraction is carried out, and more commonly extracted characteristic parameters include cell area, perimeter, centroid, length of long and short axes, color, texture and the like.

Step 403 specifically includes the steps of:

a. cell number calculation: and converting the selected small image block into a gray image, converting into a binary image according to a corresponding threshold value, setting the pixel value of the target area to be 1, and setting the pixel value of the background area to be 0, wherein the pixel value of the background area is used as an effective area mask. Performing image processing operation based on mathematical morphology on the obtained binary image, performing morphological open operation, namely firstly corroding and then expanding to remove fine impurity regions, then separating cells with sticky condition by using a watershed algorithm to obtain a discrete cell mask image, and counting cells in the discrete cell mask.

b. Cell perimeter area calculation: taking the cell outline as a standard ellipse approximately, counting the number of pixel points of each cell in the discrete cell mask obtained by the steps as the cell area, counting the longest axis and the shortest axis of the cell, and obtaining the perimeter of the cell outline by using an ellipse perimeter calculation formula. The calculation formula of the approximate perimeter of the ellipse is as follows:

where a represents the minor axis radius of the ellipse, b represents the major axis radius of the ellipse, and C is the approximate perimeter of the ellipse.

c. Cell texture feature statistics: and converting the selected small image blocks into gray level images, and positioning cell areas on the gray level images by using a discrete cell mask. Extracting cell texture features in the current cell area by using a gray level co-occurrence matrix, wherein the common texture features defined by the gray level co-occurrence matrix are as follows:

texture second order matrix (ASM): corresponding to texture uniformity, the smaller ASM value indicates the more uniform staining exhibited by the nuclei.

Entropy (ENT): the texture information is one kind of image information, and the image texture richness and entropy value are in positive correlation.

Inverse Differential Moment (IDM): describing how much the texture of the cell image changes locally, the larger the value, the smaller the change in different areas of the cell texture, i.e. locally very uniform.

Contrast (CON): the greater the CON value, the more pixels with large contrast are, reflecting the sharpness of the cell image and the depth of the texture grooves.

Step 404, feature fusion:

and splicing the second and third features extracted from each selected image small block with the first features extracted from the whole pathological image according to rows to obtain feature vectors of the current small block image, and splicing the feature vectors of the 5 image small blocks according to columns to obtain a feature matrix of the current gastroscope biopsy digital pathological image. And carrying out normalization processing on the feature matrix, and mapping the features of different attributes to the same distribution space so that the features of different attributes have the same initial weight. The normalized feature matrix is averaged according to columns, the feature matrix is compressed into a one-dimensional feature vector, and the one-dimensional feature vector is fused with the global and local features of the current digital pathology image of the gastroscope biopsy, so that the whole pathology image can be well described.

Step S5, distinguishing and classifying gastroscope pathological images:

and distinguishing and classifying the current gastroscope biopsy digital pathological images by using a trained random forest classifier. The classifier selects Random Forest (Random Forest) because the Random Forest proves convergence by using strong number law, and the Random Forest cannot be over-fitted due to the increase of trees, is insensitive to noise data, and keeps the characteristic of small generalization error. Because the decision tree node dividing features can be randomly selected, the model can be efficiently trained when the feature dimension of the sample is very high, and a stable model can be obtained.

Specifically, the one-dimensional feature vector obtained in step 404 is input into a pre-trained random forest classifier to obtain the benign and malignant classification of the current gastroscope biopsy digital pathology image.

Claims

1. The gastroscope pathological image classification method based on weak supervision learning is characterized by comprising the following steps of:

step 1, obtaining pathological images and constructing a database

step 3, image preprocessing:

step 7, feature extraction and feature fusion, comprising the following steps:

step 701, first feature extraction:

step 7012, determining a mask of a suspicious lesion area: resetting a pixel value which is larger than or equal to a preset threshold value in the thermodynamic diagram to be 1, and resetting a pixel value which is smaller than the threshold value to be 0 to obtain a mask of the suspicious region;

step 7014, extracting suspicious tissue region characteristics;

step 702, second feature extraction:

step 703, third feature extraction:

step 7034, feature fusion:

step 8, distinguishing and classifying gastroscope pathological images:

2. The gastroscopic pathology image classification method based on weak supervised learning as set forth in claim 1, wherein the step 1 comprises the steps of:

step 101, data acquisition and screening:

step 102, data desensitization:

desensitizing each acquired digital pathology image of gastroscope biopsy;

step 103, data marking:

the professional pathologist carries out image-level coarse granularity labeling on the gastroscope biopsy digital pathological images screened in the step 101 and desensitized in the step 102, wherein the coarse granularity labeling only relates to benign and malignant classification of the gastroscope biopsy digital pathological images, and does not need pixel-level image segmentation labeling in a pathological region; the marking result of the gastroscope biopsy digital pathological image which is judged to be positive is '1', and the marking result of the gastroscope biopsy digital pathological image which is judged to be negative is '0'; the original gastroscopic biopsy digital pathology image and the corresponding labeling result jointly form the gastroscopic pathology image database.

3. The gastroscopic pathology image classification method based on weak supervised learning as set forth in claim 1, wherein the step 3 comprises the steps of:

step 301, background and invalid region removal:

step 302, graph cutting processing:

step 303, color standardization processing:

step 304, image enhancement:

4. The gastroscope pathological image classification method based on weak supervision learning according to claim 1, wherein in the step 5, a training dataset is divided into three sub-datasets of a training dataset, a verification dataset and a test dataset according to a certain proportion for model training, and a final trained feature extraction network model is obtained, wherein the training dataset is used for training the feature extraction network model, the verification dataset determines super parameters and training stop positions of the feature extraction network model, and the test dataset checks the final effect of the feature extraction network model.