CN106845551B

CN106845551B - Tissue pathology image identification method

Info

Publication number: CN106845551B
Application number: CN201710059300.0A
Authority: CN
Inventors: 汤红忠; 李骁; 王翔; 毛丽珍
Original assignee: Xiangtan University
Current assignee: Xiangtan University
Priority date: 2017-01-24
Filing date: 2017-01-24
Publication date: 2020-08-11
Anticipated expiration: 2037-01-24
Also published as: CN106845551A

Abstract

The invention discloses a tissue pathology image identification method, which comprises the following steps: selecting disease-free and disease-existing training samples and disease-free and disease-existing testing samples; establishing a disease-free dictionary learning model and a disease dictionary learning model by combining the disease-free training samples and the disease training samples, alternately and iteratively optimizing two objective functions until the maximum iteration times is reached, and learning to obtain a disease-free dictionary and a disease dictionary; performing sparse representation on the test sample by using the disease-free dictionary and the disease dictionary, and respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary; and obtaining classification statistics through sparse reconstruction of the error vector, and determining the category of the test sample through comparison of the classification statistics and a threshold value. The invention provides a new model and a new method for the application of dictionary learning in the classification of histopathology images, and the learned dictionary with class marks has better sparse reconstruction and intra-class robustness for similar samples and better inter-class discrimination for non-similar samples.

Description

Tissue pathology image identification method

Technical Field

The invention relates to a tissue pathology image identification method.

Background

With the development of computer-aided diagnosis technology, research on "digital pathology" is also gradually concerned by researchers, and how to accurately and automatically extract discriminant features hidden in images and provide necessary information for subsequent histopathological image analysis or classification, so that disease grades and classifications are quickly and accurately given has become one of the most challenging research subjects in "digital pathology".

The traditional feature extraction methods are mainly classified into the following two types: the first category is based on domain-specific or task-specific features, such as size and morphological features of biological cells, grayscale or color information of images, texture, etc.; the second major category is mainly based on spatial structure and multi-scale features, such as morphological features, graph methods, scale invariant features, wavelet features, and the like. Most of the traditional feature extraction modes are pixel-level features or manual features, are generally only suitable for specific data objects, are limited in application range, and are high in feature redundancy and low in discriminability.

In recent years, sparse representations have gained significant attention due to their prominent appearance in many computer vision problems. The basic idea is to represent an original signal as a sparse signal based on a set of overcomplete dictionaries. Sparse representation has achieved great success in the fields of image denoising and recovery, face recognition, image classification and the like. With the development of the technology, how to learn the dictionary suitable for a specific problem (such as for image classification) becomes a focus of attention of the scholars, namely a theoretical framework of dictionary learning.

The key of dictionary learning is whether the constructed dictionary has better reconstruction and discriminability. For such problems, Zhang et al propose a discriminant K-SVD (DK-SVD) dictionary learning method. Jiang et al propose a dictionary learning method based on a class Consistent K-SVD (LC-KSVD). Yang et al propose a discriminant dictionary learning (FDDL) method by using a Fisher criterion, and indirectly improve the discriminant performance of the dictionary by constraining sparse representation coefficients. Vu et al propose a Discriminative Feature-oriented Dictionary Learning (DFDL) method and apply it to histopathological image classification. The method can obtain a very good classification effect in image classification.

However, since the different types of histopathology images have different presented features, the cell morphology and geometric structure feature changes greatly in the same type of histopathology images, and the pathological features are diversified, which results in that the feature difference between similar pathological image samples is larger than that between non-similar pathological image samples, so that the degree of similarity between the diseased dictionary and the non-diseased dictionary learned by the method is higher, the discriminability between the non-diseased sample and the diseased sample is still lower, and the classification performance of the non-diseased sample and the diseased sample is still to be improved.

Disclosure of Invention

In order to solve the technical problems, the invention provides a histopathology image identification method with high accuracy and high robustness.

The technical scheme for solving the problems is as follows: a tissue pathology image recognition method, comprising the steps of:

firstly, selecting a plurality of image blocks from disease-free images and disease-containing images of a certain tissue as disease-free training samples and disease-containing training samples, and disease-free testing samples and disease-containing testing samples;

step two, optimizing and learning the disease-free dictionary: establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary through learning by minimizing a target function in a two-step alternate iterative optimization mode;

step three, optimizing and learning the sick dictionary: establishing a diseased dictionary learning model by combining a diseased training sample and a disease-free training sample, and learning to obtain a diseased dictionary by minimizing a target function in a two-step alternate iteration optimization mode;

step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two;

step five, obtaining a reconstructed error vector of the test sample: performing sparse representation on the test sample by using the acquired disease-free dictionary and disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary;

step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value.

The histopathological image identification method comprises the specific steps that image blocks with the same number are respectively selected from a certain tissue disease-free image and a certain tissue disease-free image, each image block is divided into RGB three channels, pixel values of the three channels are converted into column vectors and then are connected in series to obtain a feature vector, finally the feature vectors are connected in parallel to be used as disease-free and disease-free training samples Y,

test samples were obtained in the same manner.

In the method for identifying the histopathology image, the second step specifically comprises

2-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases

2-2: establishing a disease-free dictionary learning model, wherein the model is as follows:

wherein argmin represents a variable value at which the objective function is minimized, Y,

Respectively represent the training samples of no disease and disease, X,

Sparse representation coefficients representing disease-free and disease-prone training samples, N and

representing the number of feature vectors, L, of non-diseased and diseased images, respectively₁The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula

Representing the sparse reconstruction error of the disease-free dictionary and the disease-free training sample,

representing the reconstruction error of the disease-free dictionary and the disease-containing training sample, wherein F represents a norm, psi (D) is a Fisher criterion constraint term of the disease-free dictionary, and the expression is as follows:

wherein M is the mean value of all atoms in the disease-free dictionary D, M is a matrix formed by the mean values M of the atoms in the disease-free dictionary D,

for having a fault dictionary

The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0；

2-3: fixing the disease-free dictionary D, and updating the sparse coding coefficient, wherein the objective function at the moment is as follows:

order training sample

Coding coefficient matrix

L₁The coding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary is optimally solved as

Then, the solution of the objective function is completed by two steps of iteration of the sparse representation of the disease-free training sample in the disease-free dictionary D and the sparse representation of the disease-free training sample in the disease-free dictionary D, and the unified simplification is as follows:

respectively solving sparse solutions of training samples in the disease-free dictionary D by utilizing OMP algorithm in SPAMS toolbox

2-4: fixing the sparse coding coefficient, and updating the disease-free dictionary D, wherein the objective function at the moment is as follows:

through simplification, the method comprises the following steps:

where tr denotes the trace of the matrix

And solving the optimal solution of the disease-free dictionary D by adopting a coordinate gradient descent method.

The tissue pathology image identification method comprises the specific steps of the third step

3-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases

3-2: a disease dictionary learning model is established, and the model is as follows:

wherein Y is,

Respectively represent the training samples of no disease and disease, X,

representing the number of feature vectors, L, of non-diseased and diseased images, respectively₂The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula

Representing sparse reconstruction errors of the diseased dictionary and the diseased sample,

representing the reconstruction error of the diseased dictionary and the non-diseased sample,

the Fisher criterion constraint term of the sick dictionary is expressed as:

where m is the mean of all atoms in the disease-free dictionary D,

for having a fault dictionary

Mean of all atoms in the list, M is a dictionary with diseases

Mean of all atoms in

A matrix of compositions;

3-3: fixed with a sick dictionary

And updating the sparse coding coefficient, wherein the objective function at the moment is as follows:

order training sample

Coding coefficient matrix

L₂The coding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary is determined as the optimal sparsity solution

The solution of the objective function is divided into the case that the disease-free training sample is in the disease dictionary

Sparse representation of lower and sick training sample in sick dictionary

The following sparseness represents two iterative steps, which are uniformly simplified as follows:

respectively solving the training samples in the dictionary with diseases by utilizing the OMP algorithm in the SPAMS toolbox

Sparse solution

3-4: fixing sparse coding coefficients and updating a sick dictionary

The objective function at this time is as follows:

through simplification, the method comprises the following steps:

wherein the content of the first and second substances,

method for solving dictionary with diseases by adopting coordinate gradient descent method

And (5) optimal solution.

The histopathological image identification method comprises the concrete step of the fifth step

5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting u blocks to form a matrix H as the test sample, and utilizing

Solving test sample H in dictionary with class mark

Sparse coding of

5-2, calculating the test samples in the non-diseased dictionary D and the diseased dictionary

Sparse reconstructed error vector of₁＝diag((H-DX)(H-DX)^T)，

Where diag (·) represents the elements on the main diagonal of the matrix.

The tissue pathology image identification method comprises the following specific steps

6-1, defining a vector

N_tThe number of the test samples;

6-2, obtaining a classification statistic S from the vector C:

when the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; otherwise, when the classification statistic S is smaller than the threshold Th, the test sample is a diseased sample.

The invention has the beneficial effects that: the method comprises the following steps: firstly, randomly selecting a plurality of image blocks from a histopathology image data set as a training sample and a test sample; inputting different types of training samples into the model, solving the model by using an alternate iteration method, continuously optimizing an objective function, and learning to obtain a dictionary with class marks; and finally, performing sparse representation on the test set matrix based on the obtained dictionary with the class mark, and determining the class of the test set matrix through the comparison of the reconstruction error vector and the threshold value. The invention provides a new model and a new method for the application of dictionary learning in the classification of the tissue pathological images, and the learned dictionary with class marks has better sparse reconstruction and intra-class robustness for similar samples and better inter-class discrimination for non-similar samples, thereby effectively improving the classification performance of the tissue pathological images.

Drawings

FIG. 1 is a flow chart of the present invention.

Fig. 2 is a schematic diagram of histopathology of lung, spleen and kidney in the ADL database, wherein (a) from left to right are non-diseased images of lung, spleen and kidney, respectively, and (b) from left to right are diseased images of lung, spleen and kidney, respectively.

FIG. 3 is a schematic diagram of the histopathology of adenosis and leaf cancer in the BreaKHis database, wherein (a) is the histopathology image of adenosis, and (b) is the histopathology image of leaf cancer.

Detailed Description

The invention is further described below with reference to the figures and examples.

As shown in fig. 1, the present invention comprises the steps of:

the method comprises the following steps: a plurality of image blocks are respectively selected from two images of a certain tissue, namely a disease-free image and a disease-suffering image, and used as disease-free training samples and disease-suffering training samples. The method comprises the following specific steps:

respectively randomly selecting 40 images from two images of a certain tissue without diseases and with diseases, randomly extracting 250 image blocks from each image, wherein the block size is 20 × 20, 10000 color image blocks are counted, then dividing each color image block into three channels of RGB, converting pixel values of the three channels into column vectors and then connecting in series to obtain a feature vector, and finally connecting the feature vectors in parallel to be used as training samples, then Y,

R^1200×10000the size of the matrix is shown, and 110 images of the disease-free image are randomly selected from the rest certain tissue images to be used as test sets.

Step two, optimizing and learning the disease-free dictionary: and establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary by learning through minimizing the objective function in a two-step alternate iterative optimization mode. The method comprises the following specific steps:

Respectively represent the training samples of no disease and disease, X,

for having a fault dictionary

The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0; the model aims to minimize the 1 st item and the 3 rd item and simultaneously maximize the 2 nd item, so that the learned dictionary with the similar labels has better reconstruction performance on similar samples, has poorer reconstruction performance on non-similar samples, even cannot be reconstructed, and has stronger discrimination among the learned dictionaries, thereby obtaining discriminative characteristics so as to further better classify the samples;

order training sample

Coding coefficient matrix

The solution of the objective functionThe solution is divided into two steps of iteration completion of sparse representation of the disease-free training sample under the disease-free dictionary D and sparse representation of the disease-free training sample under the disease-free dictionary D, and unified simplification is as follows:

through simplification, the method comprises the following steps:

where tr denotes the trace of the matrix

The function is a convex function, and the optimal solution of the disease-free dictionary D is obtained by adopting a coordinate gradient descent method.

Step three, optimizing and learning the sick dictionary: and establishing a diseased dictionary learning model by combining the diseased training samples and the non-diseased training samples, and learning to obtain a diseased dictionary by minimizing the objective function in a two-step alternate iteration optimization mode. The method comprises the following specific steps:

wherein Y is,

Respectively represent the training samples of no disease and disease, X,

the Fisher criterion constraint term of the sick dictionary is expressed as:

where m is the mean of all atoms in the disease-free dictionary D,

for having a fault dictionary

Mean of all atoms in the list, M is a dictionary with diseases

Mean of all atoms in

A matrix of compositions; the model aims to minimize the 1 st item and the 3 rd item and maximize the 2 nd item at the same time, so that the learned dictionary with the similar labels has better reconstruction performance on similar samples, has poorer reconstruction performance on non-similar samples and even can not be reconstructed, and has stronger discrimination among the learned dictionaries, thereby obtaining the characteristics with discriminability and further better classification.

3-3: fixed with a sick dictionary

order training sample

Coding coefficient matrix

Sparse representation of lower and sick training sample in sick dictionary

Sparse solution

3-4: fixing sparse coding coefficients and updating a sick dictionary

The objective function at this time is as follows:

through simplification, the method comprises the following steps:

wherein the content of the first and second substances,

An optimal solution;

3-5: and returning to the step two, alternately performing the process of optimally learning the disease-free dictionary and the process of optimally learning the disease-containing dictionary until the maximum iteration times is reached, and stopping.

And step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two.

Step five, obtaining a reconstructed error vector of the test sample: and performing sparse representation on the test sample by using the obtained disease-free dictionary and the disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary. The method comprises the following specific steps:

5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting 250 blocks to form a matrix H as the test sample, and utilizing

Solving test sample H in dictionary with class mark

Sparse coding of

Sparse reconstructed error vector of₁＝diag((H-DX)(H-DX)^T)，

Where diag (·) represents the elements on the main diagonal of the matrix.

Step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value. The method comprises the following specific steps:

6-1, defining a vector

N_tThe number of the test samples;

6-2, obtaining a classification statistic S from the vector C:

Table 1 is a table comparing the classification results of lung images applied to the ADL database by the present invention and other methods.

TABLE 1

Table 2 is a comparison table of the classification results of spleen images applied to the ADL database by the present invention and other methods.

TABLE 2

Table 2 is a comparison table of the classification results of kidney images applied to the ADL database according to the present invention and other methods.

TABLE 3

As can be seen from tables 1, 2 and 3, the diagnosis effect of the model provided by the invention on the diseases of the three organs is obviously better than that of other methods, and the positive fractions of the disease-free samples and the disease-affected samples are improved. Particularly, the lung classification result in table 1 is more obvious, and compared with DFDL, the classification precision of the method is improved by 2-3%. As can be seen from fig. 2, the diseased lung images include large-sized alveoli, while the diseased lung images include small-sized alveoli, which are filled with bluish-purple inflammatory cells, and have more complex textures, and the difference between the diseased and the diseased lung images is significantly greater than that between the spleen and kidney images. Meanwhile, the spleen images without diseases and with diseases have high similarity of texture and structure, but have inferior discriminativity and classification performance due to larger color difference; the images of the disease-free kidney and the disease-affected kidney have high similarity of texture and structure, high color similarity, the worst discriminability and the weakest classification performance. The experimental results in the table correspond exactly to fig. 1, again illustrating the effectiveness of the model proposed by the present invention.

In order to verify the universality of the discriminant feature learning framework of the histopathological image constructed by the invention, the model provided by the invention is particularly applied to diagnosis of disease types in the Breakhis data set.

Table 4 is a table comparing the results of classification in the BreaKHis database applied by the present invention and other methods.

TABLE 4

The classification results of different methods on the BreaKHis database are shown in the table 4, and experimental results show that the model provided by the invention has better disease classification performance on two benign breast cancer images in the graph 3, and the result shows that the model has better effect on effectively improving the reconstructability and robustness of sparse representation of similar samples with a labeled dictionary and simultaneously solves the problem of poor discrimination on non-similar samples.

Claims

1. A tissue pathology image recognition method, comprising the steps of:

the second step comprises the following specific steps

Respectively represent the training samples of no disease and disease, X,

Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L₁The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula

for having a fault dictionary

order training sample

Coding coefficient matrix

through simplification, the method comprises the following steps:

where tr denotes the trace of the matrix

Solving an optimal solution of the disease-free dictionary D by adopting a coordinate gradient descent method; step three, optimizing and learning the sick dictionary: establishing a diseased dictionary learning model by combining a diseased training sample and a disease-free training sample, and learning to obtain a diseased dictionary by minimizing a target function in a two-step alternate iteration optimization mode;

the third step comprises the following specific steps

wherein Y is,

Respectively represent the training samples of no disease and disease, X,

Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L₂The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula

the Fisher criterion constraint term of the sick dictionary is expressed as:

where m is the mean of all atoms in the disease-free dictionary D,

for having a fault dictionary

The mean value of all the atoms in (c),

for having a fault dictionary

Mean of all atoms in

A matrix of compositions;

3-3: fixed with a sick dictionary

order training sample

Coding coefficient matrix

Sparse representation of lower and sick training sample in sick dictionary

Sparse solution

3-4: fixing sparse coding coefficients and updating a sick dictionary

The objective function at this time is as follows:

through simplification, the method comprises the following steps:

wherein the content of the first and second substances,

An optimal solution;

2. The histopathological image recognition method according to claim 1, wherein: the first step is that the image blocks with the same number are respectively selected from two images with diseases and without diseases of a certain tissue, then each image block is divided into RGB three channels, pixel values of the three channels are converted into column vectors and then are connected in series to obtain a feature vector, finally the feature vectors are juxtaposed to be used as training samples Y with diseases and without diseases,

test samples were obtained in the same manner.

3. The histopathological image recognition method according to claim 2, wherein: the concrete steps of the fifth step are

Solving test sample H in dictionary with class mark

Sparse coding of

Sparse reconstructed error vector of₁＝diag((H-DX)(H-DX)^T)，

Where diag (·) represents the elements on the main diagonal of the matrix.

4. The histopathological image recognition method according to claim 3, wherein the concrete step of the sixth step is

6-1, defining a vector

N_tThe number of the test samples;

6-2, obtaining a classification statistic S from the vector C: