CN106845551B - Tissue pathology image identification method - Google Patents

Tissue pathology image identification method Download PDF

Info

Publication number
CN106845551B
CN106845551B CN201710059300.0A CN201710059300A CN106845551B CN 106845551 B CN106845551 B CN 106845551B CN 201710059300 A CN201710059300 A CN 201710059300A CN 106845551 B CN106845551 B CN 106845551B
Authority
CN
China
Prior art keywords
disease
dictionary
free
samples
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710059300.0A
Other languages
Chinese (zh)
Other versions
CN106845551A (en
Inventor
汤红忠
李骁
王翔
毛丽珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN201710059300.0A priority Critical patent/CN106845551B/en
Publication of CN106845551A publication Critical patent/CN106845551A/en
Application granted granted Critical
Publication of CN106845551B publication Critical patent/CN106845551B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a tissue pathology image identification method, which comprises the following steps: selecting disease-free and disease-existing training samples and disease-free and disease-existing testing samples; establishing a disease-free dictionary learning model and a disease dictionary learning model by combining the disease-free training samples and the disease training samples, alternately and iteratively optimizing two objective functions until the maximum iteration times is reached, and learning to obtain a disease-free dictionary and a disease dictionary; performing sparse representation on the test sample by using the disease-free dictionary and the disease dictionary, and respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary; and obtaining classification statistics through sparse reconstruction of the error vector, and determining the category of the test sample through comparison of the classification statistics and a threshold value. The invention provides a new model and a new method for the application of dictionary learning in the classification of histopathology images, and the learned dictionary with class marks has better sparse reconstruction and intra-class robustness for similar samples and better inter-class discrimination for non-similar samples.

Description

Tissue pathology image identification method
Technical Field
The invention relates to a tissue pathology image identification method.
Background
With the development of computer-aided diagnosis technology, research on "digital pathology" is also gradually concerned by researchers, and how to accurately and automatically extract discriminant features hidden in images and provide necessary information for subsequent histopathological image analysis or classification, so that disease grades and classifications are quickly and accurately given has become one of the most challenging research subjects in "digital pathology".
The traditional feature extraction methods are mainly classified into the following two types: the first category is based on domain-specific or task-specific features, such as size and morphological features of biological cells, grayscale or color information of images, texture, etc.; the second major category is mainly based on spatial structure and multi-scale features, such as morphological features, graph methods, scale invariant features, wavelet features, and the like. Most of the traditional feature extraction modes are pixel-level features or manual features, are generally only suitable for specific data objects, are limited in application range, and are high in feature redundancy and low in discriminability.
In recent years, sparse representations have gained significant attention due to their prominent appearance in many computer vision problems. The basic idea is to represent an original signal as a sparse signal based on a set of overcomplete dictionaries. Sparse representation has achieved great success in the fields of image denoising and recovery, face recognition, image classification and the like. With the development of the technology, how to learn the dictionary suitable for a specific problem (such as for image classification) becomes a focus of attention of the scholars, namely a theoretical framework of dictionary learning.
The key of dictionary learning is whether the constructed dictionary has better reconstruction and discriminability. For such problems, Zhang et al propose a discriminant K-SVD (DK-SVD) dictionary learning method. Jiang et al propose a dictionary learning method based on a class Consistent K-SVD (LC-KSVD). Yang et al propose a discriminant dictionary learning (FDDL) method by using a Fisher criterion, and indirectly improve the discriminant performance of the dictionary by constraining sparse representation coefficients. Vu et al propose a Discriminative Feature-oriented Dictionary Learning (DFDL) method and apply it to histopathological image classification. The method can obtain a very good classification effect in image classification.
However, since the different types of histopathology images have different presented features, the cell morphology and geometric structure feature changes greatly in the same type of histopathology images, and the pathological features are diversified, which results in that the feature difference between similar pathological image samples is larger than that between non-similar pathological image samples, so that the degree of similarity between the diseased dictionary and the non-diseased dictionary learned by the method is higher, the discriminability between the non-diseased sample and the diseased sample is still lower, and the classification performance of the non-diseased sample and the diseased sample is still to be improved.
Disclosure of Invention
In order to solve the technical problems, the invention provides a histopathology image identification method with high accuracy and high robustness.
The technical scheme for solving the problems is as follows: a tissue pathology image recognition method, comprising the steps of:
firstly, selecting a plurality of image blocks from disease-free images and disease-containing images of a certain tissue as disease-free training samples and disease-containing training samples, and disease-free testing samples and disease-containing testing samples;
step two, optimizing and learning the disease-free dictionary: establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary through learning by minimizing a target function in a two-step alternate iterative optimization mode;
step three, optimizing and learning the sick dictionary: establishing a diseased dictionary learning model by combining a diseased training sample and a disease-free training sample, and learning to obtain a diseased dictionary by minimizing a target function in a two-step alternate iteration optimization mode;
step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two;
step five, obtaining a reconstructed error vector of the test sample: performing sparse representation on the test sample by using the acquired disease-free dictionary and disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary;
step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value.
The histopathological image identification method comprises the specific steps that image blocks with the same number are respectively selected from a certain tissue disease-free image and a certain tissue disease-free image, each image block is divided into RGB three channels, pixel values of the three channels are converted into column vectors and then are connected in series to obtain a feature vector, finally the feature vectors are connected in parallel to be used as disease-free and disease-free training samples Y,
Figure BDA0001218063270000031
test samples were obtained in the same manner.
In the method for identifying the histopathology image, the second step specifically comprises
2-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure BDA0001218063270000032
2-2: establishing a disease-free dictionary learning model, wherein the model is as follows:
Figure BDA0001218063270000033
wherein argmin represents a variable value at which the objective function is minimized, Y,
Figure BDA0001218063270000034
Respectively represent the training samples of no disease and disease, X,
Figure BDA0001218063270000035
Sparse representation coefficients representing disease-free and disease-prone training samples, N and
Figure BDA0001218063270000036
representing the number of feature vectors, L, of non-diseased and diseased images, respectively1The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure BDA0001218063270000041
Representing the sparse reconstruction error of the disease-free dictionary and the disease-free training sample,
Figure BDA0001218063270000042
representing the reconstruction error of the disease-free dictionary and the disease-containing training sample, wherein F represents a norm, psi (D) is a Fisher criterion constraint term of the disease-free dictionary, and the expression is as follows:
Figure BDA0001218063270000043
wherein M is the mean value of all atoms in the disease-free dictionary D, M is a matrix formed by the mean values M of the atoms in the disease-free dictionary D,
Figure BDA0001218063270000044
for having a fault dictionary
Figure BDA0001218063270000045
The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0;
2-3: fixing the disease-free dictionary D, and updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure BDA0001218063270000046
order training sample
Figure BDA0001218063270000047
Coding coefficient matrix
Figure BDA0001218063270000048
L1The coding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary is optimally solved as
Figure BDA0001218063270000049
Then, the solution of the objective function is completed by two steps of iteration of the sparse representation of the disease-free training sample in the disease-free dictionary D and the sparse representation of the disease-free training sample in the disease-free dictionary D, and the unified simplification is as follows:
Figure BDA00012180632700000410
respectively solving sparse solutions of training samples in the disease-free dictionary D by utilizing OMP algorithm in SPAMS toolbox
Figure BDA00012180632700000411
2-4: fixing the sparse coding coefficient, and updating the disease-free dictionary D, wherein the objective function at the moment is as follows:
Figure BDA00012180632700000412
through simplification, the method comprises the following steps:
Figure BDA00012180632700000413
where tr denotes the trace of the matrix
Figure BDA0001218063270000051
And solving the optimal solution of the disease-free dictionary D by adopting a coordinate gradient descent method.
The tissue pathology image identification method comprises the specific steps of the third step
3-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure BDA0001218063270000052
3-2: a disease dictionary learning model is established, and the model is as follows:
Figure BDA0001218063270000053
wherein Y is,
Figure BDA0001218063270000054
Respectively represent the training samples of no disease and disease, X,
Figure BDA0001218063270000055
Sparse representation coefficients representing disease-free and disease-prone training samples, N and
Figure BDA0001218063270000056
representing the number of feature vectors, L, of non-diseased and diseased images, respectively2The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure BDA0001218063270000057
Representing sparse reconstruction errors of the diseased dictionary and the diseased sample,
Figure BDA0001218063270000058
representing the reconstruction error of the diseased dictionary and the non-diseased sample,
Figure BDA0001218063270000059
the Fisher criterion constraint term of the sick dictionary is expressed as:
Figure BDA00012180632700000510
where m is the mean of all atoms in the disease-free dictionary D,
Figure BDA00012180632700000511
for having a fault dictionary
Figure BDA00012180632700000512
Mean of all atoms in the list, M is a dictionary with diseases
Figure BDA00012180632700000513
Mean of all atoms in
Figure BDA00012180632700000514
A matrix of compositions;
3-3: fixed with a sick dictionary
Figure BDA00012180632700000515
And updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure BDA00012180632700000516
order training sample
Figure BDA00012180632700000517
Coding coefficient matrix
Figure BDA00012180632700000518
L2The coding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary is determined as the optimal sparsity solution
Figure BDA00012180632700000519
The solution of the objective function is divided into the case that the disease-free training sample is in the disease dictionary
Figure BDA00012180632700000520
Sparse representation of lower and sick training sample in sick dictionary
Figure BDA00012180632700000521
The following sparseness represents two iterative steps, which are uniformly simplified as follows:
Figure BDA00012180632700000522
respectively solving the training samples in the dictionary with diseases by utilizing the OMP algorithm in the SPAMS toolbox
Figure BDA00012180632700000523
Sparse solution
Figure BDA0001218063270000061
3-4: fixing sparse coding coefficients and updating a sick dictionary
Figure BDA0001218063270000062
The objective function at this time is as follows:
Figure BDA0001218063270000063
through simplification, the method comprises the following steps:
Figure BDA0001218063270000064
wherein the content of the first and second substances,
Figure BDA0001218063270000065
method for solving dictionary with diseases by adopting coordinate gradient descent method
Figure BDA0001218063270000066
And (5) optimal solution.
The histopathological image identification method comprises the concrete step of the fifth step
5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting u blocks to form a matrix H as the test sample, and utilizing
Figure BDA0001218063270000067
Solving test sample H in dictionary with class mark
Figure BDA0001218063270000068
Sparse coding of
Figure BDA0001218063270000069
5-2, calculating the test samples in the non-diseased dictionary D and the diseased dictionary
Figure BDA00012180632700000610
Sparse reconstructed error vector of1=diag((H-DX)(H-DX)T),
Figure BDA00012180632700000611
Where diag (·) represents the elements on the main diagonal of the matrix.
The tissue pathology image identification method comprises the following specific steps
6-1, defining a vector
Figure BDA00012180632700000612
NtThe number of the test samples;
6-2, obtaining a classification statistic S from the vector C:
Figure BDA00012180632700000613
when the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; otherwise, when the classification statistic S is smaller than the threshold Th, the test sample is a diseased sample.
The invention has the beneficial effects that: the method comprises the following steps: firstly, randomly selecting a plurality of image blocks from a histopathology image data set as a training sample and a test sample; inputting different types of training samples into the model, solving the model by using an alternate iteration method, continuously optimizing an objective function, and learning to obtain a dictionary with class marks; and finally, performing sparse representation on the test set matrix based on the obtained dictionary with the class mark, and determining the class of the test set matrix through the comparison of the reconstruction error vector and the threshold value. The invention provides a new model and a new method for the application of dictionary learning in the classification of the tissue pathological images, and the learned dictionary with class marks has better sparse reconstruction and intra-class robustness for similar samples and better inter-class discrimination for non-similar samples, thereby effectively improving the classification performance of the tissue pathological images.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of histopathology of lung, spleen and kidney in the ADL database, wherein (a) from left to right are non-diseased images of lung, spleen and kidney, respectively, and (b) from left to right are diseased images of lung, spleen and kidney, respectively.
FIG. 3 is a schematic diagram of the histopathology of adenosis and leaf cancer in the BreaKHis database, wherein (a) is the histopathology image of adenosis, and (b) is the histopathology image of leaf cancer.
Detailed Description
The invention is further described below with reference to the figures and examples.
As shown in fig. 1, the present invention comprises the steps of:
the method comprises the following steps: a plurality of image blocks are respectively selected from two images of a certain tissue, namely a disease-free image and a disease-suffering image, and used as disease-free training samples and disease-suffering training samples. The method comprises the following specific steps:
respectively randomly selecting 40 images from two images of a certain tissue without diseases and with diseases, randomly extracting 250 image blocks from each image, wherein the block size is 20 × 20, 10000 color image blocks are counted, then dividing each color image block into three channels of RGB, converting pixel values of the three channels into column vectors and then connecting in series to obtain a feature vector, and finally connecting the feature vectors in parallel to be used as training samples, then Y,
Figure BDA0001218063270000081
R1200×10000the size of the matrix is shown, and 110 images of the disease-free image are randomly selected from the rest certain tissue images to be used as test sets.
Step two, optimizing and learning the disease-free dictionary: and establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary by learning through minimizing the objective function in a two-step alternate iterative optimization mode. The method comprises the following specific steps:
2-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure BDA0001218063270000082
2-2: establishing a disease-free dictionary learning model, wherein the model is as follows:
Figure BDA0001218063270000083
wherein argmin represents a variable value at which the objective function is minimized, Y,
Figure BDA0001218063270000084
Respectively represent the training samples of no disease and disease, X,
Figure BDA0001218063270000085
Sparse representation coefficients representing disease-free and disease-prone training samples, N and
Figure BDA0001218063270000086
representing the number of feature vectors, L, of non-diseased and diseased images, respectively1The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure BDA0001218063270000087
Representing the sparse reconstruction error of the disease-free dictionary and the disease-free training sample,
Figure BDA0001218063270000088
representing the reconstruction error of the disease-free dictionary and the disease-containing training sample, wherein F represents a norm, psi (D) is a Fisher criterion constraint term of the disease-free dictionary, and the expression is as follows:
Figure BDA0001218063270000089
wherein M is the mean value of all atoms in the disease-free dictionary D, M is a matrix formed by the mean values M of the atoms in the disease-free dictionary D,
Figure BDA00012180632700000810
for having a fault dictionary
Figure BDA00012180632700000811
The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0; the model aims to minimize the 1 st item and the 3 rd item and simultaneously maximize the 2 nd item, so that the learned dictionary with the similar labels has better reconstruction performance on similar samples, has poorer reconstruction performance on non-similar samples, even cannot be reconstructed, and has stronger discrimination among the learned dictionaries, thereby obtaining discriminative characteristics so as to further better classify the samples;
2-3: fixing the disease-free dictionary D, and updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure BDA0001218063270000091
order training sample
Figure BDA0001218063270000092
Coding coefficient matrix
Figure BDA0001218063270000093
L1The coding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary is optimally solved as
Figure BDA0001218063270000094
The solution of the objective functionThe solution is divided into two steps of iteration completion of sparse representation of the disease-free training sample under the disease-free dictionary D and sparse representation of the disease-free training sample under the disease-free dictionary D, and unified simplification is as follows:
Figure BDA0001218063270000095
respectively solving sparse solutions of training samples in the disease-free dictionary D by utilizing OMP algorithm in SPAMS toolbox
Figure BDA0001218063270000096
2-4: fixing the sparse coding coefficient, and updating the disease-free dictionary D, wherein the objective function at the moment is as follows:
Figure BDA0001218063270000097
through simplification, the method comprises the following steps:
Figure BDA0001218063270000098
where tr denotes the trace of the matrix
Figure BDA0001218063270000099
The function is a convex function, and the optimal solution of the disease-free dictionary D is obtained by adopting a coordinate gradient descent method.
Step three, optimizing and learning the sick dictionary: and establishing a diseased dictionary learning model by combining the diseased training samples and the non-diseased training samples, and learning to obtain a diseased dictionary by minimizing the objective function in a two-step alternate iteration optimization mode. The method comprises the following specific steps:
3-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure BDA0001218063270000101
3-2: a disease dictionary learning model is established, and the model is as follows:
Figure BDA0001218063270000102
wherein Y is,
Figure BDA0001218063270000103
Respectively represent the training samples of no disease and disease, X,
Figure BDA0001218063270000104
Sparse representation coefficients representing disease-free and disease-prone training samples, N and
Figure BDA0001218063270000105
representing the number of feature vectors, L, of non-diseased and diseased images, respectively2The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure BDA0001218063270000106
Representing sparse reconstruction errors of the diseased dictionary and the diseased sample,
Figure BDA0001218063270000107
representing the reconstruction error of the diseased dictionary and the non-diseased sample,
Figure BDA0001218063270000108
the Fisher criterion constraint term of the sick dictionary is expressed as:
Figure BDA0001218063270000109
where m is the mean of all atoms in the disease-free dictionary D,
Figure BDA00012180632700001010
for having a fault dictionary
Figure BDA00012180632700001011
Mean of all atoms in the list, M is a dictionary with diseases
Figure BDA00012180632700001012
Mean of all atoms in
Figure BDA00012180632700001013
A matrix of compositions; the model aims to minimize the 1 st item and the 3 rd item and maximize the 2 nd item at the same time, so that the learned dictionary with the similar labels has better reconstruction performance on similar samples, has poorer reconstruction performance on non-similar samples and even can not be reconstructed, and has stronger discrimination among the learned dictionaries, thereby obtaining the characteristics with discriminability and further better classification.
3-3: fixed with a sick dictionary
Figure BDA00012180632700001014
And updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure BDA00012180632700001015
order training sample
Figure BDA00012180632700001016
Coding coefficient matrix
Figure BDA00012180632700001017
L2The coding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary is determined as the optimal sparsity solution
Figure BDA00012180632700001018
The solution of the objective function is divided into the case that the disease-free training sample is in the disease dictionary
Figure BDA00012180632700001019
Sparse representation of lower and sick training sample in sick dictionary
Figure BDA00012180632700001020
The following sparseness represents two iterative steps, which are uniformly simplified as follows:
Figure BDA0001218063270000111
respectively solving the training samples in the dictionary with diseases by utilizing the OMP algorithm in the SPAMS toolbox
Figure BDA0001218063270000112
Sparse solution
Figure BDA0001218063270000113
3-4: fixing sparse coding coefficients and updating a sick dictionary
Figure BDA0001218063270000114
The objective function at this time is as follows:
Figure BDA0001218063270000115
through simplification, the method comprises the following steps:
Figure BDA0001218063270000116
wherein the content of the first and second substances,
Figure BDA0001218063270000117
method for solving dictionary with diseases by adopting coordinate gradient descent method
Figure BDA0001218063270000118
An optimal solution;
3-5: and returning to the step two, alternately performing the process of optimally learning the disease-free dictionary and the process of optimally learning the disease-containing dictionary until the maximum iteration times is reached, and stopping.
And step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two.
Step five, obtaining a reconstructed error vector of the test sample: and performing sparse representation on the test sample by using the obtained disease-free dictionary and the disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary. The method comprises the following specific steps:
5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting 250 blocks to form a matrix H as the test sample, and utilizing
Figure BDA0001218063270000119
Solving test sample H in dictionary with class mark
Figure BDA00012180632700001110
Sparse coding of
Figure BDA00012180632700001111
5-2, calculating the test samples in the non-diseased dictionary D and the diseased dictionary
Figure BDA00012180632700001112
Sparse reconstructed error vector of1=diag((H-DX)(H-DX)T),
Figure BDA00012180632700001113
Where diag (·) represents the elements on the main diagonal of the matrix.
Step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value. The method comprises the following specific steps:
6-1, defining a vector
Figure BDA0001218063270000121
NtThe number of the test samples;
6-2, obtaining a classification statistic S from the vector C:
Figure BDA0001218063270000122
when the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; otherwise, when the classification statistic S is smaller than the threshold Th, the test sample is a diseased sample.
Table 1 is a table comparing the classification results of lung images applied to the ADL database by the present invention and other methods.
TABLE 1
Figure BDA0001218063270000123
Table 2 is a comparison table of the classification results of spleen images applied to the ADL database by the present invention and other methods.
TABLE 2
Figure BDA0001218063270000131
Table 2 is a comparison table of the classification results of kidney images applied to the ADL database according to the present invention and other methods.
TABLE 3
Figure BDA0001218063270000132
As can be seen from tables 1, 2 and 3, the diagnosis effect of the model provided by the invention on the diseases of the three organs is obviously better than that of other methods, and the positive fractions of the disease-free samples and the disease-affected samples are improved. Particularly, the lung classification result in table 1 is more obvious, and compared with DFDL, the classification precision of the method is improved by 2-3%. As can be seen from fig. 2, the diseased lung images include large-sized alveoli, while the diseased lung images include small-sized alveoli, which are filled with bluish-purple inflammatory cells, and have more complex textures, and the difference between the diseased and the diseased lung images is significantly greater than that between the spleen and kidney images. Meanwhile, the spleen images without diseases and with diseases have high similarity of texture and structure, but have inferior discriminativity and classification performance due to larger color difference; the images of the disease-free kidney and the disease-affected kidney have high similarity of texture and structure, high color similarity, the worst discriminability and the weakest classification performance. The experimental results in the table correspond exactly to fig. 1, again illustrating the effectiveness of the model proposed by the present invention.
In order to verify the universality of the discriminant feature learning framework of the histopathological image constructed by the invention, the model provided by the invention is particularly applied to diagnosis of disease types in the Breakhis data set.
Table 4 is a table comparing the results of classification in the BreaKHis database applied by the present invention and other methods.
TABLE 4
Figure BDA0001218063270000141
The classification results of different methods on the BreaKHis database are shown in the table 4, and experimental results show that the model provided by the invention has better disease classification performance on two benign breast cancer images in the graph 3, and the result shows that the model has better effect on effectively improving the reconstructability and robustness of sparse representation of similar samples with a labeled dictionary and simultaneously solves the problem of poor discrimination on non-similar samples.

Claims (4)

1. A tissue pathology image recognition method, comprising the steps of:
firstly, selecting a plurality of image blocks from disease-free images and disease-containing images of a certain tissue as disease-free training samples and disease-containing training samples, and disease-free testing samples and disease-containing testing samples;
step two, optimizing and learning the disease-free dictionary: establishing a study model of the disease-free dictionary by combining the disease-free training samples and the disease training samples, and obtaining the disease-free dictionary through learning by minimizing a target function in a two-step alternate iterative optimization mode;
the second step comprises the following specific steps
2-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure FDA0002551522900000011
2-2: establishing a disease-free dictionary learning model, wherein the model is as follows:
Figure FDA0002551522900000012
wherein argmin represents a variable value at which the objective function is minimized, Y,
Figure FDA0002551522900000013
Respectively represent the training samples of no disease and disease, X,
Figure FDA0002551522900000019
Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L1The encoding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure FDA0002551522900000014
Representing the sparse reconstruction error of the disease-free dictionary and the disease-free training sample,
Figure FDA0002551522900000015
representing the reconstruction error of the disease-free dictionary and the disease-containing training sample, wherein F represents a norm, psi (D) is a Fisher criterion constraint term of the disease-free dictionary, and the expression is as follows:
Figure FDA0002551522900000016
wherein M is the mean value of all atoms in the disease-free dictionary D, M is a matrix formed by the mean values M of the atoms in the disease-free dictionary D,
Figure FDA0002551522900000017
for having a fault dictionary
Figure FDA0002551522900000018
The mean values of all atoms in (α), (β) represent the penalty coefficients of the intra-class spacing and the inter-class spacing, α>0;
2-3: fixing the disease-free dictionary D, and updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure FDA0002551522900000021
order training sample
Figure FDA0002551522900000022
Coding coefficient matrix
Figure FDA0002551522900000023
L1The coding sparsity of the disease-free samples and the disease-containing samples under the disease-free dictionary is optimally solved as
Figure FDA0002551522900000024
Then, the solution of the objective function is completed by two steps of iteration of the sparse representation of the disease-free training sample in the disease-free dictionary D and the sparse representation of the disease-free training sample in the disease-free dictionary D, and the unified simplification is as follows:
Figure FDA0002551522900000025
respectively solving sparse solutions of training samples in the disease-free dictionary D by utilizing OMP algorithm in SPAMS toolbox
Figure FDA0002551522900000026
2-4: fixing the sparse coding coefficient, and updating the disease-free dictionary D, wherein the objective function at the moment is as follows:
Figure FDA0002551522900000027
through simplification, the method comprises the following steps:
Figure FDA0002551522900000028
where tr denotes the trace of the matrix
Figure FDA0002551522900000029
Solving an optimal solution of the disease-free dictionary D by adopting a coordinate gradient descent method; step three, optimizing and learning the sick dictionary: establishing a diseased dictionary learning model by combining a diseased training sample and a disease-free training sample, and learning to obtain a diseased dictionary by minimizing a target function in a two-step alternate iteration optimization mode;
the third step comprises the following specific steps
3-1: respectively randomly selecting n column vectors from the training samples without diseases and with diseases as initialized dictionary D without diseases and dictionary with diseases
Figure FDA00025515229000000210
3-2: a disease dictionary learning model is established, and the model is as follows:
Figure FDA0002551522900000031
wherein Y is,
Figure FDA0002551522900000032
Respectively represent the training samples of no disease and disease, X,
Figure FDA0002551522900000033
Sparse representation coefficients representing the training samples of disease-free and disease respectively, N and N representing the number of feature vectors of disease-free and disease images respectively, L2The encoding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary, rho is a regularization parameter, and rho>0; in the formula
Figure FDA0002551522900000034
Representing sparse reconstruction errors of the diseased dictionary and the diseased sample,
Figure FDA0002551522900000035
representing the reconstruction error of the diseased dictionary and the non-diseased sample,
Figure FDA0002551522900000036
the Fisher criterion constraint term of the sick dictionary is expressed as:
Figure FDA0002551522900000037
where m is the mean of all atoms in the disease-free dictionary D,
Figure FDA0002551522900000038
for having a fault dictionary
Figure FDA0002551522900000039
The mean value of all the atoms in (c),
Figure FDA00025515229000000310
for having a fault dictionary
Figure FDA00025515229000000311
Mean of all atoms in
Figure FDA00025515229000000312
A matrix of compositions;
3-3: fixed with a sick dictionary
Figure FDA00025515229000000313
And updating the sparse coding coefficient, wherein the objective function at the moment is as follows:
Figure FDA00025515229000000314
order training sample
Figure FDA00025515229000000315
Coding coefficient matrix
Figure FDA00025515229000000316
L2The coding sparsity of the disease-free samples and the disease-containing samples under the disease dictionary is determined as the optimal sparsity solution
Figure FDA00025515229000000317
The solution of the objective function is divided into the case that the disease-free training sample is in the disease dictionary
Figure FDA00025515229000000318
Sparse representation of lower and sick training sample in sick dictionary
Figure FDA00025515229000000319
The following sparseness represents two iterative steps, which are uniformly simplified as follows:
Figure FDA00025515229000000320
respectively solving the training samples in the dictionary with diseases by utilizing the OMP algorithm in the SPAMS toolbox
Figure FDA00025515229000000321
Sparse solution
Figure FDA00025515229000000322
3-4: fixing sparse coding coefficients and updating a sick dictionary
Figure FDA00025515229000000323
The objective function at this time is as follows:
Figure FDA00025515229000000324
through simplification, the method comprises the following steps:
Figure FDA00025515229000000325
wherein the content of the first and second substances,
Figure FDA0002551522900000041
method for solving dictionary with diseases by adopting coordinate gradient descent method
Figure FDA0002551522900000042
An optimal solution;
step four, judging whether the maximum iteration times is reached, if so, entering step five, and if not, returning to step two;
step five, obtaining a reconstructed error vector of the test sample: performing sparse representation on the test sample by using the acquired disease-free dictionary and disease dictionary, and then respectively calculating sparse reconstruction error vectors of the test sample under the disease-free dictionary and the disease dictionary;
step six: obtaining a classification result of the test sample: obtaining a classification statistic through sparse reconstruction of the error vector, and then determining the category of the test sample through comparison of the classification statistic with a threshold value.
2. The histopathological image recognition method according to claim 1, wherein: the first step is that the image blocks with the same number are respectively selected from two images with diseases and without diseases of a certain tissue, then each image block is divided into RGB three channels, pixel values of the three channels are converted into column vectors and then are connected in series to obtain a feature vector, finally the feature vectors are juxtaposed to be used as training samples Y with diseases and without diseases,
Figure FDA0002551522900000043
test samples were obtained in the same manner.
3. The histopathological image recognition method according to claim 2, wherein: the concrete steps of the fifth step are
5-1, dividing the image of the test sample into blocks, regarding each block as a column vector H, randomly selecting u blocks to form a matrix H as the test sample, and utilizing
Figure FDA0002551522900000044
Solving test sample H in dictionary with class mark
Figure FDA0002551522900000045
Sparse coding of
Figure FDA0002551522900000046
5-2, calculating the test samples in the non-diseased dictionary D and the diseased dictionary
Figure FDA0002551522900000047
Sparse reconstructed error vector of1=diag((H-DX)(H-DX)T),
Figure FDA0002551522900000048
Where diag (·) represents the elements on the main diagonal of the matrix.
4. The histopathological image recognition method according to claim 3, wherein the concrete step of the sixth step is
6-1, defining a vector
Figure FDA0002551522900000051
NtThe number of the test samples;
6-2, obtaining a classification statistic S from the vector C:
Figure FDA0002551522900000052
when the classification statistic S is greater than or equal to the threshold Th, the test sample is a disease-free sample; otherwise, when the classification statistic S is smaller than the threshold Th, the test sample is a diseased sample.
CN201710059300.0A 2017-01-24 2017-01-24 Tissue pathology image identification method Active CN106845551B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710059300.0A CN106845551B (en) 2017-01-24 2017-01-24 Tissue pathology image identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710059300.0A CN106845551B (en) 2017-01-24 2017-01-24 Tissue pathology image identification method

Publications (2)

Publication Number Publication Date
CN106845551A CN106845551A (en) 2017-06-13
CN106845551B true CN106845551B (en) 2020-08-11

Family

ID=59122438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710059300.0A Active CN106845551B (en) 2017-01-24 2017-01-24 Tissue pathology image identification method

Country Status (1)

Country Link
CN (1) CN106845551B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832786B (en) * 2017-10-31 2019-10-25 济南大学 A kind of recognition of face classification method dictionary-based learning
CN109063766B (en) * 2018-07-31 2021-11-30 湘潭大学 Image classification method based on discriminant prediction sparse decomposition model
CN109308485B (en) * 2018-08-02 2022-11-29 中国矿业大学 Migrating sparse coding image classification method based on dictionary field adaptation
CN109376802B (en) * 2018-12-12 2021-08-03 浙江工业大学 Gastroscope organ classification method based on dictionary learning
CN111027594B (en) * 2019-11-18 2022-08-12 西北工业大学 Step-by-step anomaly detection method based on dictionary representation
CN113627556B (en) * 2021-08-18 2023-03-24 广东电网有限责任公司 Method and device for realizing image classification, electronic equipment and storage medium
CN113793319B (en) * 2021-09-13 2023-08-25 浙江理工大学 Fabric image flaw detection method and system based on category constraint dictionary learning model
CN114428873B (en) * 2022-04-07 2022-06-28 源利腾达(西安)科技有限公司 Thoracic surgery examination data sorting method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9946931B2 (en) * 2015-04-20 2018-04-17 Los Alamos National Security, Llc Change detection and change monitoring of natural and man-made features in multispectral and hyperspectral satellite imagery
CN104866810B (en) * 2015-04-10 2018-07-13 北京工业大学 A kind of face identification method of depth convolutional neural networks
CN105844223A (en) * 2016-03-18 2016-08-10 常州大学 Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning

Also Published As

Publication number Publication date
CN106845551A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN106845551B (en) Tissue pathology image identification method
CN110717354B (en) Super-pixel classification method based on semi-supervised K-SVD and multi-scale sparse representation
CN108509854B (en) Pedestrian re-identification method based on projection matrix constraint and discriminative dictionary learning
CN102609681B (en) Face recognition method based on dictionary learning models
Xie et al. Texture classification via patch-based sparse texton learning
CN109447123B (en) Pedestrian re-identification method based on label consistency constraint and stretching regularization dictionary learning
CN109753950B (en) Dynamic facial expression recognition method
CN109815357B (en) Remote sensing image retrieval method based on nonlinear dimension reduction and sparse representation
CN112836671B (en) Data dimension reduction method based on maximized ratio and linear discriminant analysis
CN108509925B (en) Pedestrian re-identification method based on visual bag-of-words model
Ensafi et al. A bag of words based approach for classification of HEp-2 cell images
CN111652273B (en) Deep learning-based RGB-D image classification method
CN104077742B (en) Human face sketch synthetic method and system based on Gabor characteristic
CN110796022B (en) Low-resolution face recognition method based on multi-manifold coupling mapping
CN105608478A (en) Combined method and system for extracting and classifying features of images
CN106529586A (en) Image classification method based on supplemented text characteristic
Zheng et al. Probability fusion decision framework of multiple deep neural networks for fine-grained visual classification
CN108460400A (en) A kind of hyperspectral image classification method of combination various features information
Suo et al. Structured dictionary learning for classification
CN111695455B (en) Low-resolution face recognition method based on coupling discrimination manifold alignment
CN105868711B (en) Sparse low-rank-based human behavior identification method
CN113793319A (en) Fabric image flaw detection method and system based on class constraint dictionary learning model
CN110097499B (en) Single-frame image super-resolution reconstruction method based on spectrum mixing kernel Gaussian process regression
Li et al. Supervised learning on local tangent space
CN110543845B (en) Face cascade regression model training method and reconstruction method for three-dimensional face

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant