WO2022117081A1 - Application of morphological feature of circulating tumor cell in clinical diagnosis and treatment of gastric cancer - Google Patents

Application of morphological feature of circulating tumor cell in clinical diagnosis and treatment of gastric cancer Download PDF

Info

Publication number
WO2022117081A1
WO2022117081A1 PCT/CN2021/135426 CN2021135426W WO2022117081A1 WO 2022117081 A1 WO2022117081 A1 WO 2022117081A1 CN 2021135426 W CN2021135426 W CN 2021135426W WO 2022117081 A1 WO2022117081 A1 WO 2022117081A1
Authority
WO
WIPO (PCT)
Prior art keywords
ctc
gastric cancer
circulating tumor
ctcs
tumor cells
Prior art date
Application number
PCT/CN2021/135426
Other languages
French (fr)
Chinese (zh)
Inventor
孙益红
汪学非
金炜翔
唐兆庆
刘天舒
石红岩
方勇
彭海翔
温冬
Original Assignee
复旦大学附属中山医院
骏实生物科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 复旦大学附属中山医院, 骏实生物科技(上海)有限公司 filed Critical 复旦大学附属中山医院
Publication of WO2022117081A1 publication Critical patent/WO2022117081A1/en

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • G01N33/56966Animal cells
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/574Immunoassay; Biospecific binding assay; Materials therefor for cancer
    • G01N33/57407Specifically defined cancers
    • G01N33/57446Specifically defined cancers of stomach or intestine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/52Predicting or monitoring the response to treatment, e.g. for selection of therapy based on assay results in personalised medicine; Prognosis
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2800/00Detection or diagnosis of diseases
    • G01N2800/56Staging of a disease; Further complications associated with the disease

Definitions

  • the present invention relates to the use of morphological characteristics of circulating tumor cells in the construction of a system for predicting or judging the condition of gastric cancer in a subject.
  • the invention establishes the correlation between CTC morphology and cancer, especially gastric cancer, and realizes the prediction or judgment of gastric cancer according to the morphological classification of circulating tumor cells.
  • Circulating tumor cells are the general term for various types of tumor cells present in peripheral blood. Typically, pre-existing tumor metastases or the release of circulating tumor cells are responsible for the formation of secondary tumors. Circulating tumor cells fall off from solid tumor lesions (primary tumor, metastases) spontaneously or due to diagnosis and treatment operations. Most of the circulating tumor cells undergo apoptosis or phagocytosis after entering the peripheral blood, and a few can escape and anchor, and then develop into Metastases, the so-called metastasis of malignant tumors.
  • Invasion and metastasis of malignant tumors are the key links in the recurrence and metastasis of patients, which often lead to the failure of tumor treatment, increase the risk of death of patients with malignant tumors, and endanger the lives of patients. Therefore, if the circulating tumor cells can be detected early after they enter the peripheral blood, it is of great significance for medical personnel to take effective treatment measures in the early stage.
  • CTC testing can often be performed on cancer patients to provide early diagnosis, cancer treatment for primary or secondary cancer growth and confirmation. Separation and enrichment of CTCs only requires a small amount of peripheral blood drawn from patients, which has no side effects on patients. Therefore, high-frequency monitoring can be performed to achieve the purpose of real-time monitoring of disease progression. More importantly, CTC can be used as a real-time sample for analyzing the biological characteristics of a patient's tumor, which can discover the real-time biological changes of the patient, and adjust the treatment plan in time according to the results to achieve real-time individualized treatment. In addition, since the number, type and characteristics of CTCs present in a patient's circulating blood correlates with overall prognosis and response to therapy, it may further provide an effective tool for prognosis of cancer patients.
  • CTC-related sequences including but not limited to CTC self-nucleic acid sequences, CTC expressed protein sequences, etc.
  • CN103597354A discloses a method for predicting and improving the survival of gastric cancer patients, wherein an attempt is made to predict the postoperative survival of subjects with early gastric cancer after tumor surgery.
  • the method relies on detecting the activation state or level of a particular combination of signal transduction protein analytes in cancer cells obtained from a subject selected from the group consisting of primary tumor cells, circulating tumor cells (CTCs), ascites tumors Cells (ATC) and combinations thereof. But this method is not based on the morphology of cells.
  • the inventors established the correlation between CTC morphology and cancer, especially gastric cancer by morphological classification of circulating tumor cells, and found that CTC morphology was correlated with cancer conditions, thus realizing the prediction or judgment of cancer according to the morphological classification of CTCs 's condition.
  • the present invention is based on the above-mentioned findings, and therefore, one aspect of the present invention relates to the use of a circulating tumor cell morphological signature in the construction of a system for predicting or judging a cancer condition in a subject. .
  • the prediction or determination is made for a subject who has or is at risk of developing gastric cancer or who has had gastric cancer but has been cured.
  • the morphological characteristics of the circulating tumor cells are obtained by immunofluorescence staining.
  • the gastric cancer condition comprises one or more of gastric cancer classification, gastric cancer pTNM staging, gastric cancer overall survival, gastric cancer progression-free survival.
  • the system includes a classification module, which inputs the morphological features of circulating tumor cells obtained by immunofluorescence staining into the classification module, and outputs the prediction result of the gastric cancer condition of the subject to which the circulating tumor cells belong.
  • the classification module includes a classifier constructed by machine learning, and the classifier classifies the circulating tumor cells according to the morphological features of the circulating tumor cells to determine whether the classification of the circulating tumor cells is related to the gastric cancer of the subject. disease-related relationship.
  • the classifier employs a k-means clustering algorithm.
  • the morphological characteristics characterize the size of the nucleus, the morphology of the nucleus, the size of the cell membrane and/or the plasma, the morphology of the cell membrane and/or the plasma, the expression of circulating tumor cell markers, the presence of circulating tumor cell markers in the cell membrane and One or more of the distribution of/or pulp.
  • further screening and/or principal component analysis is performed on parameters obtained to characterize the morphology of circulating tumor cells.
  • the morphological characteristics of circulating tumor cells of claim 1 are used.
  • Figure 1 shows the overall survival curve of patients under the condition of independent observation of each type of CTC in the preoperative CTC detection. .
  • Figure 2 shows the overall survival curve of patients in the case of a combination of T4 and T6 CTCs in the preoperative CTC detection.
  • Figure 3 shows the progression-free survival curve of patients under the condition of independent observation of each type of CTC in postoperative CTC detection.
  • Figure 4 shows the progression-free survival curve of patients in the case of combining T3 type and T6 type of CTC in postoperative CTC detection.
  • the present invention provides an application of morphologically classified CTC cells based on machine learning in predicting or judging cancer conditions, and using machine learning-based morphologically classified CTC cells to predict or determine Methods of diagnosing cancer status.
  • CTC circulating tumor cells
  • CTCs refer to cells exfoliated from solid tumors.
  • CTCs are often epithelial cells shed from solid tumors present at very low concentrations in the circulation of patients with advanced cancer.
  • CTCs can also be mesothelial from sarcoma or melanocytes from melanoma.
  • CTCs can also be cells derived from primary, secondary or tertiary tumors.
  • CTCs can also be circulating cancer stem cells.
  • CTCs can also be any cells in a sample from a subject that are indicative of the presence of cancer or other disorders.
  • the term "subject sample” means a sample collected from a subject for detection of the presence or absence of circulating tumor cells therein.
  • the subject sample may also not originate from the circulatory system, ie not from the blood.
  • the subject sample can be any sample containing CTCs suitable for detection from sources including whole blood, bone marrow, pleural fluid, peritoneal fluid, central spinal fluid, emulsion, urine, tears, sweat, saliva, organ secretions, and bronchi , nasal cavity, throat, etc. rinse.
  • the subject sample is blood, including, for example, whole blood or any portion or component thereof.
  • Blood samples suitable for use in the present invention may be extracted from any known source including blood cells or components thereof, such as veins, arteries, peripheral, tissue, spinal cord, and the like.
  • the obtained sample can be obtained and processed using well-known and conventional clinical methods (eg, procedures for drawing and processing whole blood).
  • An exemplary sample can be peripheral blood drawn from a cancer subject.
  • a "clinical CTC sample” refers to a sample of a subject that contains at least one CTC, including a sample of a subject judged to contain at least one CTC by a CTC/WBC classifier.
  • cancer includes a variety of cancer types well known in the art including, but not limited to, dysplasia, hyperplasia, solid tumor, and hematopoietic cancer. Many types of cancer are known to metastasize and shed circulating tumor cells or are metastatic, such as secondary cancers arising from primary cancers that have metastasized. Additional cancers may include, but are not limited to, the following organs or systems: brain, heart, lung, gastrointestinal, genitourinary, liver, bone, nervous system, gynecological, hematological, skin, breast, and adrenal glands.
  • cancer cells include gliomas (schwannomas, glioblastomas, astrocytomas), neuroblastomas, pheochromocytomas, gangliomas, meningiomas, adrenocortical carcinomas, medulloblastomas tumor, rhabdomyosarcoma, kidney cancer, various types of vascular cancer, osteoblast cancer, prostate cancer, ovarian cancer, uterine fibroids, salivary gland cancer, choroid plexus cancer, breast cancer, pancreatic cancer, colon cancer and adult cancer Megakaryocyte carcinoma; and skin cancer including malignant melanoma, basal cell carcinoma, squamous cell carcinoma, Karposi tumor, structurally abnormal nevi, lipoma, hemangioma, dermatofibroma, keloid, sarcoma (e.g. fibrosarcoma or vascular endothelium) tumor and melanoma).
  • the cancer is preferably gastric cancer.
  • condition refers to various information after a subject suffers from cancer, including but not limited to tumor type, tumor location, tumor type, tumor stage, the subject's overall survival, the subject's progression-free survival, etc.
  • the pTNM staging is the pathological staging in the TNM staging system for gastric cancer, which is specifically recorded in the seventh edition of the "AJCC Cancer Staging Manual" by F.L. Greene, D.L. Peggy, I.D. Fleming, A.G. Tumors in different parts have their own staging standards, and they are classified according to different tumors.
  • the T in TNM staging refers to the condition of the primary tumor. With the increase of tumor volume and the increase of the scope of involvement of adjacent tissues, it is represented by T1 to T4 in turn, and can be further subdivided.
  • N in the TNM staging refers to the involvement of regional lymph nodes. When the lymph nodes are not involved, it is represented by N0.
  • N1-N3 With the increase of the degree and scope of lymph node involvement, it is represented by N1-N3 in turn.
  • M in the TNM staging refers to distant metastasis, those without distant metastasis are indicated by M0, and those with distant metastases are indicated by M1.
  • M1 The following are the staging standards of T, N, and M in the TNM staging of gastric cancer:
  • Tx Primary tumor cannot be assessed
  • T1 Tumor invades the lamina limbal mesa, muscularis mucosa, or submucosa
  • T1a Tumor invades the lamina muscularis mucosae
  • T1b Tumor invades the submucosa
  • T2 Tumor invades the muscularis
  • T3 Tumor penetrates subserosal connective tissue without invading the peritoneum or adjacent structures
  • T4 Invasion of serosa or adjacent structures
  • T4a Tumor penetrates the serosa
  • T4b Tumor invades adjacent organs
  • N1 1-2 lymph node metastases
  • N2 3-6 lymph node metastases
  • N3a 7-15 lymph node metastases
  • N3b Equal to or more than 16 lymph node metastases
  • Tumors that penetrate the muscularis basement, enter the gastrocolic or hepatogastric ligament, or enter the omentum, but do not penetrate the visceral peritoneum covering these structures should be classified as T3. If the visceral peritoneum covering these structures is penetrated, it should be classified as T4.
  • the aforementioned adjacent structures of the stomach include the spleen, transverse colon, liver, diaphragm, pancreas, abdominal wall, adrenal glands, kidneys, small intestine, and retroperitoneum.
  • the tumor extends intramurally to the duodenum or esophagus, and T stage is determined by the deepest site of invasion, including the stomach.
  • a specific stage is delineated using a combination of the three TNM indicators.
  • Table 1 below shows a typical staging schedule.
  • Intestinal-type gastric cancer originates from the intestinal metaplasia mucosa, generally has obvious glandular structure, the tumor cells are columnar or cuboid, with a brush border, and the tumor cells secrete acidic mucous substances, similar to the structure of intestinal cancer; often accompanied by atrophic Gastritis and intestinal metaplasia are more common in elderly men, with a longer course of disease, higher incidence and better prognosis.
  • Diffuse-type gastric cancer originates from the intrinsic gastric mucosa. Cancer cells are poorly differentiated, grow diffusely, lack cell connections, and generally do not form glandular ducts.
  • all survival includes a description of the clinical survival time of a patient following diagnosis of a disease (eg, cancer) or treatment of the disease.
  • a disease eg, cancer
  • progression-free survival includes the length of time during and after treatment for a particular disease (eg, cancer) during which a patient survives with the disease but without other symptoms of the disease.
  • a particular disease eg, cancer
  • classifier refers to a specific algorithm and data processing method, combination of parameters. As long as a specific algorithm, a data processing method, or any one of the parameters are different in type and value, they are regarded as different classifiers.
  • the data processing refers to the processing performed on the data in order to eliminate the influence of the data itself on the classification result. Data processing methods include, but are not limited to, data centralization, data normalization (normalization), and preprocessing for unbalanced data.
  • CTC/WBC classifier circulating tumor cell/leukocyte classifier
  • CTC type classifier circulating tumor cell type classifier
  • the present invention provides a basis for predicting or judging cancer stages, types, overall survival of patients, and progression-free survival of patients by correlating CTC morphological classification with the condition of cancer, especially gastric cancer.
  • the CTCs of the present invention can be obtained from patients who have been confirmed to have cancer, and can also be obtained from ordinary subjects who are not sure whether they have cancer. If CTCs are obtained from ordinary subjects who are uncertain whether they have cancer, it is necessary to first confirm the presence of CTC cells in the subject's sample. As mentioned above, we refer to subject samples from patients with cancer as "clinical CTC samples", i.e. they contain CTCs.
  • the sample is made into a fluorescent image, and then image recognition is performed on the original fluorescent image to determine whether the subject sample contains CTCs.
  • subject samples are collected from subjects, they are preprocessed, enriched, and stained to produce raw fluorescent images.
  • the peripheral blood is incubated with a combination of erythrocyte-specific antibodies and leukocyte-specific antibodies to couple the erythrocytes and leukocytes in the whole blood sample, and then the cells in the blood can be separated and stratified according to their own density by density gradient centrifugation.
  • the blood sample after gradient density centrifugation will be divided into 4 layers, as shown in Figure 1: from top to bottom are plasma, monocytes, density gradient centrifuge, red blood cells and white blood cells; CTCs, as monocytes, will be in monocytes
  • the cell layer is extracted from the mononuclear cell layer to achieve the enrichment of CTCs in the blood.
  • the monocyte layer contains CTCs and leukocytes (WBCs) coupled to the CTCs.
  • WBCs leukocytes
  • the mononuclear cell layer is extracted, it is washed and then subjected to the subsequent immunofluorescence staining process, including a series of processes such as fixation, permeabilization, and fluorescent antibody staining.
  • fluorescent antibodies used in the staining process include staining cell nuclei.
  • DAPI is a blue fluorescent DNA stain that binds to the AT region of dsDNA to increase fluorescence approximately 20-fold. It is excited by a violet (405nm) laser line and is commonly used as nuclear counterstain in fluorescence microscopy, flow cytometry and chromosomal staining.
  • TRITC is a high-performance derivative of rhodamine dye, activated to easily and reliably label antibodies, proteins and other molecules used as fluorescent probes.
  • Cy5 is a bright, far-red fluorescent dye with excitation light ideal for 633nm or 647nm laser lines; used to label protein and nucleic acid conjugates.
  • the dyes that can be used for fluorescent antibody staining are not limited to the above, and various conventionally used dyes for immunofluorescence staining such as FITC, RB200, PE, EpCAM, CK, and CD45 can also be used. Those skilled in the art can select appropriate immunofluorescent dyes and staining methods according to common knowledge in the art.
  • the immunofluorescence staining process there will be about 200ul of samples remaining, which will contain about 0 to 100 CTCs, leukocytes, platelets, and impurities.
  • the number of leukocytes is 0-200, such as 50-150, 75-125, 85-115, 95-110, 100-105, 103-104.
  • the criterion for CTC is DAPI+ and TRITC+ and CY5-
  • the criterion for WBC is DAPI+ and CY5+; "+” means there is a fluorescent signal
  • "-" means no fluorescent signal.
  • the stained sample is transferred to one of the wells of a 96-well plate, and then scanned, using equipment commonly used in the field, such as a commercial scanner ThermoFisher CX5.
  • equipment commonly used in the field such as a commercial scanner ThermoFisher CX5.
  • the scanner can be placed on the automatic moving stage to take pictures. Realize the complete scanning of the sample in a single well; because it is an immunofluorescence stained sample, it is possible to switch different fluorescence channels to take pictures when taking pictures in each area, and then superimpose them later.
  • each set of images includes images from each fluorescence channel, which is the original fluorescence image of the clinical sample input to the image preprocessing module of the present invention.
  • each set of images contains images from three fluorescence channels, DAPI, TRITC, and CY5.
  • the tinted samples generated 169 sets of images.
  • the image of the entire sample area is used for image recognition, which includes the following steps:
  • Step 1 Input the original fluorescence image of the subject sample; the original fluorescence image is the original fluorescence image obtained by scanning the subject sample after preprocessing, enrichment and fluorescence staining as above.
  • Step 2 Perform image preprocessing on the original fluorescent image of the subject sample; the image preprocessing includes steps: (1) image correction, correcting uneven image signals and backgrounds caused by uneven illumination intensity; (2) identifying Primary target, identifies a target that has a signal in the channel set as the primary target, and cuts the picture into an image centered on that target, i.e.
  • the primary target can be one or more ;
  • the primary target is the DAPI channel; (3) Identify the secondary target; on the basis of identifying the primary target with a signal, identify the secondary target, and cut the picture into images centered on the target, respectively, That is, an image of a single cell about a secondary target; the secondary target can be one or more for simultaneous identification; in addition to the primary target and secondary target, there can also be one or more secondary targets, and low priority other targets on the secondary target. That is, all the targets are divided into multiple levels according to the recognition priority, and each level has one or more targets.
  • the total number of targets is the number of staining channels; (4) Calculate various characteristic parameters, respectively calculate the morphological parameters of the identified primary targets and secondary targets, and the fluorescence signal intensity parameters of each channel; (5) Data export and save, Export and save data for various morphological parameters, as well as individual cell images for primary and secondary targets; in one instance, the primary target is the DAPI channel, and the secondary targets are the TRITC channel and CY5 channel; in another instance , the primary target is the TRITC channel, the secondary target is the DAPI channel and the CY5 channel; in another example, the primary target is the CY5 channel, and the secondary target is the TRITC channel and the DAPI channel; preprocessing can be performed by conventional cell image analysis software , including but not limited to Cellprofiler, Celleste, CMIS. In one example, preprocessing is performed by Cellprofiler.
  • Step 3 Output the image and characteristic parameters of a single cell in the subject sample; the single cell includes CTC and WBC; output the single cell image and characteristic parameters of the primary target and the secondary target respectively; the calculation of various characteristic parameters , including but not limited to calculating the morphological parameters of the primary target and the secondary target, the fluorescence signal intensity of each channel; the morphological parameters include but are not limited to the size and shape (Area&Shape), signal intensity ( Intensity), surface structure (Texture), correlation (Correlation) and other parameters that can characterize cell morphology. In addition, morphological parameters also include the relationship between the above parameters, such as the distribution of fluorescent signals in cells.
  • the CTC/WBC classifier is used for automatic interpretation (classification) to determine whether the subject sample contains CTC, and the subject sample containing at least one CTC is judged as a clinical CTC sample.
  • the CTC/WBC classifier can choose to use an existing classifier, or choose to use a classifier optimized according to the characteristics of the subject samples and clinical CTC samples.
  • a classifier optimized according to the characteristics of subject samples and/or clinical CTC samples a classifier established based on existing subject samples and/or clinical CTC samples may be used, or Use classifiers trained based on ad hoc collected/obtained subject samples and/or fractions of clinical CTC samples.
  • the establishment of the CTC/WBC classifier can be performed according to the method of establishing a supervised machine learning classifier in the art, which sequentially includes establishing a training set, establishing a candidate CTC/WBC classifier, and optimizing the candidate CTC/WBC. Classifier, choose the best CTC/WBC classifier.
  • the training set is the dataset used to train the CTC/WBC classifier.
  • the training set includes data of CTC and data of WBC.
  • Each piece of data is a cell, and a piece of data includes multiple dimensions, that is, the aforementioned various characteristic parameters.
  • each piece of data also includes a label, which indicates whether the data (cell) is CTC or WBC.
  • the source of the label can be the existing cell morphology database, immunofluorescence database and existing papers and literature, etc., or it can also be derived from the manual judgment and labeling of the images and various characteristic parameters of the single cell in the aforementioned steps 1 to 3. the resulting label.
  • CTC and WBC When manually judging and labeling the aforementioned single cell images and various characteristic parameters, since there are only two types of cells in this stage, CTC and WBC, the results of judgment and labeling are only "for CTC" and "non-CTC”. ” (i.e. “for WBC”).
  • the labels are derived from existing cell morphology databases, immunofluorescence databases, and existing papers, etc., the data of cells marked as CTC and WBC are directly selected and used. Those skilled in the art know how to select suitable data from existing databases and literature.
  • characteristic parameters indicate different biological meanings. There are some characteristic parameters that can more effectively characterize whether a cell is a CTC or not, while some characteristic parameters are less relevant to whether a cell is a CTC or not. Therefore, screening the characteristic parameters that can better characterize cellular CTCs and assigning appropriate weights will help to improve the generalization performance of the CTC/WBC classifier, and at the same time, can effectively reduce the redundant computation.
  • the process of screening the optimal feature parameter set includes:
  • the scale() function of the R language package is used for data centering and normalization. Specifically, for the dataset of each feature parameter, the following processing is performed:
  • Some morphological parameters are highly correlated, and repeated use will cause their importance (weight) to be unreasonably increased.
  • the size of the nucleus itself will affect the overall size of the cell to a certain extent, but if the two parameters of the size of the nucleus and the overall size of the cell are used at the same time, the weight of the characteristic parameter of the size of the nucleus will actually be increased.
  • the present invention adopts the Pearson correlation coefficient to eliminate morphological parameters with high correlation, and the coefficient is widely used to measure the degree of linear correlation between two variables.
  • n is the sample size
  • x i , y i are the values of the ith sample, are the mean values of the datasets x and y, respectively;
  • RFE feature parameter importance
  • the unimportant feature parameters are eliminated to obtain a new training set.
  • the amount of data included in the new training set (the number of CTCs and WBCs in the training set remains unchanged), but the number of feature parameters included in each piece of data is reduced (ie, dimensionality reduction).
  • the new training set use a variety of supervised machine learning algorithms and fusion model algorithms, multiple preprocessing methods for unbalanced training sets, multiple evaluation methods, cross-validation, and optimize the parameters of the CTC/WBC classifier. , to understand the best performance that each candidate CTC/WBC classifier can achieve.
  • Supervised machine learning algorithms that can be used include but are not limited to K-Nearest Neighbors (KNN), Stochastic Gradient Boosting (GBM), AdaBoost Classification Trees (ADABOOST), Support Vector Machines (Support Vector Machines) Vector Machines, SVM), Random Forest (RF), Naive Bayes ( Bayes, NB), Extreme Gradient Boosting (XGB), Artificial Neural Network (ANN), Decision Tree (Decision Tree), Logistic Regression (Logistics Regression), Linear regression (Linear regression) and other fields commonly used algorithms.
  • KNN K-Nearest Neighbors
  • GBM Stochastic Gradient Boosting
  • ADABOOST AdaBoost Classification Trees
  • Support Vector Machines Support Vector Machines
  • RF Random Forest
  • Naive Bayes Bayes, NB
  • Extreme Gradient Boosting XGB
  • Artificial Neural Network ANN
  • Decision Tree Decision Tree
  • Logistic Regression Logistics Regression
  • Linear regression Linear regression
  • the training set is derived from the use of temporally collected/obtained subject samples and/or clinical CTC samples, the number of positive and negative data in the training set varies greatly, and the training The ensemble is called an imbalanced training set (also called a biased training set). This will lead to the failure of some commonly used metrics for evaluating CTC/WBC classifiers. Therefore, it is necessary to pre-process the imbalanced training set.
  • the methods that can be used to pre-process the imbalanced training set include but are not limited to Original, Up-sampling, Down-sampling, Synthetic Minority Over-sampling Technique, SMOTE), random oversampling (Random Over Sampling Examples, ROSE), etc.
  • Evaluation metrics include but are not limited to confusion matrix, ROC, PR, mAP, AUC, etc.
  • the F1 score, recall rate, and TPR indicators focus more on the sensitivity of identifying CTCs. Therefore, F1 score, recall rate, TPR The higher the value, the higher the sensitivity of the system to identify CTCs. According to this, the CTC/WBC classifier with excellent performance was selected.
  • the F1 score is the harmonic value of the precision rate and the recall rate, which is closer to the smaller of the two numbers, so when the precision rate and the recall rate are close, the F1 value is the largest.
  • TP True Positive, true positive, predicted to be a positive sample, and actually a positive sample.
  • TN True Negative, true negative, predicted as a negative sample, actually a negative sample.
  • TPR True Positive Rate
  • PR is a curve composed of Precision (Y-axis) and Recall (X-axis).
  • ROC Receiveiver Operating Characteristic
  • FPR the abscissa of the plane
  • TPR the ordinate
  • a TPR and FPR point pair can be obtained according to its performance on the test sample. In this way, this classifier can be mapped to a point on the ROC plane.
  • a curve passing through (0, 0), (1, 1) can be obtained, which is the ROC curve of this classifier. In general, this curve should be above the line connecting (0, 0) and (1, 1). The larger the area under the ROC curve, the better the classifier.
  • AUC Absolute Under Curve
  • Screening efficiency is 100%* (the number of non-CTCs excluded by the image recognition system/total number of cells in the sample), the higher the screening efficiency value, the higher the specificity of the image recognition system to identify CTCs.
  • the generalization performance of the CTC/WBC classifier can also be directly detected and evaluated by using cells that are clinically known to be CTCs.
  • evaluation indicators include but are not limited to CTC concordance, Positive sample concordance, Screening efficiency, etc.
  • CTC concordance positive sample concordance
  • Screening efficiency etc.
  • the best CTC/WBC classifier was selected according to the following principles: F1 score, recall rate, TPR value as high as possible, positive sample consistency reaching 100 %, on this basis, it is preferred that the CTC consistency is higher than 90% or the screening efficiency is higher than 95%. If there are multiple qualified CTC/WBC classifiers, select one of them according to actual needs. For example, you can select the one with the highest CTC consistency among several CTC/WBC classifiers.
  • the type of CTC can be classified using the CTC type classifier. Multiple cells known to be identified as CTCs can also be provided directly for classification.
  • a class of CTCs refers to CTCs with similar characteristics in terms of cell morphology and immunofluorescence response.
  • the morphological parameters of CTC cells and CTC immunofluorescence markers are used as features, and CTCs are divided into multiple types.
  • the CTC cells When classifying the CTC cells screened by the CTC/WBC classifier, the CTC cells have already undergone fluorescence image production, and the images used in the CTC/WBC classifier can be directly used.
  • Fluorescence images can be obtained using the same method as in the CTC/WBC classifier.
  • the CTC cells When classifying the CTC cells screened by the CTC/WBC classifier, the CTC cells have been image-recognized, and the cell morphological parameters used in the CTC/WBC classifier can be directly used.
  • De novo image identification of the cells is required when classifying multiple cells that directly provide known determinations to be CTCs.
  • Image recognition and cytomorphological parameters can be obtained using the same method as in the CTC/WBC classifier.
  • CTC type classification is performed using unsupervised machine learning.
  • unsupervised machine learning a labeled training set is not needed, but the feature parameters of all the data are directly fed into the classifier.
  • the selection and processing of feature parameters will help improve the classification generalization performance and optimize the classification results.
  • data centralization and normalization are performed on the characteristic parameters as required.
  • Morphological parameters are highly correlated, and repeated use will lead to an unreasonable increase in their importance (weight), so the same method as the previous one is adopted here, and the Pearson correlation coefficient is used to eliminate the highly correlated ones. Morphological parameters. Its specific implementation is also as described above.
  • Principal Component Analysis is a statistical method. Principal component analysis aims to use the idea of dimensionality reduction to convert multiple indicators into a few comprehensive indicators. Specifically, through the compression algorithm of the matrix, the principal component analysis can reduce the dimension of the matrix while retaining the main characteristics of the matrix as much as possible, so that the space and data volume can be greatly saved, and the storage cost and calculation complexity can be reduced. to obtain higher accuracy.
  • the data is transformed into a new coordinate system through the orthogonalized linear transformation of principal component analysis.
  • the entire data set includes N characteristic parameters and n data (ie, n CTC cells)
  • n data ie, n CTC cells
  • the coordinate system is a high-dimensional coordinate system.
  • the center of the coordinate axis is moved to the center of the data, and then the coordinate axis is rotated so that the variance of the data on a certain axis is the largest, that is, the projection of all n data individuals in this direction is the most dispersed.
  • PC1 axis which is the first principal component.
  • PC2 the second principal component
  • PC3 the third principal component
  • PCN Nth principal component
  • the composition of the final principal components is the smallest number of principal components that retains more than 95% of the cumulative variance information (cumulative proportion). For example, as shown in the figure above, after retaining the 8 principal components of PC1 to PC8, about 96% of the cumulative variance information can be retained.
  • the above 95% value is a commonly used value of confidence in statistics, and a value greater than 95% or less than 95% can also be used as required.
  • this value can also be adjusted based on the subsequent validation effect (ie the effect of the final cell clustering).
  • CTC type classifier commonly used unsupervised machine learning algorithms can be used to build the classifier, including but not limited to Hierarchical Clustering, Expectation Maximization (EM), Restricted Boltzmann Machine , artificial neural network, k-means clustering (k-means), anomaly detection method (Anomaly Detection), auto-encoder (Auto-encoder), deep belief network (DeepBeliefNetwork, DBN), Hebbian learning method (Hebbian learning) , Generative adversarial networks (GAN), Self-Organizing Map Network (SOM), Mean-Shift Clustering, DBSCAN Clustering, Hierarchical Clustering of Agglomeration and Splitting, etc.
  • EM Expectation Maximization
  • Restricted Boltzmann Machine artificial neural network
  • k-means clustering k-means
  • anomaly detection method anomaly detection method
  • Auto-encoder Auto-encoder
  • DBN Deep belief network
  • Hebbian learning method Hebbian learning
  • GAN Gener
  • the above algorithms can be implemented through language packages, software, and scripts commonly used in the art, and the selection method thereof is well known to those skilled in the art.
  • the R language package is used to implement the above algorithm.
  • k-means clustering is employed to build a CTC type classifier.
  • the k-means algorithm is a cluster analysis algorithm, which is an algorithm to calculate data aggregation.
  • the algorithm steps are as follows:
  • the CTCs used in the examples can be obtained from patients who have been confirmed to have cancer, or can be obtained from ordinary subjects who are not sure whether they have cancer. If CTCs are obtained from ordinary subjects who are uncertain whether they have cancer, the aforementioned classification methods or other diagnostic methods need to be used to confirm the presence of CTC cells in their subject samples. After confirming that the subject sample is a clinical CTC sample, the CTC contained in the clinical CTC sample is classified according to the above method, and the classification is associated with the disease condition.
  • Example 1 CTC classification and pTNM staging of gastric cancer
  • Example 1 studied the correlation between the number of various types of CTCs in the preoperative CTC detection and the pTNM staging of gastric cancer, and observed which type of CTC number was most correlated with the pTNM staging of gastric cancer.
  • the Kruskal-Wallis test was used to calculate the number of all types of CTCs, the number of T1 type CTCs, the number of T2 type CTCs, the number of T3 type CTCs, the number of T4 type CTCs, and the number of T5 types. Whether there is a significant difference (P value) in the number of type CTCs and the number of T6 type CTCs in different groups (early, middle, late), if the p value is ⁇ 0.05, that is, there is a significant difference in the number of this type of CTC between the two groups.
  • P value Whether there is a significant difference (P value) in the number of type CTCs and the number of T6 type CTCs in different groups (early, middle, late), if the p value is ⁇ 0.05, that is, there is a significant difference in the number of this type of CTC between the two groups.
  • CTC ⁇ 1/5mL positive preoperative CTC
  • the Kruskal-Wallis test was used to calculate the number of all types of CTCs, the number of T1 type CTCs, the number of T2 type CTCs, the number of T3 type CTCs, the number of T4 type CTCs, and the number of T5 types. Whether there is a significant difference (P value) in the number of type CTCs and the number of T6 type CTCs in different groups (diffuse type, intestinal type, mixed type), if the p value is less than 0.05, that is, the number of CTCs of this type is significant between different populations difference.
  • the patient population (a total of 211 cases) with positive preoperative CTC detection (CTC ⁇ 1/5mL) was divided into groups.
  • the population grouping is shown in Table 4 below:
  • the above analysis methods were implemented in SPSS software.
  • T4_positive&T6_negative population (N 71) had the longest overall survival (median 825, 95% confidence interval: 796 to 854)
  • T4_negative & T6_positive population (N 21) had the shortest overall survival (median 726, 95% confidence interval: 606 to 847).
  • the patient population (a total of 229 cases) detected by postoperative CTC was grouped, and the population grouping is shown in Table 7 below.
  • the ⁇ 2 test was used to compare the gender differences between the populations, and the Wilcoxon sum test was used to compare the age and gastric cancer staging between the populations. P ⁇ 0.05 means there is a significant difference between the two groups.
  • the above analysis methods were implemented in SPSS software.
  • Figure 3 shows the progression-free survival curve of patients under the condition of independent observation of each type of CTC in postoperative CTC detection.

Abstract

An application of a morphological feature of a circulating tumor cell in clinical diagnosis and treatment of gastric cancer, in which the association between circulating tumor cell morphology and cancer, especially gastric cancer, is established, and the condition of gastric cancer is predicted or determined according to morphological classification of the circulating tumor cells.

Description

循环肿瘤细胞形态学特征在胃癌临床诊疗中的应用Application of morphological characteristics of circulating tumor cells in clinical diagnosis and treatment of gastric cancer 技术领域technical field
本发明涉及循环肿瘤细胞的形态学特征在构建在对象中预测或判断胃癌病况的***中的应用。本发明建立了CTC形态与癌症特别是胃癌的关联性,实现了根据循环肿瘤细胞的形态学分类来预测或判断胃癌的病况。The present invention relates to the use of morphological characteristics of circulating tumor cells in the construction of a system for predicting or judging the condition of gastric cancer in a subject. The invention establishes the correlation between CTC morphology and cancer, especially gastric cancer, and realizes the prediction or judgment of gastric cancer according to the morphological classification of circulating tumor cells.
背景技术Background technique
循环肿瘤细胞(circulating tumor cells,CTC)是存在于外周血中的各类肿瘤细胞的统称。通常而言,己存在的肿瘤转移或释放循环肿瘤细胞是导致继发性肿瘤形成的原因。循环肿瘤细胞由于自发或诊疗操作从实体肿瘤病灶(原发灶、转移灶)脱落,其中大部分循环肿瘤细胞在进入外周血后发生凋亡或被吞噬,少数能够逃逸并锚着,继而发展成为转移灶,即所谓恶性肿瘤的转移。恶性肿瘤的侵袭转移是患者复发转移的关键环节,往往导致肿瘤治疗失败,增加恶性肿瘤患者死亡风险,危及患者生命。因此,如果能够在循环肿瘤细胞进入外周血后及早发现,则对医疗人员早期采取有效的治疗措施具有重要意义。Circulating tumor cells (CTCs) are the general term for various types of tumor cells present in peripheral blood. Typically, pre-existing tumor metastases or the release of circulating tumor cells are responsible for the formation of secondary tumors. Circulating tumor cells fall off from solid tumor lesions (primary tumor, metastases) spontaneously or due to diagnosis and treatment operations. Most of the circulating tumor cells undergo apoptosis or phagocytosis after entering the peripheral blood, and a few can escape and anchor, and then develop into Metastases, the so-called metastasis of malignant tumors. Invasion and metastasis of malignant tumors are the key links in the recurrence and metastasis of patients, which often lead to the failure of tumor treatment, increase the risk of death of patients with malignant tumors, and endanger the lives of patients. Therefore, if the circulating tumor cells can be detected early after they enter the peripheral blood, it is of great significance for medical personnel to take effective treatment measures in the early stage.
通常可以对癌症患者进行CTC检测来提供早期诊断,对原发性或继发性癌症生长和确定进行癌症治疗。分离富集CTC只需抽取患者少量外周血,对患者没有副作用,因此可以高频度的监测,达到实时监测疾病进展的目的。更为重要的是,CTC可作为分析患者肿瘤生物学特征的实时样本,可以发现患者的实时生物学变化,并根据结果及时调整治疗方案,实现实时的个体化治疗。另外,因为患者循环血液中存在的CTC的数量、类型和特征与整体预后和对治疗的响应具有相关性,因此可进一步提供针对癌症患者的预后的有效工具。CTC testing can often be performed on cancer patients to provide early diagnosis, cancer treatment for primary or secondary cancer growth and confirmation. Separation and enrichment of CTCs only requires a small amount of peripheral blood drawn from patients, which has no side effects on patients. Therefore, high-frequency monitoring can be performed to achieve the purpose of real-time monitoring of disease progression. More importantly, CTC can be used as a real-time sample for analyzing the biological characteristics of a patient's tumor, which can discover the real-time biological changes of the patient, and adjust the treatment plan in time according to the results to achieve real-time individualized treatment. In addition, since the number, type and characteristics of CTCs present in a patient's circulating blood correlates with overall prognosis and response to therapy, it may further provide an effective tool for prognosis of cancer patients.
但是,目前并没有相关文献记载根据CTC形态进行癌症病况和肿瘤发展情况的预测或判断。现有的技术往往从CTC相关序列(包括但不限于CTC自身核酸序列、CTC所表达的蛋白质序列等)出发,从患者分离的生物样品中检测和/ 或定量上述序列,以诊断、检测或监测肿瘤疾病。However, there is currently no relevant literature documenting the prediction or judgment of cancer condition and tumor development based on CTC morphology. Existing technology often starts from CTC-related sequences (including but not limited to CTC self-nucleic acid sequences, CTC expressed protein sequences, etc.), and detects and/or quantifies the above-mentioned sequences from biological samples isolated from patients to diagnose, detect or monitor. tumor disease.
CN103597354A公开了一种用于预测和改善胃癌患者存活的方法,其中尝试预测具有早期胃癌的受试者在肿瘤手术后的术后存活。该方法依赖于检测获自受试者的癌细胞中信号转导蛋白分析物的特定组合的活化状态或水平,所述癌细胞选自原发性肿瘤细胞、循环肿瘤细胞(CTC)、腹水肿瘤细胞(ATC)及其组合。但该方法并不是从细胞的形态出发进行检测。CN103597354A discloses a method for predicting and improving the survival of gastric cancer patients, wherein an attempt is made to predict the postoperative survival of subjects with early gastric cancer after tumor surgery. The method relies on detecting the activation state or level of a particular combination of signal transduction protein analytes in cancer cells obtained from a subject selected from the group consisting of primary tumor cells, circulating tumor cells (CTCs), ascites tumors Cells (ATC) and combinations thereof. But this method is not based on the morphology of cells.
发明内容SUMMARY OF THE INVENTION
本发明人通过对循环肿瘤细胞进行形态学分类,建立了CTC形态与癌症特别是胃癌的关联性,结果发现CTC形态与癌症病况具有相关性,从而实现了根据CTC的形态学分类预测或判断癌症的病况。The inventors established the correlation between CTC morphology and cancer, especially gastric cancer by morphological classification of circulating tumor cells, and found that CTC morphology was correlated with cancer conditions, thus realizing the prediction or judgment of cancer according to the morphological classification of CTCs 's condition.
本发明是基于上述发现的发明,因此,本发明的一个方面涉及一种循环肿瘤细胞形态学特征在构建在对象中预测或判断癌症病况的***中的应用。。The present invention is based on the above-mentioned findings, and therefore, one aspect of the present invention relates to the use of a circulating tumor cell morphological signature in the construction of a system for predicting or judging a cancer condition in a subject. .
在一些实施方式中,所述预测或判断针对罹患胃癌或具有患胃癌风险或曾罹患胃癌但已治愈的对象进行。In some embodiments, the prediction or determination is made for a subject who has or is at risk of developing gastric cancer or who has had gastric cancer but has been cured.
在一些实施方式中,所述循环肿瘤细胞的形态学特征通过免疫荧光染色获得。In some embodiments, the morphological characteristics of the circulating tumor cells are obtained by immunofluorescence staining.
在一些实施方式中,所述胃癌病况包括胃癌的分型、胃癌的pTNM分期、胃癌整体生存期、胃癌无进展生存期中的一种或多种。In some embodiments, the gastric cancer condition comprises one or more of gastric cancer classification, gastric cancer pTNM staging, gastric cancer overall survival, gastric cancer progression-free survival.
在一些实施方式中,所述***包括分类模块,将利用免疫荧光染色获得的循环肿瘤细胞的形态学特征输入分类模块,输出与循环肿瘤细胞所属对象的胃癌病况预测结果。In some embodiments, the system includes a classification module, which inputs the morphological features of circulating tumor cells obtained by immunofluorescence staining into the classification module, and outputs the prediction result of the gastric cancer condition of the subject to which the circulating tumor cells belong.
在一些实施方式中,所述分类模块中包含通过机器学习构建的分类器,所述分类器通过循环肿瘤细胞的形态学特征、对循环肿瘤细胞进行分类,确定循环肿瘤细胞的分类与对象的胃癌病况的相关关系。In some embodiments, the classification module includes a classifier constructed by machine learning, and the classifier classifies the circulating tumor cells according to the morphological features of the circulating tumor cells to determine whether the classification of the circulating tumor cells is related to the gastric cancer of the subject. disease-related relationship.
在一些实施方式中,所述分类器采用k-均值聚类算法。In some embodiments, the classifier employs a k-means clustering algorithm.
在一些实施方式中,所述形态学特征表征细胞核大小、细胞核形态、细胞膜和/或浆的大小、细胞膜和/或浆的形态、循环肿瘤细胞标志物表达量、循环肿瘤细胞标志物在细胞膜和/或浆的分布中的一种或多种。In some embodiments, the morphological characteristics characterize the size of the nucleus, the morphology of the nucleus, the size of the cell membrane and/or the plasma, the morphology of the cell membrane and/or the plasma, the expression of circulating tumor cell markers, the presence of circulating tumor cell markers in the cell membrane and One or more of the distribution of/or pulp.
在一些实施方式中,在将所述特征输入分类器前,进一步对获得表征循环肿瘤细胞形态的特征的参数进行筛选和/或主成分分析。In some embodiments, prior to inputting the features into the classifier, further screening and/or principal component analysis is performed on parameters obtained to characterize the morphology of circulating tumor cells.
在一些实施方式中,使用如权利要求1所述的循环肿瘤细胞的形态学特征。In some embodiments, the morphological characteristics of circulating tumor cells of claim 1 are used.
附图说明Description of drawings
图1示出了术前CTC检测中,独立观察各类型CTC情况下,患者的整体生存期曲线。。Figure 1 shows the overall survival curve of patients under the condition of independent observation of each type of CTC in the preoperative CTC detection. .
图2示出了术前CTC检测中,综合T4类型和T6类型的CTC情况下,患者的整体生存期曲线。Figure 2 shows the overall survival curve of patients in the case of a combination of T4 and T6 CTCs in the preoperative CTC detection.
图3示出了术后CTC检测中,独立观察各类型CTC情况下,患者的无进展生存期曲线。Figure 3 shows the progression-free survival curve of patients under the condition of independent observation of each type of CTC in postoperative CTC detection.
图4示出了术后CTC检测中,综合T3类型和T6类型的CTC情况下,患者的无进展生存期曲线。Figure 4 shows the progression-free survival curve of patients in the case of combining T3 type and T6 type of CTC in postoperative CTC detection.
具体实施方式Detailed ways
本发明提供一种基于机器学习的、在形态学上进行了分类的CTC细胞在预测或判断癌症病况中的应用,以及利用基于机器学习的、在形态学上进行了分类的CTC细胞来预测或判断癌症病况的方法。The present invention provides an application of morphologically classified CTC cells based on machine learning in predicting or judging cancer conditions, and using machine learning-based morphologically classified CTC cells to predict or determine Methods of diagnosing cancer status.
在对本发明的方法和对象进行描述前,要理解的是,本发明并不限于所描述的具体组合物、方法和实验条件,因为这些组合物、方法和条件可以变化。还要理解的是,本文所用的术语仅用于描述具体实施方式的目的,并不意图进行限制,因为本发明的范围基于权利要求书所要求的范围。Before the methods and subjects of the present invention are described, it is to be understood that this invention is not limited to the particular compositions, methods and experimental conditions described, as such compositions, methods and conditions may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting, since the scope of the invention is based on the scope of the claims.
此外,说明书中和权利要求中的术语第一、第二等等用于在类似的要素之间进行区分,并且不一定用于在时间上、空间上、以排名或任何其他方式来描述序列。应该理解,如此使用的这些术语在合适情况下可以互换,并且本文描述的本发明的实施例能够以除了本文描述或说明的之外的其他序列来操作。Furthermore, the terms first, second, etc. in the specification and in the claims are used to distinguish between similar elements, and not necessarily to describe a sequence temporally, spatially, in rank, or in any other way. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
要注意,权利要求中使用的术语“包括”不应被解读为限定于其后列出的装置/手段;它并不排除其他要素或步骤。由此其解读为指定所陈述的特征、整数、步骤或组件的存在,但不排除一个或多个其他特征、整数、步骤或组件, 或其群组的存在或添加。It is to be noted that the term "comprising" used in the claims should not be read as being limited to the means/means listed thereafter; it does not exclude other elements or steps. It is thus read to specify the presence of a stated feature, integer, step or component, but not to preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof.
定义definition
如本文所用,术语“循环肿瘤细胞”或“CTC”意指表示受试者样本中存在的任何癌细胞或者癌细胞的群集。通常,CTC是指从实体肿瘤剥落的细胞。CTC常常是从具有晚期癌的患者的循环中以极低浓度存在的实体肿瘤脱落的上皮细胞。CTC也可以是来自肉瘤的间皮或者来自黑素瘤的黑素细胞。CTC也可以是源自初生、次生或第三肿瘤的细胞。CTC也可以是循环癌干细胞。CTC也可以是来自受试者样本中的任何能够指示癌症或其它病症的存在的细胞。As used herein, the term "circulating tumor cells" or "CTC" is meant to denote any cancer cell or cluster of cancer cells present in a sample from a subject. In general, CTCs refer to cells exfoliated from solid tumors. CTCs are often epithelial cells shed from solid tumors present at very low concentrations in the circulation of patients with advanced cancer. CTCs can also be mesothelial from sarcoma or melanocytes from melanoma. CTCs can also be cells derived from primary, secondary or tertiary tumors. CTCs can also be circulating cancer stem cells. CTCs can also be any cells in a sample from a subject that are indicative of the presence of cancer or other disorders.
如本文所用,术语“受试者样本”意指采集自受试者的、用于检测其中的循环肿瘤细胞存在与否的样本。虽然使用术语“循环肿瘤细胞”,但受试者样本也可以不来自循环***,即不来自血液。受试者样本可以是包含适于检测的CTC的任何样本,其来源包括全血、骨髓、胸膜液、腹膜液、中央脊髓液、乳液、尿液、泪液、汗液、唾液、器官分泌物以及支气管、鼻腔、咽喉等的冲洗液。在一个实例中,受试者样本是血液,包括例如全血或其任何部分或组分。适用于本发明的血液样本可提取自包括血细胞或其组分的任何己知来源,如静脉、动脉、外周、组织、脊髓及类似物。例如,可利用公知和常规的临床方法(例如,抽取和处理全血的程序)得到和处理获得的样本。示例性的样本可以是从癌症对象中抽取的外周血。如本文所用,“临床CTC样本”是指包含至少一个CTC的受试者样本,包括经CTC/WBC分类器判断包含至少一个CTC的受试者样本。As used herein, the term "subject sample" means a sample collected from a subject for detection of the presence or absence of circulating tumor cells therein. Although the term "circulating tumor cells" is used, the subject sample may also not originate from the circulatory system, ie not from the blood. The subject sample can be any sample containing CTCs suitable for detection from sources including whole blood, bone marrow, pleural fluid, peritoneal fluid, central spinal fluid, emulsion, urine, tears, sweat, saliva, organ secretions, and bronchi , nasal cavity, throat, etc. rinse. In one example, the subject sample is blood, including, for example, whole blood or any portion or component thereof. Blood samples suitable for use in the present invention may be extracted from any known source including blood cells or components thereof, such as veins, arteries, peripheral, tissue, spinal cord, and the like. For example, the obtained sample can be obtained and processed using well-known and conventional clinical methods (eg, procedures for drawing and processing whole blood). An exemplary sample can be peripheral blood drawn from a cancer subject. As used herein, a "clinical CTC sample" refers to a sample of a subject that contains at least one CTC, including a sample of a subject judged to contain at least one CTC by a CTC/WBC classifier.
如本文所用,术语“癌症”包括本领域众所周知的多种癌症类型,包括但不限于发育不良、增生、固体肿瘤和造血癌。许多类型的癌症己知为转移和剥落循环肿瘤细胞或者是转移的,例如产生于己经转移的初生癌的次生癌。附加癌可包括但不限于下列器官或***:脑、心脏、肺、胃肠、泌尿生殖道、肝脏、骨、神经***、妇科的、血液的、皮肤、乳腺和肾上腺。附加类型的癌细胞包括神经胶质瘤(神经鞘瘤、神经胶母细胞瘤、星细胞瘤)、成神经细胞瘤、嗜铬细胞瘤、节瘤、脑膜瘤、肾上腺皮质癌、成神经管细胞瘤、横纹骨肉癌、肾癌、各种类型的血管癌、成骨细胞癌、***癌、卵巢癌、子宫肌瘤、唾腺癌、脉络丛癌、乳腺癌、胰腺癌、结肠癌和成巨核细胞癌;以及皮肤癌包括恶性黑 素瘤、基底细胞癌、鳞状细胞癌、Karposi瘤、结构异常痣、脂肪瘤、血管瘤、皮肤纤维瘤、瘢痕瘤、肉瘤(例如纤维肉瘤或血管内皮瘤以及黑素瘤)。在本发明中,所述癌症优选胃癌。As used herein, the term "cancer" includes a variety of cancer types well known in the art including, but not limited to, dysplasia, hyperplasia, solid tumor, and hematopoietic cancer. Many types of cancer are known to metastasize and shed circulating tumor cells or are metastatic, such as secondary cancers arising from primary cancers that have metastasized. Additional cancers may include, but are not limited to, the following organs or systems: brain, heart, lung, gastrointestinal, genitourinary, liver, bone, nervous system, gynecological, hematological, skin, breast, and adrenal glands. Additional types of cancer cells include gliomas (schwannomas, glioblastomas, astrocytomas), neuroblastomas, pheochromocytomas, gangliomas, meningiomas, adrenocortical carcinomas, medulloblastomas tumor, rhabdomyosarcoma, kidney cancer, various types of vascular cancer, osteoblast cancer, prostate cancer, ovarian cancer, uterine fibroids, salivary gland cancer, choroid plexus cancer, breast cancer, pancreatic cancer, colon cancer and adult cancer Megakaryocyte carcinoma; and skin cancer including malignant melanoma, basal cell carcinoma, squamous cell carcinoma, Karposi tumor, structurally abnormal nevi, lipoma, hemangioma, dermatofibroma, keloid, sarcoma (e.g. fibrosarcoma or vascular endothelium) tumor and melanoma). In the present invention, the cancer is preferably gastric cancer.
在本发明中,“病况”是指受试者所罹患癌症后的各类信息,包括但不限于肿瘤类型、肿瘤部位、肿瘤分型、肿瘤分期、受试者的整体生存期、受试者的无进展生存期等等。In the present invention, "condition" refers to various information after a subject suffers from cancer, including but not limited to tumor type, tumor location, tumor type, tumor stage, the subject's overall survival, the subject's progression-free survival, etc.
pTNM分期是胃癌的TNM分期***中的病理学分期,其具体如F.L.格林尼D.L.佩基I.D.弗莱明A.G.弗瑞兹C.M.拜耳赤著《AJCC癌症分期手册》第七版所记载。不同部位的肿瘤均具有各自的分期标准,并根据肿瘤的不同各自分类。TNM分期中的T是指肿瘤原发灶的情况,随着肿瘤体积的增加和邻近组织受累范围的增加,依次用T1~T4来表示,并可以进一步细分。TNM分期中的N指区域***受累情况,***未受累时,用N0表示。随着***受累程度和范围的增加,依次用N1~N3表示。TNM分期中的M指远处转移,没有远处转移者用M0表示,有远处转移者用M1表示。以下是胃癌TNM分期中T、N、和M的分期标准:The pTNM staging is the pathological staging in the TNM staging system for gastric cancer, which is specifically recorded in the seventh edition of the "AJCC Cancer Staging Manual" by F.L. Greene, D.L. Peggy, I.D. Fleming, A.G. Tumors in different parts have their own staging standards, and they are classified according to different tumors. The T in TNM staging refers to the condition of the primary tumor. With the increase of tumor volume and the increase of the scope of involvement of adjacent tissues, it is represented by T1 to T4 in turn, and can be further subdivided. N in the TNM staging refers to the involvement of regional lymph nodes. When the lymph nodes are not involved, it is represented by N0. With the increase of the degree and scope of lymph node involvement, it is represented by N1-N3 in turn. M in the TNM staging refers to distant metastasis, those without distant metastasis are indicated by M0, and those with distant metastases are indicated by M1. The following are the staging standards of T, N, and M in the TNM staging of gastric cancer:
Tx:原发肿瘤无法评估Tx: Primary tumor cannot be assessed
Tis:原位癌:上皮内癌未浸润固有层Tis: Carcinoma in situ: Intraepithelial carcinoma that does not invade the lamina propria
T1:肿瘤侵及黏膜固有层,黏膜肌层或黏膜下层T1: Tumor invades the lamina propria, muscularis mucosa, or submucosa
T1a:肿瘤侵及黏膜固有层或黏膜肌层T1a: Tumor invades the lamina propria or muscularis mucosae
T1b:肿瘤侵及黏膜下层T1b: Tumor invades the submucosa
T2:肿瘤侵及固有肌层T2: Tumor invades the muscularis propria
T3:肿瘤穿透浆膜下***,未侵及腹膜或邻近结构T3: Tumor penetrates subserosal connective tissue without invading the peritoneum or adjacent structures
T4:侵及浆膜或邻近结构T4: Invasion of serosa or adjacent structures
T4a:肿瘤侵透浆膜T4a: Tumor penetrates the serosa
T4b:肿瘤侵及邻近器官T4b: Tumor invades adjacent organs
Nx:区域LN无法评估Nx: Regional LN cannot be assessed
N0:无区域LN转移N0: No regional LN transfer
N1:1-2个***转移N1: 1-2 lymph node metastases
N2:3-6个***转移N2: 3-6 lymph node metastases
N3a:7-15个***转移N3a: 7-15 lymph node metastases
N3b:等于或多于16个***转移N3b: Equal to or more than 16 lymph node metastases
MO:无远处转移MO: No distant metastasis
M1:远处转移M1: distant metastasis
肿瘤穿透固有肌层,进入胃结肠或肝胃韧带,或进入大小网膜,但没有穿透覆盖这些结构的脏层腹膜,这种情况应分为T3。如果穿透覆盖这些结构的脏层腹膜就应分为T4。胃的上述邻近结构包括脾、横结肠、肝、膈、胰腺、腹壁、肾上腺、肾、小肠、腹膜后。肿瘤由壁内延伸至十二指肠或食管,由包括胃在内的浸润最深部位决定T分期。Tumors that penetrate the muscularis propria, enter the gastrocolic or hepatogastric ligament, or enter the omentum, but do not penetrate the visceral peritoneum covering these structures, should be classified as T3. If the visceral peritoneum covering these structures is penetrated, it should be classified as T4. The aforementioned adjacent structures of the stomach include the spleen, transverse colon, liver, diaphragm, pancreas, abdominal wall, adrenal glands, kidneys, small intestine, and retroperitoneum. The tumor extends intramurally to the duodenum or esophagus, and T stage is determined by the deepest site of invasion, including the stomach.
用TNM三个指标的组合划出特定的分期。下表1示出了一个典型的分期表。A specific stage is delineated using a combination of the three TNM indicators. Table 1 below shows a typical staging schedule.
Figure PCTCN2021135426-appb-000001
Figure PCTCN2021135426-appb-000001
表1Table 1
1965年Lauren根据胃癌的组织结构和生物学行为,将胃癌分为肠型和弥漫型。肠型胃癌起源于肠化生黏膜,一般具有明显的腺管结构,瘤细胞呈柱状或立方形,可见刷状缘,瘤细胞分泌酸性黏液物质,类似于肠癌的结构;常伴有 萎缩性胃炎和肠化生,多见于老年男性,病程较长,发病率较高,预后较好。弥漫型胃癌起源于胃固有黏膜,癌细胞分化较差,呈弥漫性生长,缺乏细胞连接,一般不形成腺管,许多低分化腺癌和印戒细胞癌属于此型;多见于年轻女性,易出现***转移和远处转移,预后较差。Henson等在美国的调查显示,肠型胃癌的发病率在美国男性、女性、非裔和白人中均呈现下降趋势,而弥漫型胃癌在同等人群中却呈上升趋势,发病率从1978年的0.3/100,000人增加至2000年的1.8/100,000人,其中以印戒细胞癌的增加最为明显。还有研究表明,部分弥漫型胃癌有家族聚集和遗传性,家系连锁研究发现CDH1基因胚系突变是其发病原因。Lauren分型不仅反映肿瘤的生物学行为,而且体现其病因、发病机理和流行特征。该分型的另一优点是可以利用胃镜下活检组织进行胃癌分型,指导手术治疗。Lauren分型简明有效,常被西方国家采用。但有10%~20%的病例兼有肠型和弥漫型的特征,难以归入其中任何一种,从而称为混合型。In 1965, Lauren classified gastric cancer into intestinal type and diffuse type according to the histological structure and biological behavior of gastric cancer. Intestinal-type gastric cancer originates from the intestinal metaplasia mucosa, generally has obvious glandular structure, the tumor cells are columnar or cuboid, with a brush border, and the tumor cells secrete acidic mucous substances, similar to the structure of intestinal cancer; often accompanied by atrophic Gastritis and intestinal metaplasia are more common in elderly men, with a longer course of disease, higher incidence and better prognosis. Diffuse-type gastric cancer originates from the intrinsic gastric mucosa. Cancer cells are poorly differentiated, grow diffusely, lack cell connections, and generally do not form glandular ducts. Many poorly differentiated adenocarcinomas and signet ring cell carcinomas belong to this type. Lymph node metastasis and distant metastasis occur, and the prognosis is poor. A survey by Henson et al. in the United States showed that the incidence of intestinal-type gastric cancer showed a downward trend in American men, women, African Americans and whites, while diffuse-type gastric cancer showed an upward trend in the same population. The incidence rate increased from 0.3 in 1978. /100,000 increased to 1.8/100,000 in 2000, with signet ring cell carcinoma increasing the most. Other studies have shown that some diffuse gastric cancers have familial aggregation and inheritance, and family linkage studies have found that the germline mutation of the CDH1 gene is the cause of its pathogenesis. Lauren classification not only reflects the biological behavior of tumors, but also reflects its etiology, pathogenesis and epidemic characteristics. Another advantage of this classification is that it can use gastroscopic biopsy tissue for gastric cancer classification to guide surgical treatment. Lauren classification is concise and effective, and is often used in Western countries. However, 10% to 20% of cases have both intestinal and diffuse features, and it is difficult to classify them into any of them, so it is called mixed type.
术语“整体生存期”包括描述在诊断为患病(如癌症)或治疗疾病后患者的临床生存时间。The term "overall survival" includes a description of the clinical survival time of a patient following diagnosis of a disease (eg, cancer) or treatment of the disease.
术语“无进展生存”包括特定疾病(例如,癌症)治疗期间和治疗后的时间的长度,其间患者在具有疾病但没有疾病的其他症状的情况下存活。The term "progression-free survival" includes the length of time during and after treatment for a particular disease (eg, cancer) during which a patient survives with the disease but without other symptoms of the disease.
如本文所用,“分类器”是指特定的算法与数据处理方法、参数的组合。只要特定的算法与数据处理方法、参数中的任一个存在种类、数值上的不同,就视为不同的分类器。所述数据处理是指为了消除数据本身对分类结果造成的影响而对数据进行的处理。数据处理方法包括但不限于数据中心化、数据归一化(标准化)、针对不平衡数据的前处理。As used herein, "classifier" refers to a specific algorithm and data processing method, combination of parameters. As long as a specific algorithm, a data processing method, or any one of the parameters are different in type and value, they are regarded as different classifiers. The data processing refers to the processing performed on the data in order to eliminate the influence of the data itself on the classification result. Data processing methods include, but are not limited to, data centralization, data normalization (normalization), and preprocessing for unbalanced data.
在本发明中,使用了两个分类器:循环肿瘤细胞/白细胞分类器(以下也称为CTC/WBC分类器)和循环肿瘤细胞类型分类器(以下也称为CTC类型分类器)。CTC/WBC分类器被用于区分CTC和WBC,而CTC类型分类器被用于将不同种类的CTC区分开。因此,可以先使用CTC/WBC分类器,找到CTC,然后使用CTC类型分类器,将找到的CTC分为多个种类。In the present invention, two classifiers are used: a circulating tumor cell/leukocyte classifier (hereinafter also referred to as CTC/WBC classifier) and a circulating tumor cell type classifier (hereinafter also referred to as CTC type classifier). The CTC/WBC classifier was used to distinguish CTCs from WBCs, while the CTC type classifier was used to distinguish different kinds of CTCs. Therefore, the CTC/WBC classifier can be used first to find the CTCs, and then the CTC type classifier can be used to classify the found CTCs into multiple categories.
方法method
本发明通过将CTC形态学分类与癌症、尤其是胃癌的病况相关联,为预测或判断癌症的分期、分型、患者的整体生存期、患者的无进展生存期提供了依据。The present invention provides a basis for predicting or judging cancer stages, types, overall survival of patients, and progression-free survival of patients by correlating CTC morphological classification with the condition of cancer, especially gastric cancer.
本发明的CTC可以获自已经确定罹患癌症的患者,也可以获自不确定是否罹患癌症的普通受试者。如果CTC获自不确定是否罹患癌症的普通受试者,则需要首先确认其受试者样本中是否存在CTC细胞。如上所述,我们将罹患癌症的患者的受试者样本称为“临床CTC样本”,即其中包含CTC。The CTCs of the present invention can be obtained from patients who have been confirmed to have cancer, and can also be obtained from ordinary subjects who are not sure whether they have cancer. If CTCs are obtained from ordinary subjects who are uncertain whether they have cancer, it is necessary to first confirm the presence of CTC cells in the subject's sample. As mentioned above, we refer to subject samples from patients with cancer as "clinical CTC samples", i.e. they contain CTCs.
整个关联流程分为以下步骤:The entire association process is divided into the following steps:
(1)确定受试者样本是否为临床CTC样本;(1) Determine whether the subject sample is a clinical CTC sample;
(2)对临床CTC样本中的CTC进行分类;(2) Classify CTCs in clinical CTC samples;
(3)建立CTC分类情况与癌症的各种病况之间的相关性。(3) To establish the correlation between CTC classification and various conditions of cancer.
以下对各个步骤分别进行说明。Each step will be described below.
确定受试者样本是否为临床CTC样本Determine whether the subject sample is a clinical CTC sample
从受试者中采集受试者样本后,将样本制作为荧光图像,然后对原始荧光图像进行图像识别,以判断受试者样本是否包含CTC。After a subject sample is collected from the subject, the sample is made into a fluorescent image, and then image recognition is performed on the original fluorescent image to determine whether the subject sample contains CTCs.
〔荧光图像的制作〕[Creation of Fluorescence Image]
从受试者中采集受试者样本后,对其进行前处理、富集和染色,制作原始荧光图像。After subject samples are collected from subjects, they are preprocessed, enriched, and stained to produce raw fluorescent images.
将外周血液与红细胞特异性抗体和白细胞特异性抗体组合进行孵育,使全血样本中红细胞和白细胞耦联在一起,再通过密度梯度离心方法使血液中细胞根据自己的密度达到分离分层的目的,经过梯密度离心的血液样本会分成4层,如图1所示:从上往下分别为血浆、单核细胞、密度梯度离心液以及红细胞和白细胞;CTC做为单核细胞会处于单核细胞层,将单核细胞层提取出来,达到血液中CTC的富集目的。The peripheral blood is incubated with a combination of erythrocyte-specific antibodies and leukocyte-specific antibodies to couple the erythrocytes and leukocytes in the whole blood sample, and then the cells in the blood can be separated and stratified according to their own density by density gradient centrifugation. , the blood sample after gradient density centrifugation will be divided into 4 layers, as shown in Figure 1: from top to bottom are plasma, monocytes, density gradient centrifuge, red blood cells and white blood cells; CTCs, as monocytes, will be in monocytes The cell layer is extracted from the mononuclear cell layer to achieve the enrichment of CTCs in the blood.
单核细胞层中包含CTC和与CTC偶联的白细胞(WBC)。单核细胞层被提取出来后,进行清洗,然后进行后续的免疫荧光染色流程,包括固定、透化、荧光抗体染色等一系列流程,在染色流程中使用的荧光抗体的实例包括对细胞核进行染色的染料DAPI(DAPI通道)、特异性识别CTC上EpCAM/CK表位的染色 剂TRITC(TRITC通道)、特异性识别WBC上CD45表位的染色剂CY5(CY5通道)。DAPI是一种蓝色荧光DNA染色剂,与dsDNA的AT区结合后,荧光增强约20倍。它被紫罗兰色(405nm)激光线激发,通常用作荧光显微镜,流式细胞仪和染色体染色中的核复染。TRITC是罗丹明染料的高性能衍生物,经活化可轻松可靠地标记用作荧光探针的抗体,蛋白质和其他分子。Cy5是一种明亮的,远红色荧光染料,具有激发光,非常适合633nm或647nm激光线;用于标记蛋白质和核酸偶联物。荧光抗体染色可使用的染料不限于以上,也可以使用FITC、RB200、PE、EpCAM、CK、CD45等常规使用的各种用于免疫荧光染色的染料。本领域技术人员可以根据本领域的常识选用合适的免疫荧光染料和染色方法。The monocyte layer contains CTCs and leukocytes (WBCs) coupled to the CTCs. After the mononuclear cell layer is extracted, it is washed and then subjected to the subsequent immunofluorescence staining process, including a series of processes such as fixation, permeabilization, and fluorescent antibody staining. Examples of fluorescent antibodies used in the staining process include staining cell nuclei. The dye DAPI (DAPI channel), the dye TRITC (TRITC channel) that specifically recognizes the EpCAM/CK epitope on CTC, and the dye CY5 (CY5 channel) that specifically recognizes the CD45 epitope on WBC. DAPI is a blue fluorescent DNA stain that binds to the AT region of dsDNA to increase fluorescence approximately 20-fold. It is excited by a violet (405nm) laser line and is commonly used as nuclear counterstain in fluorescence microscopy, flow cytometry and chromosomal staining. TRITC is a high-performance derivative of rhodamine dye, activated to easily and reliably label antibodies, proteins and other molecules used as fluorescent probes. Cy5 is a bright, far-red fluorescent dye with excitation light ideal for 633nm or 647nm laser lines; used to label protein and nucleic acid conjugates. The dyes that can be used for fluorescent antibody staining are not limited to the above, and various conventionally used dyes for immunofluorescence staining such as FITC, RB200, PE, EpCAM, CK, and CD45 can also be used. Those skilled in the art can select appropriate immunofluorescent dyes and staining methods according to common knowledge in the art.
在免疫荧光染色流程后,最后剩余约200ul的样本,样本中会包含约0~100个CTC、白细胞、血小板以及杂质等。在一个实例中,白细胞的数量为0~200个,例如为50~150个,75~125个,85~115个,95~110个,100~105个,103~104个。经过上述荧光抗体染色后,CTC的判定标准是DAPI+且TRITC+且CY5-,WBC的判定标准是DAPI+且CY5+;其中“+”指有荧光信号,“-”指没有荧光信号。After the immunofluorescence staining process, there will be about 200ul of samples remaining, which will contain about 0 to 100 CTCs, leukocytes, platelets, and impurities. In one example, the number of leukocytes is 0-200, such as 50-150, 75-125, 85-115, 95-110, 100-105, 103-104. After the above-mentioned fluorescent antibody staining, the criterion for CTC is DAPI+ and TRITC+ and CY5-, and the criterion for WBC is DAPI+ and CY5+; "+" means there is a fluorescent signal, and "-" means no fluorescent signal.
染好色的样本会转入96孔板中的其中一个孔内,然后进行扫描,扫描可以采用本领域常用的设备,例如商业化扫描仪ThermoFisher CX5。在使用ThermoFisher CX5的情况下,由于孔底的面积可能大于单次扫描仪拍照面积,所以可以将扫描仪置于自动移动载物台进行拍照,通过对不同位置进行拍照,然后再进行后期拼接,实现对单个孔内的样本完整扫描;由于是免疫荧光染色样本,因此可以在每个区域拍照时切换不同荧光通道进行拍照,再进行后期叠加。The stained sample is transferred to one of the wells of a 96-well plate, and then scanned, using equipment commonly used in the field, such as a commercial scanner ThermoFisher CX5. In the case of using ThermoFisher CX5, since the area of the hole bottom may be larger than that of a single scanner, the scanner can be placed on the automatic moving stage to take pictures. Realize the complete scanning of the sample in a single well; because it is an immunofluorescence stained sample, it is possible to switch different fluorescence channels to take pictures when taking pictures in each area, and then superimpose them later.
染好色的样本通过扫描后会生成多组图像覆盖整个样本区域,每组图像包含来自各个荧光通道的图像,即为本发明中输入图像前处理模块的临床样本的原始荧光图像。在一个实例中,每组图像包含来自DAPI、TRITC、CY5三个荧光通道的图像。在一个实例中,染好色的样本生成169组图像。After the dyed sample is scanned, multiple sets of images will be generated to cover the entire sample area, and each set of images includes images from each fluorescence channel, which is the original fluorescence image of the clinical sample input to the image preprocessing module of the present invention. In one example, each set of images contains images from three fluorescence channels, DAPI, TRITC, and CY5. In one example, the tinted samples generated 169 sets of images.
〔图像识别〕〔Image Identification〕
将整个样本区域的图像用于图像识别,其包括以下步骤:The image of the entire sample area is used for image recognition, which includes the following steps:
步骤一、输入受试者样本的原始荧光图像;所述原始荧光图像是将受试者样本如上进行前处理、富集和荧光染色后扫描获得的原始荧光图像。Step 1: Input the original fluorescence image of the subject sample; the original fluorescence image is the original fluorescence image obtained by scanning the subject sample after preprocessing, enrichment and fluorescence staining as above.
步骤二、对所述受试者样本的原始荧光图像进行图像前处理;图像前处理包括步骤:(1)图像修正,修正不均一光照强度导致的不均一的图像信号和背景;(2)识别首要目标,识别在被设定为首要目标的通道中有信号的目标,并将图片切割成以该目标为中心的图像,即关于首要目标的单个细胞的图像;首要目标可以是一个或多个;在一个实例中,首要目标是DAPI通道;(3)识别次级目标;在识别出首要目标有信号的基础上,识别次级目标,并分别将图片切割成以该目标为中心的图像,即关于次级目标的单个细胞的图像;次级目标可以是一个或多个,以同时进行识别;在首要目标和次级目标以外,还可以存在一个或多个再次级目标,以及优先度低于再次级目标的其他目标。即,所有目标被按照识别优先度分为多个等级,每个等级中具有一个或多个目标,在同一个等级的目标同时识别的前提下,按照优先度的先后,依次识别所有目标;所有目标数量的总数为染色通道的数量;(4)计算各类特征参数,分别计算识别到的首要目标和次级目标的形态学参数,各通道荧光信号强度参数;(5)数据导出和保存,导出并保存各类形态学参数的数据,以及关于首要目标和次级目标的单个细胞图像;在一个实例中,首要目标是DAPI通道,次要目标是TRITC通道和CY5通道;在另一个实例中,首要目标是TRITC通道,次要目标是DAPI通道和CY5通道;在又一个实例中,首要目标是CY5通道,次要目标是TRITC通道和DAPI通道;前处理可以通过常规的细胞图像分析软件进行,包括但不限于Cellprofiler、Celleste、CMIS。在一个实例中,前处理通过Cellprofiler进行。Step 2: Perform image preprocessing on the original fluorescent image of the subject sample; the image preprocessing includes steps: (1) image correction, correcting uneven image signals and backgrounds caused by uneven illumination intensity; (2) identifying Primary target, identifies a target that has a signal in the channel set as the primary target, and cuts the picture into an image centered on that target, i.e. an image of a single cell about the primary target; the primary target can be one or more ; In an example, the primary target is the DAPI channel; (3) Identify the secondary target; on the basis of identifying the primary target with a signal, identify the secondary target, and cut the picture into images centered on the target, respectively, That is, an image of a single cell about a secondary target; the secondary target can be one or more for simultaneous identification; in addition to the primary target and secondary target, there can also be one or more secondary targets, and low priority other targets on the secondary target. That is, all the targets are divided into multiple levels according to the recognition priority, and each level has one or more targets. On the premise that the targets of the same level are recognized at the same time, all the targets are recognized in sequence according to the priority; The total number of targets is the number of staining channels; (4) Calculate various characteristic parameters, respectively calculate the morphological parameters of the identified primary targets and secondary targets, and the fluorescence signal intensity parameters of each channel; (5) Data export and save, Export and save data for various morphological parameters, as well as individual cell images for primary and secondary targets; in one instance, the primary target is the DAPI channel, and the secondary targets are the TRITC channel and CY5 channel; in another instance , the primary target is the TRITC channel, the secondary target is the DAPI channel and the CY5 channel; in another example, the primary target is the CY5 channel, and the secondary target is the TRITC channel and the DAPI channel; preprocessing can be performed by conventional cell image analysis software , including but not limited to Cellprofiler, Celleste, CMIS. In one example, preprocessing is performed by Cellprofiler.
步骤三、输出受试者样本中单个细胞的图像和特征参数;所述单个细胞包括CTC和WBC;分别输出关于首要目标、次级目标的单个细胞图像和特征参数;所述计算各类特征参数,包括但不限于计算首要目标和次级目标的形态学参数、各通道荧光信号强度;所述形态学参数包括但不限于细胞本身和细胞核等细胞器官的大小·形状(Area&Shape)、信号强度(Intensity)、表面结构(Texture)、相关性(Correlation)等可表征细胞形态的参数。另外,形态学参数还包括以上参数相互之间的关系,例如荧光信号在细胞中的分布等。Step 3: Output the image and characteristic parameters of a single cell in the subject sample; the single cell includes CTC and WBC; output the single cell image and characteristic parameters of the primary target and the secondary target respectively; the calculation of various characteristic parameters , including but not limited to calculating the morphological parameters of the primary target and the secondary target, the fluorescence signal intensity of each channel; the morphological parameters include but are not limited to the size and shape (Area&Shape), signal intensity ( Intensity), surface structure (Texture), correlation (Correlation) and other parameters that can characterize cell morphology. In addition, morphological parameters also include the relationship between the above parameters, such as the distribution of fluorescent signals in cells.
〔进行CTC/WBC分类〕[CTC/WBC classification is performed]
基于图像识别获得的细胞形态参数,采用CTC/WBC分类器进行自动化判读(分类),判断受试者样本中是否含有CTC,并将含有至少一个CTC的受试者样本判断为临床CTC样本。Based on the cell morphological parameters obtained by image recognition, the CTC/WBC classifier is used for automatic interpretation (classification) to determine whether the subject sample contains CTC, and the subject sample containing at least one CTC is judged as a clinical CTC sample.
CTC/WBC分类器可以选择使用现有的分类器,也可以选择使用根据受试者样本和临床CTC样本的特征而进行了优化的分类器。在使用根据受试者样本和/或临床CTC样本的特征而进行了优化的分类器的情况下,可以使用基于已有的受试者样本和/或临床CTC样本而建立的分类器,也可以使用基于临时采集/获得的受试者样本和/或临床CTC样本的一部分进行训练而得的分类器。The CTC/WBC classifier can choose to use an existing classifier, or choose to use a classifier optimized according to the characteristics of the subject samples and clinical CTC samples. In the case of using a classifier optimized according to the characteristics of subject samples and/or clinical CTC samples, a classifier established based on existing subject samples and/or clinical CTC samples may be used, or Use classifiers trained based on ad hoc collected/obtained subject samples and/or fractions of clinical CTC samples.
本发明的技术方案中,CTC/WBC分类器的建立可根据本领域通常建立有监督机器学习分类器的方法进行,其依次包括建立训练集、建立候选CTC/WBC分类器、优化候选CTC/WBC分类器、选择最佳CTC/WBC分类器。In the technical solution of the present invention, the establishment of the CTC/WBC classifier can be performed according to the method of establishing a supervised machine learning classifier in the art, which sequentially includes establishing a training set, establishing a candidate CTC/WBC classifier, and optimizing the candidate CTC/WBC. Classifier, choose the best CTC/WBC classifier.
[建立训练集][build training set]
训练集是用于训练CTC/WBC分类器的数据集。在本发明中,训练集包括CTC的数据和WBC的数据。每一条数据即为一个细胞,一条数据包括多个维度,即前述的各类特征参数。并且每一条数据还包括一个标签,该标签表示了该数据(细胞)是CTC还是WBC。The training set is the dataset used to train the CTC/WBC classifier. In the present invention, the training set includes data of CTC and data of WBC. Each piece of data is a cell, and a piece of data includes multiple dimensions, that is, the aforementioned various characteristic parameters. And each piece of data also includes a label, which indicates whether the data (cell) is CTC or WBC.
标签的来源可以是已有的细胞形态数据库、免疫荧光数据库以及现有的论文文献等,也可以来源于对前述的步骤一~步骤三的单个细胞的图像和各类特征参数进行人工判断、标注而得的标签。在对前述的单个细胞的图像和各类特征参数进行人工判断、标注时,由于在这一阶段中的细胞只有CTC和WBC两种,因此判断、标注的结果只有“为CTC”和“非CTC”(即“为WBC”)。在标签来源于已有的细胞形态数据库、免疫荧光数据库以及现有的论文文献等时,直接选择被标注为CTC和WBC的细胞的数据,加以使用。本领域技术人员知晓如何从已有的数据库和资料文献中选择合适的数据。The source of the label can be the existing cell morphology database, immunofluorescence database and existing papers and literature, etc., or it can also be derived from the manual judgment and labeling of the images and various characteristic parameters of the single cell in the aforementioned steps 1 to 3. the resulting label. When manually judging and labeling the aforementioned single cell images and various characteristic parameters, since there are only two types of cells in this stage, CTC and WBC, the results of judgment and labeling are only "for CTC" and "non-CTC". ” (i.e. “for WBC”). When the labels are derived from existing cell morphology databases, immunofluorescence databases, and existing papers, etc., the data of cells marked as CTC and WBC are directly selected and used. Those skilled in the art know how to select suitable data from existing databases and literature.
将标注了“为CTC”和“非CTC”的数据分为作为有监督机器学习中的 正数据集和负数据集,用于之后的步骤。The data marked "for CTC" and "non-CTC" are divided into positive and negative datasets as supervised machine learning for subsequent steps.
[选取特征参数][Select Feature Parameters]
在单个细胞的图像和各类特征参数中,不同的特征参数指示不同的生物学意义。有一些特征参数可以更有效地表征细胞是否CTC,而有一些特征参数与细胞是否为CTC相关性较小。因此,筛选更能够表征细胞CTC的特征参数并赋予合适的权重将有助于提高CTC/WBC分类器的泛化性能,同时能够有效减小冗余的计算量。In the image of a single cell and various characteristic parameters, different characteristic parameters indicate different biological meanings. There are some characteristic parameters that can more effectively characterize whether a cell is a CTC or not, while some characteristic parameters are less relevant to whether a cell is a CTC or not. Therefore, screening the characteristic parameters that can better characterize cellular CTCs and assigning appropriate weights will help to improve the generalization performance of the CTC/WBC classifier, and at the same time, can effectively reduce the redundant computation.
筛选最优特征参数集合的流程包括:The process of screening the optimal feature parameter set includes:
(1)数据中心化和归一化;(1) Data centralization and normalization;
采用R语言包的scale()函数进行数据中心化和归一化。具体而言,针对每一个特征参数的数据集,进行如下处理:The scale() function of the R language package is used for data centering and normalization. Specifically, for the dataset of each feature parameter, the following processing is performed:
Figure PCTCN2021135426-appb-000002
Figure PCTCN2021135426-appb-000002
其中,
Figure PCTCN2021135426-appb-000003
in,
Figure PCTCN2021135426-appb-000003
(2)基于每个特征参数的散点图,手动筛选能显著区分两种类别细胞的特征参数;(2) Based on the scatter plot of each feature parameter, manually screen the feature parameters that can significantly distinguish the two types of cells;
(3)剔除高度相关的特征参数;(3) Eliminate highly correlated feature parameters;
一些形态学参数相关性高,重复使用则将导致其重要性(权重)被不合理地增加。例如,细胞核的大小本身一定程度地会影响到细胞的整体大小,但如果同时采用细胞核的大小和细胞的整体大小这两个参数,则将事实上加重细胞核的大小这一特征参数的权重。本发明采用Pearson相关系数来剔除相关性高的形态学参数,该系数广泛用于度量两个变量之间的线性相关程度。Some morphological parameters are highly correlated, and repeated use will cause their importance (weight) to be unreasonably increased. For example, the size of the nucleus itself will affect the overall size of the cell to a certain extent, but if the two parameters of the size of the nucleus and the overall size of the cell are used at the same time, the weight of the characteristic parameter of the size of the nucleus will actually be increased. The present invention adopts the Pearson correlation coefficient to eliminate morphological parameters with high correlation, and the coefficient is widely used to measure the degree of linear correlation between two variables.
Pearson相关系数绝对值|r xy|: The absolute value of the Pearson correlation coefficient |r xy |:
Figure PCTCN2021135426-appb-000004
Figure PCTCN2021135426-appb-000004
其中,n是样本量,x i、y i是第i个样本的数值,
Figure PCTCN2021135426-appb-000005
分别是数据集x、y的平均值;
Among them, n is the sample size, x i , y i are the values of the ith sample,
Figure PCTCN2021135426-appb-000005
are the mean values of the datasets x and y, respectively;
采用|r xy|来衡量任意2个形态学参数对应的数据集合是否线性相关,如果|r xy|>0.75,则认为这两个形态学参数相关性较高,并剔除其中一个参数。0.75这个阈值是本领域的通用阈值,一般认为0.75以上属于较强相关性(strong correlation),0.45-0.75属于中等(moderate),低于0.45属于弱相关(weak correlation)。但也可以根据需要调高或调低阈值。上述计算内容均在R语言中实现,具体地,使用cor()和findCorrelation()函数实现。 Use |r xy | to measure whether the data sets corresponding to any two morphological parameters are linearly correlated. If |r xy |>0.75, it is considered that these two morphological parameters are highly correlated, and one of the parameters is eliminated. The threshold of 0.75 is a common threshold in the field. It is generally considered that more than 0.75 is a strong correlation, 0.45-0.75 is moderate, and less than 0.45 is a weak correlation. But the threshold can also be adjusted up or down as needed. The above calculation contents are all implemented in R language, specifically, using cor() and findCorrelation() functions.
(4)计算特征参数重要性(RFE)(即权重),并最终确认用于模型建立的特征参数集合;(4) Calculate the feature parameter importance (RFE) (ie weight), and finally confirm the feature parameter set used for model building;
特征参数重要性(RFE)的计算方式为每轮迭代中,选取不同特征子集,进行模型训练并评估模型,通过计算其决策系数之和,最终得到不同特征的重要程度,然后保留最佳的特征组合。The calculation method of feature parameter importance (RFE) is to select different feature subsets in each round of iteration, conduct model training and evaluate the model, and finally obtain the importance of different features by calculating the sum of their decision coefficients, and then retain the best one. Feature combination.
经过上述步骤将不重要的特征参数剔除掉,获得一个新的训练集。新的训练集所包括的数据的量(训练集中CTC和WBC的数量不变),但每一条数据中包括的特征参数的数量减少(即降维)。After the above steps, the unimportant feature parameters are eliminated to obtain a new training set. The amount of data included in the new training set (the number of CTCs and WBCs in the training set remains unchanged), but the number of feature parameters included in each piece of data is reduced (ie, dimensionality reduction).
[建立、优化候选CTC/WBC分类器][Build and optimize candidate CTC/WBC classifiers]
对于新的训练集,分别使用多种有监督的机器学***衡训练集的前处理方法、多种评估方法,交叉验证,对CTC/WBC分类器的参数进行优化,以了解每个候选CTC/WBC分类器所能实现的最佳性能。For the new training set, use a variety of supervised machine learning algorithms and fusion model algorithms, multiple preprocessing methods for unbalanced training sets, multiple evaluation methods, cross-validation, and optimize the parameters of the CTC/WBC classifier. , to understand the best performance that each candidate CTC/WBC classifier can achieve.
可以使用的有监督机器学习算法包括但不限于K邻近(K-Nearest Neighbors,KNN)、随机梯度提升(Stochastic Gradient Boosting,GBM)、AdaBoost分类树(AdaBoost Classification Trees,ADABOOST)、支持向量机(Support Vector Machines,SVM)、随机森林(Random Forest,RF)、朴素贝叶斯(
Figure PCTCN2021135426-appb-000006
Bayes,NB)、极端梯度提升(Extreme Gradient Boosting, XGB)、人工神经网络(Artificial Neural Network,ANN)、决策树(Decision Tree)、逻辑回归(Logistics Regression)、线性回归(Linear regression)等本领域常用的算法。可以使用的融合模型算法包括但不限于堆叠(Stacking),即将多种有监督机器学习算法融合在一起的算法。
Supervised machine learning algorithms that can be used include but are not limited to K-Nearest Neighbors (KNN), Stochastic Gradient Boosting (GBM), AdaBoost Classification Trees (ADABOOST), Support Vector Machines (Support Vector Machines) Vector Machines, SVM), Random Forest (RF), Naive Bayes (
Figure PCTCN2021135426-appb-000006
Bayes, NB), Extreme Gradient Boosting (XGB), Artificial Neural Network (ANN), Decision Tree (Decision Tree), Logistic Regression (Logistics Regression), Linear regression (Linear regression) and other fields commonly used algorithms. The fusion model algorithms that can be used include, but are not limited to, stacking, which is an algorithm that combines multiple supervised machine learning algorithms.
另外,由于CTC是稀有细胞,而WBC是常见细胞,因此如果训练集来源于使用基于临时采集/获得的受试者样本和/或临床CTC样本,则训练集中正负数据的数量相差巨大,训练集成为不平衡训练集(也称有偏训练集)。这将导致一些常用的用于评价CTC/WBC分类器的指标失效。因此需要对不平衡训练集进行前处理。In addition, since CTCs are rare cells and WBCs are common cells, if the training set is derived from the use of temporally collected/obtained subject samples and/or clinical CTC samples, the number of positive and negative data in the training set varies greatly, and the training The ensemble is called an imbalanced training set (also called a biased training set). This will lead to the failure of some commonly used metrics for evaluating CTC/WBC classifiers. Therefore, it is necessary to pre-process the imbalanced training set.
可以用以对不平衡训练集进行前处理的方法包括但不限于原始(Original)、上采样(Up-sampling)、下采样(Down-sampling)、合成少数类过采样技术(Synthetic Minority Over-sampling Technique,SMOTE)、随机过采样(Random Over Sampling Examples,ROSE)等。The methods that can be used to pre-process the imbalanced training set include but are not limited to Original, Up-sampling, Down-sampling, Synthetic Minority Over-sampling Technique, SMOTE), random oversampling (Random Over Sampling Examples, ROSE), etc.
[选择最佳CTC/WBC分类器][Select Best CTC/WBC Classifier]
在对新的训练集进行前处理后,对训练集使用有监督机器学习算法以及融合模型算法,在优化各个CTC/WBC分类器的参数后,获得每个候选CTC/WBC分类器所能实现的最佳性能,然后对每个CTC/WBC分类器的性能进行评估。评估指标包括但不限于混淆矩阵、ROC、PR、mAP、AUC等。After pre-processing the new training set, use the supervised machine learning algorithm and the fusion model algorithm on the training set, and after optimizing the parameters of each CTC/WBC classifier, obtain what each candidate CTC/WBC classifier can achieve. The best performance is then evaluated on the performance of each CTC/WBC classifier. Evaluation metrics include but are not limited to confusion matrix, ROC, PR, mAP, AUC, etc.
在各个性能评估指标中,针对不平衡数据集而言,在各项指标均表现良好情况下,F1打分、召回率、TPR指标更关注于识别CTC的灵敏度,所以,F1打分、召回率、TPR数值越高,***识别CTC的灵敏度越高。据此选出性能优良的CTC/WBC分类器。Among the performance evaluation indicators, for imbalanced data sets, when all indicators perform well, the F1 score, recall rate, and TPR indicators focus more on the sensitivity of identifying CTCs. Therefore, F1 score, recall rate, TPR The higher the value, the higher the sensitivity of the system to identify CTCs. According to this, the CTC/WBC classifier with excellent performance was selected.
F1打分是精确率和召回率的调和值,更接近两个数较小的那个,所以精确率和召回率接近时,F1值最大。The F1 score is the harmonic value of the precision rate and the recall rate, which is closer to the smaller of the two numbers, so when the precision rate and the recall rate are close, the F1 value is the largest.
F1=2*(prescision*召回率)/(精确率+召回率)F1=2*(precision*recall rate)/(precision rate+recall rate)
其中in
Precision:精确率,正确预测正样本例数/预测正样本总数,精确率=TP/(TP+FP)。Precision: Precision, the number of correctly predicted positive samples/the total number of predicted positive samples, precision = TP/(TP+FP).
Recall:召回率,正确预测正样本例数/实际正样本总数,召回率 =TP/(TP+FN)。Recall: Recall rate, the number of correctly predicted positive samples/the total number of actual positive samples, recall rate = TP/(TP+FN).
TP:True Positive,真阳,预测为正样本,实际也为正样本。TP: True Positive, true positive, predicted to be a positive sample, and actually a positive sample.
FP:False Positive,假阳,预测为正样本,实际为负样本。FP: False Positive, false positive, predicted to be a positive sample, but actually a negative sample.
FN:False Negative,假阴,预测为负样本,实际为正样本。FN: False Negative, false negative, predicted to be a negative sample, but actually a positive sample.
TN:True Negative,真阴,预测为负样本,实际为负样本。TN: True Negative, true negative, predicted as a negative sample, actually a negative sample.
TPR(True Positive Rate)为真阳性率,正确预测正样本例数/实际正样本总数,TPR=TP/(TP+FN)。TPR (True Positive Rate) is the true positive rate, the number of positive samples correctly predicted/the total number of actual positive samples, TPR=TP/(TP+FN).
FPR:False Positive Rate,假阳性率,错误预测为正样本例数/实际负样本总数,FPR=FP/(FP+TN)。FPR: False Positive Rate, false positive rate, falsely predicted as the number of positive samples/the total number of actual negative samples, FPR=FP/(FP+TN).
PR是由精确率Precision(Y轴)和召回率Recall(X轴)构成的曲线。PR is a curve composed of Precision (Y-axis) and Recall (X-axis).
ROC(Receiver Operating Characteristic)是衡量分类器好坏的一个标准,其主要分析工具是一个画在二维平面上的曲线——ROC曲线。平面的横坐标是FPR,纵坐标是TPR。对某个分类器而言,可以根据其在测试样本上的表现得到一个TPR和FPR点对。这样,此分类器就可以映射成ROC平面上的一个点。调整这个分类器分类时候使用的阈值,可以得到一个经过(0,0),(1,1)的曲线,这就是此分类器的ROC曲线。一般情况下,这个曲线都应该处于(0,0)和(1,1)连线的上方,ROC曲线下方的那部分面积越大,分类器效果越好。ROC (Receiver Operating Characteristic) is a standard to measure the quality of a classifier, and its main analysis tool is a curve drawn on a two-dimensional plane - the ROC curve. The abscissa of the plane is FPR and the ordinate is TPR. For a classifier, a TPR and FPR point pair can be obtained according to its performance on the test sample. In this way, this classifier can be mapped to a point on the ROC plane. By adjusting the threshold used by this classifier for classification, a curve passing through (0, 0), (1, 1) can be obtained, which is the ROC curve of this classifier. In general, this curve should be above the line connecting (0, 0) and (1, 1). The larger the area under the ROC curve, the better the classifier.
AUC(Area Under Curve)是一种用来衡量分类器好坏的一个数值化的标准,AUC越大,分类器分类效果越好。AUC为ROC曲线下方部分面积的值。AUC (Area Under Curve) is a numerical standard used to measure the quality of a classifier. The larger the AUC, the better the classification effect of the classifier. AUC is the value of the area under the ROC curve.
CTC一致性(CTC concordance)是指图像识别***识别到的CTC与人工判读CTC结果的一致性,CTC一致性=100%*(图像***识别与人工判读的同一CTC的数量/人工判读得到的CTC数量),CTC一致性值越高,图像识别***识别CTC灵敏度越高。CTC concordance refers to the consistency between the CTC recognized by the image recognition system and the results of manual interpretation of CTC, CTC consistency = 100% * (the number of the same CTC recognized by the image system and the manual interpretation / CTC obtained by manual interpretation The higher the CTC consistency value, the higher the sensitivity of the image recognition system to identify CTCs.
正样本一致性(Positive sample concordance)是指图像识别***识别到的阳性样本(≥1个CTC)与人工判读结果的一致性,positive sample concordance=100%*(人工判读为阳性,图像识别***判读也为阳性的样本数量/人工判读的阳性样本数量),positive sample concordance值越 高,图像识别***识别阳性样本灵敏度越高。Positive sample concordance refers to the consistency between the positive samples (≥1 CTC) recognized by the image recognition system and the results of manual interpretation, positive sample concordance=100%*(The manual interpretation is positive, the image recognition system interprets Also the number of positive samples/number of positive samples interpreted manually), the higher the positive sample concordance value, the higher the sensitivity of the image recognition system to identify positive samples.
筛选效率(Screening efficiency)为100%*(图像识别***排除掉的非CTC数量/该样本中的细胞总数),screening efficiency值越高,图像识别***识别CTC特异性越高。Screening efficiency is 100%* (the number of non-CTCs excluded by the image recognition system/total number of cells in the sample), the higher the screening efficiency value, the higher the specificity of the image recognition system to identify CTCs.
也可以直接使用临床上已知是否为CTC的细胞来检测、评估CTC/WBC分类器的泛化性能。在直接使用临床上已知是否为CTC的细胞来进行评估时,评估指标包括但不限于CTC一致性(CTC concordance)、正样本一致性(Positive sample concordance)、筛选效率(Screening efficiency)等。在临床检测CTC时,首先期望正样本一致性尽可能高,理想的是达到100%;在满足正样本一致性要求的情况下,尽可能选择CTC一致性和筛选效率高的CTC/WBC分类器。The generalization performance of the CTC/WBC classifier can also be directly detected and evaluated by using cells that are clinically known to be CTCs. When directly using clinically known cells to be CTCs for evaluation, evaluation indicators include but are not limited to CTC concordance, Positive sample concordance, Screening efficiency, etc. In the clinical detection of CTCs, it is first expected that the consistency of positive samples should be as high as possible, ideally reaching 100%; in the case of meeting the requirements of positive sample consistency, CTC/WBC classifiers with high CTC consistency and screening efficiency should be selected as much as possible. .
在同时使用交叉验证和临床样本来评估CTC/WBC分类器性能的情况下,根据以下原则选出最佳CTC/WBC分类器:F1打分、召回率、TPR数值尽量高,正样本一致性达到100%,在此基础上,优选CTC一致性高于90%或筛选效率高于95%。如有多个符合条件的CTC/WBC分类器,根据实际需要选择其中一个,例如,可以选择几个CTC/WBC分类器中CTC一致性最高的一个。In the case of using both cross-validation and clinical samples to evaluate the performance of the CTC/WBC classifier, the best CTC/WBC classifier was selected according to the following principles: F1 score, recall rate, TPR value as high as possible, positive sample consistency reaching 100 %, on this basis, it is preferred that the CTC consistency is higher than 90% or the screening efficiency is higher than 95%. If there are multiple qualified CTC/WBC classifiers, select one of them according to actual needs. For example, you can select the one with the highest CTC consistency among several CTC/WBC classifiers.
对临床CTC样本中的CTC进行分类Classification of CTCs in clinical CTC samples
在使用CTC/WBC分类器,筛选出CTC细胞后,可以使用CTC类型分类器对CTC的类型进行分类。也可以直接提供已知确定为CTC的多个细胞,对其进行分类。After the CTC cells are screened out using the CTC/WBC classifier, the type of CTC can be classified using the CTC type classifier. Multiple cells known to be identified as CTCs can also be provided directly for classification.
一类CTC是指在细胞形态、免疫荧光响应等方面具有相近特性的CTC。在本发明中,将CTC细胞的形态学参数和CTC免疫荧光标志物作为特征,将CTC分为多个类型。A class of CTCs refers to CTCs with similar characteristics in terms of cell morphology and immunofluorescence response. In the present invention, the morphological parameters of CTC cells and CTC immunofluorescence markers are used as features, and CTCs are divided into multiple types.
〔荧光图像制作〕[Fluorescence Image Production]
在对使用CTC/WBC分类器筛选出的CTC细胞进行分类时,CTC细胞已经经过了荧光图像制作,可直接采用CTC/WBC分类器中采用的图像。When classifying the CTC cells screened by the CTC/WBC classifier, the CTC cells have already undergone fluorescence image production, and the images used in the CTC/WBC classifier can be directly used.
在对直接提供已知确定为CTC的多个细胞进行分类时,需要从头开始对细胞进行荧光图像制作。可使用与CTC/WBC分类器中相同的方法获得荧 光图像。De novo fluorescence imaging of cells is required when sorting multiple cells that directly provide known determinations to be CTCs. Fluorescence images can be obtained using the same method as in the CTC/WBC classifier.
〔图像识别〕〔Image Identification〕
在对使用CTC/WBC分类器筛选出的CTC细胞进行分类时,CTC细胞已经经过图像识别,可直接采用CTC/WBC分类器中采用的细胞形态学参数。When classifying the CTC cells screened by the CTC/WBC classifier, the CTC cells have been image-recognized, and the cell morphological parameters used in the CTC/WBC classifier can be directly used.
在对直接提供已知确定为CTC的多个细胞进行分类时,需要从头开始对细胞进行图像识别。可使用与CTC/WBC分类器中相同的方法进行图像识别并获得细胞形态学参数。De novo image identification of the cells is required when classifying multiple cells that directly provide known determinations to be CTCs. Image recognition and cytomorphological parameters can be obtained using the same method as in the CTC/WBC classifier.
〔进行CTC类型分类〕[Classification of CTC types]
在本发明中,采用无监督机器学习进行CTC类型分类。在无监督机器学习中,不需要带有标签的训练集,而是直接将所有数据的特征参数输入分类器中。In the present invention, CTC type classification is performed using unsupervised machine learning. In unsupervised machine learning, a labeled training set is not needed, but the feature parameters of all the data are directly fed into the classifier.
[选取特征参数][Select Feature Parameters]
如前所述,对特征参数进行选择和处理将有助于提高分类泛化性能,优化分类结果。在进行CTC类型分类时,根据需要对特征参数进行数据的中心化和归一化,数据的中心化和归一化采用与前述的数据的中心化和归一化中相同的方法来进行As mentioned before, the selection and processing of feature parameters will help improve the classification generalization performance and optimize the classification results. When classifying CTC types, data centralization and normalization are performed on the characteristic parameters as required.
然后对特征参数进行如下选择和处理,以获得最优特征参数的集合:Then the feature parameters are selected and processed as follows to obtain the set of optimal feature parameters:
(a)剔除高度相关的特征参数(a) Eliminate highly correlated feature parameters
(b)主成分分析(PCA)。(b) Principal Component Analysis (PCA).
可以根据需要进行(a)剔除高度相关的特征参数或(b)主成分分析,也可以依次对特征参数进行(a)剔除高度相关的特征参数和(b)主成分分析。You can perform (a) remove highly correlated feature parameters or (b) principal component analysis as needed, or perform (a) remove highly correlated feature parameters and (b) principal component analysis on the feature parameters in turn.
<剔除高度相关的特征参数><Remove highly correlated feature parameters>
如前所述,一些形态学参数相关性高,重复使用则将导致其重要性(权重)被不合理地增加,因此此处采用和前述相同的方法,用Pearson相关系数来剔除相关性高的形态学参数。其具体实现也如前所述。As mentioned above, some morphological parameters are highly correlated, and repeated use will lead to an unreasonable increase in their importance (weight), so the same method as the previous one is adopted here, and the Pearson correlation coefficient is used to eliminate the highly correlated ones. Morphological parameters. Its specific implementation is also as described above.
<主成分分析><Principal Component Analysis>
主成分分析(Principal Component Analysis,PCA)是一种统计方法。主成分分析旨在利用降维的思想,把多指标转化为少数几个综合指标,是一种简化数据集的技术,简化后的这组指标称为主成分。具体而言,主成 分分析通过矩阵的压缩算法,在减少矩阵维数的同时尽可能的保留矩阵中所存在的主要特性,从而可以大大节省空间和数据量,以较小的存储代价和计算复杂度获得较高的准确性。Principal Component Analysis (PCA) is a statistical method. Principal component analysis aims to use the idea of dimensionality reduction to convert multiple indicators into a few comprehensive indicators. Specifically, through the compression algorithm of the matrix, the principal component analysis can reduce the dimension of the matrix while retaining the main characteristics of the matrix as much as possible, so that the space and data volume can be greatly saved, and the storage cost and calculation complexity can be reduced. to obtain higher accuracy.
在本发明中,通过主成分分析的正交化线性变换,将数据变换到一个新的坐标***中。例如,假设在经过如上的数据处理后,整个数据集包括N个特征参数和n个数据(即n个CTC细胞),则将数据变换到一个新的坐标***中,该坐标***包括N个坐标轴。此时该坐标***是一个高维坐标***。在这个新的坐标***中,将坐标轴中心移到数据的中心,然后旋转坐标轴,使得数据在某个轴上的方差最大,即全部n个数据个体在该方向上的投影最为分散。轴上的方差越大、全部的数据个体在该方向上的投影越分散则代表该轴中保留了越多的信息。按照数据集在各个轴上对方差的贡献依次从大到小进行排列,将轴上的方差最大的轴称为PC1轴,其为第一主成分。以此类推,获得找到第二主成分(PC2)、第三主成分(PC3)等等,直至第N主成分(PCN)。In the present invention, the data is transformed into a new coordinate system through the orthogonalized linear transformation of principal component analysis. For example, assuming that after the above data processing, the entire data set includes N characteristic parameters and n data (ie, n CTC cells), then transform the data into a new coordinate system, which includes N coordinates axis. At this time, the coordinate system is a high-dimensional coordinate system. In this new coordinate system, the center of the coordinate axis is moved to the center of the data, and then the coordinate axis is rotated so that the variance of the data on a certain axis is the largest, that is, the projection of all n data individuals in this direction is the most dispersed. The larger the variance on the axis and the more scattered the projections of all data individuals in this direction, the more information is retained in the axis. According to the contribution of the dataset to the variance on each axis, they are arranged in descending order, and the axis with the largest variance on the axis is called the PC1 axis, which is the first principal component. By analogy, it is obtained to find the second principal component (PC2), the third principal component (PC3), and so on, up to the Nth principal component (PCN).
在一个实例中,其具体实现如下:In one instance, its specific implementation is as follows:
i.使用prcomp()函数,进行主成分分析,分析结果按照主成分PC1到PCN上数据集对方差的贡献(proportion of variance)从大到小依次进行排列。i. Use the prcomp() function to perform principal component analysis, and the analysis results are arranged in descending order according to the contribution of variance of the data sets from PC1 to PCN.
ii.以保留95%以上的累计方差信息(cumulative proportion)的最少的主成分数量为最终主成分的组成。例如,如上图所示,保留PC1到PC8的8个主成分后,可以保留约96%的累计方差信息。以上95%的数值是统计学中置信度的常用值,也可以根据需要,采用大于95%或小于95%的值。该值越大,则最终主成分所包括的信息越全面,但特征参数降维效果就越差;该值越小,则最终主成分所包括的信息越少,但特征参数降维效果越好。本领域技术人员可根据实际需要进行调整,以获得这两者的平衡,这是本领域技术人员公知的。另外,也可以根据随后的验证效果(即最终细胞聚类的效果),对该值进行调整。ii. The composition of the final principal components is the smallest number of principal components that retains more than 95% of the cumulative variance information (cumulative proportion). For example, as shown in the figure above, after retaining the 8 principal components of PC1 to PC8, about 96% of the cumulative variance information can be retained. The above 95% value is a commonly used value of confidence in statistics, and a value greater than 95% or less than 95% can also be used as required. The larger the value, the more comprehensive the information included in the final principal component, but the worse the feature parameter dimensionality reduction effect; the smaller the value, the less information included in the final principal component, but the better the feature parameter dimensionality reduction effect . Those skilled in the art can make adjustments according to actual needs to obtain a balance between the two, which is well known to those skilled in the art. In addition, this value can also be adjusted based on the subsequent validation effect (ie the effect of the final cell clustering).
〔建立CTC类型分类器〕[Building a CTC Type Classifier]
在CTC类型分类器中,可使用常用的无监督的机器学习算法来建立分 类器,包括但不限于层次聚类(Hierarchical Clustering)、期望最大化聚类(EM)、受限波尔兹曼机、人工神经网络、k-均值聚类(k-means)、异常检测法(Anomaly Detection)、自编码器(Auto-encoder)、深度信念网络(DeepBeliefNetwork,DBN)、赫比学习法(Hebbian learning)、生成式对抗网络(Generative adversarial networks,GAN)、自组织映射网络(SOM)、Mean-Shift聚类、DBSCAN聚类、凝聚和***的层次聚类(Hierarchical Clustering)等。In the CTC type classifier, commonly used unsupervised machine learning algorithms can be used to build the classifier, including but not limited to Hierarchical Clustering, Expectation Maximization (EM), Restricted Boltzmann Machine , artificial neural network, k-means clustering (k-means), anomaly detection method (Anomaly Detection), auto-encoder (Auto-encoder), deep belief network (DeepBeliefNetwork, DBN), Hebbian learning method (Hebbian learning) , Generative adversarial networks (GAN), Self-Organizing Map Network (SOM), Mean-Shift Clustering, DBSCAN Clustering, Hierarchical Clustering of Agglomeration and Splitting, etc.
可以通过本领域常用的语言包、软件、脚本来实现以上算法,其选择方法是本领域技术人员公知的。在一个实例中,采用R语言包来实现以上算法。The above algorithms can be implemented through language packages, software, and scripts commonly used in the art, and the selection method thereof is well known to those skilled in the art. In one example, the R language package is used to implement the above algorithm.
在一个实例中,采用k-均值聚类来建立CTC类型分类器。k-均值算法是一种聚类分析算法,其是来计算数据聚集的算法,其算法步骤如下:In one example, k-means clustering is employed to build a CTC type classifier. The k-means algorithm is a cluster analysis algorithm, which is an algorithm to calculate data aggregation. The algorithm steps are as follows:
i.从数据中选择k个对象作为初始聚类中心;i. Select k objects from the data as initial cluster centers;
ii.计算每个聚类对象到聚类中心的距离来划分;ii. Calculate the distance from each cluster object to the cluster center to divide;
iii.再次计算每个聚类中心;iii. Calculate each cluster center again;
iv.计算标准测度函数,直到达到最大迭代次数,则停止,否则,继续操作;iv. Calculate the standard measure function until the maximum number of iterations is reached, then stop, otherwise, continue to operate;
v.确定最优的聚类中心。v. Determine the optimal cluster center.
建立CTC分类情况与癌症的各种病况之间的相关性To establish the correlation between CTC classification status and various conditions of cancer
在对CTC进行了分类的基础上,我们建立了CTC分类与胃癌pTNM分期、胃癌Lauren分型、胃癌整体生存期、以及胃癌无进展生存期的相关性。实施例表明了CTC分类与胃癌pTNM分期、胃癌Lauren分型、胃癌整体生存期、以及胃癌无进展生存期存在正相关性。Based on the classification of CTCs, we established the correlation between CTC classification and gastric cancer pTNM stage, gastric cancer Lauren classification, gastric cancer overall survival, and gastric cancer progression-free survival. The examples show that there is a positive correlation between CTC classification and gastric cancer pTNM stage, gastric cancer Lauren classification, gastric cancer overall survival, and gastric cancer progression-free survival.
实施例Example
实施例中所使用的CTC可以获自已经确定罹患癌症的患者,也可以获自不确定是否罹患癌症的普通受试者。如果CTC获自不确定是否罹患癌症的普通受试者,则需要首先采用前述的分类方法或其他诊断手段来确认其受试者样本中 存在CTC细胞。在确认该受试者样本为临床CTC样本后,再根据上述的方法将临床CTC样本中包含的CTC分类,将其分类与病况相关联。The CTCs used in the examples can be obtained from patients who have been confirmed to have cancer, or can be obtained from ordinary subjects who are not sure whether they have cancer. If CTCs are obtained from ordinary subjects who are uncertain whether they have cancer, the aforementioned classification methods or other diagnostic methods need to be used to confirm the presence of CTC cells in their subject samples. After confirming that the subject sample is a clinical CTC sample, the CTC contained in the clinical CTC sample is classified according to the above method, and the classification is associated with the disease condition.
实施例1 CTC分类与胃癌pTNM分期Example 1 CTC classification and pTNM staging of gastric cancer
研究目的:Research purposes:
实施例1研究了术前CTC检测中各类型CTC数量与胃癌pTNM分期的相关性,观察哪个类型CTC的数量与胃癌pTNM分期最具相关性。Example 1 studied the correlation between the number of various types of CTCs in the preoperative CTC detection and the pTNM staging of gastric cancer, and observed which type of CTC number was most correlated with the pTNM staging of gastric cancer.
人群入组条件:Crowd entry conditions:
在可进行手术的胃癌人群中,针对术前CTC检测为阳性(CTC≥1/5mL)的患者人群(共计211例),按照胃癌pTNM分期进行分组:早中期(pTNM 1期和pTNM 2期,N=115),晚期(pTNM 3期和pTNM 4期,N=96);In the gastric cancer population that can be operated, the patients with positive CTC (CTC≥1/5mL) before surgery (total 211 cases) were grouped according to gastric cancer pTNM stage: early and middle stage (pTNM stage 1 and pTNM stage 2, N=115), advanced stage (pTNM stage 3 and pTNM stage 4, N=96);
分析方法:Analytical method:
在比较不同分组之间的CTC数量是否存在显著差异时,采用了Kruskal-Wallis检验,分别计算全部类型CTC数量、T1类型CTC数量、T2类型CTC数量、T3类型CTC数量、T4类型CTC数量、T5类型CTC数量、T6类型CTC数量在不同分组(早中期,晚期)中是否具有显著性差异(P值),如果p值<0.05,即该类型CTC数量在两组人群之间存在显著差异。上述分析均在SPSS软件中实现。When comparing the number of CTCs between different groups, the Kruskal-Wallis test was used to calculate the number of all types of CTCs, the number of T1 type CTCs, the number of T2 type CTCs, the number of T3 type CTCs, the number of T4 type CTCs, and the number of T5 types. Whether there is a significant difference (P value) in the number of type CTCs and the number of T6 type CTCs in different groups (early, middle, late), if the p value is <0.05, that is, there is a significant difference in the number of this type of CTC between the two groups. The above analysis was implemented in SPSS software.
分析结果:Analysis result:
在两组人群(早中期,晚期)中发现,整体CTC数量(全部类型CTC的数量)没有显著差异(P=0.812),T1类型CTC数量(P=0.044<0.05)以及T6类型CTC数量(P=0.039<0.05)存在显著差异(如表1),且在晚期人群中T1类型CTC数量以及T6类型CTC数量显著高于早中期人群中这两个类型的CTC数量(如表1)。从上述结果来看,通过CTC的形态学分类,我们找到了与胃癌pTNM分期显著相关的2个类型(T1、T6)的CTC,并且这两个类型的CTC在晚期人群(pTNM 3期和pTNM 4期)中的数量显著高于早中期人群(pTNM 1期和pTNM 2期),具体示于下表2。There was no significant difference in the number of overall CTCs (the number of all types of CTCs) (P=0.812), the number of T1 type CTCs (P=0.044<0.05) and the number of T6 type CTCs (P=0.044<0.05) in the two groups (early, middle, late) = 0.039 < 0.05), there is a significant difference (as shown in Table 1), and the number of T1 type CTCs and the number of T6 type CTCs in the late-stage population are significantly higher than those of the two types of CTCs in the early and middle-stage population (as shown in Table 1). From the above results, through the morphological classification of CTCs, we found 2 types (T1, T6) of CTCs that were significantly correlated with pTNM staging of gastric cancer, and these two types of CTCs were found in advanced populations (pTNM stage 3 and pTNM stage 3). Stage 4) was significantly higher than the early-to-mid-stage population (pTNM stage 1 and pTNM stage 2), as shown in Table 2 below.
Figure PCTCN2021135426-appb-000007
Figure PCTCN2021135426-appb-000007
表2比较不同类型CTC数量在早中期和晚期人群中的差异Table 2 Comparison of the number of different types of CTCs in the early, middle and late populations
实施例2 CTC分类与胃癌Lauren分型Example 2 CTC classification and Lauren classification of gastric cancer
研究目的:Research purposes:
研究术前CTC检测中各类型CTC数量与胃癌Lauren分型的相关性,观察哪个类型CTC的数量与胃癌Lauren分型最具相关性。To study the correlation between the number of various types of CTCs in preoperative CTC detection and the Lauren classification of gastric cancer, and to observe which type of CTC number has the most correlation with the Lauren classification of gastric cancer.
人群入组条件:Crowd entry conditions:
在可进行手术的胃癌人群中,针对术前CTC检测为阳性(CTC≥1/5mL)的患者人群(共计167例),按照胃癌Lauren分型进行分组:弥漫型(N=35),肠型(N=66),混合型(N=66)。In the population of operable gastric cancer, the patients with positive preoperative CTC (CTC≥1/5mL) (total 167 cases) were grouped according to the Lauren classification of gastric cancer: diffuse type (N=35), intestinal type (N=66), mixed (N=66).
分析方法:Analytical method:
在比较不同分组之间的CTC数量是否存在显著差异时,采用了Kruskal-Wallis检验,分别计算全部类型CTC数量、T1类型CTC数量、T2类型CTC数量、T3类型CTC数量、T4类型CTC数量、T5类型CTC数量、T6类型CTC数量在不同分组(弥漫型,肠型,混合型)中是否具有显著性差异(P值),如果p值<0.05,即该类型CTC数量在不同人群之间存在显著差异。When comparing the number of CTCs between different groups, the Kruskal-Wallis test was used to calculate the number of all types of CTCs, the number of T1 type CTCs, the number of T2 type CTCs, the number of T3 type CTCs, the number of T4 type CTCs, and the number of T5 types. Whether there is a significant difference (P value) in the number of type CTCs and the number of T6 type CTCs in different groups (diffuse type, intestinal type, mixed type), if the p value is less than 0.05, that is, the number of CTCs of this type is significant between different populations difference.
分析结果:Analysis results:
在不同人群(弥漫型,肠型,混合型)中发现,整体CTC数量(全部类型CTC的数量)没有显著差异(P=0.464>0.05),T1类型CTC数量(P=0.024<0.05)以及T6类型CTC数量(P=0.001<0.05)存在显著差异(下表3),且在弥漫型人群中T1类型CTC数量以及T6类型CTC数量显著高于肠型或混合型人群中这两个类型的CTC数量(如表2)。若仅比较肠型和混合型人群中各类型CTC数量的差异,发现均无显著差异。从上述结果来看,通过CTC的形态学分类,我们找到了与胃癌Lauren分型显著相关的2个类型(T1、T6)的CTC,并且这两个类型的CTC在弥漫型人群中的数量显著高于肠型或混合型人群。In different populations (diffuse type, intestinal type, mixed type), there was no significant difference in the number of overall CTCs (the number of all types of CTCs) (P=0.464>0.05), the number of T1 type CTCs (P=0.024<0.05) and the number of T6 type CTCs (P=0.464>0.05) There was a significant difference in the number of type CTCs (P=0.001<0.05) (Table 3 below), and the number of T1 type CTCs and T6 type CTCs in the diffuse type population were significantly higher than those in the intestinal type or mixed type. quantity (see Table 2). If only the differences in the number of CTCs of various types in the enterotype and mixed groups were compared, no significant difference was found. From the above results, through the morphological classification of CTCs, we found 2 types (T1, T6) of CTCs that were significantly related to the Lauren classification of gastric cancer, and the numbers of these two types of CTCs were significant in the diffuse population. higher than those of intestinal or mixed type.
Figure PCTCN2021135426-appb-000008
Figure PCTCN2021135426-appb-000008
表3比较不同类型CTC数量在弥漫型,肠型,混合型人群中的差异Table 3 Comparison of the number of different types of CTCs in diffuse, intestinal, and mixed populations
实施例3 CTC分类与胃癌整体生存期Example 3 CTC classification and overall survival of gastric cancer
研究目的:Research purposes:
探索术前CTC检测中各类型CTC数量与胃癌整体生存期的相关性,观察哪 个类型CTC的数量与胃癌整体生存期最具相关性。To explore the correlation between the number of various types of CTCs in preoperative CTC detection and the overall survival of gastric cancer, and to observe which type of CTC number is most correlated with the overall survival of gastric cancer.
人群入组条件:Crowd entry conditions:
在可进行手术的胃癌人群中,针对术前CTC检测为阳性(CTC≥1/5mL)的患者人群(共计211例)进行分组,人群分组情况如下表4所示:In the gastric cancer population that can be operated, the patient population (a total of 211 cases) with positive preoperative CTC detection (CTC≥1/5mL) was divided into groups. The population grouping is shown in Table 4 below:
Figure PCTCN2021135426-appb-000009
Figure PCTCN2021135426-appb-000009
表4Table 4
分析方法:Analytical method:
按照上述表格内容,在每一个分析批次内部,将人群分成2组:Tn_阴性和Tn_阳性(n=1,2,3,4,5,6),然后采用Kaplan-Meier法绘制整体生存期曲线, 整体生存期的计算是从手术开始,随访至患者死亡的时间,以天为单位,曲线的比对使用log-rank检验;在比较人群之间的性别差异时,采用了χ2检验,在比较人群之间的年龄、胃癌分期时,采用了Wilcoxon轶和检验。P<0.05为两组人群之间有显著差异。上述分析方法在SPSS软件中实现。According to the above table, within each analysis batch, the population was divided into 2 groups: Tn_negative and Tn_positive (n=1, 2, 3, 4, 5, 6), and then the Kaplan-Meier method was used to draw the overall Survival curve, the calculation of overall survival is the time from the beginning of the operation, the follow-up to the death of the patient, in days, and the log-rank test was used for the comparison of the curves; when comparing the gender differences between the populations, the χ2 test was used. , when comparing age and gastric cancer stage between populations, the Wilcoxon sum test was used. P<0.05 means there is a significant difference between the two groups. The above analysis methods were implemented in SPSS software.
分析结果:Analysis result:
在不同分析批次中发现,按照T4类型CTC分类的两个人群(T4_阴性,T4_阳性)的整体生存期曲线存在显著差异(P=0.027),T4_阴性人群的整体生存期(中位值799,95%置信区间:751至846)短于T4_阳性人群的整体生存期(中位值817,95%置信区间:784至850)。图1示出了术前CTC检测中,独立观察各类型CTC情况下,患者的整体生存期曲线。Significant differences (P=0.027) were found in the overall survival curves of the two populations (T4_negative, T4_positive) classified by T4 type CTCs across the different analysis batches, and the overall survival of the T4_negative population (medium Median 799, 95% confidence interval: 751 to 846) was shorter than overall survival in the T4_positive population (median 817, 95% confidence interval: 784 to 850). Figure 1 shows the overall survival curve of patients under the condition of independent observation of each type of CTC in the preoperative CTC detection.
同时,我们也发现按照T6类型CTC分类的两个人群(T6_阴性,T6_阳性)的整体生存期曲线存在显著差异(P=0.032),T6_阴性人群的整体生存期(中位值841,95%置信区间:808至873)长于T6_阳性人群的整体生存期(中位值719,95%置信区间:604至833)。为了排除上述人群之间显著差异是受其他临床特征影响,我们进一步得分析了人群之间在性别、年龄、胃癌分期是否存在显著差异。统计分析结果显示(下表5、表6),按T4类型CTC分类的人群在性别、年龄、胃癌分期上均无显著差异,按T6类型CTC分类的人群在上述临床基本特征上也无显著差异。At the same time, we also found that the overall survival curves of the two populations (T6_negative, T6_positive) classified by T6 type CTC were significantly different (P=0.032), and the overall survival time of the T6_negative population (median 841 , 95% confidence interval: 808 to 873) was longer than the overall survival of the T6_positive population (median 719, 95% confidence interval: 604 to 833). In order to rule out that the significant differences between the above populations were influenced by other clinical characteristics, we further analyzed whether there were significant differences in gender, age, and gastric cancer staging between the populations. The results of statistical analysis (Table 5 and Table 6 below) showed that there was no significant difference in gender, age, and gastric cancer stage among the population classified by T4 type CTC, and there was no significant difference in the above-mentioned basic clinical characteristics among the population classified by T6 type CTC. .
Figure PCTCN2021135426-appb-000010
Figure PCTCN2021135426-appb-000010
表5术前CTC检测中,按T4类型CTC分类的两个人群的基本临床特征的相关性分析Table 5 Correlation analysis of basic clinical characteristics of two populations classified by T4 type CTC in preoperative CTC detection
Figure PCTCN2021135426-appb-000011
Figure PCTCN2021135426-appb-000011
表6术前CTC检测中,按T6类型CTC分类的两个人群的基本临床特征的相关性分析Table 6 Correlation analysis of basic clinical characteristics of two populations classified by T6 type CTC in preoperative CTC detection
综上所述,通过对CTC形态进行分类后,如果只从T4类型CTC的统计结果来看,发现没有T4类型CTC的患者相比于有T4类型CTC的患者有更短的整体生存期;如果只从T6类型CTC的统计结果来看,发现有T6类型CTC的患者相比于没有T6类型CTC的患者有更短的整体生存期。To sum up, after classifying CTC morphology, only from the statistical results of T4 type CTC, it is found that patients without T4 type CTC have a shorter overall survival period than patients with T4 type CTC; if Only from the statistical results of T6-type CTCs, it was found that patients with T6-type CTCs had a shorter overall survival compared with patients without T6-type CTCs.
考虑到患者检测出的CTC中,可能同时存在T4类型和T6类型的CTC,我们重新对人群进行了划分,来进一步挖掘同时考量两种类型CTC对患者整体生存期的影响。同样,采用Kaplan-Meier法绘制了上述4个人群的整体生存期曲线(图2),其中T4_阳性&T6_阳性的人群数量太少(N=4),没有纳入进一步的统计分析当中。图2示出了术前CTC检测中,综合T4类型和T6类型的CTC情况下,患者的整体生存期曲线。Considering that there may be both T4 and T6 CTCs in the detected CTCs, we re-divided the population to further explore the impact of both types of CTCs on the overall survival of patients. Similarly, the Kaplan-Meier method was used to draw the overall survival curves of the above four populations (Figure 2), and the number of T4_positive & T6_positive populations was too small (N=4) to be included in further statistical analysis. Figure 2 shows the overall survival curve of patients in the case of a combination of T4 and T6 CTCs in the preoperative CTC detection.
统计分析发现,三组人群(T4_阴性&T6_阴性,T4_阴性&T6_阳性,T4_阳性&T6_阴性)的整体生存期曲线存在显著差异(P=0.033),T4_阳性&T6_阴性人群(N=71)的整体生存期最长(中位值825,95%置信区间:796至854),T4_阴性&T6_阳性人群(N=21)的整体生存期最短(中位值726,95%置信区间:606至847)。Statistical analysis found that the overall survival curves of the three groups (T4_negative&T6_negative, T4_negative&T6_positive, T4_positive&T6_negative) were significantly different (P=0.033), T4_positive&T6_negative population (N=71) had the longest overall survival (median 825, 95% confidence interval: 796 to 854), and the T4_negative & T6_positive population (N=21) had the shortest overall survival (median 726, 95% confidence interval: 606 to 847).
综上所述,将CTC按形态学分类后发现,在术前CTC检测中,T4类型CTC和T6类型CTC均与患者的整体生存期存在显著相关性,综合两种类型CTC的结果发现,T4_阳性&T6_阴性的患者的整体生存期显著高于T4_阴性&T6_阳性的患者。In summary, after classifying CTCs by morphology, it was found that in the preoperative CTC detection, both T4 type CTC and T6 type CTC were significantly correlated with the overall survival of patients. _positive & T6_negative patients had significantly higher overall survival than T4_negative & T6_positive patients.
实施例4 CTC分类与胃癌无进展生存期Example 4 CTC classification and progression-free survival in gastric cancer
研究目的:Research purposes:
探索术后CTC检测中各类型CTC数量与胃癌无进展生存期的相关性,观察哪个类型CTC的数量与胃癌无进展生存期最具相关性。To explore the correlation between the number of various types of CTCs in postoperative CTC detection and the progression-free survival of gastric cancer, and to observe which type of CTC number is most correlated with the progression-free survival of gastric cancer.
人群入组条件:Crowd entry conditions:
在术后胃癌人群中,针对术后CTC检测的患者人群(共计229例)进行分组,人群分组情况如下表7所示。In the postoperative gastric cancer population, the patient population (a total of 229 cases) detected by postoperative CTC was grouped, and the population grouping is shown in Table 7 below.
Figure PCTCN2021135426-appb-000012
Figure PCTCN2021135426-appb-000012
表7Table 7
分析方法:Analytical method:
按照表7的内容,在每一个分析批次内部,将人群分成2组:Tn_阴性和Tn_阳性(n=1,2,3,4,5,6),然后采用Kaplan-Meier法绘制无进展生存期曲线,无进展生存期的计算是从术后CTC检测时间点开始,随访至患者出现复发或转移的时间点,以天为单位,曲线的比对使用log-rank检验;在比较人群之间的性别差异时,采用了χ2检验,在比较人群之间的年龄、胃癌分期时,采用了Wilcoxon轶和检验。P<0.05为两组人群之间有显著差异。上述分析方法在 SPSS软件中实现。Within each analysis batch, the population was divided into 2 groups: Tn_negative and Tn_positive (n=1, 2, 3, 4, 5, 6) according to the contents of Table 7, and then plotted using the Kaplan-Meier method Progression-free survival curve, the calculation of progression-free survival is from the time point of postoperative CTC detection, followed up to the time point of recurrence or metastasis of the patient, in days, the curve comparison uses the log-rank test; in the comparison The χ2 test was used to compare the gender differences between the populations, and the Wilcoxon sum test was used to compare the age and gastric cancer staging between the populations. P<0.05 means there is a significant difference between the two groups. The above analysis methods were implemented in SPSS software.
分析结果:Analysis results:
在不同分析批次中发现,按照T3类型CTC分类的两个人群(T3_阴性,T3_阳性)的无进展生存期曲线存在显著差异(P=0.011),T3_阴性人群的无进展生存期(中位值743,95%置信区间:700至786)短于T3_阳性人群的无进展生存期(中位值776,95%置信区间:741至811)。图3示出了术后CTC检测中,独立观察各类型CTC情况下,患者的无进展生存期曲线。Significant differences in progression-free survival curves (P=0.011) were found between the two populations (T3_negative, T3_positive) classified by T3 type of CTCs across the different analysis batches, and progression-free survival in the T3_negative population was significantly different (P=0.011). (median 743, 95% confidence interval: 700 to 786) was shorter than progression-free survival in the T3_positive population (median 776, 95% confidence interval: 741 to 811). Figure 3 shows the progression-free survival curve of patients under the condition of independent observation of each type of CTC in postoperative CTC detection.
同时,我们也发现按照T6类型CTC分类的两个人群(T6_阴性,T6_阳性)的无进展生存期曲线存在显著差异(P=0.039),T6_阴性人群的无进展生存期(中位值779,95%置信区间:744至814)长于T6_阳性人群的无进展生存期(中位值572,95%置信区间:439至705)(上图)。为了排除上述人群之间显著差异是受其他临床特征影响,我们进一步得分析了人群之间在性别、年龄、胃癌分期是否存在显著差异。统计分析结果显示(下表8、表9),按T3类型CTC分类的人群在性别、年龄、胃癌分期上均无显著差异,按T6类型CTC分类的人群在上述临床基本特征上也无显著差异。At the same time, we also found that the progression-free survival curves of the two populations (T6_negative, T6_positive) classified by T6 type CTC were significantly different (P=0.039), and the progression-free survival of the T6_negative population (median value 779, 95% confidence interval: 744 to 814) was longer than progression-free survival in the T6_positive population (median 572, 95% confidence interval: 439 to 705) (top panel). In order to rule out that the significant differences between the above populations were influenced by other clinical characteristics, we further analyzed whether there were significant differences in gender, age, and gastric cancer staging between the populations. Statistical analysis results (Table 8 and Table 9 below) show that there is no significant difference in gender, age, and gastric cancer stage among the population classified by T3 type CTC, and the population classified by T6 type CTC also has no significant difference in the above-mentioned basic clinical characteristics. .
Figure PCTCN2021135426-appb-000013
Figure PCTCN2021135426-appb-000013
表8术后CTC检测中,按T3类型CTC分类的两个人群的基本临床特征的相关性分析Table 8 Correlation analysis of basic clinical characteristics of two populations classified by T3 type CTC in postoperative CTC detection
Figure PCTCN2021135426-appb-000014
Figure PCTCN2021135426-appb-000014
表9术后CTC检测中,按T6类型CTC分类的两个人群的基本临床特征的相关性分析Table 9 Correlation analysis of basic clinical characteristics of two populations classified by T6 type CTC in postoperative CTC detection
综上所述,通过对CTC形态进行分类后,如果只从T3类型CTC的统计结果来看,发现没有T3类型CTC的患者相比于有T3类型CTC的患者有更短的无进展生存期;如果只从T6类型CTC的统计结果来看,发现有T6类型CTC的患者相比于没有T6类型CTC的患者有更短的无进展生存期。To sum up, after classifying CTC morphology, only from the statistical results of T3 type CTC, it is found that patients without T3 type CTC have a shorter progression-free survival period than patients with T3 type CTC; If only from the statistical results of T6 type CTCs, it was found that patients with T6 type CTCs had a shorter progression-free survival than patients without T6 type CTCs.
考虑到患者检测出的CTC中,可能同时存在T3类型和T6类型的CTC,我们重新对人群进行了划分,来进一步挖掘同时考量两种类型CTC对患者无进展生存期的影响。同样,采用Kaplan-Meier法绘制了上述4个人群的无进展生存期曲线(图4),其中T3_阳性&T6_阳性的人群数量太少(N=6),没有纳入进一步的统计分析当中。图4示出了术后CTC检测中,综合T3类型和T6类型的CTC情况下,患者的无进展生存期曲线。Considering that there may be both T3 and T6 CTCs in the detected CTCs, we re-divided the population to further explore the impact of both types of CTCs on the progression-free survival of patients. Similarly, the Kaplan-Meier method was used to draw the progression-free survival curves of the above four populations (Figure 4), and the number of T3_positive & T6_positive populations was too small (N=6) to be included in further statistical analysis. Figure 4 shows the progression-free survival curve of patients in the case of combining T3 type and T6 type of CTC in postoperative CTC detection.
统计分析发现,三组人群(T3_阴性&T6_阴性,T3_阴性&T6_阳性,T3_阳性&T6_阴性)的无进展生存期曲线存在显著差异(P=0.002),T3_阳性&T6_阴性人群(N=47)的无进展生存期最长(中位值775,95%置信区间:737至812),T3_阴性&T6_阳性人群(N=13)的无进展生存期最短(中位值505,95%置信区间:335至675)。Statistical analysis found that the progression-free survival curves of the three groups (T3_negative&T6_negative, T3_negative&T6_positive, T3_positive&T6_negative) were significantly different (P=0.002), T3_positive&T6_negative The population (N=47) had the longest progression-free survival (median 775, 95% confidence interval: 737 to 812) and the T3_negative & T6_positive population (N=13) had the shortest progression-free survival (median value 505, 95% confidence interval: 335 to 675).

Claims (10)

  1. 循环肿瘤细胞的形态学特征在构建在对象中预测或判断胃癌病况的***中的应用。Use of morphological features of circulating tumor cells in constructing a system for predicting or judging gastric cancer condition in a subject.
  2. 如权利要求1所述的应用,其中,所述预测或判断针对罹患胃癌或具有患胃癌风险或曾罹患胃癌但已治愈的对象进行。The use according to claim 1, wherein the prediction or determination is performed on a subject who has gastric cancer or is at risk of developing gastric cancer, or has suffered from gastric cancer but has been cured.
  3. 如权利要求1所述的应用,其中,所述循环肿瘤细胞的形态学特征通过免疫荧光染色获得。The use of claim 1, wherein the morphological characteristics of the circulating tumor cells are obtained by immunofluorescence staining.
  4. 如权利要求1所述的应用,其中,所述胃癌病况包括胃癌的分型、胃癌的pTNM分期、胃癌整体生存期、胃癌无进展生存期中的一种或多种。The application of claim 1, wherein the gastric cancer condition comprises one or more of gastric cancer classification, gastric cancer pTNM staging, gastric cancer overall survival, and gastric cancer progression-free survival.
  5. 如权利要求1所述的应用,其中,所述***包括分类模块,将利用免疫荧光染色获得的循环肿瘤细胞的形态学特征输入分类模块,输出与循环肿瘤细胞所属对象的胃癌病况预测结果。The application according to claim 1, wherein the system includes a classification module, which inputs the morphological features of circulating tumor cells obtained by immunofluorescence staining into the classification module, and outputs the prediction result of gastric cancer condition of the object to which the circulating tumor cells belong.
  6. 如权利要求5所述的应用,其中,所述分类模块中包含通过机器学习构建的分类器,所述分类器通过循环肿瘤细胞的形态学特征、对循环肿瘤细胞进行分类,确定循环肿瘤细胞的分类与对象的胃癌病况的相关关系。The application according to claim 5, wherein the classification module comprises a classifier constructed by machine learning, and the classifier classifies the circulating tumor cells according to the morphological features of the circulating tumor cells to determine the circulating tumor cells. Correlation of the classification with the gastric cancer condition of the subject.
  7. 如权利要求6所述的应用,其中,所述分类器采用k-均值聚类算法。The application of claim 6, wherein the classifier employs a k-means clustering algorithm.
  8. 如权利要求3所述的应用,其中,所述形态学特征表征细胞核大小、细胞核形态、细胞膜和/或浆的大小、细胞膜和/或浆的形态、循环肿瘤细胞标志物表达量、循环肿瘤细胞标志物在细胞膜和/或浆的分布中的一种或多种。The application according to claim 3, wherein the morphological features represent the size of the nucleus, the morphology of the nucleus, the size of the cell membrane and/or the plasma, the morphology of the cell membrane and/or the plasma, the expression level of circulating tumor cell markers, the circulating tumor cell One or more of the distribution of the marker in the cell membrane and/or plasma.
  9. 如权利要求8所述的应用,其中,在将所述特征输入分类器前,进一步对获得表征循环肿瘤细胞形态的特征的参数进行筛选和/或主成分分析。The application according to claim 8, wherein before inputting the features into the classifier, further screening and/or principal component analysis is performed on the parameters obtained to characterize the morphology of circulating tumor cells.
  10. 一种在对象中预测或判断胃癌病况的***,其中,使用如权利要求1所述的循环肿瘤细胞的形态学特征。A system for predicting or judging the condition of gastric cancer in a subject, wherein the morphological characteristics of circulating tumor cells according to claim 1 are used.
PCT/CN2021/135426 2020-12-03 2021-12-03 Application of morphological feature of circulating tumor cell in clinical diagnosis and treatment of gastric cancer WO2022117081A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011412056.XA CN114594252A (en) 2020-12-03 2020-12-03 Application of morphological characteristics of circulating tumor cells in clinical diagnosis and treatment of gastric cancer
CN202011412056.X 2020-12-03

Publications (1)

Publication Number Publication Date
WO2022117081A1 true WO2022117081A1 (en) 2022-06-09

Family

ID=81802365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135426 WO2022117081A1 (en) 2020-12-03 2021-12-03 Application of morphological feature of circulating tumor cell in clinical diagnosis and treatment of gastric cancer

Country Status (2)

Country Link
CN (1) CN114594252A (en)
WO (1) WO2022117081A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090317836A1 (en) * 2006-01-30 2009-12-24 The Scripps Research Institute Methods for Detection of Circulating Tumor Cells and Methods of Diagnosis of Cancer in Mammalian Subject
CN103597354A (en) * 2011-04-04 2014-02-19 雀巢产品技术援助有限公司 Methods for predicting and improving the survival of gastric cancer patients
CN107407626A (en) * 2014-09-26 2017-11-28 加利福尼亚大学董事会 The method for assessing the disease condition of cancer
CN107850587A (en) * 2015-05-26 2018-03-27 创新微技术公司 Application of the circulating tumor cell mitotic index in cancer is layered and is diagnosed
CN110998318A (en) * 2017-06-02 2020-04-10 艾匹克科学公司 Method for determining therapy based on single cell characterization of Circulating Tumor Cells (CTCs) in metastatic disease
US20200333235A1 (en) * 2019-04-22 2020-10-22 Rutgers, The State University Of New Jersey Use of multi-frequency impedance cytometry in conjunction with machine learning for classification of biological particles

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090317836A1 (en) * 2006-01-30 2009-12-24 The Scripps Research Institute Methods for Detection of Circulating Tumor Cells and Methods of Diagnosis of Cancer in Mammalian Subject
CN103597354A (en) * 2011-04-04 2014-02-19 雀巢产品技术援助有限公司 Methods for predicting and improving the survival of gastric cancer patients
CN107407626A (en) * 2014-09-26 2017-11-28 加利福尼亚大学董事会 The method for assessing the disease condition of cancer
CN107850587A (en) * 2015-05-26 2018-03-27 创新微技术公司 Application of the circulating tumor cell mitotic index in cancer is layered and is diagnosed
CN110998318A (en) * 2017-06-02 2020-04-10 艾匹克科学公司 Method for determining therapy based on single cell characterization of Circulating Tumor Cells (CTCs) in metastatic disease
US20200333235A1 (en) * 2019-04-22 2020-10-22 Rutgers, The State University Of New Jersey Use of multi-frequency impedance cytometry in conjunction with machine learning for classification of biological particles

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YANG YⱭN-LI;LIU BIN;SU QIN-JUN;JIN LIⱭNɡ-LIⱭNɡ;MA YING-CHUN;HAN XIⱭO-PENɡ;YU JIⱭN-PINɡ;YANG YⱭN-FEI;WANG TINɡ-RU;ZHAO LⱭN-JUN: "Circulating Tumor Cells and Their Karyotyping Correlates with Pathological Staging in Gastric Cancer", CHINESE JOURNAL OF DIAGNOSTIC PATHOLOGY, vol. 25, no. 11, 22 November 2018 (2018-11-22), pages 754 - 760, XP055935995, ISSN: 1007-8096, DOI: 10.3969/j.issn.1007-8096.2018.11.006 *

Also Published As

Publication number Publication date
CN114594252A (en) 2022-06-07

Similar Documents

Publication Publication Date Title
Saha et al. Her2Net: A deep framework for semantic segmentation and classification of cell membranes and nuclei in breast cancer evaluation
JP5184087B2 (en) Methods and computer program products for analyzing and optimizing marker candidates for cancer prognosis
EP1949285B1 (en) Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition
US7461048B2 (en) Systems and methods for treating, diagnosing and predicting the occurrence of a medical condition
US20070019854A1 (en) Method and system for automated digital image analysis of prostrate neoplasms using morphologic patterns
AU2012229102B2 (en) Systems and compositions for diagnosing Barrett&#39;s esophagus and methods of using the same
He et al. A new method for CTC images recognition based on machine learning
ZA200502575B (en) Cell-based detection and differentiation of disease states
US20140235487A1 (en) Oral cancer risk scoring
CN111652095A (en) CTC image identification method and system based on artificial intelligence
CN114092934A (en) Method for classifying circulating tumor cells
EP4334912A1 (en) Analysis of histopathology samples
Teverovskiy et al. Improved prediction of prostate cancer recurrence based on an automated tissue image analysis system
WO2022117081A1 (en) Application of morphological feature of circulating tumor cell in clinical diagnosis and treatment of gastric cancer
De León Rodríguez et al. A machine learning workflow of multiplexed immunofluorescence images to interrogate activator and tolerogenic profiles of conventional type 1 dendritic cells infiltrating melanomas of disease-free and metastatic patients
WO2006122251A2 (en) Method and system for automated digital image analysis of prostrate neoplasms using morphologic patterns
Teverovskiy et al. Automated localization and quantification of protein multiplexes via multispectral fluorescence imaging
Chang et al. Multiplexed immunohistochemistry image analysis using sparse coding
Sapir et al. Improved automated localization and quantification of protein multiplexes via multispectral fluorescence imaging in heterogenous biopsy samples
Bassen et al. Clinical decision support system (cdss) for the classification of atypical cells in pleural effusions
Saeed-Vafa et al. Practical applications of digital pathology
Tsakiroglou et al. Quantifying cell-type interactions and their spatial patterns as prognostic biomarkers in follicular lymphoma
US20230352149A1 (en) Single-cell morphology analysis for disease profiling and drug discovery
Pytlarz et al. Deep Learning Glioma Grading with the Tumor Microenvironment Analysis Protocol for Comprehensive Learning, Discovering, and Quantifying Microenvironmental Features
Sun et al. Deep learning-based adaptive detection of fetal nucleated red blood cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21900105

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21900105

Country of ref document: EP

Kind code of ref document: A1