CN116469473B - Model training method, device, equipment and storage medium for T cell subtype identification - Google Patents
Model training method, device, equipment and storage medium for T cell subtype identification Download PDFInfo
- Publication number
- CN116469473B CN116469473B CN202310708381.8A CN202310708381A CN116469473B CN 116469473 B CN116469473 B CN 116469473B CN 202310708381 A CN202310708381 A CN 202310708381A CN 116469473 B CN116469473 B CN 116469473B
- Authority
- CN
- China
- Prior art keywords
- cells
- model
- sequencing data
- data
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 210000001744 T-lymphocyte Anatomy 0.000 title claims abstract description 282
- 238000012549 training Methods 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000012163 sequencing technique Methods 0.000 claims abstract description 201
- 206010028980 Neoplasm Diseases 0.000 claims abstract description 122
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 72
- 210000004027 cell Anatomy 0.000 claims abstract description 69
- 230000014509 gene expression Effects 0.000 claims abstract description 43
- 239000003550 marker Substances 0.000 claims abstract description 25
- 238000001914 filtration Methods 0.000 claims description 20
- 238000013145 classification model Methods 0.000 claims description 18
- 238000010494 dissociation reaction Methods 0.000 claims description 17
- 230000005593 dissociations Effects 0.000 claims description 17
- 108700042075 T-Cell Receptor Genes Proteins 0.000 claims description 15
- 230000006698 induction Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 108020005196 Mitochondrial DNA Proteins 0.000 claims description 7
- 238000002790 cross-validation Methods 0.000 claims description 6
- 238000007477 logistic regression Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 4
- 230000001939 inductive effect Effects 0.000 claims description 2
- 210000001266 CD8-positive T-lymphocyte Anatomy 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 7
- 239000000427 antigen Substances 0.000 description 7
- 108091007433 antigens Proteins 0.000 description 7
- 102000036639 antigens Human genes 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000003908 quality control method Methods 0.000 description 7
- 108091008874 T cell receptors Proteins 0.000 description 6
- 102000016266 T-Cell Antigen Receptors Human genes 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000000338 in vitro Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 101000946843 Homo sapiens T-cell surface glycoprotein CD8 alpha chain Proteins 0.000 description 4
- 102100034922 T-cell surface glycoprotein CD8 alpha chain Human genes 0.000 description 4
- 230000003915 cell function Effects 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000011282 treatment Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002438 mitochondrial effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000007796 conventional method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000000265 homogenisation Methods 0.000 description 2
- 210000004698 lymphocyte Anatomy 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- 101000738413 Homo sapiens T-cell surface glycoprotein CD3 gamma chain Proteins 0.000 description 1
- 101000946833 Homo sapiens T-cell surface glycoprotein CD8 beta chain Proteins 0.000 description 1
- 102100037911 T-cell surface glycoprotein CD3 gamma chain Human genes 0.000 description 1
- 102100034928 T-cell surface glycoprotein CD8 beta chain Human genes 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 238000011467 adoptive cell therapy Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000005975 antitumor immune response Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000004071 biological effect Effects 0.000 description 1
- 230000007321 biological mechanism Effects 0.000 description 1
- 239000000090 biomarker Substances 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000011990 functional testing Methods 0.000 description 1
- 238000009169 immunotherapy Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 210000003370 receptor cell Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention provides a model training method, device, equipment and storage medium for T cell subtype identification, and relates to the technical field of biology, wherein the method comprises the following steps: acquiring a preset data set for establishing a model; extracting sequencing data of T cells from the data set of the established model based on the expression quantity of Marker genes corresponding to the sequencing data of the data set of the established model; determining a first correspondence between sequencing data of the T cells and tumor-specific T cells under the condition that cells corresponding to the sequencing data of the T cells support annotation information identifying the tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying the tumor; and training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
Description
Technical Field
The invention relates to the technical field of biology, in particular to a model training method, device and equipment for T cell subtype identification and a storage medium.
Background
Tumor-specific T cells are the primary lymphocytes that recognize and kill tumors; in addition, the identification of T Cell Receptor (TCR) of tumor specific T cells can also provide clinical monitoring biomarkers for patient treatment, which are used for tracking the clinical curative effect of anti-tumor immune response and deeply researching the biological mechanism of tumor immune treatment.
Currently, the conventional method for identifying tumor-specific T cells is an ex vivo T cell functional test.
However, the above identification process has high requirements on laboratory platforms and long identification period; moreover, a large part of tumor-specific T cells may be missed, for example T cells that may miss endogenous viral antigens or eventually depleted T cells that cannot be activated in vitro, and the identification accuracy of tumor-specific T cells is low.
Disclosure of Invention
The invention provides a model training method, device, equipment and storage medium for T cell subtype identification, which are used for solving the problems of high requirements on a laboratory platform, long identification period and low identification accuracy in the identification of tumor specific T cells in the prior art.
The invention provides a model training method for T cell subtype identification, which comprises the following steps:
Acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
determining a first correspondence between sequencing data of the T cells and tumor-specific T cells, if cells corresponding to the sequencing data of the T cells support annotation information identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
and training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
According to the model training method for T cell subtype identification provided by the invention, the acquisition of a preset data set for establishing a model comprises the following steps:
acquiring a preset candidate data set;
filtering the sequencing data of the candidate data set to obtain the data set of the established model;
Wherein the filtering operation comprises the steps of:
removing sequencing data from the candidate data set, wherein the detected number of genes is smaller than a first threshold value;
removing sequencing data with the number of specific molecular tags UMI less than a second threshold from the candidate data set;
removing sequencing data with the ratio of the mitochondrial gene expression amount of UMI greater than a third threshold value from the candidate data set;
and removing sequencing data corresponding to the double cells from the candidate data set.
According to the model training method for T cell subtype identification provided by the invention, the expression quantity of the Marker gene corresponding to the sequencing data based on the established model dataset, the sequencing data of T cells are extracted from the established model dataset, and the method comprises the following steps:
extracting first candidate sequencing data from the modeling dataset based on the expression quantity of Marker genes corresponding to the sequencing data of the modeling dataset;
and removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells.
According to the model training method for identifying the subtype of the T cell, the T cell receptor gene and the tissue dissociation induction gene are removed from the hypervariable genes of the first candidate sequencing data, so that the sequencing data of the T cell are obtained, and the model training method comprises the following steps:
Removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain second candidate sequencing data;
and processing the second candidate sequencing data through a preset SCTransform algorithm to obtain sequencing data of the T cells.
According to the model training method for T cell subtype identification provided by the invention, the training of a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model comprises the following steps:
setting parameters of a first candidate model which is preset through an extreme gradient lifting algorithm to obtain a preliminary identification model; wherein the parameters include at least one of: maximum depth of tree, learning rate and sampling percentage;
taking a preset logistic regression model as a classification model;
and obtaining the model to be trained based on the preliminary identification model and the classification model.
According to the model training method for T cell subtype identification provided by the invention, the model to be trained is obtained based on the preliminary identification model and the classification model, and comprises the following steps:
Obtaining a second candidate model based on the preliminary identification model and the classification model;
and calculating target superparameters of the second candidate model through a preset 10-time cross validation algorithm, and optimizing the second candidate model based on the target superparameters to obtain the model to be trained.
The invention also provides a model training device for T cell subtype identification, which comprises the following components:
the acquisition module is used for acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
the extraction module is used for extracting sequencing data of T cells from the data set of the established model based on the expression quantity of the Marker gene corresponding to the sequencing data of the data set of the established model;
a determining module, configured to determine a first correspondence between sequencing data of the T cells and tumor-specific T cells, in the case where cells corresponding to the sequencing data of the T cells support annotation information for identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
And the training module is used for taking the first corresponding relation and the second corresponding relation as training data, training a preset model to be trained, and obtaining a T cell subtype identification model.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a model training method for T cell subtype identification as described in any one of the above when executing the program.
The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which when executed by a processor implements a model training method of T cell subtype identification as described in any of the above.
The invention also provides a computer program product comprising a computer program which when executed by a processor implements a model training method for T cell subtype identification as described in any one of the above.
Compared with the method, the device, the equipment and the storage medium for model training for identifying the T cell subtype, compared with the method, the device, the equipment and the storage medium for identifying the tumor-specific T cell by an in-vitro T cell function test in the related technology, the method, the device and the storage medium for model training for identifying the T cell subtype have the problems of high requirements on a laboratory platform, long identification period and low identification accuracy, and the T cell subtype identification model trained by the embodiment of the invention is used for identifying the tumor-specific T cell, is simple to operate and high in analysis efficiency, effectively reduces the identification period, and improves the identification accuracy of the tumor-specific T cell.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a model training method for T cell subtype identification provided by the invention;
FIG. 2 is a second flow chart of the model training method for T cell subtype identification provided by the present invention;
FIG. 3 is a schematic diagram of an example of an identification result in a model training method for T cell subtype identification provided by the invention;
FIG. 4 is a bar graph of the distribution ratio of tumor-specific T cells and other T cell clones in the model training method for T cell subtype identification provided by the present invention;
FIG. 5 is a graph of subject performance characteristics in a model training method for T cell subtype identification provided by the present invention;
FIG. 6 is a graph of accurate recall in a model training method for T cell subtype identification provided by the present invention;
FIG. 7 is a graph of validation in a model training method for T cell subtype identification provided by the present invention;
FIG. 8 is a schematic diagram of the structure of a model training device for T cell subtype identification provided by the invention;
fig. 9 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The model training method, device, equipment and storage medium for T cell subtype identification of the present invention are described below with reference to the accompanying drawings.
FIG. 1 is a schematic flow chart of a model training method for T cell subtype identification, which is shown in FIG. 1, and comprises steps 101 to 104; wherein:
step 101, acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
102, extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
step 103, determining a first corresponding relation between the sequencing data of the T cells and tumor specific T cells under the condition that cells corresponding to the sequencing data of the T cells support annotation information for identifying tumors; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
and 104, training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
In the related art, a conventional method for identifying tumor-specific T cells is an ex vivo T cell function test. This screening process requires a high laboratory platform, has a long detection period, and can miss a large portion of tumor-specific T cells, such as T cells that recognize endogenous viral antigens or eventually depleted T cells that cannot be activated in vitro.
The above-mentioned disadvantages greatly limit the clinical use of T cell adoptive therapies with T cell receptor engineering. In recent years, single cell sequencing technology has gradually revealed the biological properties of tumor-specific T cells in terms of their application in research, for example, these T cells exhibit a higher depletion index. This makes it possible to identify tumor-specific T cells using the single cell transcriptome characteristics of the T cells.
In the embodiment of the invention, firstly, a data set of a built model of single-cell sequencing data comprising tumor specific T cells is obtained; the modeled data set may be downloaded, for example, from a published public database.
Alternatively, tumor-specific T cells may include cd8+ T cells and cd4+ T cells.
After the data set of the established model is obtained, the expression quantity of the Marker gene can be counted and single cell subgroup classification can be carried out based on the expression quantity of the Marker gene corresponding to the sequencing data in the data set of the established model so as to extract the sequencing data of the T cells from the data set of the established model.
After the sequencing data of the T cells are obtained, each T cell can be divided into tumor specific T cells and non-tumor specific T cells according to whether the T cells support the annotation information for identifying the tumor or not, so that the embodiment of the invention judges whether the cells corresponding to the sequencing data of the T cells support the annotation information for identifying the tumor or not, so that the cells corresponding to the sequencing data of the T cells are classified into the tumor specific T cells or the non-tumor specific T cells, a first corresponding relation between the sequencing data of the tumor specific T cells and the T cells corresponding to the tumor specific T cells and a second corresponding relation between the non-tumor specific T cells and the sequencing data of the T cells corresponding to the non-tumor specific T cells are determined, the first corresponding relation and the second corresponding relation are further used as training data, and a T cell subtype identification model is obtained through training a training mode of supervised learning.
Alternatively, the first correspondence and the second correspondence obtained above may be used as an input data set, where a training set of 70% of the data amount is included, and the remaining 30% of the data is used as a verification set.
Optionally, performance evaluation may also be performed on the trained T cell subtype identification model, for example, calculating an accuracy rate, recall rate, F value of identification, and a subject operating characteristic curve (receiver operating characteristic curve, ROC)/AUC (Area Under Curve) curve, wherein the AUC curve is used to characterize an area enclosed by the axis under the ROC curve.
Alternatively, after a T cell subtype identification model is obtained, tumor-specific T cells, such as cd8+ T lymphocytes, may be identified by the T cell subtype identification model.
In the model training method for T cell subtype identification provided by the embodiment of the invention, a data set of a built model including single-cell sequencing data of tumor specific T cells is firstly obtained, sequencing data of T cells is extracted from the data set of the built model based on Marker gene expression quantity corresponding to the sequencing data in the data set of the built model, then whether the cells corresponding to the sequencing data of the T cells support annotation information for identifying tumors is judged, so that the cells corresponding to the sequencing data of the T cells are classified into tumor specific T cells or non-tumor specific T cells, a first corresponding relation between the tumor specific T cells and the sequencing data of the T cells corresponding to the tumor specific T cells and a second corresponding relation between the non-tumor specific T cells and the sequencing data of the T cells corresponding to the non-tumor specific T cells are determined, and further the first corresponding relation and the second corresponding relation are used as training data, and a model to be trained in a training mode of supervised training learning, so that the T cell subtype identification model is obtained. Compared with the method for identifying the tumor-specific T cells by in-vitro T cell function test in the related art, the method has the advantages that the requirements on a laboratory platform are high, the identification period is long, and the identification accuracy is low, the T cell subtype identification model trained by the embodiment of the invention is used for identifying the tumor-specific T cells, the operation is simple, the analysis efficiency is high, the identification period is effectively reduced, and the identification accuracy of the tumor-specific T cells is improved.
Optionally, the implementation manner of obtaining the preset data set of the modeling may include:
acquiring a preset candidate data set;
filtering the sequencing data of the candidate data set to obtain the data set of the established model;
wherein the filtering operation comprises the steps of:
1) Removing sequencing data from the candidate data set, wherein the detected number of genes is smaller than a first threshold value;
in particular, for example, when a gene (gene) identified in a single cell is detected in less than 3 cells, the sequencing data corresponding to that cell may be removed from the candidate dataset.
2) Removing from the candidate dataset sequencing data for which the number of specific molecular tags (Unique molecularidentifier, UMI) is less than a second threshold;
specifically, in the event of an abnormality in the sequencing Counts data, the number of UMIs may be less than a second threshold, e.g., the total number of UMIs in a single cell is less than 200, at which time the sequencing data may be removed from the candidate dataset.
3) Removing sequencing data with the ratio of the mitochondrial gene expression amount of UMI greater than a third threshold value from the candidate data set;
specifically, in the case where the mitochondrial genome ratio is too high, the mitochondrial gene expression amount ratio of UMI may be greater than a third threshold, for example, the mitochondrial gene expression amount ratio of UMI in a single cell is greater than 20%, at which time the sequencing data may be removed from the candidate dataset.
4) And removing sequencing data corresponding to the double cells from the candidate data set.
Specifically, based on a preset double Finder algorithm, sequencing data corresponding to double cells in the sequencing data of the candidate data set can be analyzed, and the sequencing data corresponding to the double cells can be removed.
In the embodiment of the invention, the quality control is carried out on the sequencing data of the candidate data set so as to filter the sequencing data corresponding to the low-quality cells to obtain the data set of the established model, so that the data quality of the training data can be effectively improved, and the identification accuracy of the T cell subtype identification model obtained by training is further improved.
Optionally, the implementation manner of extracting the sequencing data of the T cells from the modeling dataset based on the expression level of the Marker gene corresponding to the sequencing data of the modeling dataset may include:
extracting first candidate sequencing data from the modeling dataset based on the expression quantity of Marker genes corresponding to the sequencing data of the modeling dataset;
and removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells.
Specifically, after extracting the first candidate sequencing data from the data set based on the expression level of the Marker gene corresponding to the sequencing data of the data set of the model, the hypervariable genes in the first candidate sequencing data can be filtered, and specifically, the T cell receptor genes and the tissue dissociation induction genes (or referred to as tissue dissociation induction genes) are removed, so as to obtain sequencing data of T cells.
The hypervariable gene means: the gene with the largest expression difference is selected when comparing cells, and the identification accuracy of tumor specific T cells is improved based on the hypervariable gene, thereby being beneficial to identifying different types of cells.
Optionally, the implementation of removing the T cell receptor gene and the tissue dissociation inducing gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells may include:
removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain second candidate sequencing data;
and processing the second candidate sequencing data through a preset SCTransform algorithm to obtain sequencing data of the T cells.
Specifically, after filtering the hypervariable genes of the first candidate sequencing data, the sequencing data can be processed through a preset SCTransform algorithm to obtain sequencing data of the T cells; the SCTransform algorithm can scale and reduce the dimension of the sequencing data, realize the uniformity of the expression quantity, remove the influence of the sequencing depth, effectively improve the data quality of the training data, and further improve the identification accuracy of the T cell subtype identification model obtained by training.
Alternatively, the sequencing data may be processed using the SCTransform algorithm of single cell analysis software setup to achieve expression level uniformity, removing sequencing depth effects.
Optionally, the training the preset model to be trained by using the first correspondence and the second correspondence as training data, and the implementation manner of obtaining the T cell subtype identification model may include:
setting parameters of a first candidate model which is preset through an extreme gradient lifting algorithm to obtain a preliminary identification model; wherein the parameters include at least one of: maximum depth of tree, learning rate and sampling percentage;
taking a preset logistic regression model as a classification model;
And obtaining the model to be trained based on the preliminary identification model and the classification model.
Specifically, an extreme gradient lifting algorithm can be used for setting parameters of the first candidate model to obtain a preliminary identification model; the set parameters may include at least one of: maximum depth of tree, learning rate and sampling percentage; selecting a logistic regression model as a classification model; and obtaining a model to be trained based on the preliminary identification model and the classification model. The embodiment of the invention provides a specific implementation mode for acquiring a model to be trained.
Optionally, the implementation manner of obtaining the model to be trained based on the preliminary identification model and the classification model may include:
obtaining a second candidate model based on the preliminary identification model and the classification model;
and calculating target superparameters of the second candidate model through a preset 10-time cross validation algorithm, and optimizing the second candidate model based on the target superparameters to obtain the model to be trained.
Specifically, according to a 10-time cross validation algorithm, the optimal super parameter of the model is calculated as a target super parameter, the optimized model is obtained and used as a T cell subtype identification model, and the identification accuracy of the T cell subtype identification model can be effectively improved.
The following illustrates a model training method for T cell subtype identification provided by the examples of the present invention.
The detection sample used is easy to obtain based on single cell transcriptome data of infiltrating lymphocytes in tumor surgical excision samples or puncture samples of cancer patients. Compared with the conventional experimental flow for identifying the tumor-specific T cells, the T cell subtype identification model for identifying the tumor-specific T cells, which is established by the invention, greatly shortens the identification period and the identification cost of the tumor-specific T cells, and the analysis result shows that 99% of tumor-infiltrating CD8+ T lymphocytes to be detected can be correctly classified. The identification method is simple to operate and high in analysis efficiency; the detection result is combined with single-cell immune group sequencing data, and T cell receptor sequence information of tumor specific T cells can be directly obtained, so that a foundation is laid for the subsequent treatment of engineering T cell receptor cells. Therefore, the T cell subtype identification model trained by the T cell subtype identification model training method provided by the invention can be used as an effective screening tool for adoptive cell therapy, so that the T cell subtype identification model can be widely applied to the field of tumor immunotherapy.
1. The model training method for T cell subtype identification comprises the following steps:
S1, acquiring a single-cell sequencing data set (a data set for establishing a model) containing new antigen specific CD8+ T cells (tumor specific T cells), wherein the data set is downloaded from a published public database;
s2, quality control is carried out on single cell transcriptome sequencing data in the data set for establishing the model: according to the detected number of genes in each single cell, the sequencing count number and the proportion of mitochondrial genome, removing single cell sequencing data with excessive or insufficient expression of the number of genes, abnormal sequencing count data and high proportion of mitochondrial genome in single cell sequencing data, and filtering out single cell sequencing data of double cells;
specifically, single cell transcriptome sequencing data was quality controlled, and low quality cell filtration treatments were performed according to the following criteria:
1) Genes identified in single cells were detected in less than 3 cells;
2) The total number of UMIs in a single cell is less than 200;
3) The ratio of the expression quantity of the mitochondrial gene of UMI in single cells is more than 20%;
4) According to the analysis result of the double Finder, the double cells were removed.
S3, based on single-cell transcriptome sequencing data after quality control and filtration, counting the expression quantity of Marker genes, classifying single-cell subsets, and extracting single-cell transcriptome sequencing data of tumor-infiltrated CD8+ T cells; filtering the hypervariable genes, and filtering out T cell receptor genes and induction expression genes (tissue dissociation induction genes) in the tissue dissociation process;
S4, scaling single-cell transcriptome sequencing data of tumor-infiltrated CD8+ T cells, specifically using a single-cell analysis software SEurat SCTransform algorithm to perform expression quantity homogenization and remove sequencing depth influence; the hypervariable gene scaling data in which the residuals are top 1500 may be extracted for subsequent use;
alternatively, ranking may be based on the relationship of the expression mean and variance.
S5, dividing each CD8+ T cell into tumor-specific T cells and non-tumor-specific T cells according to annotation information of whether tumors are identified, and integrating the hypervariable gene scaling data in S4 to serve as an input data set of a machine learning model (model to be trained);
specifically, the input data set may be divided into training sets containing 70% data amount, and the remaining 30% data is used as verification set;
s6, setting parameters of a preliminary identification model, including maximum depth of a tree, learning rate and sampling percentage, by using an extreme gradient lifting algorithm, and selecting a logistic regression model as a classification model;
s7, calculating optimal super parameters of the model according to a 10-time cross validation technology to obtain an optimized new antigen specific CD8+ T cell subtype identification model;
and S8, performing performance evaluation on the established machine learning model (T cell subtype identification model) including calculating accuracy, recall, F value and ROC/AUC curve.
2. FIG. 2 is a second flow chart of the model training method for T cell subtype identification provided by the invention, as shown in FIG. 2, comprising the following steps:
1. tumor-specific T cell data collection;
2. quality control and expression quantity quantification of sequencing data;
3. annotating cd8+ T cells;
4. cd8+ T cell expression matrix data washing;
5. and (5) establishing a machine learning model.
Specifically, firstly, carrying out data filtering, comparison, quantification and identification on collected original data (a data set for establishing a model) to obtain a gene expression matrix of CD8+ T cells (single-cell transcriptome sequencing data of CD8+ T cells), then carrying out further data filtering, standardization and scaling, and finally, identifying new antigen specific CD8+ T cells through a machine learning algorithm (a T cell subtype identification model) of tumor specific T cells; the specific steps of the analysis method are as follows:
(1) And (3) data quality control: selecting single-cell sequencing data of tumor tissues, and performing quality control filtering on the single-cell sequencing data by using the SEURat software;
(2) Cd8+ T cell identification: based on single-cell sequencing data after quality control filtration, counting the expression quantity (CD 3D, CD3G, CD8A, CD8B and CD 45) of Marker genes, identifying CD8+ T cells, and then extracting single-cell transcriptome sequencing data of the CD8+ T cells;
(3) Single cell transcriptome sequencing data scaling and filtering for cd8+ T cells: carrying out expression quantity homogenization by using a single cell analysis software SEurat SCTransform algorithm, and removing the influence of sequencing depth; filtering the hypervariable genes, removing T cell receptor genes and tissue dissociation induction genes, and extracting the scaling data of the top 1500 hypervariable genes;
(4) The hypervariable gene scaling data in the step (3) is used as an input data set of a machine learning model (model to be trained) for identifying tumor-specific T cells, and identification is carried out, and fig. 3 is a schematic diagram of an example of identification results in a model training method for identifying T cell subtypes, which is provided by the invention, as shown in fig. 3, by unifying manifold approximation and projection (uniform manifold approximation and projection, UMAP) diagrams, the distribution of new antigen-specific T cells (namely tumor-specific T cells) in infiltrating CD8+ T cells of a tumor patient is shown;
(5) According to the TCR sequence information of the identified novel antigen specific CD8+ T cells, the amplification condition is judged, and the result is shown in FIG. 4, and FIG. 4 is a distribution proportion bar graph of tumor specific T cells and other T cell clones in the model training method for T cell subtype identification provided by the invention.
In addition, fig. 5 is a graph of the operating characteristics of a subject in the model training method for T cell subtype identification provided by the present invention, as shown in fig. 5, which is used to demonstrate the specificity and sensitivity of the model.
FIG. 6 is a graph of accurate recall curves in the model training method for T cell subtype identification provided by the invention, as shown in FIG. 6, for demonstrating recall rate and accuracy of the model.
FIG. 7 is a graph of validation in a model training method for T cell subtype identification provided by the present invention, as shown in FIG. 7, for showing that the model is not over-fitted or under-fitted.
The following describes a model training device for T cell subtype identification provided by the present invention, and the model training device for T cell subtype identification described below and the model training method for T cell subtype identification described above can be referred to correspondingly.
Fig. 8 is a schematic structural diagram of a model training device for T cell subtype identification provided by the present invention, and as shown in fig. 8, a model training device 800 for T cell subtype identification includes:
an obtaining module 801, configured to obtain a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
An extraction module 802, configured to extract sequencing data of T cells from the modeling dataset based on an expression level of a Marker gene corresponding to the sequencing data of the modeling dataset;
a determining module 803, configured to determine a first correspondence between sequencing data of the T cells and tumor-specific T cells, in a case where cells corresponding to the sequencing data of the T cells support annotation information for identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
the training module 804 is configured to train a preset model to be trained by using the first correspondence and the second correspondence as training data, so as to obtain a T cell subtype identification model.
In the model training device for T cell subtype identification provided by the embodiment of the invention, firstly, an acquisition module acquires a data set of a built model of single-cell sequencing data comprising tumor specific T cells, an extraction module extracts sequencing data of the T cells from the data set of the built model based on Marker gene expression quantity corresponding to the sequencing data in the data set of the built model, a determination module judges whether cells corresponding to the sequencing data of the T cells support annotation information for identifying tumors, so as to classify the cells corresponding to the sequencing data of the T cells into tumor specific T cells or non-tumor specific T cells, determine a first corresponding relation between the tumor specific T cells and the sequencing data of the T cells corresponding to the tumor specific T cells, and a second corresponding relation between the non-tumor specific T cells and the sequencing data of the T cells corresponding to the non-tumor specific T cells, and further, a training module trains the model to be trained by taking the first corresponding relation and the second corresponding relation as training data, and a training mode with supervised learning, thereby obtaining a T cell subtype identification model. Compared with the method for identifying the tumor-specific T cells by in-vitro T cell function test in the related art, the method has the advantages that the requirements on a laboratory platform are high, the identification period is long, and the identification accuracy is low, the T cell subtype identification model trained by the embodiment of the invention is used for identifying the tumor-specific T cells, the operation is simple, the analysis efficiency is high, the identification period is effectively reduced, and the identification accuracy of the tumor-specific T cells is improved.
Optionally, the obtaining module 801 is specifically configured to:
acquiring a preset candidate data set;
filtering the sequencing data of the candidate data set to obtain the data set of the established model;
wherein the filtering operation comprises the steps of:
removing sequencing data from the candidate data set, wherein the detected number of genes is smaller than a first threshold value;
removing sequencing data with the number of specific molecular tags UMI less than a second threshold from the candidate data set;
removing sequencing data with the ratio of the mitochondrial gene expression amount of UMI greater than a third threshold value from the candidate data set;
and removing sequencing data corresponding to the double cells from the candidate data set.
Optionally, the extracting module 802 is specifically configured to:
extracting first candidate sequencing data from the modeling dataset based on the expression quantity of Marker genes corresponding to the sequencing data of the modeling dataset;
and removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells.
Optionally, the extracting module 802 is further specifically configured to:
removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain second candidate sequencing data;
And processing the second candidate sequencing data through a preset SCTransform algorithm to obtain sequencing data of the T cells.
Optionally, the training module 804 is specifically configured to:
setting parameters of a first candidate model which is preset through an extreme gradient lifting algorithm to obtain a preliminary identification model; wherein the parameters include at least one of: maximum depth of tree, learning rate and sampling percentage;
taking a preset logistic regression model as a classification model;
and obtaining the model to be trained based on the preliminary identification model and the classification model.
Optionally, the training module 804 is further specifically configured to:
obtaining a second candidate model based on the preliminary identification model and the classification model;
and calculating target superparameters of the second candidate model through a preset 10-time cross validation algorithm, and optimizing the second candidate model based on the target superparameters to obtain the model to be trained.
Fig. 9 is a schematic structural diagram of an electronic device provided by the present invention, and as shown in fig. 9, the electronic device may include: processor 910, communication interface (Communications Interface), memory 930, and communication bus 940, wherein processor 910, communication interface 920, and memory 930 communicate with each other via communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform a model training method for T cell subtype identification, the method comprising:
Acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
determining a first correspondence between sequencing data of the T cells and tumor-specific T cells, if cells corresponding to the sequencing data of the T cells support annotation information identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
and training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
Further, the logic instructions in the memory 930 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing a model training method for T cell subtype identification provided by the methods described above, the method comprising:
acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
determining a first correspondence between sequencing data of the T cells and tumor-specific T cells, if cells corresponding to the sequencing data of the T cells support annotation information identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
and training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform a model training method for T cell subtype identification provided by the methods above, the method comprising:
acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
determining a first correspondence between sequencing data of the T cells and tumor-specific T cells, if cells corresponding to the sequencing data of the T cells support annotation information identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
and training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (8)
1. A model training method for T cell subtype identification, comprising:
acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
extracting sequencing data of T cells from the data set of the established model based on the expression quantity of a Marker gene corresponding to the sequencing data of the data set of the established model;
determining a first correspondence between sequencing data of the T cells and tumor-specific T cells, if cells corresponding to the sequencing data of the T cells support annotation information identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
Training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model;
the method for extracting the sequencing data of the T cells from the data set of the established model based on the expression quantity of the Marker gene corresponding to the sequencing data of the data set of the established model comprises the following steps:
extracting first candidate sequencing data from the modeling dataset based on the expression quantity of Marker genes corresponding to the sequencing data of the modeling dataset;
and removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells.
2. The model training method for T cell subtype identification of claim 1, wherein the acquiring a pre-set model-built dataset comprises:
acquiring a preset candidate data set;
filtering the sequencing data of the candidate data set to obtain the data set of the established model;
wherein the filtering operation comprises the steps of:
removing sequencing data from the candidate data set, wherein the detected number of genes is smaller than a first threshold value;
Removing sequencing data with the number of specific molecular tags UMI less than a second threshold from the candidate data set;
removing sequencing data with the ratio of the mitochondrial gene expression amount of UMI greater than a third threshold value from the candidate data set;
and removing sequencing data corresponding to the double cells from the candidate data set.
3. The model training method for T cell subtype identification of claim 1, wherein the removing of T cell receptor genes and tissue dissociation-inducing genes from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells comprises:
removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain second candidate sequencing data;
and processing the second candidate sequencing data through a preset SCTransform algorithm to obtain sequencing data of the T cells.
4. The method for training a model for T cell subtype identification according to claim 1, wherein training a preset model to be trained by using the first correspondence and the second correspondence as training data to obtain a T cell subtype identification model comprises:
Setting parameters of a first candidate model which is preset through an extreme gradient lifting algorithm to obtain a preliminary identification model; wherein the parameters include at least one of: maximum depth of tree, learning rate and sampling percentage;
taking a preset logistic regression model as a classification model;
and obtaining the model to be trained based on the preliminary identification model and the classification model.
5. The method for training a model for T cell subtype identification according to claim 4, wherein the obtaining the model to be trained based on the preliminary identification model and the classification model comprises:
obtaining a second candidate model based on the preliminary identification model and the classification model;
and calculating target superparameters of the second candidate model through a preset 10-time cross validation algorithm, and optimizing the second candidate model based on the target superparameters to obtain the model to be trained.
6. A model training device for T cell subtype identification, comprising:
the acquisition module is used for acquiring a preset data set for establishing a model; wherein the modeled dataset comprises at least single cell sequencing data of tumor specific T cells;
The extraction module is used for extracting sequencing data of T cells from the data set of the established model based on the expression quantity of the Marker gene corresponding to the sequencing data of the data set of the established model;
a determining module, configured to determine a first correspondence between sequencing data of the T cells and tumor-specific T cells, in the case where cells corresponding to the sequencing data of the T cells support annotation information for identifying a tumor; determining a second correspondence between the sequencing data of the T cells and non-tumor specific T cells if the cells corresponding to the sequencing data of the T cells do not support annotation information identifying a tumor;
the training module is used for training a preset model to be trained by taking the first corresponding relation and the second corresponding relation as training data to obtain a T cell subtype identification model;
the extraction module is specifically configured to:
extracting first candidate sequencing data from the modeling dataset based on the expression quantity of Marker genes corresponding to the sequencing data of the modeling dataset;
and removing the T cell receptor gene and the tissue dissociation induction gene from the hypervariable genes of the first candidate sequencing data to obtain the sequencing data of the T cells.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the model training method of T cell subtype identification of any one of claims 1 to 5.
8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the model training method of T cell subtype identification of any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310708381.8A CN116469473B (en) | 2023-06-15 | 2023-06-15 | Model training method, device, equipment and storage medium for T cell subtype identification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310708381.8A CN116469473B (en) | 2023-06-15 | 2023-06-15 | Model training method, device, equipment and storage medium for T cell subtype identification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116469473A CN116469473A (en) | 2023-07-21 |
CN116469473B true CN116469473B (en) | 2023-09-22 |
Family
ID=87181055
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310708381.8A Active CN116469473B (en) | 2023-06-15 | 2023-06-15 | Model training method, device, equipment and storage medium for T cell subtype identification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116469473B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104195227A (en) * | 2008-11-07 | 2014-12-10 | 赛昆塔公司 | Methods of monitoring conditions by sequence analysis |
CN111276252A (en) * | 2020-01-15 | 2020-06-12 | 北京吉因加科技有限公司 | Construction method and device of tumor benign and malignant identification model |
CN111315390A (en) * | 2017-09-05 | 2020-06-19 | 磨石肿瘤生物技术公司 | Novel antigen identification for T cell therapy |
CN113160887A (en) * | 2021-04-23 | 2021-07-23 | 哈尔滨工业大学 | Screening method of tumor neoantigen fused with single cell TCR sequencing data |
CN115798723A (en) * | 2023-01-18 | 2023-03-14 | 北京泽桥医疗科技股份有限公司 | Construction method of cancer recurrence risk prediction model |
WO2023037164A2 (en) * | 2021-09-10 | 2023-03-16 | Immunoscape Pte Ltd | Systems and methods for the identification of target-specific t cells and their receptor sequences using machine learning |
CN115896242A (en) * | 2022-11-25 | 2023-04-04 | 绵溢(河北雄安)生物科技有限公司 | Intelligent cancer screening model and method based on peripheral blood immune characteristics |
-
2023
- 2023-06-15 CN CN202310708381.8A patent/CN116469473B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104195227A (en) * | 2008-11-07 | 2014-12-10 | 赛昆塔公司 | Methods of monitoring conditions by sequence analysis |
CN111315390A (en) * | 2017-09-05 | 2020-06-19 | 磨石肿瘤生物技术公司 | Novel antigen identification for T cell therapy |
CN111276252A (en) * | 2020-01-15 | 2020-06-12 | 北京吉因加科技有限公司 | Construction method and device of tumor benign and malignant identification model |
CN113160887A (en) * | 2021-04-23 | 2021-07-23 | 哈尔滨工业大学 | Screening method of tumor neoantigen fused with single cell TCR sequencing data |
WO2023037164A2 (en) * | 2021-09-10 | 2023-03-16 | Immunoscape Pte Ltd | Systems and methods for the identification of target-specific t cells and their receptor sequences using machine learning |
CN115896242A (en) * | 2022-11-25 | 2023-04-04 | 绵溢(河北雄安)生物科技有限公司 | Intelligent cancer screening model and method based on peripheral blood immune characteristics |
CN115798723A (en) * | 2023-01-18 | 2023-03-14 | 北京泽桥医疗科技股份有限公司 | Construction method of cancer recurrence risk prediction model |
Also Published As
Publication number | Publication date |
---|---|
CN116469473A (en) | 2023-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20030017481A1 (en) | Methods for classifying samples and ascertaining previously unknown classes | |
CN111009286A (en) | Method and apparatus for microbiological analysis of host samples | |
CN108319813A (en) | Circulating tumor DNA copies the detection method and device of number variation | |
CN108021788B (en) | Method and device for extracting biomarkers based on deep sequencing data of cell free DNA | |
CN112086129A (en) | Method and system for predicting cfDNA of tumor tissue | |
CN112289376B (en) | Method and device for detecting somatic cell mutation | |
CN107208131A (en) | Method for lung cancer parting | |
CN110910950A (en) | Flow method for combined analysis of single-cell scRNA-seq and scATAC-seq | |
CN107849613A (en) | Method for lung cancer parting | |
CN107463797B (en) | Biological information analysis method and device for high-throughput sequencing, equipment and storage medium | |
CN111584064A (en) | Colorectal cancer metastasis prediction system and application method thereof | |
CN113096737B (en) | Method and system for automatically analyzing pathogen type | |
CN116469473B (en) | Model training method, device, equipment and storage medium for T cell subtype identification | |
CN116385441B (en) | Method and system for risk stratification of oligodendroglioma based on MRI | |
CN116580768B (en) | Tumor tiny residual focus detection method based on customized strategy | |
CN113862351A (en) | Kit and method for identifying extracellular RNA biomarkers in body fluid sample | |
CN109215736B (en) | High-throughput detection method and application of enterovirus group | |
Liu et al. | TSDLPP: a novel two-stage deep learning framework for prognosis prediction based on whole slide histopathological images | |
KR20190114351A (en) | Methods for Identifying Microdeletion or Microamplification of Fetal Chromosomes Using Non-invasive Prenatal testing | |
CN110619926B (en) | Analysis method and analysis system for recognizing all RNA (ribonucleic acid) cleavage sites | |
CN113918786A (en) | Intelligent cell subtype judgment method | |
CN114037662A (en) | Circulating tumor cell identification system based on random forest algorithm | |
CN112382341A (en) | Method for identifying biomarkers related to esophageal squamous carcinoma prognosis | |
CN104450922A (en) | Method for performing chromosome aneuploidy detection based on single cell amplification by using chromosome specific sites | |
Jakubiak et al. | The spatial landscape of glial pathology and T-cell response in Parkinson’s disease substantia nigra |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |