CN113053458B - Method and device for predicting tumor neoantigen load - Google Patents

Method and device for predicting tumor neoantigen load Download PDF

Info

Publication number
CN113053458B
CN113053458B CN202110067368.XA CN202110067368A CN113053458B CN 113053458 B CN113053458 B CN 113053458B CN 202110067368 A CN202110067368 A CN 202110067368A CN 113053458 B CN113053458 B CN 113053458B
Authority
CN
China
Prior art keywords
tumor
mutation
sites
somatic
neoantigen
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110067368.XA
Other languages
Chinese (zh)
Other versions
CN113053458A (en
Inventor
高志博
金皓玄
王佳茜
苏小凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yukang Medical Laboratory
Original Assignee
Shenzhen Yukang Medical Laboratory
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yukang Medical Laboratory filed Critical Shenzhen Yukang Medical Laboratory
Priority to CN202110067368.XA priority Critical patent/CN113053458B/en
Publication of CN113053458A publication Critical patent/CN113053458A/en
Application granted granted Critical
Publication of CN113053458B publication Critical patent/CN113053458B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for predicting tumor neoantigen load, and provides a specific method for predicting tumor neoantigen load degree based on a tumor immunity editing theory. The method can accurately quantify the tumor neoantigen load, and can be used as a biomarker to predict the curative effect of an immune checkpoint inhibitor anti-PD- (L) 1.

Description

Method and device for predicting tumor neoantigen load
Technical Field
The invention belongs to the technical field of bioinformatics, relates to a tumor neoantigen load prediction method and device, and particularly relates to a tumor neoantigen load prediction method and device based on a tumor immunity editing theory.
Background
Antitumor targeted drugs and immune checkpoint inhibitors are effective means for treating cancers, but currently, it is accepted that evaluation of potential biomarkers such as Tumor Mutation Burden (TMB), microsatellite instability (MSI) and the like through the efficacy of immune checkpoint inhibitor anti-PD- (L) 1 cannot completely screen patients who benefit from immune checkpoint inhibitors.
Tumor neoantigens (neoantigens) are the binding sites of genome and tumor immunotherapy, and the research of various biomarkers sensitive to the curative effect of immunotherapy (such as TMB, MSI, HLA (human leukocyte antigen) and the like) can be finally attributed to the search of high-quality tumor neoantigens. The tumor immunity treatment is based on the immunogenicity of tumor, and the tumor neoantigen is a protein specifically expressed in tumor cells only, can be recognized by T cells of an immune system, is an ideal target for the tumor immunity treatment, and has the advantages of higher tumor neoantigen quantity, stronger immunogenicity, higher tumor neoantigen load and better immunity treatment effect.
The occurrence and development of cancer cells in the body are dynamic processes that interact with the immune system, which has not only the ability to clear tumor cells, but also the effect of promoting tumor growth. At present, the academic world has a theory on the occurrence and development of tumors from the viewpoint of immunization: tumor immune editing theory. Tumor immune editing is a process by which the adaptive and innate immune system controls tumor growth and shapes tumor immunogenicity, which includes three stages: clearance, equilibration and escape. Clearance, or tumor immune monitoring, refers to the process of adaptive and innate immune branching to identify and destroy newly formed cancer cells. Equilibration is the longest phase, including the state of equilibrium between preventing tumor growth and modeling the immunogenicity of small numbers of tumor cells. In the escape phase, less immunogenic tumor cells gradually grow and spread into visible tumors. The extent of tumor immune editing can be determined by studying the relationship of tumor neoantigens to tumor mutations.
However, there is no method for accurately quantifying tumor neoantigen load and calculating tumor neoantigen load based on tumor immune editing theory, and predicting the therapeutic effect of immune checkpoint inhibitor anti-PD- (L) 1 as a biomarker. Tumor neoantigen burden calculated by other similar methods could not or only be verified in a small number of data sets to be able to assess the efficacy of the immune checkpoint inhibitor anti-PD- (L) 1.
Disclosure of Invention
Therefore, the technical problems to be solved by the invention are as follows: the immune checkpoint inhibitor anti-PD- (L) 1 efficacy evaluation lacks a biomarker accurate enough, and a method for evaluating the efficacy of the immune checkpoint inhibitor PD-1 in a large range of multiple cancer species by applying tumor neoantigen load optimized based on a tumor immune editing theory does not exist, so that a prediction method of tumor neoantigen load based on the tumor immune editing theory is provided.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the first aspect of the present invention provides a method for predicting tumor neoantigen load, comprising the steps of:
acquiring sequencing data of the same tumor tissue sample and a control sample;
respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites;
performing cluster analysis on the somatic cell mutation sites to obtain tumor clones containing different mutation sites;
predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments to HLA according to the somatic mutation sites;
calculating the ratio of the neoantigen peptide fragment to the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide fragment sequence to obtain an immune editing score;
tumor neoantigen load was calculated.
In a second aspect, the present invention provides a tumor neoantigen predicting apparatus, comprising:
the data acquisition module is used for acquiring sequencing data of the same tumor tissue sample and the control sample, and respectively carrying out somatic variation detection on the tumor tissue sample and the control sample to obtain somatic variation sites;
the detection module is used for respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites;
the clustering module is used for clustering the somatic mutation sites to obtain tumor clones containing different mutation sites;
a prediction module for predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments to HLA according to the somatic mutation sites;
the immune editing calculation module is used for calculating the proportion of the neoantigen peptide fragment and the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide fragment sequence to obtain an immune editing score;
a tumor neoantigen calculation module for calculating tumor neoantigen load.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention provides a specific method for predicting tumor neoantigen load degree based on a tumor immunity editing theory, which fully utilizes each mutation site of a tumor tissue sample and potential tumor neoantigen to estimate a specific numerical value corresponding to the tumor neoantigen load predicted based on the tumor immunity editing theory, and provides a numerically-referenced basis for the follow-up prediction of the immune treatment effect. The method can accurately quantify the tumor neoantigen load, and can be used as a biomarker to predict the curative effect of an immune checkpoint inhibitor anti-PD- (L) 1.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings, in which
FIG. 1 is a flow chart of a prediction method according to a first embodiment of the present invention;
FIG. 2 is a graph showing the result of predicting the therapeutic effect of immunotherapy based on the IOTNL score of a lung cancer tissue sample according to a fifth embodiment of the invention;
fig. 3 is a graph showing the result of predicting the therapeutic effect of immunotherapy based on the iotatnl score of the tissue sample of the nasopharyngeal carcinoma queue according to the sixth embodiment of the invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning.
It must be noted that, as used in this specification and the appended claims, the singular forms "a," "an," and "the" include plural referents unless the content clearly dictates otherwise.
As used herein, the term "patient" preferably refers to a human, but other mammals are also contemplated. The terms "organism," "individual," "subject," or "patient" are used interchangeably as synonyms.
The invention is applicable to the prognosis of all cancers. The cancer may be a respiratory cancer, or a subtype and stage thereof (phase), the respiratory system including the respiratory tract (nasal, pharyngeal, laryngeal, tracheal, bronchial) and the lung, and in some embodiments, the cancer includes, but is not limited to, lung cancer, nasopharyngeal cancer, laryngeal cancer, pharyngeal cancer, tracheal cancer, and the like. In some embodiments, the cancer may also include, but is not limited to, breast cancer, lung cancer, prostate cancer, colorectal cancer, brain cancer, esophageal cancer, gastric cancer, bladder cancer, pancreatic cancer, cervical cancer, head and neck cancer, ovarian cancer, melanoma, and multi-drug resistant cancer; or subtypes and phases thereof (phase).
In some embodiments, the subject may also be a solid tumor patient, including but not limited to a lung cancer, nasopharyngeal cancer, or melanoma patient.
As used herein, the term "tumor" refers to all tumor cell growth and proliferation, both malignant and benign, as well as all pre-cancerous cells and tissues and cancerous cells and tissues. Such cancers include, but are not limited to, respiratory cancers including, but not limited to, lung cancer, nasopharyngeal cancer, laryngeal cancer, pharyngeal cancer, tracheal cancer, and the like; the cancer may also be other lymphoproliferative cancers such as precursor B lymphoblastic leukemia/lymphoblastic lymphoma, follicular B cell non-hodgkin's lymphoma, hodgkin's lymphoma precursor T cell lymphoblastic leukemia/lymphoblastic lymphoma, immature T cell neoplasm, peripheral post-thymic T cell neoplasm, T cell prolymphocytic leukemia, peripheral T cell lymphoma, undefined anaplastic large cell lymphoma, adult T cell leukemia/lymphoma, chronic lymphocytic leukemia, mantle cell lymphoma, follicular lymphoma, marginal zone lymphoma, hairy cell leukemia, diffuse large B cell lymphoma, burkitt's lymphoma, lymphoplasmacytic lymphoma, precursor T lymphoblastic leukemia/lymphoblastic lymphoma, T cell prolymphocytic leukemia, angioimmunoblastic lymphoma or nodular lymphoblastic-predominant hodgkin's lymphoma.
As used herein, variant allelic sequencing depth V i Also known as mutation depth, variant allele frequency VAF i May also be referred to as mutation frequency; somatic mutation site var i May be referred to as variant var i Candidate peptide neoAg i Can be called as tumor neoantigen peptide neoAg i
To solve the problem of lack of sufficiently accurate biomarkers for immune checkpoint inhibitor anti-PD- (L) 1 efficacy assessment, a first embodiment of the present application provides a method for predicting tumor neoantigen burden (iotatnl), as shown in fig. 1, comprising the steps of:
s1, acquiring sequencing data of the same tumor tissue sample and a control sample.
Specifically, sequencing data of tumor tissue samples within the whole exon range, including somatic mutation, depth and mutation frequency information, is detected by mutation detection software, which may employ any of VarScan, muTect, such as VarScan (v 2.4.1), mutact (v 4.0.12.0), and the like. In this embodiment, the tumor sample and the control sample of the unified subject need to be detected simultaneously. The subject may be, for example, an individual who has been diagnosed as a tumor patient by clinical means. Tumor specimens generally refer to specimens derived from the affected part or tissue of a tumor patient, such as lung tissue specimens of a lung cancer patient. Control samples generally refer to control samples derived from non-diseased parts or tissues of the same tumor patient, such as peripheral blood isolated leukocyte samples and the like. Genomic second generation sequencing data of tumor samples and control samples are typically first aligned to a reference genome. Thus, in a preferred embodiment, the data acquisition step acquires a comparison file of genomic second generation sequencing data of the tumor sample and the control sample to a reference genome. In a more preferred embodiment, the reference genome may specifically be human reference genome hg19.
In some specific implementations of the present examples, the tumor tissue sample has a sequencing depth of > 200×, in other implementations > 300×, in other implementations > 400×, and in other implementations > 500×. Additionally, in some specific embodiments, the control sample has a sequencing depth of > 50×, in other embodiments > 100×, and in other embodiments > 200×.
S2, respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites.
It will be appreciated by those skilled in the art that somatic variations may also be referred to as somatic mutations var i (i=1,..n.), the mutation site may also be referred to as a mutation site, in the present embodiment of the present invention, the same tumor tissue sample and control sample refer to a tumor tissue sample and control sample from the same subject. And detecting the copy number variation of somatic cells of the sample in the whole exon range by copy number detection software, and obtaining estimated tumor purity information. The copy number detection software may employ any of CNVkit, ascatNgsSpecies such as CNVkit (v0.8.1), ascatNgs (v3.1.0).
S3, carrying out cluster analysis on the somatic cell mutation sites to obtain tumor clones containing different mutation sites.
Specifically, the clustering analysis may be performed using PyClone software, preferably PyClone (v0.13.0). First the somatic mutation (SNV/indel/SV) sites were mapped: inputting the results obtained in the steps S1 and S2 into PyClone software, and carrying out cluster analysis on the mutation according to the somatic mutation sites of the sample, the mutation allele sequencing depth, the mutation allele frequency, the mutation site copy number and the tumor purity of the corresponding mutation sites. Of course, other analysis software, such as CloneSig (v 0.1), may also be used for the clustering software.
Wherein the variant allele sequencing depth (V i ) Refers to the number of sequences (bars) in the sequencing data at which somatic variations occur at the corresponding sites. Variant allele frequency VAF i (Variant Allele Fraction) is calculated by the following formula:
wherein R is i Is the reference allelic sequencing depth, i.e., the number of normal sequences in the sequencing data at which no somatic mutation occurs at the corresponding site.
The variability point copy number is calculated as follows: var according to somatic mutation site i Copy number variation CNV in a region i Calculation of the somatic mutation site var i Reference copy number NCN of the region in which it is located i Actual total copy number TCN i The method comprises the steps of carrying out a first treatment on the surface of the Wherein:
acquisition of somatic mutation site var i Two chromosomesUpper allele specific copy number variation CNV i,major 、CNV i,minor Wherein CNV i,major ≥CNV i,minor Further calculate the actual mutation site copy number CN i,major 、CN i,minor Wherein:
tumor purity refers to the ratio Pur of the number of tumor cells to the total number of cells in a tumor tissue sample, and the range of values is (0, 1), and the tumor cells refer to the sum of all cells with somatic mutation.
In the above-described cluster analysis process, for any type of somatic variation, the cells in a tumor tissue sample of a subject can be divided into three categories: normal cells (N), tumor cells (T) wt ) And tumor cells (T mut ) Tumor cells (T mut ) Occupying tumor cells (T) wt ) And tumor cells carrying the mutation (T) mut ) The ratio of the sum (i.e. all tumor cells) is called the tumor cell ratio of the mutation site, and if the ratio of the mutation tumor cells of two or more mutation sites is satisfied in the same distribution model, the mutation in the same distribution model is given the same cluster label and clustered into a cluster, which is called a clone. Each cluster tag C of each subject j (j=1, …, c) have a tumor cell cluster ratio corresponding theretoWherein the tumor cell cluster proportion +.>Calculated by the following method:
in some embodiments, other versions of PyClone or other variant clustering software such as clone sig (v 0.1) may also be used in variant clustering.
S4, predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments and HLA according to the somatic mutation sites.
Specifically, based on the binding affinity of peptide fragments to HLA, the potential tumor neoantigen peptide fragment sequences are predicted based on the detected mutation data to obtain the somatic mutation site var i Translated variant amino acid aa i In the center, polypeptides in the 21-mer range were scanned to determine candidate peptides that bind to HLA class I molecules. Then, the binding affinity of 8-11-mer peptide to HLA class I molecule is predicted by using NetMHCPan3.0 software, and epitope satisfying the following conditions is screened as candidate peptide neoAg i : (1) Mutations that are not expressed according to the RNA sequence data (mutations with a number of mutant allele reads.gtoreq.1 in the RNA sequencing data are confirmed as expressed); (2) an epitope of the sequence homologous to itself; (3) According to netmhcpan3.0, the half maximal inhibitory concentration (IC 50) is an epitope greater than 500 nanometers. In this embodiment, the tumor neoantigen peptide prediction software may be NetMHCPan (3.0) or NetMHCPan (4.0) software.
S5, calculating the ratio of the neoantigen peptide to the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide sequence, and obtaining an immune editing score so as to quantify and determine the immune editing stage (clearance, balance or escape) of each tumor clone.
Specifically, each tumor clone cluster was labeled c when calculating the immune edit score of each tumor j All corresponding non-synonymous somatic mutation sites var i Predicted candidate peptide neoAg i Respectively accumulating to obtain the number num of non-synonymous mutation sites var,j And number num of candidate peptides neoAg,j . Calculating the ratio of the number of candidate peptides to the non-synonymous variation sites to obtain an immune editing score es j The calculation method is as follows:
num var,j =var 1 +var 2 ...+var i
num neoAg,j =neoAg 1 +neoAg 2 ...+neoAg i
s6, calculating tumor neogenesis antigen load.
The proportion of tumor cells occupied by the various tumor clones defined as immune clearance was added to give the final tumor neoantigen load iotanl. In this example, when calculating tumor neoantigen load ioTNL based on tumor clone clusters defined as immune-cleared state and tumor immune editing theory, immune editing score es j Tumor clone clusters less than 0.9 are defined as "immune-clearing" status, and the tumor cell cluster ratio corresponding to the tumor clone clusters in the immune-clearing status is determinedAdding, calculating the value of the IOTNL, wherein the value range of the value is [0,3 ]]The specific calculation mode is as follows:
and es is j <0.9。
Preferably, after the step S6, the method further includes: and setting a threshold according to the tumor neoantigen load ioTNL predicted based on the tumor immunity theory in the tumor tissue sample, judging the subjects corresponding to the samples greater than or equal to the threshold as low-risk subjects, otherwise, indicating that the tumor immunity escape degree is lower. In addition, in some embodiments of the present examples, the tumor sample is preferably selected from lung cancer, nasopharyngeal carcinoma or melanoma.
In some embodiments of the present examples, the somatic mutation selects at least one of point mutation (SNV), insertion/deletion (indel), structural mutation (SV), copy number mutation (CNV). For example, in some embodiments, the somatic variation may be specifically SNV, indel, in other embodiments SNV, indel, SV, and in other embodiments SNV, indel, SV, CNV.
In some embodiments of the present examples, the method of sequencing tumor tissue samples and control samples is whole genome sequencing, whole exome sequencing, or probe capture sequencing, preferably whole exome sequencing.
A second embodiment of the present application provides a tumor neoantigen predicting apparatus, which includes:
the data acquisition module is used for acquiring sequencing data of the same tumor tissue sample and the control sample, and respectively carrying out somatic variation detection on the tumor tissue sample and the control sample to obtain somatic variation sites;
the detection module is used for respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites;
the clustering module is used for clustering the somatic mutation sites to obtain tumor clones containing different mutation sites;
a prediction module for predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments to HLA according to the somatic mutation sites;
the immune editing calculation module is used for calculating the proportion of the neoantigen peptide fragment and the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide fragment sequence to obtain an immune editing score;
a tumor neoantigen calculation module for calculating tumor neoantigen load.
In some embodiments of the present embodiment, the tumor neoantigen calculating module is configured to determine tumor clone clusters in an "immune clearing" state, and accumulate the tumor cell clusters corresponding to the tumor clone in the "immune clearing" state in proportion, where the value is a tumor neoantigen load value predicted based on a tumor immune editing theory.
A third embodiment of the present application provides an electronic device, including:
and a memory for storing a program.
A processor for implementing the method according to the first embodiment by executing the program stored in the memory.
A fourth embodiment of the present application provides a computer-readable storage medium, which may be provided in the electronic apparatus in the above third embodiment, and may be the memory in the above third embodiment. Which stores a program executable by a processor to implement the prediction method as described in the first embodiment. The storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
According to the embodiment, the method for predicting the tumor neoantigen load based on the tumor immunity editing theory is provided, the specific numerical value corresponding to the tumor neoantigen load predicted based on the tumor immunity editing theory is estimated by fully utilizing each mutation site and potential tumor neoantigen of a sample, and a numerical reference is provided for the follow-up prediction of the immune treatment effect.
The fifth embodiment of the present application provides a predictive example of a specific tumor neoantigen:
in the embodiment, the samples used are a group of lung cancer immunotherapy queue tumor tissue samples and control blood samples of 65 patients, and the control blood samples are specifically peripheral blood separated leukocyte samples. The tumor tissue sample sampling mode of this embodiment is a formalin-fixed paraffin embedded (FFPE) sample prepared by sampling a pathological puncture single point at the lung focus of each patient, and the sampling unit is a center for tumor prevention and treatment in the university of zhongshan.
The method for predicting the sample comprises the following steps:
1. tumor tissue samples and control blood samples were obtained and subjected to whole exome sequencing, which was provided by Nanjing age and Gene Biotechnology Inc. After obtaining the sequenced sequence data, the variation of the test sample in the whole exon range (SNV/indel) was detected using the variation detection software VarScan (v2.4.1), the minimum coverage was set equal to 20, the minimum support reading was equal to 5, and other parameters were default parameters. And obtaining the reference allelic sequencing depth and variant allelic sequencing depth information of each mutation site. Copy number detection software ascatNgs (v3.1.0) is used for detecting the copy number variation of somatic cells of a tested sample in the whole exon range, the mode is set as an rule_count, other parameters adopt default parameters, and specific copy number and estimated tumor purity value of each allele are detected and obtained.
2. According to the related parameters obtained in the step S1, cluster analysis is carried out on the mutation according to clusters through PyClone (v0.13.0) software, the-priority parameter is set as major_copy_number, the-interfaces parameter is set as 10,000, the-burn-in parameter is set as 1000, and the-tuneur contents parameter is set as the value of the tumor purity Pur of the subject. The variant number, the reference allele sequencing depth, the variant allele sequencing depth, the reference copy number of the region where the variant is located and the actual allele-specific copy number parameters of the variant are made into a file sample.tsv with a tab as a divider as an input of the-in_files parameter, and the file content of the sample.tsv input by one patient (patient number: F17120989277) in the lung cancer queue is exemplified as follows. Analysis results of somatic mutation clusters of each sample in the queue are obtained after the PyClone software is run, wherein mutation_id is mutation number of each mutation site, ref_counts is reference allele sequencing depth, var_counts is mutation allele sequencing depth, normal_cn is reference copy number of the region where the mutation is located, minor_cn is minor allele specific copy number of the region where the mutation is located, major_cn is major allele specific copy number of the region where the mutation is located, and details are shown in table 1.
TABLE 1
/>
And obtaining the analysis result of mutation clusters of each sample in the queue after operating the PyClone software. The following is an example of the PyClone run results for example patient F17120989277. Wherein sample_id is the number of the sample of the subject, cluster_id is the number of each mutation cluster clustered by PyClone, size is the number of mutation contained in each mutation cluster, mean represents the estimated proportion of tumor cells occupied by the corresponding mutation cluster, and std represents the standard deviation of the calculation result, and is specifically shown in table 2.
TABLE 2
3. According to body finenessAnd (3) translating the polypeptides in the range of 21-mers by taking the mutation amino acid corresponding to each mutation site as a center as a result of cell mutation detection analysis. The binding affinity of peptides in the 8-11 mer range to HLA class I was predicted using the NetMHCPan3.0 binding algorithm. Screening epitopes satisfying the following conditions as candidate peptides neoAg i : : (1) Mutations that are not expressed according to the RNA sequence data (mutations with a number of mutant allele reads.gtoreq.1 in the RNA sequencing data are confirmed as expressed); (2) the sequence is homologous to itself; (3) According to netmhcpan3.0, the half maximal inhibitory concentration (IC 50) is greater than 500 nm. The tumor neoantigen peptide prediction results for example patient F17120989277 are exemplified as follows. Wherein NeoRank is the predicted sequence number of the nascent antigen Peptide fragment, HLA is the HLA class I molecule combined with the predicted Peptide fragment, peptides are the predicted nascent antigen Peptide fragment, neoepitopeservice and WildtypeScare are the affinity values of the predicted variant Peptide fragment and the wild type Peptide fragment combined with the HLA molecule respectively, the lower the value is the stronger the representing affinity, wildtypePeptide is the wild type Peptide fragment, gene and CDSMutation are the original variant genes and original variant sites corresponding to the predicted nascent antigen Peptide fragment respectively, and the specific is shown in Table 3.
TABLE 3 Table 3
/>
/>
4. From the results of the previous step, an immune edit score was calculated for each tumor clone of each sample, as exemplified by the immune edit score results for each tumor clone of patient F17120989277, where sample_id is the subject sample number, cluster_id is the number of PyClone clusters to each cluster, size is the number of mutations contained in each cluster, and coding_score is the immune edit score for that clone, as shown in table 4. None of the other clones of the patient found tumor neoantigens meeting the above step 3), so that only clone No. 4 gave an immune editing score.
TABLE 4 Table 4
5. Judging the immune editing state of each clone, defining a tumor clone cluster with the immune editing score smaller than 0.9 as an immune clearing state, adding the tumor cell cluster proportion corresponding to the tumor clone cluster in the immune clearing state, and calculating the ioTNL value. After calculating the tumor neoantigen load (ioTNL) value predicted based on the tumor immune editing theory for each sample, setting the ioTNL threshold to 60, judging samples with ioTNL less than or equal to 60 as low ioTNL (ioTNL-L), and judging samples with ioTNL greater than 60 as high ioTNL (ioTNL-H). After collecting the efficacy and Progression Free Survival (PFS) information of the patients tested, the samples of this batch were subjected to survival analysis (see fig. 2, time units on the abscissa are days) and found to have a significant predictive effect on PFS prognosis for patients (p=0.00068) using the results of tumor immune editing level assessed by iotantl, with higher risk of progression for patients with high iotantl (hr=2.83), as shown in table 5 below. The result verifies the effectiveness and accuracy of tumor neoantigen load predicted by using a tumor immune editing theory, and also shows that the ioTNL can be used as a biomarker to predict the curative effect of lung cancer immunotherapy.
TABLE 5
/>
/>
The sixth embodiment of the present application provides another example of prediction of specific tumor neoantigens:
in this example, a total of 61 patients were treated with nasopharyngeal carcinoma immunotherapy, and a control blood sample, specifically a leukocyte sample isolated from peripheral blood. The tumor tissue samples of this example were sampled at a single point at the lung lesions of each patient. The tumor tissue sample sampling mode of this embodiment is a single point sampling of pathological puncture at the lung focus of each patient and a prepared formalin-fixed paraffin embedded (FFPE) sample, and the sampling unit is a center for tumor prevention and treatment in university of zhongshan.
The prediction method is basically the same as the fifth embodiment, and includes the steps of:
1. tumor tissue samples and control blood samples were obtained and subjected to whole exome sequencing, which was provided by Nanjing age and Gene Biotechnology Inc.
2. And (3) carrying out cluster analysis on the mutation by cluster by using PyClone (v0.13.0) software according to the related parameters obtained in the step (1). The contents of the sample. Tsv file entered by one of the tested patients (patient number: F17120989297) in the batch of nasopharyngeal carcinoma cohorts are exemplified as follows. Analysis results of the somatic mutation clusters of each sample in the queue were obtained after running PyClone software, as shown in table 6.
TABLE 6
/>
/>
/>
And (3) obtaining analysis results of mutation clusters of all samples in the queue after operating the PyClone software. The following is an example patient's PyClone run results. Wherein sample_id is the number of the sample of the subject, cluster_id is the number of each mutation cluster clustered by PyClone, size is the number of mutation contained in each mutation cluster, mean represents the estimated proportion of the corresponding mutation cluster to the tumor cells, std represents the standard deviation of the calculation result, and the analysis result of mutation cluster of the sample is shown in table 7.
TABLE 7
3. The potential tumor neoantigen peptide sequences were predicted, as exemplified below for tumor neoantigen peptide predictions for example patient F17120989297. Wherein NeoRank is the predicted sequence number of the nascent antigen Peptide fragment, HLA is the HLA class I molecule combined with the predicted Peptide fragment, peptides are the predicted nascent antigen Peptide fragment, neoepitopeservice and WildtypeScare are the affinity values of the predicted variant Peptide fragment and the wild type Peptide fragment combined with the HLA molecule respectively, the lower the value is the stronger the affinity, wildtypePeptide is the wild type Peptide fragment, gene and CDSMutation are the original variant genes and original variant sites corresponding to the predicted nascent antigen Peptide fragment respectively, and the specific is shown in Table 8.
TABLE 8
4. The immune edit score of each tumor clone of each sample was calculated from the results of the previous step, as exemplified below as the immune edit score results of each tumor clone of example patient F17120989297, wherein sample_id is the subject sample number, cluster_id is the number of each variant cluster clustered by PyClone, size is the number of variants contained in each variant cluster, and coding_score is the immune edit score of the clone, as shown in table 9. None of the other clones of the patient found tumor neoantigens meeting the above step 3, so that only clone No. 4 had an immune editing score.
TABLE 9
5. Calculating the value of the IOTNL, setting the threshold value of the IOTNL to be 24.5 after the ROC curve analysis and correction, judging a sample with the IOTNL less than or equal to 24.5 as low IOTNL (IOTNL-L), and judging a sample with the ITH greater than 24.5 as high IOTNL (IOTNL-H). After collecting the efficacy and Progression Free Survival (PFS) information of the patients tested, the samples of this batch were subjected to survival analysis (see fig. 3, time units on the abscissa are days) and found to have a significant predictive effect on PFS prognosis for patients assessed by means of iotatnl (p=0.047), with higher risk of progression for patients with high iotatnl (hr=1.77), as shown in table 10 below. The results successfully verify the effectiveness and accuracy of the evaluation of the tumor neoantigen load predicted based on the tumor immune editing theory by using the ioTNL analysis technology in another nasopharyngeal carcinoma queue again, and also demonstrate that the ioTNL can be used as a biomarker to predict the curative effect of immunotherapy in nasopharyngeal carcinoma.
Table 10
/>
/>
In summary, the embodiment of the invention provides a specific method for calculating the tumor neogenetic antigen load of a sample based on tumor immune editing theory prediction based on the data of Whole Exome Sequencing (WES), and makes full use of each mutation site detected by the sample and the predicted tumor neogenetic antigen to estimate the immune editing state of each tumor clone, thereby calculating a specific numerical value of the tumor immune editing degree and providing a numerical reference for the follow-up prediction of the immune therapeutic effect.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims (6)

1. A method for predicting tumor neoantigen burden, comprising the steps of:
acquiring sequencing data of the same tumor tissue sample and a control sample in a whole exon range;
respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites in the whole exon range;
performing cluster analysis on the somatic cell mutation sites to obtain tumor clones containing different mutation sites;
predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments to HLA according to the somatic mutation sites;
calculating the ratio of the neoantigen peptide fragment to the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide fragment sequence to obtain an immune editing score;
the tumor neoantigen load was calculated and calculated,
the predicting potential tumor neoantigen peptide sequences based on peptide fragment binding affinity to HLA according to the somatic mutation site comprises:
based on somatic mutation sitesTranslated variant amino acids->Centrally, scanning polypeptides in the 21-mer range to determine candidate peptides that bind to HLA class i molecules;
predicting the binding affinity of the 8-11-mer peptide to HLA class i molecules;
screening candidate peptides
The screening candidate peptidesThe method comprises the following steps: screening RNA data expressing mutation, sequence homologous epitope and half maximum inhibition concentration greater than 500 nm;
calculating the ratio of the neoantigen peptide fragments to the non-synonymous mutation sites based on the clustering result of the somatic mutation sites and the tumor neoantigen peptide fragment sequence, wherein the step of obtaining the immune editing score comprises the following steps:
labeling each clusterAll non-synonymous somatic mutation sites corresponding to +.>And predicted candidate peptide->Respectively accumulating to obtain the number of non-synonymous mutation sites +.>And number of candidate peptides->
Calculating the ratio of the number of candidate peptides to the number of non-synonymous variation sites to obtain an immune editing score
The calculating tumor neoantigen burden comprises:
immune edit scoreTumor clone clusters less than 0.9 are defined as immunoclear status;
proportion of tumor cell clusters corresponding to tumor clone clusters in immune clearance stateAdding to obtain tumor neoantigen load.
2. The method of claim 1, wherein said clustering of said somatic mutation sites to obtain tumor clones containing different mutation sites comprises:
based on the somatic mutation sites, calculating mutation allele sequencing depth, mutation allele frequency, mutation site copy number and tumor purity values, carrying out cluster analysis according to the calculated mutation allele sequencing depth, mutation allele frequency, mutation site copy number and tumor purity values, and clustering the somatic mutation sites according to clusters.
3. The method of claim 2, wherein the variant allele sequencing depth is a variant of the sequencing data at which a somatic variant occurs at the corresponding siteNumber of heterologous sequences; variant allele frequency VAF i Calculated by the following formula:
wherein V is i For variant allelic sequencing depth, R i Sequencing depth for the reference allele.
4. A method of predicting tumor neoantigen burden according to any one of claims 2-3, wherein during the cluster analysis, the somatic cells comprise normal cells N, tumor cells T not carrying the variation mut And tumor cell T carrying the variation wt Said tumor cell T carrying said variation wt Occupying tumor cells T not carrying the variation mut And tumor cell T carrying the variation wt The sum proportion is the proportion of the mutated tumor cells, and if the proportion of the mutated tumor cells of two or more mutated sites is in the same distribution model, the mutated sites in the same distribution model are clustered into a cluster by the same cluster label; each cluster tagHas a tumor cell cluster proportion corresponding thereto>The tumor cell cluster proportion->Calculated by the following method:
5. the method according to claim 4, wherein the somatic mutation site is at least one selected from the group consisting of a point mutation, an insertion or deletion, a structural mutation, and a copy number mutation; the method for acquiring the sequencing data comprises one of whole genome sequencing, whole exome sequencing or probe capture sequencing.
6. A tumor neoantigen load prediction apparatus, comprising:
the data acquisition module is used for acquiring sequencing data of the same tumor tissue sample and the same control sample in a whole exon range, and respectively carrying out somatic variation detection on the tumor tissue sample and the control sample to obtain somatic variation sites;
the detection module is used for respectively carrying out somatic mutation detection on the tumor tissue sample and the control sample to obtain somatic mutation sites in the whole exon range;
the clustering module is used for clustering the somatic mutation sites to obtain tumor clones containing different mutation sites;
a prediction module for predicting potential tumor neoantigen peptide sequences based on binding affinity of peptide fragments to HLA according to the somatic mutation sites;
the immune editing calculation module is used for calculating the proportion of the neoantigen peptide fragment and the non-synonymous mutation site based on the clustering result of the somatic mutation site and the tumor neoantigen peptide fragment sequence to obtain an immune editing score;
a tumor neoantigen calculation module for calculating tumor neoantigen load;
the predicting potential tumor neoantigen peptide sequences based on peptide fragment binding affinity to HLA according to the somatic mutation site comprises:
based on somatic mutation sitesTranslated variant amino acids->Centering, scanning the polypeptides in the 21-mer range to determine candidate peptides that bind to HLA class I molecules;
Predicting the binding affinity of the 8-11-mer peptide to HLA class i molecules;
screening candidate peptides
The screening candidate peptidesThe method comprises the following steps: screening RNA data expressing mutation, sequence homologous epitope and half maximum inhibition concentration greater than 500 nm;
calculating the ratio of the neoantigen peptide fragments to the non-synonymous mutation sites based on the clustering result of the somatic mutation sites and the tumor neoantigen peptide fragment sequence, wherein the step of obtaining the immune editing score comprises the following steps:
labeling each clusterAll non-synonymous somatic mutation sites corresponding to +.>And predicted candidate peptide->Respectively accumulating to obtain the number of non-synonymous mutation sites +.>And number of candidate peptides->
Calculating the ratio of the number of candidate peptides to the number of non-synonymous variation sites to obtain an immune editing score
The calculating tumor neoantigen burden comprises:
will immunizeEditing scoreTumor clone clusters less than 0.9 are defined as immunoclear status;
proportion of tumor cell clusters corresponding to tumor clone clusters in immune clearance stateAdding to obtain tumor neoantigen load.
CN202110067368.XA 2021-01-19 2021-01-19 Method and device for predicting tumor neoantigen load Active CN113053458B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110067368.XA CN113053458B (en) 2021-01-19 2021-01-19 Method and device for predicting tumor neoantigen load

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110067368.XA CN113053458B (en) 2021-01-19 2021-01-19 Method and device for predicting tumor neoantigen load

Publications (2)

Publication Number Publication Date
CN113053458A CN113053458A (en) 2021-06-29
CN113053458B true CN113053458B (en) 2023-08-04

Family

ID=76508497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110067368.XA Active CN113053458B (en) 2021-01-19 2021-01-19 Method and device for predicting tumor neoantigen load

Country Status (1)

Country Link
CN (1) CN113053458B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114141304B (en) * 2021-12-15 2024-04-30 上海科技大学 Tumor sample immune editing state quantification method and application thereof
CN114882951B (en) * 2022-05-27 2022-12-27 深圳裕泰抗原科技有限公司 Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
CN115424740B (en) * 2022-09-30 2023-11-17 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013192339A1 (en) * 2012-06-19 2013-12-27 The Regents Of The University Of California Biomarkers for the immunoediting-escape phase
WO2016047715A1 (en) * 2014-09-24 2016-03-31 北海道公立大学法人札幌医科大学 Tumor antigen peptides
CN109584960A (en) * 2018-12-14 2019-04-05 上海鲸舟基因科技有限公司 Predict the method, apparatus and storage medium of tumor neogenetic antigen
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
CN110387419A (en) * 2019-08-20 2019-10-29 裕策医疗器械江苏有限公司 Solid tumor polygenes detects genetic chip and preparation method thereof and detection device
CN111402952A (en) * 2020-03-27 2020-07-10 深圳裕策生物科技有限公司 Method and system for detecting tumor heterogeneity degree
WO2020253643A1 (en) * 2019-06-19 2020-12-24 上海交通大学医学院 Tumor neoantigen polypeptide and use thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013192339A1 (en) * 2012-06-19 2013-12-27 The Regents Of The University Of California Biomarkers for the immunoediting-escape phase
WO2016047715A1 (en) * 2014-09-24 2016-03-31 北海道公立大学法人札幌医科大学 Tumor antigen peptides
CN109584960A (en) * 2018-12-14 2019-04-05 上海鲸舟基因科技有限公司 Predict the method, apparatus and storage medium of tumor neogenetic antigen
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
WO2020253643A1 (en) * 2019-06-19 2020-12-24 上海交通大学医学院 Tumor neoantigen polypeptide and use thereof
CN110387419A (en) * 2019-08-20 2019-10-29 裕策医疗器械江苏有限公司 Solid tumor polygenes detects genetic chip and preparation method thereof and detection device
CN111402952A (en) * 2020-03-27 2020-07-10 深圳裕策生物科技有限公司 Method and system for detecting tumor heterogeneity degree

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多发肝癌免疫基因组特征及免疫逃逸机制的探究;高强等;《临床肝胆病杂志》;20200615(第06期);第152-155页 *

Also Published As

Publication number Publication date
CN113053458A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN113053458B (en) Method and device for predicting tumor neoantigen load
Pesaran et al. Beyond DNA: an integrated and functional approach for classifying germline variants in breast cancer genes
CN108603234A (en) Medical diagnosis on disease based on variant and tracking
CN111402952A (en) Method and system for detecting tumor heterogeneity degree
US20220148734A1 (en) Blood cell-free dna-based method for predicting prognosis of liver cancer treatment
KR20150132500A (en) Compositions and methods for cancer prognosis
Sultan et al. Towards the early detection of ductal carcinoma (a common type of breast cancer) using biomarkers linked to the PPAR (γ) signaling pathway
Livshits et al. Pathway-based personalized analysis of breast cancer expression data
Alkallas et al. Multi-omic analysis reveals significantly mutated genes and DDX3X as a sex-specific tumor suppressor in cutaneous melanoma
WO2012125712A2 (en) Lung tumor classifier for current and former smokers
CN108475300A (en) Genome base sequence abrupt information using cancer patient and raw stored Custom Prosthesis medicament selection method and system
CN113544288A (en) DNA methylation marker for predicting liver cancer recurrence and application thereof
JP2022524484A (en) How to predict the survival rate of cancer patients
Cao et al. Identification and development of a novel 4-gene immune-related signature to predict osteosarcoma prognosis
Singh et al. Differentially expressed full-length, fusion and novel isoforms transcripts-based signature of well-differentiated keratinized oral squamous cell carcinoma
Chen et al. Genetic variations of mitochondrial genome modify risk and prognosis of hepatocellular carcinoma patients
CN112725455B (en) Application of m6A key gene and risk model in prediction of prognosis of adrenocortical adenocarcinoma
CN113450920A (en) Method and device for predicting immunotherapy curative effect of non-small cell lung cancer patient
Ma et al. A more novel and powerful prognostic gene signature of lung adenocarcinoma determined from the immune cell infiltration landscape
Deng et al. Molecular diagnosis and treatment of meningiomas: an expert consensus (2022)
Deng et al. Identification of a novel missense FBN2 mutation in a Chinese family with congenital contractural arachnodactyly using exome sequencing
WO2020094569A1 (en) Method for determining cellular composition of a tumor
Wilmott et al. Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-detection algorithms for human melanoma genomes
Nassar et al. Tumor mutation burden prediction model in Egyptian breast cancer patients based on next generation sequencing
CN116072258A (en) Device for developing bladder cancer tumor antigen and predicting guiding medication and prognosis of bladder cancer patient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant