CN106676178B - Method and system for evaluating tumor heterogeneity - Google Patents

Method and system for evaluating tumor heterogeneity Download PDF

Info

Publication number
CN106676178B
CN106676178B CN201710043183.9A CN201710043183A CN106676178B CN 106676178 B CN106676178 B CN 106676178B CN 201710043183 A CN201710043183 A CN 201710043183A CN 106676178 B CN106676178 B CN 106676178B
Authority
CN
China
Prior art keywords
variation
sequencing
variant
ctdna
copy number
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710043183.9A
Other languages
Chinese (zh)
Other versions
CN106676178A (en
Inventor
吴爱伟
常连鹏
李进
龚玉华
管彦芳
易鑫
杨玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiyingjia Technology Co.,Ltd.
SUZHOU JIYINJIA BIOMEDICAL ENGINEERING Co.,Ltd.
Original Assignee
Geneplus - Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geneplus - Beijing filed Critical Geneplus - Beijing
Priority to CN201710043183.9A priority Critical patent/CN106676178B/en
Publication of CN106676178A publication Critical patent/CN106676178A/en
Application granted granted Critical
Publication of CN106676178B publication Critical patent/CN106676178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • Biotechnology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method and a system for evaluating tumor heterogeneity. Specifically, the invention provides a molecular clone analysis method, which divides all variations into different molecular clones based on the detection results of multiple types of variations in circulating tumor DNA by high-throughput sequencing, and evaluates the heterogeneity of tumors by using molecular clone levels. The method of the invention realizes tumor heterogeneity assessment based on ctDNA high-throughput variation detection, and effectively assists tumor prognosis and treatment scheme formulation.

Description

Method and system for evaluating tumor heterogeneity
Technical Field
The present invention is in the field of biotechnology, and more particularly, the present invention relates to methods and systems for assessing tumor heterogeneity.
Background
Tumors are a disease caused by genetic changes. Tumors often involve a variety of types of genetic Variations, including Single Nucleotide Variations (SNV), short indels (indels), Copy Number Variations (CNV), Structural Variations (SV), and the like. The process of variant accumulation begins as the first variant develops in the tumor cells. Over time and the evolution of tumors, the deleterious variations that occur first create favorable maintenance conditions for later-occurring variations, allowing tumor cells to constantly acquire or enhance capabilities such as inhibiting apoptosis, unlimited replication, immune escape, etc., and thus tumor cells to accumulate variations much faster than normal cells. The final tumor that is formed is indeed a mixture of cell populations with different genetic characteristics: some cells carry only early-stage variation, and some carry later-stage variation; in these tumor cells, the proportion of cells involved in the mutation also decreases from early to late as the time of occurrence; the simultaneous variation in a cell is co-conserved and died out during tumor evolution, involving the same proportion of cells. The complexity of the distribution of the varying cellular proportions in the tumor can reflect tumor heterogeneity, the latter being the most direct and important manifestation of tumor complexity, which is closely related to tumor patient prognosis and survival time.
At present, the heterogeneity of tumors is mostly evaluated by adopting a method of multi-site sampling and high-throughput sequencing of the same tumor patient, namely, after pathological sampling is carried out on a plurality of positions or a plurality of focuses of tissues of the patient, the variation of each sampling part is analyzed by a high-throughput sequencing method, and description and hierarchical statistics are carried out on the common variation and the cell proportion corresponding to the variation. This method has the following disadvantages: (1) the multi-site clinical sampling has bias, can only represent the molecular variation characteristics of the taken part, and cannot represent the complexity of the whole tumor; (2) has certain clinical risk; (3) some types of metastases are difficult to obtain, such as pleural/peritoneal metastases; (4) inaccuracies, the heterogeneity analysis method by common variations, identifies common variations as the same level, does not specifically partition common variations, thus leading to inaccuracies in the partial analysis results (Gerlinger, M.et. internal. statistical and branched analysis reported by multiple regional analysis. the New England and lateral of media 366, 883. once: 10.1056/NEJMoa1113205 (2012); Hao, J.J.et. spatial internal statistical analysis and hierarchical analysis in analytical cell of environmental cells nuclear, doi:10.1038/ng.3683 (2016)). In addition, there are methods to assess tumor heterogeneity only by the copy number variation results of single-point sampling (oxygen, l., Satas, G. & Raphael, b.j. quantitative genetic in recent-genome and recent-isomer sequencing data. bioinformatics30,3532-3540, doi:10.1093/bioinformatics/btu651(2014)), which have the disadvantage of low population coverage, i.e., they can only cover cancer or population with a large amount of copy number variation, in addition to the disadvantage of sampling bias.
Therefore, there is a need in the art for analytical methods to more accurately assess tumor heterogeneity to effectively aid in tumor prognosis and treatment planning.
Disclosure of Invention
In order to more accurately evaluate the heterogeneity of tumors, the invention provides a Molecular Clone (mClone) analysis method, which is based on the detection results of multiple types of variation in circulating tumor DNA (ctDNA) by high-throughput sequencing, divides all the variation into different Molecular clones, and evaluates the heterogeneity of tumors by using Molecular Clone levels. The method of the invention realizes tumor heterogeneity assessment based on ctDNA high-throughput variation detection, and effectively assists tumor prognosis and treatment scheme formulation.
Accordingly, in a first aspect, the present invention provides a method of assessing tumor heterogeneity, the method comprising:
1) sequencing (preferably high-throughput sequencing) free DNA (cell-free DNA, cfDNA) of a patient to obtain sequencing information;
2) determining ctDNA variation by using the sequencing information, calculating variation allelic frequency according to the sequencing information and the determined ctDNA variation, determining the actual total copy number of the region where the variation is located, and calculating the ratio of ctDNA to cfDNA;
3) clustering the ctDNA variation according to the proportion determined in the step 2), and sequencing information and copy number information of the ctDNA variation, wherein each cluster obtained by clustering is determined as a molecular clone to obtain a clustered clone level;
4) assessing the patient's tumor heterogeneity based on their clonal hierarchy, the more clonal hierarchies the more heterogeneous the patient's tumor.
In a second aspect, the present invention provides a method of comparing tumor heterogeneity in different patients, the method comprising:
calculating a molecular clone hierarchy for each of said patients using steps 1) -3) of the method of the first aspect of the invention, the more clone hierarchies the more heterogeneous tumours for different patients.
In one embodiment of the first or second aspect of the present invention, step 2) comprises:
2.1) obtaining a variant V (said variant V being selected from SNV, indel and SV) (V) using said sequencing informationiReference allelic sequencing depth (R) for i-1, …, ni) Depth of variant allelic sequencing (M)i) And calculating Variant Allelic Frequency (VAF)i),
Figure BDA0001213440480000031
Wherein the reference allelic sequencing depth (R)i) The number of the normal sequences without the variation at the corresponding sites in the sequencing result; depth of variant allelic sequencing (M)i) The number of the variant sequences of which the variation occurs at the corresponding site in the sequencing result;
2.2) Using the mutation ViCNV (CNV) of the regioniI 1, …, n), calculating the variation ViReference copy number of the region (rCN)i) And actual total Copy Number (CN)i),
Figure BDA0001213440480000032
Figure BDA0001213440480000033
If an accurate CNV detection method (e.g., using SNP chip detection) is used in step 1), allele-specific Copy Number Variations (CNV) on both chromosomes are obtained for variations that are not on the male sex chromosomei,major,CNVi,minor,CNVi,major≥CNVi,minor) Information on the actual allele-specific Copy Number (CN)i,major,CNi,minor),
Figure BDA0001213440480000034
Figure BDA0001213440480000035
2.3) ctDNA ratio assessment: the percentage CtDNA (CTF) in cfDNA was evaluated with the maximum variant allelic frequency,
CTF=max(VAFi) I 1, …, n (equation 5)
In one embodiment, in step 3) of the method of the invention, the variants are clustered by the predicted variant cell proportion, for example using PyClone (v0.13, the current latest version, which is referred to below as "version" unless otherwise specified) software.
In one embodiment, in step 3) of the method of the invention, the reference for variant V (SNV/indel/SV) and the variant allelic depth data (R)i,Mi): used to evaluate the proportion of variant tumor cells together with CTF and CNV. In one embodiment, in step 3) of the method of the present invention, the proportion of the cell population in which each of the mutations is present in all tumor cells is predicted using PyClone software, and the software parameters can be set as follows: total tumor cell ratio (CTF) ═ highest value of variant allele frequencies; the iteration number is 20000; other parameters are defaults.
In one embodiment, in step 3) of the method of the invention, the n detected variant V (SNV/indel/SV) are clustered using PyClone, with default parameters except for the following parameters:
(a)--tumour_contentsCTF;
(b)--num_iters 20000;
(c) -private total _ copy _ number when using allele-specific CNV data as input
When the parameter is set to the partial _ copy _ number;
(d) -dense pyclone beta binding, which is set to pyclone binding when the whole genome sequencing technique with a lower sequencing depth is used in step 1);
(e) -in _ files property. tsv, a property. tsv file is a file with tabs as dividers; each row contains information of one variant V (SNV/indel/SV) in addition to the header row; the paint comprises six columns which are as follows in sequence: motion _ id, ref _ counts, var _ counts, normal _ cn, minor _ cn, and major _ cn.
In a third aspect, the present invention provides a system for assessing tumor heterogeneity, the system comprising:
1) a module for sequencing (preferably high-throughput sequencing) cfDNA of a patient;
2) means for performing the steps of:
a) receiving sequencing information from module 1);
b) obtaining ctDNA variations in cfDNA by comparing with sequence information of a normal gene sequence;
c) calculating variant allelic frequency according to the sequencing information and the ctDNA variation, determining the actual total copy number of the region where the variation is located or the actual allele-specific copy number, and calculating the ratio of ctDNA to cfDNA;
d) clustering the ctDNA variation according to the proportion determined in the step c) and sequencing information and copy number information of the ctDNA variation to determine molecular cloning, and calculating a molecular cloning level;
3) a result output module:
outputting a result of tumor heterogeneity based on the patient's molecular clonal hierarchy, the more clonal hierarchies the higher the tumor heterogeneity of the patient.
In a fourth aspect, the present invention provides a system for comparing tumor heterogeneity in different patients, the system comprising:
1) a module for sequencing (preferably high-throughput sequencing) cfDNA of a patient;
2) means for performing the steps of:
a) receiving sequencing information from module 1);
b) obtaining ctDNA variations in cfDNA by comparing with sequence information of a normal gene sequence;
c) calculating variant allelic frequency according to the sequencing information and the variant result, determining the actual total copy number or the actual specific copy number of the allele of the region in which the variant is positioned, and calculating the ratio of ctDNA to cfDNA;
d) clustering the ctDNA variation according to the proportion determined in the step c) and sequencing information and copy number information of the ctDNA variation to determine molecular cloning, and calculating a molecular cloning level;
3) a result output module:
comparing the molecular clone levels of different patients and outputting the result of comparing the tumor heterogeneity of different patients, wherein the more clone levels of patients, the higher the tumor heterogeneity.
In one embodiment of the third or fourth aspect of the present invention, step c) of module 2) comprises the steps of:
c.1) obtaining a variant V (said variant V being selected from SNV, indel and SV) (V) using said sequencing informationiReference allelic sequencing depth (R) for i-1, …, ni) Depth of variant allelic sequencing (M)i) And calculating Variant Allelic Frequency (VAF)i),
Figure BDA0001213440480000051
Wherein the reference allelic sequencing depth (R)i) The number of the normal sequences without the variation at the corresponding sites in the sequencing result; depth of variant allelic sequencing (M)i) The number of the variant sequences of which the variation occurs at the corresponding site in the sequencing result;
c.2) utilizing the mutation ViCNV (CNV) of the regioniI 1, …, n), calculating the variation ViReference copy number of the region (rCN)i) And actual total Copy Number (CN)i),
Figure BDA0001213440480000052
Figure BDA0001213440480000053
If an accurate CNV detection method (e.g. using SNP chip detection) is used in step 1),for variations that are not on the male sex chromosome, allele-specific Copy Number Variations (CNV) on both chromosomes are obtainedi,major,CNVi,minor,CNVi,major≥CNVi,minor) Information on the actual allele-specific Copy Number (CN)i,major,CNi,minor),
Figure BDA0001213440480000054
Figure BDA0001213440480000061
c.3) ctDNA ratio assessment: the percentage CtDNA (CTF) in cfDNA was evaluated with the maximum variant allelic frequency,
CTF=max(VAFi) I 1, …, n (equation 5)
In one embodiment of the third or fourth aspect of the invention, module 2) is a computer readable medium of instructions for performing the steps. Module 3) is a computer readable medium of instructions to perform the steps.
In one embodiment, in step d) of module 2) of the system of the invention, the reference of variant V (SNV/indel/SV) and the variant allelic depth data (R) arei,Mi): used to evaluate the proportion of variant tumor cells together with CTF and CNV. In one embodiment, in step d) of module 2) of the system of the invention, the proportion of the cell population in which each of the variants is present in all tumor cells is predicted using PyClone software, and the software parameters can be set as follows: total tumor cell ratio (CTF) ═ highest value of variant allele frequencies; the iteration number is 20000; other parameters are defaults.
In one embodiment, in step d) of module 2) of the system of the invention, the variants are clustered by the predicted variant cell proportion, for example using PyClone software.
In one embodiment, in step d) of module 2) of the system of the invention, the detected n variant V (SNV/indel/SV) are clustered using PyClone, with default parameters except for the following parameters:
(a)--tumour_contentsCTF;
(b)--num_iters 20000;
(c) -private total _ copy _ number when using allele-specific CNV data as input
When the parameter is set to the partial _ copy _ number;
(d) -dense pyclone beta binding, which is set to pyclone binding when the whole genome sequencing technique with a lower sequencing depth is used in block 1);
(e) -in _ files property. tsv, a property. tsv file is a file with tabs as dividers; each row contains information of one variant V (SNV/indel/SV) in addition to the header row; the paint comprises six columns which are as follows in sequence: motion _ id, ref _ counts, var _ counts, normal _ cn, minor _ cn, and major _ cn.
The invention provides a heterogeneity assessment method which is more in line with the tumorigenesis and development rule based on the tumor evolution theory and the ctDNA high-throughput variation detection technology and analyzes tumor variation from the clone level.
The present invention finds that higher tumor heterogeneity has higher risk of tumor progression.
Compared with other analysis methods, the advantages of the invention are as follows:
1) comprehensiveness of information: ctDNA can reflect more comprehensive tumor molecular characteristics relative to tissue sampling bias at single or multiple sites;
2) sampling convenience: tissue sampling usually comes from surgery or puncture, and compared with tissue sampling, especially multi-site tissue sampling, ctDNA detection only needs noninvasive blood sampling and is easier and more feasible clinically;
3) high accuracy: the heterogeneity is evaluated from the clonal surface rather than the variant surface on the basis of the tumor evolution theory by fully utilizing variant information, covering SNV, indel and SV, reserving the specific frequency of the variant rather than utilizing detected/undetected binary values.
By means of the three points, the method and the system can more accurately and reasonably evaluate the heterogeneity of the tumor.
Drawings
The invention is illustrated by the following figures.
Fig. 1 is a flow chart of mClone analysis, with the steps marked by x being performed separately for each patient.
Fig. 2 survival analysis, high heterogeneity for the left curve and low heterogeneity for the right curve.
Detailed Description
In the present invention, the name of the Gene is given by Official designation (Official Symbol) in NCBI-Gene, and the Gene mutation and the protein mutation are expressed by common expression in the art. For example, c.518t > C (p.v173a) represents a missense mutation, indicating a change of the T base at position 518 of the coding region to a C base, resulting in a mutation of the amino acid at position 173 from histidine V to arginine a; c.2235-2249 delGGAATTAAGAGAGAAC (p.E746-A750 del) indicates a small fragment deletion, indicating the deletion of bases GGAATTAAGAGAAGC from position 2235 to 2249 of the coding region, resulting in the deletion of 5 amino acids from position 746 to 750; c.2663+1G > A represents a splicing mutation, and represents that the first base of an intron which is closely connected with the 3 end of the exon where the 2663 th site of the coding region is changed from G to A; c.7081c > T (p.q2361 x) represents a nonsense mutation, changing the C base at position 7081 of the coding region to a T base, resulting in a Q at position 2361 to a stop codon.
In the present invention, the mathematical notation ceil refers to rounding up.
In the present invention, cfDNA may also be sample DNA of blood (plasma), saliva, pleural effusion, urine, feces, and the like.
In the present invention, the tumor is selected from, but not limited to: lung cancer, colorectal cancer, gastric cancer, breast cancer, kidney cancer, pancreatic cancer, ovarian cancer, endometrial cancer, thyroid cancer, cervical cancer, esophageal cancer, and liver cancer. In a specific embodiment, the tumor is lung cancer and the variation is a variation listed in table 1.
The flow chart of the method of the invention is shown in fig. 1, and for each tested patient, after ctDNA variation is detected by high-throughput sequencing, the ratio of ctDNA to cfDNA is evaluated according to the sequencing result of ctDNA variation; the above ratios, together with the detected variations, are used as input to cluster the variations, each cluster obtained by clustering is determined to be a molecular clone, then the clone levels are calculated, and finally the tumor heterogeneity of each patient is evaluated according to the clone levels of all patients. The present inventors found that, for lung cancer, patients with high heterogeneity were found to have a clone score of more than 3.5, and patients with low heterogeneity were found to have a clone score of less than 3.5.
The following is a description of the main technical process and principle of the method of the present invention:
1. high throughput sequencing to detect ctDNA variations
First, for a plurality of patients of the same cancer species selected as subjects, mutation detection and parameter calculation were performed for each patient:
1) sequencing cfDNA of a subject by high-throughput sequencing technologies such as whole genome, whole exome or probe capture sequencing and corresponding informatics analysis methods to obtain variations contained in the ctDNA, including SNV, indel, SV, CNV and the like;
2) obtaining variant V (variant V is selected from SNV, indel and SV) (V) according to the sequencing result in the step 1)iReference allelic sequencing depth (R) for i-1, …, ni) Depth of variant allelic sequencing (M)i) And calculating Variant Allelic Frequency (VAF)i),
Figure BDA0001213440480000081
Wherein the reference allelic sequencing depth (R)i) The number of the normal sequences without the variation at the corresponding sites in the sequencing result; depth of variant allelic sequencing (M)i) The number of the variant sequences of which the variation occurs at the corresponding site in the sequencing result;
3) using variation ViCNV (CNV) of the regioniI 1, …, n), calculating the variation ViReference copy number of the region (rCN)i) And actual total Copy Number (CN)i),
Figure BDA0001213440480000082
Figure BDA0001213440480000083
If the precise CNV detection method (e.g., SNP chip detection) is used in 1), allele-specific Copy Number Variation (CNV) on both chromosomes is obtained for variation not on male sex chromosomesi,major,CNVi,minor,CNVi,major≥CNVi,minor) Information on the actual allele-specific Copy Number (CN)i,major,CNi,minor),
Figure BDA0001213440480000084
Figure BDA0001213440480000085
Accurate CNV detection refers to obtaining allele-specific copy number variation of both chromosomes, for example using SNP chip detection.
2. Variant clustering and clone-level computation
Then, for each patient, cluster analysis and clone hierarchy calculation were performed on the detected variation according to the parameters obtained in 1:
1) ctDNA ratio evaluation: the percentage CtDNA (CTF) in cfDNA was evaluated with the maximum variant allelic frequency,
CTF=max(VAFi) I 1, …, n (equation 5)
2) Variant clustering:
for any variation (SNV/indel/SV), the source cells of cfDNA are classified into three categories: the ratio of normal cells (N), tumor cells not carrying the mutation (C0) and tumor cells carrying the mutation (C1), wherein the ratio of the tumor cells carrying the mutation (C1) to all the tumor cells (C1+ C0) is called the ratio of the mutant tumor cells, and if the ratio of the two or more mutant tumor cells is equivalent, the occurrence time of the two or more mutant tumor cells is similar, and the two or more mutant tumor cells are endowed with the same cluster label and are clustered into a cluster, namely a molecular clone.
Therefore, the following data are needed for mutation clustering:
a) reference and variant allelic depth data (R) for variant V (SNV/indel/SV)i,Mi): for assessing the proportion of variant tumor cells with both CTF and CNV;
b) reference copy number (rCN) in step 1.3)i) And actual total Copy Number (CN)i) Or the actual allele-specific Copy Number (CN)i,major,CNi,minor): for a certain variation, the amplification or deletion of the copy number of the variant allele can cause the false increase or false decrease of the estimated value of the proportion of the variant tumor cells, so that the genotype of the C1 cells can be more accurately judged by adding the copy number variation data, the variation frequency is corrected, and the proportion of the variant tumor cells is correctly evaluated;
c) CTF: to estimate the composition of cfDNA-derived cells, i.e. the proportion of tumor cells (C0+ C1) among all cells (N + C0+ C1), accurate setting of this parameter helps to correctly calculate the quantitative ratio of reference alleles from normal cells to reference alleles from tumor cells.
For example, the n detected variant V (SNV/indel/SV) are clustered using PyClone V0.13 (current latest version), with default parameters except for the following parameters:
(a)--tumour_contentsCTF;
(b)--num_iters 20000;
(c) -private total _ copy _ number when using allele-specific CNV data as input
When the parameter is set to the partial _ copy _ number;
(d) -dense pyclone beta binding, which parameter is set to pyclone binding when 1.1) a low sequencing depth whole genome sequencing technique is used;
(e) -in _ files property. tsv, a property. tsv file is a file with tabs as dividers; each row contains information of one variant V (SNV/indel/SV) in addition to the header row; the paint comprises six columns which are as follows in sequence: motion _ id, ref _ counts, var _ counts, normal _ cn, minor _ cn, and major _ cn.
PyClone(Roth,A.et al.PyClone:statistical inference of clonal
population structure in cancer.Nature methods 11,396-398,
10.1038/nmeth.2883(2014) estimates V from the variant V (SNV/indel/SV) and CNV informationiThe cells in the tumor occupy the proportion of all tumor cells, and each variation is assigned with a cluster label (C)i,i=1,…,n,CiE {1, …, c }, c being the number of clusters).
Other versions of PyClone or other variant clustering software may also be employed for variant clustering.
3) And (3) clone level calculation:
the clone level, i.e., the number of molecular clones mutated to aggregate c. In the process of tumor development, the structure of the tumor evolution tree is gradually enlarged and complicated, the molecular cloning is more, the cloning level is deepened continuously, and therefore the size of the cloning level is closely related to the tumor heterogeneity.
3. Assessment of tumor heterogeneity
Taking the median of the clone levels of all tested patients as a threshold value for judging the high/low tumor heterogeneity of each patient; patients with clonal hierarchy below this threshold have less tumor heterogeneity, whereas tumor heterogeneity is higher.
Since genomic variation varies significantly between cancer species, the methods of the invention do not suggest comparing heterogeneity across cancer species.
In the method of the present invention, other steps than the sequencing step may be present in the form of instructions in a computer readable medium, and the instructions in the computer readable medium may be read by a computing device to perform other steps of the method of the present invention, as long as the sequencing result is input to the computing device. Including but not limited to a computer, portable computer, PAD, smartphone, smart wrist, etc.
Examples
In this example, 10 lung cancer patients are taken as an example to explain the present invention. It should be noted that the examples are for illustrative purposes only and should not be construed as limiting the present application in any way.
List of variants detected by ctDNA high throughput sequencing
1) Variation V (SNV/indel/SV)
2-8 mutations were detected in 10 cases of lung cancer patients, and the detection list of the mutations V (SNV/indel/SV) is shown in Table 1.
TABLE 1 list of variant V (SNV/indel/SV) detection
Figure BDA0001213440480000101
Figure BDA0001213440480000111
2)CNV
Of 10 lung cancer patients, only S5 detected EGFR amplification at a fold of 1.73, as shown in Table 2. Therefore, the actual total copy number corresponding to the EGFR Deletion mutation detected in S5 is estimated to be 4.
TABLE 2CNV detection List
Sample numbering Gene State of copy number variation Multiple of copy number variation
S5 EGFR gain 1.73
Statistics of mClone analysis results
Pyclone clustering
The detected variants were clustered using PyClone v0.13, with default parameters except for the following:
a)--tumour_contents
b)--num_iters 20000
c)--prior total_copy_number
d)--density pyclone_beta_binomial
e)--in_files
parameters a) and e) specify the CTF and the input file, respectively. The CTF and input file contents for each patient are shown in table 3:
table 3Pyclone input data
Figure BDA0001213440480000121
Figure BDA0001213440480000131
Wherein, mutation _ id represents mutation number, ref _ counts represents reference count, var _ counts represents mutation count, normal _ CN represents normal copy number, i.e. CNiMinor _ CN represents a small copy number, i.e. CNi,minorThe major _ CN represents the large copy number, i.e. CNi,major
The results of mClone analysis and subsequent follow-up data using the method of the present invention are shown in table 4, and the median of all clone levels, i.e., cut-off is 3.5, with clone levels greater than 3.5 being patients with high heterogeneity and clone levels less than 3.5 being patients with low heterogeneity.
TABLE 4 comparison table of mClone analysis results and clinical information
Sample numbering Clonal hierarchy Tumor heterogeneity Progression-free survival (week)
S1 2 Is low in 54
S2 1 Is low in 49
S3 4 Height of 11
S4 4 Height of 27
S5 6 Height of 9
S6 6 Height of 17
S7 3 Is low in 17
S8 3 Is low in 34
S9 5 Height of 22
S10 2 Is low in 36
Survival analysis of this sample (see figure 2) revealed that tumor heterogeneity results using clonal level assessment had a significant predictive effect on patient prognosis (progression-free survival) (p, 0.044), with higher tumor heterogeneity with higher risk of progression (risk ratio 9.386). The results verify the effectiveness and accuracy of assessing tumor heterogeneity using mClone analysis techniques.
The molecular clone hierarchy obtained by the molecular clone mClone analysis method can be used for evaluating the heterogeneity of the tumor, the heterogeneity of the tumor represents the development stage of the tumor, and the larger the heterogeneity, the later the tumor of the patient is, and the more the tumor of the patient is developed in the near term. The above experimental data confirm this.

Claims (9)

1. A system for assessing tumor heterogeneity, the system comprising:
1) a module for sequencing cfDNA of a patient;
2) means for performing the steps of:
a) receiving sequencing information from module 1);
b) obtaining ctDNA variations in cfDNA by comparing with sequence information of a normal gene sequence;
c) calculating variant allelic frequency according to the sequencing information and the ctDNA variation, determining the actual total copy number of the region where the variation is located or the actual allele-specific copy number, and calculating the ratio of ctDNA to cfDNA;
d) clustering the ctDNA variation according to the proportion determined in the step c) and sequencing information and copy number information of the ctDNA variation to determine molecular cloning, and calculating a molecular cloning level;
3) a result output module:
outputting a result of tumor heterogeneity based on the patient's molecular clonal hierarchy, the more clonal hierarchies the higher the tumor heterogeneity of the patient.
2. A system for comparing tumor heterogeneity in different patients, the system comprising:
1) a module for sequencing cfDNA of a patient;
2) means for performing the steps of:
a) receiving sequencing information from the module of step 1);
b) obtaining ctDNA variations in cfDNA by comparing with sequence information of a normal gene sequence;
c) calculating variant allelic frequency according to the sequencing information and the variant result, determining the actual total copy number or the actual specific copy number of the allele of the region in which the variant is positioned, and calculating the ratio of ctDNA to cfDNA;
d) clustering the ctDNA variation according to the proportion determined in the step c) and sequencing information and copy number information of the ctDNA variation to determine molecular cloning, and calculating a molecular cloning level;
3) a result output module:
comparing the molecular clone levels of different patients and outputting the result of comparing the tumor heterogeneity of different patients, wherein the more clone levels of patients, the higher the tumor heterogeneity.
3. The system of claim 1 or 2, wherein the sequencing in 1) is high throughput sequencing.
4. The system according to claim 1 or 2, step c) of said 2) comprising the steps of:
c.1) obtaining the variant V using the sequencing informationiN, i 1, n, RiDepth of sequencing by variant allelic MiAnd calculating the variant allelic frequency VAFi
Figure FDA0002151336280000021
Wherein, the reference allelic sequencing depth RiThe number of the normal sequences without the variation at the corresponding sites in the sequencing result; depth of sequencing by variant allelic MiThe number of the variant sequences of which the variation occurs at the corresponding site in the sequencing result;
c.2) utilizing the mutation ViCopy number variation of the region CNViN, calculating the variance ViReference copy number of located region rCNiAnd actual total copy number CNi
Figure FDA0002151336280000022
Figure FDA0002151336280000023
c.3) ctDNA ratio assessment: by variation of ViN, the largest allele frequency of variation of ctDNA in cfDNA to assess the proportion of ctDNA in CTF,
CTF=max(VAFi),i=1,...,n。
5. the system of claim 4, wherein if the accurate copy number variation CNV detection is performed in 1) and the variation is not on male sex chromosomes, then allele-specific copy number variation CNV on both chromosomesi,majorAnd CNVi,minorCNV in (1)i,major≥CNVi,minorThe allele-specific copy number CNi,majorAnd CNi,minor
Figure FDA0002151336280000024
Figure FDA0002151336280000025
6. The system of claim 1 or 2, said 2) and/or 3) being a computer readable medium of a plurality of instructions to perform said steps.
7. The system of claim 1 or 2, wherein in step d) of 2), V is mutatediN reference and variant allele depth data R of 1iAnd MiThe proportion of variant tumor cells was evaluated together with CTF and CNV.
8. The system of claim 1 or 2, wherein in step d) of 2), the variation is clustered by a predicted variation cell ratio.
9. The system of claim 8, wherein the clustering is performed using PyClone software.
CN201710043183.9A 2017-01-19 2017-01-19 Method and system for evaluating tumor heterogeneity Active CN106676178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710043183.9A CN106676178B (en) 2017-01-19 2017-01-19 Method and system for evaluating tumor heterogeneity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710043183.9A CN106676178B (en) 2017-01-19 2017-01-19 Method and system for evaluating tumor heterogeneity

Publications (2)

Publication Number Publication Date
CN106676178A CN106676178A (en) 2017-05-17
CN106676178B true CN106676178B (en) 2020-03-24

Family

ID=58860011

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710043183.9A Active CN106676178B (en) 2017-01-19 2017-01-19 Method and system for evaluating tumor heterogeneity

Country Status (1)

Country Link
CN (1) CN106676178B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021513342A (en) * 2018-02-12 2021-05-27 エフ.ホフマン−ラ ロシュ アーゲーF. Hoffmann−La Roche Aktiengesellschaft A method of predicting response to treatment by assessing the genetic heterogeneity of the tumor
CN110853706B (en) * 2018-08-01 2022-07-22 中国科学院深圳先进技术研究院 Tumor clone composition construction method and system integrating epigenetics
CN109390034B (en) * 2018-09-20 2021-07-27 成都中珠健联基因科技有限责任公司 Method for detecting normal tissue content and tumor copy number in tumor tissue
CN109712671B (en) * 2018-12-20 2020-06-26 北京优迅医学检验实验室有限公司 Gene detection device based on ctDNA, storage medium and computer system
CN110289047B (en) * 2019-05-15 2021-06-01 西安电子科技大学 Sequencing data-based tumor purity and absolute copy number prediction method and system
CN111402952A (en) * 2020-03-27 2020-07-10 深圳裕策生物科技有限公司 Method and system for detecting tumor heterogeneity degree
CN112802548B (en) * 2021-01-07 2021-10-22 深圳吉因加医学检验实验室 Method for predicting allele-specific copy number variation of single-sample whole genome
CN114242172A (en) * 2021-07-12 2022-03-25 广州燃石医学检验所有限公司 Method for assessing intratumoral heterogeneity based on blood sequencing and use thereof for predicting the response to immunotherapy
CN116434830B (en) * 2023-04-13 2024-01-23 深圳市睿法生物科技有限公司 Tumor focus position identification method based on ctDNA multi-site methylation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687455A (en) * 2005-04-04 2005-10-26 上海奇诺肿瘤生物高新技术有限公司 Reagent and method for separating and determining dissociative DNA in blood
CN103710452A (en) * 2013-12-27 2014-04-09 朱运峰 Kit and oligonucleotides for detecting free DNA (deoxyribonucleic acid) of peripheral blood
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes
CN105602938A (en) * 2016-01-22 2016-05-25 北京圣谷同创科技发展有限公司 Plasma cfDNA extracting method
CN105603062A (en) * 2006-05-03 2016-05-25 人口诊断股份有限公司 Method of evaluating genetic disorders
CN105760712A (en) * 2016-03-01 2016-07-13 西安电子科技大学 Copy number variation detection method based on next generation sequencing
CN106055923A (en) * 2016-05-13 2016-10-26 万康源(天津)基因科技有限公司 Method for gene copy number variation analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1687455A (en) * 2005-04-04 2005-10-26 上海奇诺肿瘤生物高新技术有限公司 Reagent and method for separating and determining dissociative DNA in blood
CN105603062A (en) * 2006-05-03 2016-05-25 人口诊断股份有限公司 Method of evaluating genetic disorders
CN103710452A (en) * 2013-12-27 2014-04-09 朱运峰 Kit and oligonucleotides for detecting free DNA (deoxyribonucleic acid) of peripheral blood
CN105574361A (en) * 2015-11-05 2016-05-11 上海序康医疗科技有限公司 Method for detecting variation of copy numbers of genomes
CN105602938A (en) * 2016-01-22 2016-05-25 北京圣谷同创科技发展有限公司 Plasma cfDNA extracting method
CN105760712A (en) * 2016-03-01 2016-07-13 西安电子科技大学 Copy number variation detection method based on next generation sequencing
CN106055923A (en) * 2016-05-13 2016-10-26 万康源(天津)基因科技有限公司 Method for gene copy number variation analysis

Also Published As

Publication number Publication date
CN106676178A (en) 2017-05-17

Similar Documents

Publication Publication Date Title
CN106676178B (en) Method and system for evaluating tumor heterogeneity
Chen et al. Genomic landscape of lung adenocarcinoma in East Asians
AU2017292854B2 (en) Methods for fragmentome profiling of cell-free nucleic acids
US11978535B2 (en) Methods of detecting somatic and germline variants in impure tumors
Ding et al. Expanding the computational toolbox for mining cancer genomes
CN107779506B (en) Plasma DNA mutation analysis for cancer detection
Alkodsi et al. Comparative analysis of methods for identifying somatic copy number alterations from deep sequencing data
US20210272649A1 (en) Systems and methods for automating rna expression calls in a cancer prediction pipeline
TWI814753B (en) Models for targeted sequencing
CN106778073B (en) A kind of method and system of assessment tumor load variation
US20210104297A1 (en) Systems and methods for determining tumor fraction in cell-free nucleic acid
US20240153650A1 (en) Systems And Methods For Genetic Analysis Of Metastases
CN113228190A (en) Tumor classification based on predicted tumor mutation burden
CA3099057C (en) Surrogate marker and method for tumor mutation burden measurement
CN113096728B (en) Method, device, storage medium and equipment for detecting tiny residual focus
Muller et al. OutLyzer: software for extracting low-allele-frequency tumor mutations from sequencing background noise in clinical practice
Hadi et al. Novel patterns of complex structural variation revealed across thousands of cancer genome graphs
Demidov et al. ClinCNV: novel method for allele-specific somatic copy-number alterations detection
US20220301654A1 (en) Systems and methods for predicting and monitoring treatment response from cell-free nucleic acids
Robertson et al. Profiling copy number alterations in cell-free tumour DNA using a single-reference
KR20240067867A (en) Determination of lymphocyte abundance in mixed samples
Poletti TiMMing: developing an innovative suite of bioinformatic tools to harmonize and track the origin of copy number alterations in the evolutive history of multiple myeloma
Robinson et al. Tumor evolution and sample purity
Chaudhary VISUAL AND STATISTICAL-BASED CROSS-PLATFORM NORMALIZATION ON GENE EXPRESSION DATA OF ORAL CANCER

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210610

Address after: Room 501, 5 / F, building 2, area 1, No.8, life Garden Road, Zhongguancun Life Science Park, Longguan Town, Changping District, Beijing

Patentee after: Beijing Jiyingjia Technology Co.,Ltd.

Patentee after: SUZHOU JIYINJIA BIOMEDICAL ENGINEERING Co.,Ltd.

Address before: 102206 Room 501, 5 / F, building 2, area 1, No.8, shengshengyuan Road, Zhongguancun Life Science Park, Huilongguan town, Changping District, Beijing

Patentee before: Beijing Jiyingjia Technology Co.,Ltd.

TR01 Transfer of patent right