WO2022054086A1 - A system and a method for identifying genomic abnormalities associated with cancer and implications thereof - Google Patents

A system and a method for identifying genomic abnormalities associated with cancer and implications thereof Download PDF

Info

Publication number
WO2022054086A1
WO2022054086A1 PCT/IN2021/050877 IN2021050877W WO2022054086A1 WO 2022054086 A1 WO2022054086 A1 WO 2022054086A1 IN 2021050877 W IN2021050877 W IN 2021050877W WO 2022054086 A1 WO2022054086 A1 WO 2022054086A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
genomic
cancer
genes
identifying
Prior art date
Application number
PCT/IN2021/050877
Other languages
French (fr)
Inventor
Gowhar SHAFI
Krithika SRINIVASAN
Shruti DESAI
Mohan UTTARWAR
Original Assignee
Indx Technology (India) Private Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indx Technology (India) Private Limited filed Critical Indx Technology (India) Private Limited
Publication of WO2022054086A1 publication Critical patent/WO2022054086A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • C12Q1/6886Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material for cancer
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding

Definitions

  • the present invention broadly relates to a system and method for predicting and identifying genomic abnormalities associated with cancer in a subject. More specifically, the invention relates to a system and a method for identifying genomic abnormalities in a set of genes that leads to cancer or determine cancer prognosis and predict response to the treatment in a subject.
  • a gene may contain various genomic alterations in either the coding or the untranslated regions which could affect the biological activity of a gene in an individual. Such genomic alterations include, but are not limited to, mutations, insertions, deletions, and substitutions of one or more nucleotides.
  • biomarkers and genetic aberrations help to discriminate between cancer and healthy subjects as well as being useful in the prognosis and monitoring of cancer.
  • genomic panel testing offers the potential to evaluate a large number of biomarkers at a single time and identify any mutation, deletion or/and amplification of genes in specific biologic pathways associated with cancer. A significant difference in the expression of these biomarkers in the sample as compared to a predetermined standard of each result into the diagnosis or aiding in the diagnosis of cancer.
  • An object of the present invention is to provide a novel and improved system for identifying genomic abnormalities in a gene panel associated with cancer in a test sample.
  • Another important object of the present invention is to provide a method for identifying genomic abnormalities associated with cancer in the gene panel.
  • Yet another object of the present invention is to predict the presence, prognosis or absence of genomic abnormalities associated with cancer.
  • Another object of the invention is to provide an assay that can assess the therapeutic efficacy of a cancer treatment and determine whether a subject potentially is developing cancer or not.
  • Yet another object of the present invention is to provide a specific and cost-effective system for identifying genomic abnormalities associated with cancer.
  • the present disclosure seeks to provide a solution to the existing problems associated with the conventional approach of using short gene panel of gene markers for predicting genomic alteration due to a deletion, mutation and/or amplification of the cancer specific genes.
  • Chemotherapy response prediction by genomic tests also needs large panel of gene markers as the drugs follow liver metabolic rate for their action.
  • the present invention discloses a system for identifying genomic abnormalities in a gene panel associated with cancer in one or more test samples, the system comprising: a memory configured to receive genomic data associated with the one or more test samples and store the genomic data; a processor communicatively coupled to the memory, wherein the processor is configured to implement an artificial intelligence platform to: determine a presence of at least one genomic alteration in the genomic data; upon detecting the presence of the at least one genomic alteration, identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants in the gene panel, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; and identifying genomic abnorm
  • the present invention discloses a method for identifying genomic abnormalities associated with cancer in the gene panel, the method comprising: receiving genomic data associated with one or more tumor samples and store the genomic data; determining a presence of at least one genomic alteration in the genomic data; identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; identifying genomic abnormalities associated with cancer upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and displaying data associated with the prognosis of cancer.
  • cancer refers to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth.
  • the term cancer includes, but is not limited to, solid tumors, such as breast, ovarian, prostate, lung, kidney, gastric, colon, testicular, head and neck, pancreas, brain, melanoma, and other tumors of tissue organs and hematological tumors, such as lymphomas and leukemias, including acute myelogenous leukemia, chronic myelogenous leukemia, chronic lymphocytic leukemia, T cell lymphocytic leukemia, and B cell lymphomas.
  • prognosis may be used herein to refer to the prediction of the likelihood of cancer-attributable death or progression, including recurrence, metastatic spread, and drug resistance, of a neoplastic disease in a patient.
  • prediction may be used herein to refer to the likelihood that a patient will have a particular clinical outcome, whether positive or negative, following one or more treatments such as surgical removal or chemotherapy or radiation therapy of the primary tumor.
  • the predictive methods of the present invention can be used clinically to make treatment decisions by choosing the most appropriate treatment modalities for any particular patient.
  • the system and method disclosed herein the present invention comprises a cancer-specific gene panel, which are majorly selected on the basis of targeted cancer therapy and management.
  • the gene panel covers majority of genes or biomarkers that are differentially expressed in tissue of cancer patients versus those of normal healthy tissue.
  • the said assay may also cover DNA fingerprinting markers in order to avoid gender misinterpretation and mix-up of specimens from different individuals.
  • the assay offers high and uniform coverage depths of lOOOx thereby preventing the loss of genomic information relevant to the cancer treatment.
  • the assay also covers DNA fingerprinting markers to avoid any gender misinterpretation and mix-up of specimens from different individuals
  • the invention further comprises a computer implemented system including a data processing module and pharmacogenomics database for evaluating the response to targeted cancer therapy.
  • the said system characterizes the altered gene by locating gene variant or type of variant for preparation of a library comprising plurality of targeted genes obtained from a sample, e.g. tumor sample.
  • gene sequencing protocols is optimized for capturing one or more gene variants through clinical annotation of each genomic alteration, using pharmacogenomics database to provide high accuracy important for cancer progression, therapeutic response and resistance.
  • the data processing module is configured to generate interactive reports based on variants detected in the patients and provides the generated interactive report via a user interface (UI) to users.
  • UI user interface
  • the plurality of cancer specific genes and genomic regions are carefully identified and collected to design a gene panel which can be used for deducing the constitutional genome and the tumor-associated genomic alterations relevant to cancer diagnosis, prognosis and treatment response.
  • the gene panel of the present invention may help in identification of those cancer genes which are difficult to sequence and may provide vital information pertaining to cancer prognosis and treatment.
  • the said gene panel covers genomic alterations in a test sample from a group comprising Single Nucleotide Variants (SNVs), Indels, rearrangements and Copy Number Variations (CNV) specific to the cancer type.
  • SNVs Single Nucleotide Variants
  • CNV Copy Number Variations
  • the assay also renders Tumor Mutation Burden (TMB) and Microsatellite instability (MSI). It is pertinent to note that the assay may simultaneously screen plurality of genes in a test sample to identify the genomic abnormalities.
  • the assay is useful in determining the ability of a subject to respond to a particular therapy for treating cancer in patients and also in predicting the presence as well as the stage of cancer, by assessing the therapeutic efficacy of a cancer treatment and determining cancer development and progression in a subject. Accordingly, embodiments of the present disclosure substantially eliminate the aforementioned problems in the prior art and helps in overcoming the current clinical and economical needs of cancer diagnostics and therapy at an affordable cost.
  • the present disclosure provides a system for identifying genomic abnormalities in a gene panel associated with cancer in one or more test samples, the system comprising: a memory configured to receive genomic data associated with the one or more test samples and store the genomic data; a processor communicatively coupled to the memory, wherein the processor is configured to implement an artificial intelligence platform to: determine a presence of at least one genomic alteration in the genomic data; upon detecting the presence of the at least one genomic alteration, identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants in the gene panel, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; and identifying genomic abnor
  • At least one genomic alteration is mutation, insertion, deletion, and substitution of one or more nucleotides.
  • At least one genomic alteration comprises single nucleotide variants (SNVs), indels, rearrangements, tumor mutation burden (TMB), microsatellite instability (MSI) and copy number variations (CNV).
  • the processor is configured to determine the presence of the at least one genomic alteration by identifying the genes to obtain the gene panel.
  • system further comprises: a data repository; and a data processing module communicatively coupled to the data repository, wherein the data processing module is configured to: train a model associated with the artificial intelligence platform using at least one training dataset containing the genomic data; and store the trained model having the at least one training dataset in the data repository.
  • test sample is selected from, but not limited to blood, plasma, semen, and tissue biopsy.
  • the present invention discloses a method for identifying genomic abnormalities associated with cancer in the gene panel, the method comprising: receiving genomic data associated with one or more tumor samples and store the genomic data; determining a presence of at least one genomic alteration in the genomic data; identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; identifying genomic abnormalities associated with cancer upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and displaying data associated with the prognosis of cancer.
  • determining the presence of at the least one genomic alteration is performed by: i. identifying and sequencing the genes to obtain gene panel; ii. acquiring a deep-reads for genes or gene regions with a next generation sequencing method; and iii. mapping said read by an alignment method to assign a nucleotide base to identify the gene alteration.
  • the gene panel is obtained by: i. identifying the genes, specific for cancer; ii. designing custom DNA probes for targeted sequencing of all exons and selected introns; iii. validating the genes using KOL; iv. obtaining the gene panel comprising cancer hallmark factors involved in cell cycle, cell adhesion and cell proliferation, local tissue invasion and apoptosis, metabolism, angiogenesis and immunosurveillance for an unambiguous prediction.
  • said gene panel comprises genomic alterations selected from, but not limited to the Single Nucleotide Variations (SNVs), Indels, Copy Number Variations (CNVs) and Structural Variants (SVs)
  • SNVs Single Nucleotide Variations
  • CNVs Copy Number Variations
  • SVs Structural Variants
  • the gene panel contains probes capturing a 100-bp region in the TERT promoter.
  • the gene panel contains the probes having tile positions of >1000 common SNVs.
  • the assay comprises detecting genomic alteration in plurality of biomarker(s) in a test sample, wherein any mutation, deletion and/or amplification in a biomarker or combination of biomarkers is indicative of cancer prognosis.
  • the test sample used in the present invention for identifying one or more genomic alteration may include, but are not limited to, blood, amniotic fluid, plasma, semen, bone marrow, and tissue biopsy.
  • the test sample from an individual is screened to identify the presence or absence of one or more genomic alteration in cancer specific genes and predict the response to an anti-cancer therapy.
  • sequence optimization was performed by DNA extraction, library preparation and gene sequencing.
  • the sequencing can be performed by a variety of methods that are known to those skilled in the art.
  • the assay comprising such optimized sequencing protocols render uniform and high coverage depths of lOOOx across the targeted regions of the genome thereby preserving the vital information to deliver appropriate and useful sequencing information for efficient and interactive report generation.
  • a sample with blood or serum or tumor tissues was analyzed to deliver most useful sequencing information for efficient and Formalin-Fixed Paraffin- Embedded (FFPE) interactive report generation.
  • FFPE Formalin-Fixed Paraffin- Embedded
  • circulating tumor DNA ctDNA which is the fragment of genomic material or DNA content released by tumors, cancer, and malignant cells into the blood circulation may also be used to perform liquid biopsy. Samples identified with mutations were further used in screening, identification and analysis of gene variants for prognosis and monitoring of cancer in a subject.
  • custom DNA probes were designed for targeted sequencing of all exons and selected introns of the 500 genes (Gene coordinated provided) thoroughly studied and validated by KOLs to be clinically relevant as listed in Table 1.
  • the table includes more than5000 coding exons of canonical transcript isoforms, more thanlOO exons of non-canonical transcripts, as well as probes targeting 35 introns of 15 recurrently rearranged genes.
  • the gene panel was designed to include all major cancer hallmark factors involved in cell cycle, cell adhesion and cell proliferation, local tissue invasion and apoptosis, metabolism, angiogenesis and immunosurveillance for an unambiguous prediction.
  • the gene panel covers all four genomic alterations for better prediction of prognosis and drug recommendations such as Single Nucleotide Variations (SNVs), Indels, Copy Number Variations (CNVs) and Structural Variants (SVs).
  • SNVs Single Nucleotide Variations
  • CNVs Copy Number Variations
  • SVs Structural Variants
  • the term “gene panel” is used in a broadest sense and covers plurality of genes that are likely to respond to therapeutics comprising immunotherapy, chemotherapy, targeted therapy where said specific genes are highly cancer specific (Table 1).
  • Table 1 List of cancer specific genes screened in the assay promoter.
  • the panel covers probes that tile positions of >1000 common SNVs, which serve four purposes: i. ADME polymorphic sites to determine impact of chemotherapy (both efficacy & toxicity) ii. patient-specific fingerprint markers to identify sample mix upto detect trace amounts of contaminating DNA by identifying presence of alternate alleles at homozygous sites; and iii. to supplement CNV analysis in regions wherein few target genes are located and the probes work similar to a low-density SNP tiling array with locations evenly distributed across the genome-coverage values at these positions.
  • targeted exome sequencing libraries were prepared from QC qualified samples using iNDX IP customized Kit. Briefly, the workflow involves shearing of DNA, repairing ends, adenylation of 3’ ends, followed by adapter ligation. 10 - 50ng of gDNA/ctDNA was used for fragmentation. The adapter sequences were added onto the ends of DNA fragments to generate paired-end libraries. The resulting adaptor-ligated libraries were purified, qualified and hybridized to biotinylated capture library of custom panel. After hybridization, the targeted molecules were captured on streptavidin beads. The resulting DNA libraries were enriched and multiplexed by adding index tags for amplification, followed by purification and assessed for both quality and quantity.
  • PCR enriched adapter ligated libraries were then quantitated by Qubit dsDNA HS assay (Thermo Fisher Scientific, Waltham, MA, USA) and validated using TapeStation (Agilent Technologies, Santa Clara, CA). Resulting validated libraries were pooled in equimolar ratio and sequenced on NextSeq500 platform (Illumina, USA) to generate 2X150 bp sequence reads at 1000X sequencing depth (-2.5GB raw data/sample). The raw data was processed further after necessary quality check with an average Q30>70%.
  • BCL2FASTQ (Illumina) was used to demultiplex the base calls into individual FASTQ files using the following options: - force -no-eamss -fastq-cluster-count 0 - mismatches 1. Reads for which a matching index could not be identified were stored in a set of FASTQ files labeled as “Undetermined Indices”. To monitor possible barcode contamination, all known barcode indices were scanned for over-representation within the undetermined indices. The baits in the intergenic and intronic custom probes target >1000 genomic regions covering common SNPs, apart from the fingerprint markers to identify potential sample mix up. Further, alternate allele reads at homozygous variant sites in the patient genome were used to detect contamination. Samples with average minor allele frequency at homozygous sites >1% were flagged and in case of availability of normal- matched, it was used to define homozygous sites.
  • CNVs were identified by comparing sequence coverage of targeted regions in a tumor sample with a standard diploid normal sample and the depth of coverage was computed using standard tools and the Loess normalization procedure. Target regions in the lowest fifth percentile of coverage in >20% of all normal control samples were removed from analysis. The following criteria were used to determine significance of whole-gene gain or loss events: fold change >2.0 (gain) and ⁇ -2.0 (loss). Matched normal samples were subjected to the same copy number variant calling algorithm, using the same set of normal control samples, with modified thresholds for detecting germline events: fold change >1.3x (single copy gain, ideally 1.5x) and ⁇ -1.8x (single copy loss, ideally -2. Ox). The resulting germline calls are subtracted from the total set of copy number calls made on the tumor sample, to ensure that the final set of copy number variants from the tumor sample are somatic.
  • Integrative approach that combines long & short range paired end and split read analysis was used for the identification of SVs at single base resolution.
  • This approach of SV calling which includes deletions, tandem duplications, inversions and translocations delivers with high specificity, sensitivity and precision in a wide size spectrum. This increases specificity by eliminating germline structural aberrations as well as false-positive events, such as systematic sequencing/mapping artifacts. All candidate structural aberrations were filtered, annotated using in house tools, and manually reviewed using the Integrative Genomics Viewer (IGV).
  • IIGV Integrative Genomics Viewer
  • Annotated SNV Annotated SNV and indel calls were subjected to a series of filtering steps as listed further to ensure retainment of high confidence calls in the final step of manual review wherein the filters include the below points: i) evidence in literature for being an oncogenic or recurrent hotspot mutation; ii) occurrence of variant in previously run pool of normal controls (ie, reproducible assay artifacts); iii) technical characteristics of the variant call: coverage depth, number of mutant reads supporting the variant, and variant frequency; and iv) annotation-based filters: location (eg. exonic versus non-exonic) and effect (eg. nonsynonymous versus silent).
  • Variants listed in COSMIC were considered hotspot point mutations if they presented with >5 mentions and occurred in exons of the 500 genes targeted in this assay. These variants were subjected to lower requirements on coverage, number of mutant reads, and variant frequency to be considered as high-confidence calls where machine learning was applied to prioritize variants of clinical importance.
  • variant prioritisation is a complicated and ever-changing field.
  • the implementation of the ACMG guidelines in 2015 aided in driving consistency and transparency.
  • This standard process, and its reliance on structured data, has ultimately paved the way for the development of Machine learning models (ML).
  • ML Machine learning models
  • Variant prioritisation denotes the conclusion that the available evidence is sufficient to prove the role of the variant in disease development.
  • the development of the ML method aids in the identification and prioritisation of pathogenic variants relevant to disease/phenotype.
  • a diseased patient's whole exome sequencing contains approximately 50,000 variants, 50 of which are pathogenic variants relevant to disease.
  • Prioritizing these 50 pathogenic variants relevant to disease can be done with the help of ML methods by using different attributes such as variant feature (secondary structure, 3D structure feature, conservation score, amino acid).
  • First-tier variants ie, well-characterized hotspot mutations
  • the filtering criteria for first-tier variants were: coverage depth >20x, mutant reads >8, and variant frequency >2%, as compared to second-tier variants which follows coverage depth >20x, mutant reads >10, and variant frequency >5%.
  • Genomic sites with evidence of length instability on sequencing were first identified. To calculate MSI status, more than 100 intronic homopolymer repeat loci with adequate coverage on iTREAT bait set were analyzed for length variability and compiled into an overall MSI score. Each chosen locus has hgl9 reference repeat length of 10-20 bp; long enough to produce a high rate of DNA polymerase slippage, while short enough to fit within the 50-bp read length of NGS to facilitate alignment to the human reference genome. The baseline reference value was first calculated as the mean number of unique repeat lengths at each mononucleotide tract across a population of MS-Stable samples. For each locus, a minimum read depth of 30 or more was set for inclusion in baseline calculation.
  • the number of alleles detectable at a site was required to be proportional to its read depth, with considerable read number resulting in greater ability for low-prevalence allele discrimination.
  • the number of reads from alleles of each observed length compared to the reference genome i.e., -2, -1, +1, and +2 were expressed as percentage number of reads counted for the most frequently occurring allele, (3) Alleles with ⁇ 5% of the reads counted even for the most frequently observed allele were excluded. Tumor instability could potentially be reflected by low-prevalence alleles present at ⁇ 5% abundance.
  • MSI-High MSI-High
  • MSI-ambiguous MSI-ambiguous
  • MSS microsatellite stable
  • TMB Tumor Mutation Burden
  • TMB is defined as the number of true somatic, coding, base substitution, and indel mutations per megabase of genome. All base substitutions and indels in the coding region of targeted genes, including synonymous alterations, were initially considered before filtering as described below. Though synonymous mutations are not likely to be directly involved in immunogenicity, their presence could be a signal of mutational processes that may have resulted in nonsynonymous mutations and neoantigens elsewhere in the genome.
  • the claimed systems and methods are affordable and covers plurality of genomic alterations for predicting cancer or its prognosis.
  • the systems and methods can be used on both tissue as well as liquid biopsy.
  • the method can be used in predicting response to various anti-cancer therapies.
  • the method helps in generating user active actionable interactive reports based on one or more gene variants detected in the patient.
  • the method utilizes User Interface (UI) based friendly software to develop one or more algorithms for detecting gene variants in an individual.
  • UI User Interface
  • ADME Absorption, Distribution, Metabolism & Excretion

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Immunology (AREA)
  • Zoology (AREA)
  • Pathology (AREA)
  • Wood Science & Technology (AREA)
  • Hospice & Palliative Care (AREA)
  • Microbiology (AREA)
  • Oncology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

In the present disclosure, the genomic abnormalities are identified by receiving and genomic data associated with one or more tumor samples, where a presence of at least one genomic alteration in the genomic data is determined. Further, gene variants, related to the at least one genomic alteration, are identified and values associated with technical characteristic of the gene variants is determined. A uniform sequencing of genes and gene regions from the one or more test samples is optimized by selecting baits for selection of target genes or gene regions to be sequenced. The values are compared to threshold values to determine prognosis of cancer. The genomic abnormalities associated with cancer are identified upon exceedance of the values corresponding to the threshold values. At last, the data associated with the prognosis of cancer is displayed.

Description

FIELD OF THE INVENTION
[001] The present invention broadly relates to a system and method for predicting and identifying genomic abnormalities associated with cancer in a subject. More specifically, the invention relates to a system and a method for identifying genomic abnormalities in a set of genes that leads to cancer or determine cancer prognosis and predict response to the treatment in a subject.
BACKGROUND OF THE INVENTION
[002] It is known that the population of cells that make up a cancer are heterogeneous and complex in nature and contain different genomic modifications and variations affecting growth characteristic and sensitivity to several drugs and other anti-cancer therapies. A gene may contain various genomic alterations in either the coding or the untranslated regions which could affect the biological activity of a gene in an individual. Such genomic alterations include, but are not limited to, mutations, insertions, deletions, and substitutions of one or more nucleotides.
[003] Accordingly, prediction and prognosis of cancer typically involves study of alterations of certain indicator genes such as molecular biomarkers, mutation in gene, etc. The biomarkers and genetic aberrations help to discriminate between cancer and healthy subjects as well as being useful in the prognosis and monitoring of cancer. In this regard, genomic panel testing offers the potential to evaluate a large number of biomarkers at a single time and identify any mutation, deletion or/and amplification of genes in specific biologic pathways associated with cancer. A significant difference in the expression of these biomarkers in the sample as compared to a predetermined standard of each result into the diagnosis or aiding in the diagnosis of cancer.
[004] The prior art suggests that few of the recent technologies are based on panel of markers identifying genomic abnormalities, however, all of them represent a shorter gene panel which are not able to cover larger spectrum of chemotherapeutic markers and therefore, not suitable for use where specific and target results are desired. Hence, such a conventional approach of using short gene panel having gene markers may impose various challenges for identifying a deletion, mutation and/or amplification of the marker genes. Moreover, the shorter gene panels that are currently available are not sufficient to deliver a comprehensive diagnostic and therapeutic profiling especially in cancer.
[005] Therefore, with increasing incidences of cancer every year worldwide, more accessible and affordable tests are highly essential to cater the clinical needs of the people. Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional assays. Hence, there is a dire requirement of an assay or system for predicting genomic abnormalities associated with cancer data analysis and provide wide gene coverage with selected genes that are highly cancer specific with greater affordability to suffice the needs of all economic zones of cancer patients.
OBJECT OF THE INVENTION
[006] An object of the present invention is to provide a novel and improved system for identifying genomic abnormalities in a gene panel associated with cancer in a test sample.
[007] Another important object of the present invention is to provide a method for identifying genomic abnormalities associated with cancer in the gene panel.
[008] Yet another object of the present invention is to predict the presence, prognosis or absence of genomic abnormalities associated with cancer.
[009] Another object of the invention is to provide an assay that can assess the therapeutic efficacy of a cancer treatment and determine whether a subject potentially is developing cancer or not. [0010] Yet another object of the present invention is to provide a specific and cost-effective system for identifying genomic abnormalities associated with cancer.
SUMMARY OF THE INVENTION
[0011] The present disclosure seeks to provide a solution to the existing problems associated with the conventional approach of using short gene panel of gene markers for predicting genomic alteration due to a deletion, mutation and/or amplification of the cancer specific genes. Chemotherapy response prediction by genomic tests also needs large panel of gene markers as the drugs follow liver metabolic rate for their action.
In an embodiment, the present invention discloses a system for identifying genomic abnormalities in a gene panel associated with cancer in one or more test samples, the system comprising: a memory configured to receive genomic data associated with the one or more test samples and store the genomic data; a processor communicatively coupled to the memory, wherein the processor is configured to implement an artificial intelligence platform to: determine a presence of at least one genomic alteration in the genomic data; upon detecting the presence of the at least one genomic alteration, identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants in the gene panel, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; and identifying genomic abnormities upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and a display unit communicatively coupled to the processor to display data associated with the prognosis of cancer.
In another embodiment, the present invention discloses a method for identifying genomic abnormalities associated with cancer in the gene panel, the method comprising: receiving genomic data associated with one or more tumor samples and store the genomic data; determining a presence of at least one genomic alteration in the genomic data; identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; identifying genomic abnormalities associated with cancer upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and displaying data associated with the prognosis of cancer.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0012] Detailed embodiments of the present disclosure are described herein; however, it is to be understood that disclosed embodiments are merely exemplary of the present disclosure, which may be embodied in various alternative forms. Specific process details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the present disclosure in any appropriate process.
[0013] The terms used herein are for the purpose of describing exemplary embodiments only and are not intended to be limiting. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” or “comprising,” when used in this specification, do not preclude the presence or addition of one or more components, steps, operations, and/or elements other than a mentioned component, step, operation, and/or element.
[0014] The terms “cancer” refers to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth. In a preferred embodiment, the term cancer includes, but is not limited to, solid tumors, such as breast, ovarian, prostate, lung, kidney, gastric, colon, testicular, head and neck, pancreas, brain, melanoma, and other tumors of tissue organs and hematological tumors, such as lymphomas and leukemias, including acute myelogenous leukemia, chronic myelogenous leukemia, chronic lymphocytic leukemia, T cell lymphocytic leukemia, and B cell lymphomas. [0015] The term “prognosis” may be used herein to refer to the prediction of the likelihood of cancer-attributable death or progression, including recurrence, metastatic spread, and drug resistance, of a neoplastic disease in a patient.
[0016] The term “prediction” may be used herein to refer to the likelihood that a patient will have a particular clinical outcome, whether positive or negative, following one or more treatments such as surgical removal or chemotherapy or radiation therapy of the primary tumor. The predictive methods of the present invention can be used clinically to make treatment decisions by choosing the most appropriate treatment modalities for any particular patient.
[0017] The system and method disclosed herein the present invention comprises a cancer-specific gene panel, which are majorly selected on the basis of targeted cancer therapy and management. The gene panel covers majority of genes or biomarkers that are differentially expressed in tissue of cancer patients versus those of normal healthy tissue. The said assay may also cover DNA fingerprinting markers in order to avoid gender misinterpretation and mix-up of specimens from different individuals.
[0018] The assay offers high and uniform coverage depths of lOOOx thereby preventing the loss of genomic information relevant to the cancer treatment. The assay also covers DNA fingerprinting markers to avoid any gender misinterpretation and mix-up of specimens from different individuals, wherein the invention further comprises a computer implemented system including a data processing module and pharmacogenomics database for evaluating the response to targeted cancer therapy. The said system characterizes the altered gene by locating gene variant or type of variant for preparation of a library comprising plurality of targeted genes obtained from a sample, e.g. tumor sample. Further, gene sequencing protocols is optimized for capturing one or more gene variants through clinical annotation of each genomic alteration, using pharmacogenomics database to provide high accuracy important for cancer progression, therapeutic response and resistance. It is worth noting that the data processing module is configured to generate interactive reports based on variants detected in the patients and provides the generated interactive report via a user interface (UI) to users.
[0019] Moreover, the plurality of cancer specific genes and genomic regions are carefully identified and collected to design a gene panel which can be used for deducing the constitutional genome and the tumor-associated genomic alterations relevant to cancer diagnosis, prognosis and treatment response. The gene panel of the present invention may help in identification of those cancer genes which are difficult to sequence and may provide vital information pertaining to cancer prognosis and treatment. The said gene panel covers genomic alterations in a test sample from a group comprising Single Nucleotide Variants (SNVs), Indels, rearrangements and Copy Number Variations (CNV) specific to the cancer type. In addition to the above mentioned, the assay also renders Tumor Mutation Burden (TMB) and Microsatellite instability (MSI). It is pertinent to note that the assay may simultaneously screen plurality of genes in a test sample to identify the genomic abnormalities.
[0020] Altogether, the assay is useful in determining the ability of a subject to respond to a particular therapy for treating cancer in patients and also in predicting the presence as well as the stage of cancer, by assessing the therapeutic efficacy of a cancer treatment and determining cancer development and progression in a subject. Accordingly, embodiments of the present disclosure substantially eliminate the aforementioned problems in the prior art and helps in overcoming the current clinical and economical needs of cancer diagnostics and therapy at an affordable cost.
In a preferred embodiment, the present disclosure provides a system for identifying genomic abnormalities in a gene panel associated with cancer in one or more test samples, the system comprising: a memory configured to receive genomic data associated with the one or more test samples and store the genomic data; a processor communicatively coupled to the memory, wherein the processor is configured to implement an artificial intelligence platform to: determine a presence of at least one genomic alteration in the genomic data; upon detecting the presence of the at least one genomic alteration, identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants in the gene panel, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; and identifying genomic abnormities upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and a display unit communicatively coupled to the processor to display data associated with the prognosis of cancer.
In another embodiment, at least one genomic alteration is mutation, insertion, deletion, and substitution of one or more nucleotides.
In still another embodiment, at least one genomic alteration comprises single nucleotide variants (SNVs), indels, rearrangements, tumor mutation burden (TMB), microsatellite instability (MSI) and copy number variations (CNV). In yet another embodiment, the processor is configured to determine the presence of the at least one genomic alteration by identifying the genes to obtain the gene panel.
In still another embodiment, the system further comprises: a data repository; and a data processing module communicatively coupled to the data repository, wherein the data processing module is configured to: train a model associated with the artificial intelligence platform using at least one training dataset containing the genomic data; and store the trained model having the at least one training dataset in the data repository.
In yet another embodiment said test sample is selected from, but not limited to blood, plasma, semen, and tissue biopsy.
In another embodiment, the present invention discloses a method for identifying genomic abnormalities associated with cancer in the gene panel, the method comprising: receiving genomic data associated with one or more tumor samples and store the genomic data; determining a presence of at least one genomic alteration in the genomic data; identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; identifying genomic abnormalities associated with cancer upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and displaying data associated with the prognosis of cancer.
In still another embodiment, determining the presence of at the least one genomic alteration is performed by: i. identifying and sequencing the genes to obtain gene panel; ii. acquiring a deep-reads for genes or gene regions with a next generation sequencing method; and iii. mapping said read by an alignment method to assign a nucleotide base to identify the gene alteration.
In still another embodiment, the gene panel is obtained by: i. identifying the genes, specific for cancer; ii. designing custom DNA probes for targeted sequencing of all exons and selected introns; iii. validating the genes using KOL; iv. obtaining the gene panel comprising cancer hallmark factors involved in cell cycle, cell adhesion and cell proliferation, local tissue invasion and apoptosis, metabolism, angiogenesis and immunosurveillance for an unambiguous prediction.
In still another embodiment, said gene panel comprises genomic alterations selected from, but not limited to the Single Nucleotide Variations (SNVs), Indels, Copy Number Variations (CNVs) and Structural Variants (SVs)
In yet another embodiment, the gene panel contains probes capturing a 100-bp region in the TERT promoter.
In still another embodiment, the gene panel contains the probes having tile positions of >1000 common SNVs.
[0021] In patients with or at risk of developing cancer, identifying certain prognostic criteria and biomarkers provide some guidance in selecting appropriate course of treatment and response to such treatment and cancer prognosis. Preferably, the assay comprises detecting genomic alteration in plurality of biomarker(s) in a test sample, wherein any mutation, deletion and/or amplification in a biomarker or combination of biomarkers is indicative of cancer prognosis.
[0022] In a preferred embodiment, the test sample used in the present invention for identifying one or more genomic alteration may include, but are not limited to, blood, amniotic fluid, plasma, semen, bone marrow, and tissue biopsy. The test sample from an individual is screened to identify the presence or absence of one or more genomic alteration in cancer specific genes and predict the response to an anti-cancer therapy.
EXAMPLES:
[0023] The embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings (if any), which form a part hereof, and which show, by way of illustration, specific example embodiments. Moreover, the examples and limitations disclosed herein are intended to be not limiting in any manner, and modifications may be made without departing from the spirit of the present disclosure. Those skilled in the art will recognize that many variations are possible within the spirit and scope of the disclosure, and their equivalents, in which all terms are to be understood in their broadest possible sense unless otherwise indicated. The following detailed description is not intended to be taken in a limiting sense.
Example 1. Design and Sequencing Protocol
In order to determine the presence or absence of genomic alteration in said assay, sequence optimization was performed by DNA extraction, library preparation and gene sequencing. The sequencing can be performed by a variety of methods that are known to those skilled in the art. The assay comprising such optimized sequencing protocols render uniform and high coverage depths of lOOOx across the targeted regions of the genome thereby preserving the vital information to deliver appropriate and useful sequencing information for efficient and interactive report generation.
Briefly, to identify cancer mutations and determine a mutational load of an individual, a sample with blood or serum or tumor tissues was analyzed to deliver most useful sequencing information for efficient and Formalin-Fixed Paraffin- Embedded (FFPE) interactive report generation. Interestingly, circulating tumor DNA (ctDNA) which is the fragment of genomic material or DNA content released by tumors, cancer, and malignant cells into the blood circulation may also be used to perform liquid biopsy. Samples identified with mutations were further used in screening, identification and analysis of gene variants for prognosis and monitoring of cancer in a subject.
[0024] In order to perform the experiment, custom DNA probes were designed for targeted sequencing of all exons and selected introns of the 500 genes (Gene coordinated provided) thoroughly studied and validated by KOLs to be clinically relevant as listed in Table 1. The table includes more than5000 coding exons of canonical transcript isoforms, more thanlOO exons of non-canonical transcripts, as well as probes targeting 35 introns of 15 recurrently rearranged genes. The gene panel was designed to include all major cancer hallmark factors involved in cell cycle, cell adhesion and cell proliferation, local tissue invasion and apoptosis, metabolism, angiogenesis and immunosurveillance for an unambiguous prediction. In order to capture the target genes/gene regions, custom designed gene panel probes/primers were used, and libraries were prepared from a sample (Agilent SureSelect XT HS Target Enrichment Paired End kit is used). A deepreads for genes/gene regions was acquired with a next generation sequencing method followed by mapping said read by an alignment method to assign a nucleotide base (e.g., calling a mutation, e.g., with a Bayesian method.), followed by annotation using various public and in house curated databases
Further, the gene panel covers all four genomic alterations for better prediction of prognosis and drug recommendations such as Single Nucleotide Variations (SNVs), Indels, Copy Number Variations (CNVs) and Structural Variants (SVs). Preferably, the term “gene panel” is used in a broadest sense and covers plurality of genes that are likely to respond to therapeutics comprising immunotherapy, chemotherapy, targeted therapy where said specific genes are highly cancer specific (Table 1).
Table 1: List of cancer specific genes screened in the assay
Figure imgf000015_0001
Figure imgf000016_0001
Figure imgf000017_0001
promoter. The panel covers probes that tile positions of >1000 common SNVs, which serve four purposes: i. ADME polymorphic sites to determine impact of chemotherapy (both efficacy & toxicity) ii. patient-specific fingerprint markers to identify sample mix upto detect trace amounts of contaminating DNA by identifying presence of alternate alleles at homozygous sites; and iii. to supplement CNV analysis in regions wherein few target genes are located and the probes work similar to a low-density SNP tiling array with locations evenly distributed across the genome-coverage values at these positions.
Further, targeted exome sequencing libraries were prepared from QC qualified samples using iNDX IP customized Kit. Briefly, the workflow involves shearing of DNA, repairing ends, adenylation of 3’ ends, followed by adapter ligation. 10 - 50ng of gDNA/ctDNA was used for fragmentation. The adapter sequences were added onto the ends of DNA fragments to generate paired-end libraries. The resulting adaptor-ligated libraries were purified, qualified and hybridized to biotinylated capture library of custom panel. After hybridization, the targeted molecules were captured on streptavidin beads. The resulting DNA libraries were enriched and multiplexed by adding index tags for amplification, followed by purification and assessed for both quality and quantity. The PCR enriched adapter ligated libraries were then quantitated by Qubit dsDNA HS assay (Thermo Fisher Scientific, Waltham, MA, USA) and validated using TapeStation (Agilent Technologies, Santa Clara, CA). Resulting validated libraries were pooled in equimolar ratio and sequenced on NextSeq500 platform (Illumina, USA) to generate 2X150 bp sequence reads at 1000X sequencing depth (-2.5GB raw data/sample). The raw data was processed further after necessary quality check with an average Q30>70%.
Example 2. Demultiplexing and QC
[0025] BCL2FASTQ (Illumina) was used to demultiplex the base calls into individual FASTQ files using the following options: - force -no-eamss -fastq-cluster-count 0 - mismatches 1. Reads for which a matching index could not be identified were stored in a set of FASTQ files labeled as “Undetermined Indices”. To monitor possible barcode contamination, all known barcode indices were scanned for over-representation within the undetermined indices. The baits in the intergenic and intronic custom probes target >1000 genomic regions covering common SNPs, apart from the fingerprint markers to identify potential sample mix up. Further, alternate allele reads at homozygous variant sites in the patient genome were used to detect contamination. Samples with average minor allele frequency at homozygous sites >1% were flagged and in case of availability of normal- matched, it was used to define homozygous sites.
Example 3. Alignment
[0026] Removal of adapter sequences is inevitable prior to alignment which was done using a tool with highly automated quality control and data filtering features that analyse paired sequence overlap for pair-end sequencing data. Read pairs with insert size length <25 bp were discarded. Local alignment tool was used to transform regions with misalignments (due to indels) into clean reads. A sequence data processing tool was utilized for identifying systematic errors in base quality scores in a two-step process, building of a covariation model and adjusting the base quality scores. Recalibrated quality scores were subjected to base quality threshold of 20, corresponding to a 1/100 chance of error.
Example 4. Variant Calling
Example 4.1. SNV and InDei Calling
[0027] Paired-sample variant calling on tumor samples and their respective matched normal samples was performed to identify point mutations/SNVs and small indels (<50 bp in length). In instances where a matched normal sample was unavailable, tumor samples were considered as un-matched samples, and variant calling was performed using a within- batch normal control sample. For accurate variant calling, a pipeline included a tool which simultaneously calls all four categories of variants by local reassembly and scales linearly to sequencing depth. The following standard filters were applied to the raw output as a first pass: variant frequency in tumor/variant frequency in normal >5x, number of mutant alleles reads in tumor sample >5, variant frequency in tumor sample >1%. For unmatched normal sample variant calling, a MAF of >1% as per 1000 Genomes/ExAC databases were excluded stating the possibility of these being common somatic mutations in a population. Further the variants were annotated by the gene-based, region-based as well as filter - and functionality-based fields compliant with the Human Genome Variation Society (HGVS) standards.
Example 4.2. CNV Calling
[0028] CNVs were identified by comparing sequence coverage of targeted regions in a tumor sample with a standard diploid normal sample and the depth of coverage was computed using standard tools and the Loess normalization procedure. Target regions in the lowest fifth percentile of coverage in >20% of all normal control samples were removed from analysis. The following criteria were used to determine significance of whole-gene gain or loss events: fold change >2.0 (gain) and < -2.0 (loss). Matched normal samples were subjected to the same copy number variant calling algorithm, using the same set of normal control samples, with modified thresholds for detecting germline events: fold change >1.3x (single copy gain, ideally 1.5x) and < -1.8x (single copy loss, ideally -2. Ox). The resulting germline calls are subtracted from the total set of copy number calls made on the tumor sample, to ensure that the final set of copy number variants from the tumor sample are somatic.
Example 4.3. Structural Variant Calling
[0029] Integrative approach that combines long & short range paired end and split read analysis was used for the identification of SVs at single base resolution. This approach of SV calling which includes deletions, tandem duplications, inversions and translocations delivers with high specificity, sensitivity and precision in a wide size spectrum. This increases specificity by eliminating germline structural aberrations as well as false-positive events, such as systematic sequencing/mapping artifacts. All candidate structural aberrations were filtered, annotated using in house tools, and manually reviewed using the Integrative Genomics Viewer (IGV). Similar to SNVs and indels, known rearrangements with strong literature support were subjected to less stringent filtering criteria (ie, three paired or split reads, mapping quality >5, length >500 bp) compared to novel rearrangements (ie, five paired or split reads, mapping quality >20, length >500 bp).
Example 5. Prioritization and Interpretation
Example 5.1. Hotspots with high-confidence filtering
[0030] Annotated SNV Annotated SNV and indel calls were subjected to a series of filtering steps as listed further to ensure retainment of high confidence calls in the final step of manual review wherein the filters include the below points: i) evidence in literature for being an oncogenic or recurrent hotspot mutation; ii) occurrence of variant in previously run pool of normal controls (ie, reproducible assay artifacts); iii) technical characteristics of the variant call: coverage depth, number of mutant reads supporting the variant, and variant frequency; and iv) annotation-based filters: location (eg. exonic versus non-exonic) and effect (eg. nonsynonymous versus silent).
[0031] Prior knowledge from the literature was incorporated in the analysis through a two-tiered variant filtering scheme: 1. Variants corresponding to known hotspot mutations with extensive supporting evidence in the medical literature which included i) hotspot in COSMIC; ii) mutation hotspots in The Cancer Genome Atlas (TCGA); and iii) indels in selected exons of established oncogenes (ie, KIT exons 9 and 11, ERBB2 exon 20, EGFR exons 18, 19 and 20).
Variants listed in COSMIC were considered hotspot point mutations if they presented with >5 mentions and occurred in exons of the 500 genes targeted in this assay. These variants were subjected to lower requirements on coverage, number of mutant reads, and variant frequency to be considered as high-confidence calls where machine learning was applied to prioritize variants of clinical importance.
Briefly, variant prioritisation is a complicated and ever-changing field. By establishing a common language and standard process, the implementation of the ACMG guidelines in 2015 aided in driving consistency and transparency. This standard process, and its reliance on structured data, has ultimately paved the way for the development of Machine learning models (ML). Variant prioritisation denotes the conclusion that the available evidence is sufficient to prove the role of the variant in disease development. The development of the ML method aids in the identification and prioritisation of pathogenic variants relevant to disease/phenotype. For example, a diseased patient's whole exome sequencing contains approximately 50,000 variants, 50 of which are pathogenic variants relevant to disease. Prioritizing these 50 pathogenic variants relevant to disease can be done with the help of ML methods by using different attributes such as variant feature (secondary structure, 3D structure feature, conservation score, amino acid).
Example 5.2. Filtering Based on Technical Characteristics of the Variant Call
[0032] For effective rejection of false-positive calls, an empirical analysis was executed by comparing replicates of normal samples against each other based on threshold on coverage depth, number of mutants reads and variant frequency. First-tier variants (ie, well-characterized hotspot mutations) were considered in a separate class from novel second-tier variants. The filtering criteria for first-tier variants were: coverage depth >20x, mutant reads >8, and variant frequency >2%, as compared to second-tier variants which follows coverage depth >20x, mutant reads >10, and variant frequency >5%. Example 5.3. Position and Impact of Annotation based Filters
[0033] Variants with absolute effects on protein function and transcription were prioritized for manual review as the objective of the iTREAT assay was to identify clinically relevant mutations. As such, non-exonic variants passing all previous criteria were redirected to a separate output file (intronic, untranslated region, intergenic, upstream) with the exception of TERT promoter variants that created new binding sites for ETS transcription factors. Synonymous (ie, silent) exonic variants were similarly conducted. Only calls that impacted protein primary sequence (ie, nonsynonym ous: missense and nonsense, splice site, frameshift indel, in-frame indel) were retained and sent to the final output file for manual review using the IGV. Functional mutations that may exist among the unreported synonymous and non-exonic variants, were stored in a database for retrospective research.
Example 5.3. Microsatellite Instability
[0034] Genomic sites with evidence of length instability on sequencing were first identified. To calculate MSI status, more than 100 intronic homopolymer repeat loci with adequate coverage on iTREAT bait set were analyzed for length variability and compiled into an overall MSI score. Each chosen locus has hgl9 reference repeat length of 10-20 bp; long enough to produce a high rate of DNA polymerase slippage, while short enough to fit within the 50-bp read length of NGS to facilitate alignment to the human reference genome. The baseline reference value was first calculated as the mean number of unique repeat lengths at each mononucleotide tract across a population of MS-Stable samples. For each locus, a minimum read depth of 30 or more was set for inclusion in baseline calculation. The number of alleles detectable at a site was required to be proportional to its read depth, with considerable read number resulting in greater ability for low-prevalence allele discrimination. To normalize the number of alleles observed in a sample with respect to its read depth, the number of reads from alleles of each observed length compared to the reference genome (i.e., -2, -1, +1, and +2) were expressed as percentage number of reads counted for the most frequently occurring allele, (3) Alleles with <5% of the reads counted even for the most frequently observed allele were excluded. Tumor instability could potentially be reflected by low-prevalence alleles present at <5% abundance. This cut-off allows comparison among samples with disparate amounts of sequence coverage and (4) For each locus, mean and SD of the number of alleles was calculated. In general, most samples demonstrated sufficient read coverage to contribute to the calculation of summary statistics at all loci. However, rare loci for which fewer than 3 samples could be used to generate summary statistics were excluded from the final panel.
[0035] The results were further compared against baseline reference values at each locus to assess the instability of microsatellite loci. Data were processed in a similar fashion as in establishing MSI-negative sample baselines. For each sample, microsatellite loci with a read depth of <30 was reported as missing information. The total number of repeats of different lengths with a read count exceeding 5% or more for the most frequently observed allele was tallied after normalization. This tally was compared against the baseline for the same microsatellite locus. If the tally of alleles counted exceeded [mean number of alleles + (3 x SD)] the MSI stable reference value, the locus was scored unstable, and otherwise as stable. This metric provides a statistical framework for evaluating the stability or instability of any particular marker. Finally, the fraction of unstable loci out of the total number of loci analysed was calculated for each test sample. The computed MSI score was then designated one of the following using unsupervised categorization of specimens: MSI-High (MSI-H), MSI-ambiguous, or microsatellite stable (MSS).
Example 5.4. Tumor Mutation Burden (TMB)
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
[0036] TMB is defined as the number of true somatic, coding, base substitution, and indel mutations per megabase of genome. All base substitutions and indels in the coding region of targeted genes, including synonymous alterations, were initially considered before filtering as described below. Though synonymous mutations are not likely to be directly involved in immunogenicity, their presence could be a signal of mutational processes that may have resulted in nonsynonymous mutations and neoantigens elsewhere in the genome.
® Non-coding alterations were not included.
® Hotspot mutations were not included so as to eliminate bias towards cancer specific hotspot genes.
® Somatic alterations in COSMIC and truncations in tumor suppressor genes were not included. ,
® Alterations predicted to be germline by the somatic-germline-zygosity algorithm were not included.
® Germline alterations in dbSNP were not included.
® Germline alterations occurring with two or more counts in the ExAC database were not included per megabase was calculated by dividing the total number of true somatic mutations by the size of the coding region of the targeted territory.
Representation of case concordance as analysed and compared in iTreat assay and Routine diagnostics ADVANTAGES OF THE INVENTION
• The claimed systems and methods are affordable and covers plurality of genomic alterations for predicting cancer or its prognosis.
• The systems and methods can be used on both tissue as well as liquid biopsy.
• The method can be used in predicting response to various anti-cancer therapies.
• The method helps in generating user active actionable interactive reports based on one or more gene variants detected in the patient.
• The method utilizes User Interface (UI) based friendly software to develop one or more algorithms for detecting gene variants in an individual.
• Pancer cancer comprehensive multigene panel
• Only multigene panel which captures ADME gene markers for prediction dosage/chemo drug response/toxicity.
• Fast turn-around time (within 7- 14 days of receipt of sample)
• Affordable and cost effective
• Provides electronic active actionable report to the oncologist so that all relevant information is available at the fingertips to make right decision
• Design has specific genes/gene regions important for
• ADME (Absorption, Distribution, Metabolism & Excretion) which is very important for dose selection and toxicity determination.

Claims

We Claim:
1. A system for identifying genomic abnormalities in a gene panel associated with cancer in one or more test samples, the system comprising: a memory configured to receive genomic data associated with the one or more test samples and store the genomic data; a processor communicatively coupled to the memory, wherein the processor is configured to implement an artificial intelligence platform to: determine a presence of at least one genomic alteration in the genomic data; upon detecting the presence of the at least one genomic alteration, identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants in the gene panel, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; and identifying genomic abnormities upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and a display unit communicatively coupled to the processor to display data associated with the prognosis of cancer.
2. The system as claimed in claim 1, wherein the at least one genomic alteration is mutation, insertion, deletion, and substitution of one or more nucleotides.
27
3. The system as claimed in claim 2, wherein the at least one genomic alteration comprises single nucleotide variants (SNVs), indels, rearrangements, tumor mutation burden (TMB), microsatellite instability (MSI) and copy number variations (CNV).
4. The system as claimed in claim 1, wherein the processor is configured to determine the presence of the at least one genomic alteration by identifying the genes to obtain the gene panel.
5. The system as claimed in claim 1, wherein the system further comprises: a data repository; and a data processing module communicatively coupled to the data repository, wherein the data processing module is configured to: train a model associated with the artificial intelligence platform using at least one training dataset containing the genomic data; and store the trained model having the at least one training dataset in the data repository.
6. The system as claimed in claim 1, wherein said test sample is selected from, but not limited to blood, plasma, semen, and tissue biopsy.
7. A method for identifying genomic abnormalities associated with cancer in the gene panel, the method comprising: receiving genomic data associated with one or more tumor samples and store the genomic data; determining a presence of at least one genomic alteration in the genomic data; identifying one or more gene variants related to the at least one genomic alteration; determining values associated with technical characteristic of the one or more gene variants, wherein the technical characteristic include at least one of a coverage depth of the gene variant, number of mutant reads supporting the gene variant, location of the gene variant, and a frequency of the gene variant; optimizing uniform sequencing of genes and gene regions from the one or more test samples by selecting baits for selection of target genes or gene regions to be sequenced; comparing the values to threshold values to determine prognosis of cancer; identifying genomic abnormalities associated with cancer upon exceedance of the values associated with technical characteristic corresponding to the threshold values; and displaying data associated with the prognosis of cancer.
8. The method as claimed in claim 7, wherein determining the presence of at the least one genomic alteration is performed by: i. identifying and sequencing the genes to obtain gene panel; ii. acquiring a deep-reads for genes or gene regions with a next generation sequencing method; and mapping said read by an alignment method to assign a nucleotide base to identify the gene alteration.
9. The method as claimed in claim 7, wherein the gene panel is obtained by: i. identifying the genes, specific for cancer; ii. designing custom DNA probes for targeted sequencing of all exons and selected introns; iii. validating the genes using KOL; iv. obtaining the gene panel comprising cancer hallmark factors involved in cell cycle, cell adhesion and cell proliferation, local tissue invasion and apoptosis, metabolism, angiogenesis and immunosurveillance for an unambiguous prediction.
10. The method of obtaining the gene panel as claimed in claim 9, wherein said gene panel comprises genomic alterations selected from, but not limited to the Single Nucleotide Variations (SNVs), Indels, Copy Number Variations (CNVs) and Structural Variants (SVs)
11. The method as claimed in claim 9, wherein the gene panel contains probes capturing a 100-bp region in the TERT promoter.
12. The method as claimed in claim 9, wherein the gene panel contains the probes having tile positions of >1000 common SNVs.
PCT/IN2021/050877 2020-09-08 2021-09-08 A system and a method for identifying genomic abnormalities associated with cancer and implications thereof WO2022054086A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN202021038658 2020-09-08
IN202021038658 2020-09-08

Publications (1)

Publication Number Publication Date
WO2022054086A1 true WO2022054086A1 (en) 2022-03-17

Family

ID=80632174

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2021/050877 WO2022054086A1 (en) 2020-09-08 2021-09-08 A system and a method for identifying genomic abnormalities associated with cancer and implications thereof

Country Status (1)

Country Link
WO (1) WO2022054086A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882943A (en) * 2022-03-29 2022-08-09 深圳裕康医学检验实验室 Method and device for analyzing somatic cell variation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014014497A1 (en) * 2012-07-20 2014-01-23 Verinata Health, Inc. Detecting and classifying copy number variation in a cancer genome
WO2015175705A1 (en) * 2014-05-13 2015-11-19 Board Of Regents, The University Of Texas System Gene mutations and copy number alterations of egfr, kras and met
WO2017151524A1 (en) * 2016-02-29 2017-09-08 Foundation Medicine, Inc. Methods and systems for evaluating tumor mutational burden
WO2017196728A2 (en) * 2016-05-09 2017-11-16 Human Longevity, Inc. Methods of determining genomic health risk
WO2018081130A1 (en) * 2016-10-24 2018-05-03 The Chinese University Of Hong Kong Methods and systems for tumor detection
WO2018236852A1 (en) * 2017-06-19 2018-12-27 Jungla Inc. Interpretation of genetic and genomic variants via an integrated computational and experimental deep mutational learning framework
WO2019109086A1 (en) * 2017-12-01 2019-06-06 Illumina, Inc. Methods and systems for determining somatic mutation clonality
WO2019125864A1 (en) * 2017-12-18 2019-06-27 Personal Genome Diagnostics Inc. Machine learning system and method for somatic mutation discovery

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014014497A1 (en) * 2012-07-20 2014-01-23 Verinata Health, Inc. Detecting and classifying copy number variation in a cancer genome
WO2015175705A1 (en) * 2014-05-13 2015-11-19 Board Of Regents, The University Of Texas System Gene mutations and copy number alterations of egfr, kras and met
WO2017151524A1 (en) * 2016-02-29 2017-09-08 Foundation Medicine, Inc. Methods and systems for evaluating tumor mutational burden
WO2017196728A2 (en) * 2016-05-09 2017-11-16 Human Longevity, Inc. Methods of determining genomic health risk
WO2018081130A1 (en) * 2016-10-24 2018-05-03 The Chinese University Of Hong Kong Methods and systems for tumor detection
WO2018236852A1 (en) * 2017-06-19 2018-12-27 Jungla Inc. Interpretation of genetic and genomic variants via an integrated computational and experimental deep mutational learning framework
WO2019109086A1 (en) * 2017-12-01 2019-06-06 Illumina, Inc. Methods and systems for determining somatic mutation clonality
WO2019125864A1 (en) * 2017-12-18 2019-06-27 Personal Genome Diagnostics Inc. Machine learning system and method for somatic mutation discovery

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ABEL HJ ET AL.: "Detection of structural DNA variation from next-generation sequencing data: a review of informatics approaches", CANCER GENETICS, vol. 206, no. 12, 20 November 2013 (2013-11-20), pages 432 - 40, XP055324335, DOI: 10.1016/j.cancergen.2013.11.002 *
KATO MAMORU, NAKAMURA HIROMI, NAGAI MOMOKO, KUBO TAKASHI, ELZAWAHRY ASMAA, TOTOKI YASUSHI, TANABE YUKO, FURUKAWA EISAKU, MIYAMOTO : "A computational tool to detect DNA alterations tailored to formalin-fixed paraffin-embedded samples in cancer clinical sequencing", GENOME MEDICINE, vol. 10, no. 1, 1 December 2018 (2018-12-01), XP055913542, DOI: 10.1186/s13073-018-0547-0 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114882943A (en) * 2022-03-29 2022-08-09 深圳裕康医学检验实验室 Method and device for analyzing somatic cell variation

Similar Documents

Publication Publication Date Title
Wong et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer
US11978535B2 (en) Methods of detecting somatic and germline variants in impure tumors
Jaratlerdsiri et al. Whole-genome sequencing reveals elevated tumor mutational burden and initiating driver mutations in African men with treatment-naïve, high-risk prostate cancer
JP7487163B2 (en) Detection and diagnosis of cancer evolution
EP3766986B1 (en) Detection and treatment of disease exhibiting disease cell heterogeneity and systems and methods for communicating test results
Kovac et al. Recurrent chromosomal gains and heterogeneous driver mutations characterise papillary renal cancer evolution
KR102384620B1 (en) Methods and processes for non-invasive assessment of genetic variations
EP3682035A1 (en) Detecting somatic single nucleotide variants from cell-free nucleic acid with application to minimal residual disease monitoring
Chen et al. The strength of selection on ultraconserved elements in the human genome
US20170329893A1 (en) Methods of determining genomic health risk
EP3788173B1 (en) Surrogate marker and method for tumor mutation burden measurement
WO2020064390A1 (en) A noise measure for copy number analysis on targeted panel sequencing data
Pradat et al. Integrative pan-cancer genomic and transcriptomic analyses of refractory metastatic cancer
Uddin et al. Germline genomic and phenomic landscape of clonal hematopoiesis in 323,112 individuals
Quiroz-Zárate et al. Expression Quantitative Trait loci (QTL) in tumor adjacent normal breast tissue and breast tumor tissue
WO2022054086A1 (en) A system and a method for identifying genomic abnormalities associated with cancer and implications thereof
Nicchia et al. Identification of point mutations and large intragenic deletions in Fanconi anemia using next‐generation sequencing technology
Lindskrog et al. An integrated multi-omics analysis identifies clinically relevant molecular subtypes of non-muscle-invasive bladder cancer
Billingsley et al. Genome-wide analysis of structural variants in Parkinson’s disease using short-read sequencing data
KR101818103B1 (en) Apparatus and method for companion diagnosis
Yazar et al. DNA Methylation Analysis in Monozygotic Twins Discordant for ALS in Blood Cells
Nordentoft et al. Whole genome mutational analysis for tumor-informed ctDNA based MRD surveillance, treatment monitoring and biological characterization of urothelial carcinoma
Reddy et al. Identification of novel Alzheimer’s disease genes co-expressed with TREM2
Luft et al. Detecting oncogenic selection through biased allele retention in The Cancer Genome Atlas
Ralli et al. A Weights-based variant ranking pipeline for familial complex disorders

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21866244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21866244

Country of ref document: EP

Kind code of ref document: A1