WO2019140518A1 - Method of use of fat3 in scoliosis - Google Patents

Method of use of fat3 in scoliosis Download PDF

Info

Publication number
WO2019140518A1
WO2019140518A1 PCT/CA2019/050055 CA2019050055W WO2019140518A1 WO 2019140518 A1 WO2019140518 A1 WO 2019140518A1 CA 2019050055 W CA2019050055 W CA 2019050055W WO 2019140518 A1 WO2019140518 A1 WO 2019140518A1
Authority
WO
WIPO (PCT)
Prior art keywords
variant
risk
fat3
subject
gene
Prior art date
Application number
PCT/CA2019/050055
Other languages
French (fr)
Inventor
Mark E. Samuels
Dina Tarek NADA
Alain Moreau
Original Assignee
Chu Sainte-Justine
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chu Sainte-Justine filed Critical Chu Sainte-Justine
Publication of WO2019140518A1 publication Critical patent/WO2019140518A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • A61K38/16Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • A61K38/17Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from animals; from humans
    • A61K38/177Receptors; Cell surface antigens; Cell surface determinants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection

Definitions

  • the present disclosure relates to Idiopathic Scoliosis (IS). More specifically, the present disclosure is concerned with novel markers for the risk of developing IS, including the risk of scoliosis progression.
  • Idiopathic Scoliosis is a common complex disorder of the spine. It is a three-dimensional deformity of the skeleton characterized by a lateral curvature of ⁇ 10° on a standing radiograph (Cobb method), combined with vertebral rotation. It is the most common form of spinal disorder.
  • Genome wide association studies have identified several candidate genes for IS susceptibility including CHL 1 LBX1 GPR126, BNC2, and PAX1 [12-16]
  • SNPs single nucleotide polymorphisms
  • the present disclosure concerns a method of determining whether a subject is at risk of developing Idiopathic scoliosis (IS) comprising: (i) in a biological sample from the subject, detecting the presence or absence of at least one risk variant in at least one allele of the FAT3 gene or a marker in linkage disequilibrium therewith, wherein the detection of at least one risk variant is indicative that the subject is at risk of developing IS.
  • IS Idiopathic scoliosis
  • the risk of developing IS is a risk of developing a severe scoliosis. In embodiments, the risk of developing IS is a risk of scoliosis progression. In embodiments, the risk of developing IS is a risk of developing a severe scoliosis progression.
  • the method comprises determining the genotype of the subject (i.e., the presence or absence of a given variant in both alleles of the FAT3gene of the subject) for the at least one risk variant.
  • the present disclosure concerns a method of genotyping a subject having IS or at risk of developing IS comprising determining the genotype of the subject (homozygous or heterozygous or wild type/ancestral allele) for at least one variant of the FAT3 gene.
  • the present disclosure concerns a method of determining the risk of future parents of having a child suffering from IS (or at risk of developing IS) comprising (i) determining the presence or absence of at least two risk variants in at least one allele of the FAT3 gene in a first biological sample from the first future parent; (ii) determining the presence or absence of the at least two risk variants in at least one allele of the FAT3 gene in a second biological sample from the second future parent; and (iii) determining the risk of the future parents of having a child suffering from IS based on the presence or absence of the at least two risk variants in the first and second biological samples.
  • the at least one IS risk variant is within the coding sequence (i.e., within an exon) of the FAT3 gene. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid which is conserved between human, mouse and rat FAT3 protein sequences. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid which is conserved between species set forth in FIG. 1C. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid located within a cadherin repeat of the FAT3 protein. In embodiments, the at least one IS risk variant introduces a mutation in an amino acid located in the C-terminal region of the FAT3protein.
  • the mutation introduced by the at least one IS risk variant is a non -conservative substitution. In embodiments, the mutation introduced by the at least one IS risk variant is a silent mutation. In embodiments, the variant is located at a position on the FAT3 gene which is set forth in Table 4. In embodiments, the at least one IS risk variant comprises a variant which introduces a mutation at an amino acid position listed in Table 4 (FIG. 1A and FIGs. 2A-J). In embodiments, the at least one IS risk variant is selected from the variants listed in Table 4. In embodiments, the at least one IS risk variant comprises a variant which introduces a mutation at amino acid L517 and/or L4544. In embodiments, the at least one IS risk variant comprises a variant which introduces the mutation L517S and/or L4544F in the FAT3 protein.
  • the method comprises determining the haplotype of the subject for at least two variants shown in
  • the method comprises determining the presence or absence of at least two risk variants. In embodiments, the method comprises determining the presence or absence of at least three risk variants. In embodiments, the method comprises determining the presence or absence of at least four risk variants. In embodiments, the method comprises determining the presence or absence of at least five risk variants. In embodiments, the method comprises determining the presence or absence of at least six risk variants. In embodiments, the method comprises determining the presence or absence of all risk variants identified herein.
  • the biological sample is a blood sample. In embodiments, the biological sample is a cell sample. In embodiments, the biological sample is a protein sample. In embodiments, the biological sample is a nucleic acid sample.
  • the above methods comprise the use of an oligonucleotide probe or primer.
  • the oligonucleotide probe or primer is specific for the detection of a variant sequence (allele) set forth in Table 4 or FIGs. 2A-J.
  • the oligonucleotide probe or primer is specific for the detection of an ancestral sequence (allele) set forth in Table 4 or FIGs. 2A-J.
  • the oligonucleotide probe or primer specific for the ancestral (wild-type or native) allele comprises at least 12 consecutive nucleotides of SEQ ID NO: 1 or the complement thereof.
  • the subject is a male. In embodiments, the subject is a female. In embodiments, the subject is a pediatric subject. In embodiments, the pediatric subject is between 6 and 18 years old. In embodiments, the pediatric subject is between 10 and 15 years old. In embodiments, the subject is a subject is a subject at risk of developing IS.
  • the subject has at least one family member diagnosed with IS (first, second or third degree relative). In embodiments, the subject is diagnosed with IS. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG1 endophenotype. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG2 endophenotype. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG3 endophenotype.
  • the present disclosure relates to a method of treating or preventing IS (e.g, AIS) in a subject comprising: (i) identifying a subject at risk of developing Idiopathic scoliosis (IS) using the method disclosed herein; and (ii) providing a suitable therapy to the subject so as to treat or prevent IS.
  • the therapy comprises wearing a brace.
  • the present disclosure also concerns a method of treating or preventing IS (e.g, AIS) comprising increasing the level of FAT3 protein (native FAT3 protein) in the subject.
  • the method comprises administering an exogenous or recombinant FAT3 polypeptide, or a cell expressing a FAT3 polypeptide, to the subject.
  • the method comprises increasing the expression of the endogenous FAT3 polypeptide or correcting a defective FAT3 gene, e.g. using a genome-editing technique such as the CRISPR/Cas9 system.
  • the present disclosure also concerns the use of an exogenous or recombinant FAT3 polypeptide for treating or preventing IS (e.g, AIS) in a subject, or for manufacture of a medicament for treating or preventing Idiopathic scoliosis in a subject.
  • the present disclosure also concerns an exogenous or recombinant FAT3 polypeptide for treating or preventing IS (e.g, AIS) in a subject.
  • the present disclosure concerns a composition or kit for use in methods disclosed herein (e.g, for example, a kit for (i) detecting a variant in at least one allele of the FAT3 gene in a biological sample; (ii) determining whether a subject is at risk of developing IS; (iii) determining the risk of future parents of having a suffering from IS; (iv) for genotyping a subject for at least one variant in the FAT3gene, etc.).
  • the kit may comprise for example one or more oligonucleotide probes or primers and/or one or more antibodies specific for detection at least one FAT3gene variant.
  • the composition or kit further comprises a biological sample (e.g, polynucleotide or protein sample) from the subject.
  • the present disclosure concerns a DNA chip comprising at least one oligonucleotide for detecting the presence or absence of at least one FAT3 gene variant set forth in Table 4 and a substrate on which the oligonucleotide is immobilized.
  • the variant is an IS risk variant set forth in Table 4.
  • the present disclosure provides oligonucleotide probes or primers for use in the above described methods, compositions, kits, DNA chips, etc.
  • the oligonucleotide is for the specific detection of a variant of the present disclosure and comprises or consists of a nucleotide sequence having a variant nucleotide at a position corresponding to that defined in Table 4.
  • the variant is a risk variant defined in Table 4.
  • the oligonucleotide primer or probe hybridizes to a reference (ancestral) or a variant polynucleotide sequence set forth in Table 4, or FIGs. 2A-J or to its complementary sequence.
  • the oligonucleotide primer or probe further comprises a label.
  • the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 7-58 or the complement thereof and includes the polymorphic nucleotide from the ancestral or variant allele set forth in Table 4 (or its complement).
  • the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56 and 58 or the complement thereof and includes the nucleotide from the variant allele set forth in Table 4 (or its complement).
  • the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 7, 9, 11 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55 and 57 or the complement thereof and includes the nucleotide from the ancestral allele set forth in Table 4 (or its complement).
  • the oligonucleotide primer or probe consists of 10 to 100 nucleotides, preferably 10 to 60, 50 or 40 nucleotides. In embodiments, the oligonucleotide primer or probe consists of at least 12 nucleotides.
  • the present disclosure relates to the use of methods, compositions, kits, oligonucleotide primers or probes, and DNA chips of the present disclosure for (i) detecting a variant in at least one allele of the FAT3 gene in a biological sample; (ii) determining whether a subject is at risk of developing IS; (iii) determining the risk of future parents of having a child likely to suffer from IS; (iv) genotyping a subject for at least one variant in the FAT3 gene, etc.).
  • FIGs. 1A to C show variants identified in the FAT3 gene in IS subjects.
  • A shows the FAT3 protein organization as annotated by NCBI is 4557 amino acids long and includes multiple functional homology domains. The positions of the 26 rare variants identified in our study among the IS cases are labelled from 1 to 26 and are indicated by vertical arrows above the protein schema. The location of two heterozygous mutations present in a multiplex IS family are indicated by the boxes (i.e., #1 and 26).
  • B shows a simplified pedigree and segregation of the FAT3 mutations in one family (ID1581 ) which consisted of three affected sisters and two unaffected parents.
  • C shows a sequence alignment with different species showing that both mutations (L517S and L4544F) affect an invariantly conserved amino acid sequences in FAT3orthologues.
  • FIGs. 2A-J show the amino acid sequence of the human FAT3 protein (SEQ ID NO: 2, NP_001008781.2). Position of nucleotide and amino acid variants identified in IS subjects and listed in Table 4 are underlined and in bold. Novel polymorphic sites are in italic. The nucleotides and amino acids shown on the figure are those of the ancestral (wild- type sequence, SEQ ID NO: 1 (nucleic acid, NM_001008781.2) and SEQ ID NO: 2 (protein, NP_ 001008781.2).
  • FIGs. 2K-L show the nucleotide sequence of the human FAT3 transcript (SEQ ID NO: 1 , NM_001008781.2).
  • FAT3 is expressed in primary osteoblasts, but no significant difference was found in the level of RNA expression in cells from scoliotic patients versus controls. Thus, FAT3was, established as a new genetic factor in the etiology of idiopathic scoliosis.
  • the words“comprising” (and any form of comprising, such as“comprise” and“comprises”),“having” (and any form of having, such as“have” and“has”),“including” (and any form of including, such as“includes” and“include”) or“containing” (and any form of containing, such as“contains” and“contain”) are inclusive or open-ended and do not exclude additional, un-recited elements or method steps and are used interchangeably with, the phrases “including but not limited to” and “comprising but not limited to”.
  • each intervening number there between with the same degree of precision is explicitly contemplated.
  • the numbers 18, 19 and 20 are explicitly contemplated
  • the number 6.0, 6.1 , 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
  • MOLECULAR CLONING A LABORATORY MANUAL, Second edition, Cold Spring Harbor Laboratory Press, 1989 and Third edition, 2001 ; Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987 and periodic updates; the series METHODS IN ENZYMOLOGY, Academic Press, San Diego; Wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; METHODS IN ENZYMOLOGY, Vol. 304, "Chromatin" (P. M. Wassarman and A. P.
  • the FAT3 gene refers to the gene encoding the FAT Atypical Cadherin 3 protein (FAT3 protein) (GCID: GC11 P092314; HGNC: 231 12; Entrez Gene: 1201 14; Ensembl: ENSG00000165323; OMIM: 612483; UniProtTM KB: Q8TDW7; OMIM: 612483, chromosomal location: 1 1q14.3; 1 ; RefSeqGene: NG_052813.1 )
  • FAT3 protein FAT Atypical Cadherin 3 protein
  • the term“Idiopathic scoliosis” or“IS” refers to the common complex disorder of the spine. It is a three-dimensional deformity of the skeleton characterized by a lateral curvature of ⁇ 10° on a standing radiograph (Cobb method), combined with vertebral rotation. It is the most common form of spinal disorder. It mostly occurs at the age of adolescence and affects 1 -4% [1 ] of the global pediatric population with higher prevalence in females who are generally more severely affected than males.
  • IS includes Infantile (age of onset ⁇ 3 years old), Juvenile (age of onset between 3 and 9 years old) and Adolescent (age of onset between 10 and 15 years old) idiopathic scoliosis.
  • a subject“diagnosed with IS” is a subject having a minimum curvature in the coronal plane of 10°, showed by for example a standing posteroanterior spinal radiograph, by the Cobb method with vertebral rotation and without any congenital or genetic disorder which could be the source of the spinal deformity observed.
  • the terms“risk of developing IS” refer to a genetic or metabolic predisposition of a subject to develop a scoliosis (i.e. spinal deformity) and/or a more severe scoliosis at a future time (i.e., curve progression of the spine).
  • a scoliosis i.e. spinal deformity
  • a more severe scoliosis at a future time i.e., curve progression of the spine.
  • an increase of the Cobb’s angle of a subject e.g., from 40° to 50° or from 18° to 25°
  • a“development” of a scoliosis i.e., a scoliosis progression.
  • a subject at risk of developing IS includes asymptomatic subjects which are more likely than the general population to suffer in a future time of IS and includes subjects (e.g., children) having at least one parent, sibling or family member suffering from a scoliosis (either first degree, second degree or third degree relative). It also includes subjects which carry one or more known IS susceptibility markers (SNPs or other mutation/genetic variations).
  • SNPs IS susceptibility markers
  • a subject at risk of developing a scoliosis are asymptomatic subjects (i.e., subjects which do not yet have a spinal deformity of over 10°) but which have been identified as having a GiPCR signaling defect and classified in the FG1 , FG2 or FG3 endophenotype using well known methods (e.g., cAMP measurement, cellular impedance, etc.) as disclosed for example in WO/2003/073102, WO/2010/040234, WO/2012/045176, WO/2015/032005, WO/2014/201560, WO/2014/201557.
  • well known methods e.g., cAMP measurement, cellular impedance, etc.
  • the terms“severe scoliosis”,“severe IS” or“severe scoliosis progression” refers to a scoliosis (or scoliosis progression) with a Cobb’s angle of 40° or more.
  • biological fluid sample refers to blood, saliva, tears, sweat, urine, semen and milk.
  • blood sample is meant to refer to blood, plasma or serum.
  • polynucleotide sample or“nucleic acid sample” is meant to refer to a sample comprising DNA, or RNA (including cDNA) from a test subject.
  • the sample should contain an amount sufficient of polynucleotides for determining the presence or absence of SNPs and/or haplotypes (i.e., for genotyping) disclosed herein according to the selected method.
  • the choice of the sample type will of course depend on the specific conditions of the assay. For examples, gene variants (e.g, SNPs) found in intronic (or other untranscribed) sequences may not be detected using an RNA sample (or cDNA) sample as known in the art.
  • the sample is a cell sample from the subject but is not so limited as long as the polynucleotide sample allows for the detection of the gene variant.
  • the term“subject” is meant to refer to any mammal including human, mouse, rat, dog, chicken, cat, pig, monkey, horse, etc.
  • the subject is a human, for example a pediatric human subject.
  • Polymorphism or “variant”.
  • the genomic sequence within populations is not identical when individuals are compared. Rather, the genome exhibits sequence variability between individuals at many locations in the genome. Such variations in sequence are commonly referred to as polymorphisms, and there are many such sites within each genome. For example, the human genome exhibits sequence variations which occur on average every 500 base pairs.
  • a“polymorphism” or“variant” refers to a variation in the sequence of nucleic acid (e.g, a gene sequence). Such variation includes insertion, deletion, and substitutions in one or more nucleotides.
  • SNPs Single Nucleotide Polymorphisms
  • SNPs Single Nucleotide Polymorphisms
  • SNPs that vary between paired chromosomes in an individual. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e.
  • an SNP thus refers to a variation at a single nucleotide in a given nucleic acid sequence.
  • each version of the sequence with respect to the polymorphic site represents a specific allele of the polymorphic site.
  • sequence variants can all be referred to as polymorphisms, occurring at specific polymorphic sites characteristic of the sequence variant in question.
  • polymorphisms can comprise any number of specific alleles.
  • reference is made to different alleles at a variant/polymorphic site without choosing a reference allele.
  • a reference sequence can be referred to for a particular polymorphic site.
  • the reference allele is sometimes referred to as the "wild-type" allele or“ancestral allele” and refers herein to the allele from a "non- affected" or control/reference individual (e.g., an individual that does not display a trait or disease phenotype i.e., which does not suffer from a scoliosis or which has a lower risk of (or predisposition to) developing a scoliosis).
  • A“gene variant’ refers to a variation (mutation or alteration) in a gene sequence that occurs in a given population.
  • Each polymorphic marker/gene variant has at least two sequence variations characteristic of particular alleles at the polymorphic site.
  • the marker/gene variant can comprise any allele of any variant type found in the genome, including variations in a single nucleotide (SNPs, microsatellites, insertions, deletions, duplications and translocations.
  • SNPs single nucleotide
  • the polymorphic marker/gene variant if found in a transcribed region of the genome can be detected not only in genomic DNA but also in RNA.
  • Polymorphic markers or gene variants of the present disclosure are found in transcribed regions of the genome (were identified following exome sequencing).
  • the polymorphism/variant is found in the gene portion that is translated into a polypeptide or protein, the polymorphic marker/gene variant can be detected at the protein/polypeptide level.
  • the term“defective FAT3gene” as used herein refers to a FAT3 gene comprising one or more mutations that affect the expression of the FAT3 gene and/or that results in a FAT3 protein having reduced activity relative to the native protein.
  • the defective FAT3 gene comprises one or more of the SNPs (variant allele) disclosed herein (e.g, variant allele in Table 4).
  • the polymorphic marker/gene variant of the present disclosure and its specific sequence variation can be detected by various means such as by sequencing the nucleic acid or protein.
  • the biological activity can be evaluated in order to identify which allele is present in the subject’s sample. For example, if a particular risk allele (comprising a risk variant or combination of risk variants) affects the enzymatic activity of the protein, then, the presence of the allele or variant(s) can be assessed by performing an enzymatic test.
  • the presence of the variants(s) can be determined by assessing the expression level (e.g, immunoassays, amplification assays, etc.) of such protein or nucleic acid and comparing it to a reference level in a control sample (e.g, sample from a subject not suffering from a scoliosis or at risk of developing a scoliosis).
  • a control sample e.g, sample from a subject not suffering from a scoliosis or at risk of developing a scoliosis.
  • an “allele” refers to the nucleotide sequence of a given locus (position) on a chromosome.
  • a polymorphic marker allele thus refers to the composition (i.e., sequence) of the marker on a chromosome.
  • Genomic DNA from an individual contains two alleles for any given polymorphic marker, representative of each copy of the marker on each chromosome.
  • A“risk allele”, a“susceptibility allele” or a“predisposition allele” or a“risk variant” is nucleic acid sequence variation that is associated with an increased risk of (i.e. compared to a control/reference) or predisposition to suffering from IS.
  • a“protective allele” or“protective variant” is a sequence variation of a polymorphic marker that is associated with a lower risk of (i.e., compared to a control/reference) or predisposition to suffering from IS.
  • non-conservative mutation or “non-conservative substitution” in the context of polypeptides refers to a mutation in a polypeptide that changes an amino acid to a different amino acid with different biochemical properties (i.e., charge, hydrophobicity and/or size).
  • a non -conservative substitution includes one that changes an amino acid of one group with another amino acid of another group (e.g., an aliphatic amino acid for a basic, a cyclic, an aromatic or a polar amino acid; a basic amino acid for an acidic amino acid, a negatively charged amino acid (aspartic acid or glutamic acid) for a positively charged amino acid (lysine, arginine or histidine) etc.
  • an amino acid of one group e.g., an aliphatic amino acid for a basic, a cyclic, an aromatic or a polar amino acid; a basic amino acid for an acidic amino acid, a negatively charged amino acid (aspartic acid or glutamic acid) for a positively charged amino acid (lysine, arginine or histidine) etc.
  • a“conservative substitution” or“conservative mutation” in the context of polypeptides is a mutation that changes an amino acid to a different amino acid with similar biochemical properties (e.g. charge, hydrophobicity and size).
  • biochemical properties e.g. charge, hydrophobicity and size.
  • a leucine and isoleucine are both aliphatic, branched hydrophobic.
  • aspartic acid and glutamic acid are both small, negatively charged residues. Therefore, changing a leucine for an isoleucine (or vice versa) or changing an aspartic acid for a glutamic acid (or vice versa) are examples of conservative substitutions.
  • “Complement” or “complementary” as used herein refers to Watson-Crick (e.g, A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. "Complementarity” refers to a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary.
  • “Homology” and“homologous” refers to sequence similarity between two peptides or two nucleic acid molecules. Homology can be determined by comparing each position in the aligned sequences. A degree of homology between nucleic acid or between amino acid sequences is a function of the number of identical or matching nucleotides or amino acids at positions shared by the sequences. As the term is used herein, a nucleic acid sequence is "substantially homologous” to another sequence if the two sequences are substantially identical and the functional activity of the sequences is conserved (as used herein, the term “homologous” does not infer evolutionary relatedness, but rather refers to substantial sequence identity, and thus is interchangeable with the terms “identity’Tidentical”).
  • sequence similarity in optimally aligned substantially identical sequences may be at least 60%, 70%, 75%, 80%, 85%, 90% or 95%.
  • the units e.g., 66, 67...81 , 82, ...91 , 92%.
  • Substantially complementary nucleic acids are nucleic acids in which the complement of one molecule is substantially identical to the other molecule. Two nucleic acid or protein sequences are considered substantially identical if, when optimally aligned, they share at least about 70% sequence identity. In alternative embodiments, sequence identity may for example be at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 98% or at least 99%. Optimal alignment of sequences for comparisons of identity may be conducted using a variety of algorithms, such as the local homology algorithm of Smith and Waterman, 1981 , Adv. Appl. Math 2: 482, the homology alignment algorithm of Needleman and Wunsch, 1970, J. Mol. Biol.
  • Sequence identity may also be determined using the BLAST algorithm, described in Altschul et al. (Altschul et al. 1990) 1990 (using the published default settings). Software for performing BLAST analysis may be available through the National Center for Biotechnology Information (through the internet at http://www.ncbi.nlm.nih.gov/).
  • the BLAST algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold.
  • HSPs high scoring sequence pairs
  • Initial neighborhood word hits act as seeds for initiating searches to find longer HSPs.
  • the word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extension of the word hits in each direction is halted when the following parameters are met: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment.
  • One measure of the statistical similarity between two sequences using the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • nucleotide or amino acid sequences are considered substantially identical if the smallest sum probability in a comparison of the test sequences is less than about 1 , preferably less than about 0.1 , more preferably less than about 0.01 , and most preferably less than about 0.001.
  • hybridize to each other under moderately stringent, or preferably stringent, conditions Hybridization to filter-bound sequences under moderately stringent conditions may, for example, be performed in 0.5 M NaFIP04, 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, and washing in 0.2 x SSC/0.1 % SDS at 42°C (Ausubel 2010).
  • hybridization to filter-bound sequences under stringent conditions may, for example, be performed in 0.5 M NaHP04, 7% SDS, 1 mM EDTA at 65°C, and washing in 0.1 x SSC/0.1 % SDS at 68°C (Ausubel 2010).
  • Hybridization conditions may be modified in accordance with known methods depending on the sequence of interest (Tijssen 1993).
  • stringent conditions are selected to be about 5°C lower than the thermal melting point for the specific sequence at a defined ionic strength and pH.
  • the present disclosure identifies gene variants (polymorphic markers) in the FAT3 gene which are associated with a risk of developing IS.
  • detecting the presence of a risk allele (risk variant(s)) in polymorphic markers of one or more of the above genes is indicative of a risk of developing a scoliosis (or predisposition to IS).
  • the level of risk or the likelihood of developing a scoliosis is determined depending on the number of risk-associated variants that are present in cells from a subject.
  • the level of risk is determined by calculating a genetic score (ODD ratio), as well known in the art.
  • the present disclosure encompasses detecting the presence or absence of at least one polymorphic marker [e.g, SNP) in the FAT3gene (e.g, a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 15, 16, 17, 18, 20, 21 , 21 , 22, 23, 24, 25, 26, variants).
  • at least one polymorphic marker e.g, SNP
  • the FAT3gene e.g, a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 15, 16, 17, 18, 20, 21 , 21 , 22, 23, 24, 25, 26, variants.
  • Alleles for gene variants as referred to herein refer to the bases A, C, G or T as they occur at the polymorphic site in the SNP assay employed.
  • the assay employed may be designed to specifically detect the presence of one or both of the two bases possible, i.e. A and G.
  • the presence of the complementary bases T and C can be measured. Quantitatively (for example, in terms of relative risk), identical results would be obtained from measurement of DNA strands (+ strand or - strand).
  • Detecting specific gene variants or polymorphic markers and/or haplotypes of the present disclosure can be accomplished by methods known in the art. Such detection can be made at the nucleic acid or amino acid (protein) level.
  • standard techniques for genotyping for the presence of gene variants can be used, such as sequencing, fluorescence-based techniques (Chen, X. etai, Genome Res. 9(5): 492- 98 (1999)), methods utilizing PCR, LCR, Nested PCR and other methods for nucleic acid amplification.
  • SNP genotyping examples include, but are not limited to, TaqManTM genotyping assays and SNPlexTM platforms (Applied Biosystems), mass spectrometry (e.g., MassARRAYTM system from Sequenom), minisequencing methods, real-time PCR, Bio-PlexTM system (BioRad), CEQ and SNPstreamTM systems (Beckman), Molecular Inversion ProbeTM array technology (e.g, Affymetrix GeneChipTM), and BeadArrayTM Technologies (e.g., Illumina GoldenGate and Infinium assays).
  • TaqManTM genotyping assays and SNPlexTM platforms Applied Biosystems
  • mass spectrometry e.g., MassARRAYTM system from Sequenom
  • minisequencing methods real-time PCR
  • Bio-PlexTM system BioRad
  • CEQ and SNPstreamTM systems Beckman
  • Molecular Inversion ProbeTM array technology e.g, Affymetrix GeneChipTM
  • Linkage disequilibrium is defined as the non-random association of alleles at different loci across the genome. Alleles at two or more loci are in LD if their combination occurs more or less frequently than expected by chance in the population.
  • a particular genetic element e.g., an allele of a polymorphic marker, or a haplotype
  • another element occurs at a frequency of 0.50 (50%)
  • the predicted occurrence of a person’s having both elements is 0.25 (25%), assuming a random distribution of the elements.
  • the two elements occur together at a frequency higher than 0.25, then the elements are said to be in linkage disequilibrium, since they tend to be inherited together at a higher rate than what their independent frequencies of occurrence (e.g., allele or haplotype frequencies) would predict.
  • Identification of additional SNPs in linkage disequilibrium with a given SNP involves: (a) amplifying a fragment from the gene comprising a first SNP from a plurality of individuals; (b) identifying of second SNPs in the gene comprising said first SNP; (c) conducting a linkage disequilibrium analysis between said first SNP and second SNPs; and (d) selecting said second SNPs as being in linkage disequilibrium with said first marker. Subcombinations comprising steps (b) and (c) are also contemplated.
  • Genomic LD maps have been generated across the genome, and such LD maps have been proposed to serve as framework for mapping disease-genes (Risch et al, 1996; Maniatis et al, 2002; Reich et al, 2001 ). If all polymorphisms in the genome were independent at the population level (i.e., no LD), then every single one of them would need to be investigated in association studies, to assess all the different polymorphic states. However, due to linkage disequilibrium between polymorphisms, tightly linked polymorphisms are strongly correlated, which reduces the number of polymorphisms that need to be investigated in an association study to observe a significant association. Another consequence of LD is that many polymorphisms may give an association signal due to the fact that these polymorphisms are strongly correlated.
  • D The two metrics most commonly used to measure LD are D’ and r 2 and can be written in terms of each other and allele frequencies. Both measures range from 0 (the two alleles are independent or in equilibrium) to 1 (the two alleles are completely dependent or in complete disequilibrium), but with different interpretation.
  • D’ is equal to 1 if at most two or three of the possible haplotypes defined by two markers are present, and ⁇ 1 if all four possible haplotypes are present
  • r 2 measures the statistical correlation between two markers and is equal to 1 if only two haplotypes are present.
  • Event like recombination may decrease LD between markers. But, moderate (i.e. 0.5 ⁇ ; r 2 ⁇ 0.8) to high (i.e. 0.8 ⁇ ; r 2 ⁇ 1 ) LD conserve the "surrogate" properties of markers. In LD based association studies, when LD exist between markers and an unknown pathogenic allele, then all markers show a similar association with the disease.
  • SNPs have alleles that show strong LD (or high LD, defined as r2 ⁇ 0.80) with other nearby SNP alleles and in regions of the genome with strong LD, a selection of evenly spaced SNPs, or those chosen on the basis of their LD with other SNPs (proxy SNPs or Tag SNPs), can capture most of the genetic information of SNPs, which are not genotyped with only slight loss of statistical power.
  • this region of LD is adequately covered using few SNPs (Tag SNPs) and a statistical association between a SNP and the phenotype under study means that the SNP is a causal variant or is in LD with a causal variant.
  • a proxy (or Tag SNP) is defined as a SNP in LD (r 2 ⁇ 0.8) with one or more other SNPs.
  • the genotype of the proxy SNP could predict the genotype of the other SNP via LD and inversely.
  • any SNP in LD with one of the SNPs used herein may be replaced by one or more proxy SNPs defined according to their LD as r2 ⁇ 0.8.
  • SNPs in linkage disequilibrium can also be used in the methods according to the present disclosure, and more particularly in the diagnostic methods according to the present disclosure.
  • the presence of SNPs in linkage disequilibrium (LD) with the above identified SNPs may be genotyped, in place of, or in addition to, said identified SNPs.
  • the SNPs in linkage disequilibrium with the above identified SNP are within the same gene of the above identified SNP. Therefore, in the present disclosure, the presence of SNPs in linkage disequilibrium (LD) with a SNP of interest and located within the same gene as the SNP of interest may be genotyped, in place of, or in addition to, said SNP of interest.
  • such an SNP and the SNP of interest have r 2 ⁇ 0.70, preferably r 2 ⁇ 0.75, more preferably r 2 ⁇ 0.80, and/or have D’ ⁇ 0.60, preferably D’ ⁇ 0.65, D’ ⁇ 0.7, D’ ⁇ 0.75, more preferably D’ ⁇ 0.80.
  • such an SNP and the SNP of interest have r 2 ⁇ 0.80, which is used as reference value to define "LD" between SNPs.
  • compositions and kits for use in the methods of the present disclosure may include for example (i) one or more reagents for detecting the presence or absence of one or more FAT3 variants (e.g., one or more variants listed in Table 4 and/or shown in FIGs. 2A-K) or a substitute marker in linkage disequilibrium therewith.
  • the one or more variants can be detected at the protein level.
  • compositions and kits for use in the methods of the present disclosure may include one or more reagents for detecting the presence or absence of one or more mutations in the FAT3 protein (e.g., a mutation listed in Table 4).
  • Compositions and kits can comprise oligonucleotide primers and hybridization probes (e.g, allele-specific oligonucleotide primers and hybridization probes for determining the presence or absence of a variant in the FAT3 gene (e.g, listed in Table 4, FIG.1 and/or FIG.
  • restriction enzymes e.g, for RFLP analysis
  • antibodies that bind to a mutated FAT3 polypeptide (polymorphic polypeptide) which is encoded by a nucleic acid comprising the gene variant of the present disclosure (e.g., a nucleic acid comprising a variant (polymorphic marker) as defined in Table 4 and reagents (e.g, antibodies) for detecting a mutation in the FAT3 protein listed in Table 4 or for detecting the wild-type amino acid).
  • the kit may also include any necessary buffers, enzymes (e.g., DNA polymerase) and/or reagents necessary for performing the methods of the present disclosure.
  • the kit may comprise one or more labeled nucleic acids (or labeled antibody) capable of specific detection of one or more gene variants of the present disclosure (e.g., gene variants defined in Table 4) or any markers in linkage disequilibrium therewith as well as reagents for the detection of the label.
  • Reagents may be provided in separate containers or premixed depending on the requirements of the method.
  • Suitable labels are well known in the art and will be chosen according to the specific method used.
  • suitable labels include a radioisotope, a fluorescent label, a magnetic label, an enzyme, etc.
  • a FAT3gene variant (e.g., defined in Table 4, FIGs. 1 and 3) associated with IS in accordance with the present disclosure may be determined by DNA Chip analysis.
  • DNA chip or nucleic acid microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead.
  • a microchip may be constituted of polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, or nitrocellulose.
  • Probes comprise nucleic acids such as cDNAs or oligonucleotides that may be about 10, 1 1 , 12, 13, 14 or 15 to about 60, 50, 40 or 30 base pairs.
  • a sample from a test subject is labelled and contacted with the microarray in hybridization conditions, leading to the formation of complexes between target nucleic acids that are complementary to probe sequences attached to the microarray surface.
  • the presence of labelled hybridized complexes is then detected.
  • Many variants of the microarray hybridization technology are available to the man skilled in the art.
  • a composition e.g, a diagnostic composition
  • assay mixture which is generated following one or more steps of the methods describe herein and which include a biological sample (e.g., cell sample, blood sample, etc.) from the subject to be tested.
  • the preparation of such composition occurs while testing a subject’s biological sample for the risk of developing a scoliosis (including the risk of developing a more severe scoliosis); for aiding in the prevention and treatment of scoliosis including for determining the best treatment regimen; for adapting an undergoing treatment regimen; for selecting a new treatment regimen or for determining the frequency of a specific treatment regimen or follow-up schedule.
  • Such compositions may be prepared using as kits described herein.
  • compositions and kits of the present disclosure may thus comprise one or more oligonucleotides probe or amplification primer for the detection (e.g, amplification or hybridization) of A FAT3 gene variant of the present disclosure (e.g, a variant or reference sequence defined in Table 4).
  • oligonucleotide probes are provided in the form of a microarray or DNA chip.
  • the kit may further include instructions to use the kit in accordance with the methods of the present disclosure (e.g, for determining the risk of (or predisposition to) developing a scoliosis; for genotyping a subject or for classifying a subject suffering from a scoliosis or at risk of developing a scoliosis in a specific genetic or functional group).
  • Second French-Canadian IS cohort (replication cohort).
  • Ninety-six (96) patients of French-Canadian origin were selected for the replication study, unrelated to each other or to the cases in the discovery cohort. Since 93% of the initial cohort were females and 68% were severe cases, the second cohort were chosen to be all females and severely affected.
  • Thirty-six healthy French-Canadian females were recruited from Montreal’s schools, plus an additional sixty French-Canadian females from the CARTAGENE project [25, 26]
  • French-Canadian multiplex family A rare multiplex French-Canadian family with three affected sisters and healthy parents was ascertained and analyzed by WES analysis.
  • the proband was diagnosed at the age of 13 years old with IS with a right lumbar curve and Cobb angle measuring 15°.
  • Her first sister was diagnosed with IS with a left lumbar curve measuring 23° and the second sister was also diagnosed with IS with right thoracic curve measuring 13°.
  • Genomic DNA extraction Blood was obtained by standard venipuncture. Genomic DNA was extracted from peripheral leukocytes using PureLinkTM genomic DNA kit (Thermo Fisher Scientific, Waltham, Massachusetts, USA).
  • Exome and targeted sequencing were Whole-exome sequencing (WES) for a discovery French-Canadian IS cohort. Exome capture was performed using Agilent SureSelecF Human All Exon 50 Mb v3 according the manufacturer’s recommendations. Sequencing was done using SOLiDTM 5500x1 from Applied Biosystems by Life technologies at the Sainte-Justine University Hospital genomic platform. The average coverage of targeted sites was approximately 100X.
  • Targeted sequencing for selected genes in a replication French-Canadian IS cohort Twenty-four genes were chosen for resequencing in a second French-Canadian cohort. Enrichment of coding exons of these genes was done using Roche Nimblegen EZ Choice custom baits, with bar code multiplexing of 96 samples per lane of sequencing. Sequencing was done on an lllumina HiSeqTM 2000 at the McGill University and Genome Quebec Innovation Centre (MUGQIC). The average coverage of targeted sites was approximately 400X.
  • Library PCR Primer 1 and Library PCR Primer 2 were replaced by SureSelectTM Pre-Capture Primers provided in the SureSelectTM AB Barcoding Library Kit (Agilent).
  • Exome capture was performed using Agilent SureSelectTM XT Human All Exon 50 Mb v3 according to the manufacturer’s recommendations.
  • Final libraries were quantified using the SOLiDTM Library TaqMan® Quantitation Kit. Standard steps were taken thereafter to create enriched, templated beads for the SOLiDTM 5500x1 system. Pools of 8 libraries were loaded on each flowchip (6 lanes). Sequencing was performed in paired-end 50 bases in forward and 25 bases in reverse.
  • HiSeq2500 WES sequencing for the French-Canadian family Genomic DNA was quantified using the QuantiFluorTM dsDNA System (Promega) and 3ug amounts was used as input. Libraries were constructed using the SureSelectTM Target Enrichment System for lllumina Paired-End Multiplexed Sequencing Library protocol and the SureSelectTM T Human All Exon 50 Mb v5 capture kit (Agilent). Final libraries were qualified using a Bioanalyzer (Agilent) and quantified using the KAPA Library Quantification kit for lllumina. The clustering was done on an lllumina cBot using 16pM of pooled libraries. Pools of 4 libraries were loaded on each lane of a High Output flowcell (8 lanes). Sequencing was performed on a HiSeq2500 for 125 cycles in paired-end using HCS 2.2.38 and RTA 1.18.61.
  • gDNA was quantified using the Quant-iTTM PicoGreenTM dsDNA Assay Kit (Life Technologies). Libraries were generated robotically using the KAPA HTP Library Preparation Kit lllumina® platforms (Kapa Biosystems) as per the manufacturer’s recommendations. TruSeqTM adapters and PCR primers were purchased from BioO. Libraries were quantified using the Kapa lllumina GA with Revised Primers-SYBR Fast Universal kit (D-Mark). Average size fragment was determined using a LaChipTM GX (PerkinElmer) instrument.
  • the clustering was done on an lllumina cBot using 1 1 pM of each capture pool (2 captures per lane) and the flowcell was ran on a HiSeqTM 2000 for 100 cycles in paired-end mode using HCS 2.2.58 and RTA 1.18.63 and using the manufacturer’s instructions.
  • Bioinformatics Only protein coding, and near intronic regions were analyzed. The analysis included SNPs, and small insertions or deletions (indels). The SIFT (Sorting Intolerant from Tolerant) [27], PolyPhen-2 (Polymorphism Phenotyping v2) [28] and MutationTaster2 [29] algorithms were used to predict possible impact of amino acid substitutions on the structure and function of a human FAT3 protein in IS patients harboring different FAT3 gene variants.
  • SIFT Small Intolerant from Tolerant
  • PolyPhen-2 Polymorphism Phenotyping v2
  • MutationTaster2 MutationTaster2
  • the discovery cohort was analyzed with the following pipeline:* xsq files were converted to csfasta and qual file using XSQToolsTM. 5’ and 3’ reads were filtered independently using SOLiDTM aware software (http://bioinformatics.oxfordjournals.Org/content/26/6/849.full) and re-balanced to retain only read pairs i.e. singletons removed. The process was tested to ensure the settings used did not remove too much information, just poor-quality reads. Reads were mapped in color space using bfast+bwa-0.7.0a to hg19 at the library level and then merged by sample.
  • SNPs and INDELs were called with samtools 0.1.19 in batch mode i.e. all sample used during calling.
  • SNPEffTM 3.3h was used to add genetic variant information and effect prediction (http://snpeff.sourceforge.net/).
  • GATK IndelRealigerTM (2.5-2) was used to help resolve indels, Picard Mark Duplicates (1.96) to label PCR duplicates, and GATK base recalibration (2.5-2, SOLiDTM specific settings) to re-calibrate base qualities due to various sources of systematic technical error. Variants were annotated using Gemini 0.1 1.1 a (http://gemini.readthedocs.io/en/latest/index.html).
  • Both the replication French-Canadian IS cohort and the French-Canadian family were analyzed using the same pipeline which was slightly different from the discovery cohort.
  • the reads were trimmed and aligned to the reference human genome (hg19) using Picard, BWA (0.5.9) (Li and Durbin, 2009) and Samtools (v.0.1.12a) (Li, et al., 2009).
  • Variants were called using Pileup and varFilter commands, followed by filtering to keep Single nucleotide polymorphisms (SNPs) and insertion-deletions of Phred-like quality scores of more than 20 and 50 respectively.
  • the coverage was approximately 400x for the targeted sequencing and 100x for the five members of the family.
  • Variants were annotated using ANNOVAR (Wang, et at, 2010), according to the type of mutation and frequency in the different data bases. Both SNPs and small insertion/deletions (indels) were considered, and those which are not in either exonic region or in a neighboring potential splice site were filtered.
  • FAT3 gene structure The gene model for FAT3 used by RefSeq does not appear to be supported by long individual human cDNA clones and seems to be based on homology to several long rodent cDNAs. Therefore, to confirm the gene structure, in-house brain RNA-Seq data and WGBS data (Whole Genome Bisulfite) from an unrelated individual not part of the cohorts, as well as from GENCODE public annotations, were analyzed. FAT3 expression was also profiled using the Genotype-Tissue Expression (GTEx) Transcriptome Portal (www. gtexportal.org/home/gene/ FAT3) .
  • GTEx Genotype-Tissue Expression
  • qRT-PCR Quantitative RT-Polymerase Chain Reaction
  • qPCR assay For each qPCR assay, a standard curve was performed to ensure that the efficacy of the assay is between 90% and 1 10%. QPCR reactions were performed using PERFECTA QPCR FASTMIXTM II (Quanta), 2 mM of each primer and 1 mM of the corresponding UPL probe. The Viia7 qPCR instrument (Life Technologies) was used to detect the amplification level and was programmed with an initial step of 20 sec at 95°C, followed by 40 cycles of 1 sec at 95°C and 20 secs at 60°C.
  • EXAMPLE 3 TARGETED SEQUENCING OF THE SELECTED TWENTY-FOUR GENES IN A REPLICATION
  • the replication cohort was chosen to be more homogeneous; all cases were severely affected females. For comparison, in the initial discovery cohort, 93% of IS patients were females and only 68% were severe cases.
  • the replication cohort comprised 96 patients and 96 gender-matched controls. Coding exons of the 24 candidate genes were sequenced in the 192 replication samples, using a custom capture library. After calling and annotating variants, a filtration step was performed to remove poor quality calls and variants with a MAF greater than 1 % (according to 1000 Genomes (EUR) and ESP-EA). For the replication study, a collapsing gene burden test was again employed. Of the 24 candidate genes, only one gene; FAT3, showed significant enrichment for rare variants in the IS patients (Table 2).
  • EXAMPLE 4 EXOME SEQUENCING OF AN INDEPENDENT IS FAMILY
  • a recessive model was considered; either homozygous variant in the three sisters which is heterozygous in the parents or compound heterozygous in which the three sisters have two heterozygous variants in the same gene, each coming from one parent. No genes were found consistent with the de novo dominant or recessive homozygous models. However, the presence of compound heterozygous variants in FAT3was found consistent with the recessive model.
  • the selection of candidate genes from the WES cohort was done before the family study was performed, and the selection of the targeted resequencing genes was unbiased.
  • the two variants of FAT3 gene in the multiplex family are non-synonymous: p.L517S and p.L4544F (FIG. 1). The first variant was present in four cases and one control in the replication cohort, and the second variant was present in one case in the discovery cohort. Both variants in FAT3were confirmed by Sanger sequencing of DNA from all members of the family.
  • the gene model for FAT3 used by RefSeq appears to be supported only by long individual rodent cDNA clones in the NCBI database, whereas there are only fragmentary human cDNA clones documented in the public genome browsers. Therefore, to confirm the human FAT3 gene structure, the in-house brain RNA-Seq data and WGBS data for one individual were analyzed. The results were consistent with the RefSeq gene model (NM_001008781.2) with two exceptions. Just upstream of the 3’ terminal exon, evidence for two alternative exons which were either included or excluded together in various RNA-Seq reads were found. The two exons are also annotated by the GENCODE project website (version 24).
  • FAT3 expression in several tissues was profiled using the GTEx Transcriptome Portal, and a strong enrichment in brain and artery tissues was observed.
  • the tissues in which FAT3 expression was measured are the following (presented in descending order of FAT3 expression level): Brain Nucleus Accumbens (basal ganglia); Brain-Caudate; (basal ganglia); Brain-Putamen (basal ganglia); Artery-Coronary; Brain-Frontal Cortex (BA9); Brain-Cortex; Brain-anterior cingulate cortex (BA24); Artery-Tibial; Artery-Aorta; Brain-Hypothalamus; Brain-Amygdala; Brain-Cerebellum; Esophagus-gastroesophageal junction; Brain-Hippocampus; Bladder; Brain-Substantia nigra; Prostate; testis; Lung; Esophagus-Muscularis; Vagina; Brain-Cerebell
  • the alternative exons 25 and 26 were not captured in the replication capture sequencing because they are not annotated by RefSeq. Hence, direct Sanger sequencing was performed for these two exons in 72 cases of the replication cohort (DNA was not available for the rest of the cases). No rare potentially protein-altering variants were observed among the sequenced cases for either of these two (very small) exons.
  • Li W, Li Y, Zhang L, et al. AKAP2 identified as a novel gene mutated in a Chinese family with adolescent idiopathic scoliosis. Journal of medical genetics 2016:jmedgenet-2015-103684.
  • Li B Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics 2008;83(3):31 1 -21.

Abstract

Disclosed herein are novel FAT3 molecular markers associated with idiopathic scoliosis (IS). Accordingly, the present disclosure concerns novel methods of identifying subjects at risk of developing IS or suffering from IS and of genotyping and classifying IS subjects into genetic and functional groups. Also provided are compositions, DNA chips and kits for applying the methods.

Description

METHOD OF USE OF FAT3 IN SCOLIOSIS
CROSS REFERENCE TO RELATED APPLICATIONS
This application is a PCT application filed on January 16, 2019 and published in English under PCT Article 21 (2), which itself claims benefit of U.S. provisional application Serial No. 62/617,81 1 , filed on January 16, 2018. All documents above are incorporated herein in their entirety by reference.
FIELD OF THE INVENTION
The present disclosure relates to Idiopathic Scoliosis (IS). More specifically, the present disclosure is concerned with novel markers for the risk of developing IS, including the risk of scoliosis progression.
REFERENCE TO SEQUENCE LISTING
Pursuant to 37 C.F.R. 1.821 (c), a sequence listing is submitted herewith as an ASCII compliant text file named 14033_173_SL_ST25, that was created on December 19, 2018 and having a size of 76 kilobytes. The content of the aforementioned file named 14033_173_SL_ST25 is hereby incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION
Idiopathic Scoliosis (IS) is a common complex disorder of the spine. It is a three-dimensional deformity of the skeleton characterized by a lateral curvature of ≥10° on a standing radiograph (Cobb method), combined with vertebral rotation. It is the most common form of spinal disorder. It mostly occurs at the age of adolescence and affects 1 -4% [1 ] of the global pediatric population with higher prevalence in females who are generally more severely affected than males [2] In most cases the underlying cause of idiopathic scoliosis is unknown, although a genetic component is well recognized [3 4] Twin and family studies have documented high rates of concordance among twins and increased risk to relatives of IS patients [5 6] The mode of inheritance is still unclear [7] The genetic nature of the disease is complex, with an apparent high level of heterogeneity between different families [8-10] A number of candidate genes and loci have been suggested by different studies, but few have been successfully replicated [1 1 ] Human genetic studies have used both linkage and association methods. The results of linkage studies have been poorly reproducible [11 ] Genome wide association studies (GWAS) have identified several candidate genes for IS susceptibility including CHL 1 LBX1 GPR126, BNC2, and PAX1 [12-16] The associated common single nucleotide polymorphisms (SNPs) identified to date only explain a small portion of the genetic component of the disease. Genetic interactions [17] and rare variants [18] might well explain this“missing heritability”
[19].
Few studies have attempted to detect rare causal variants in IS and this field of research is still in its infancy. Sequencing using either whole exome or targeted gene panels, has identified several genes which might contribute to the occurrence and or severity of scoliosis; such as FBN1 FBN2 [ 20], HSPG2 l\\ POC5\ 22] and AKAP2 23\. Another study suggested that accumulation of rare variants in a group of genes of the extracellular matrix might contribute to disease risk [24] In sum, the genetic component of idiopathic scoliosis is incompletely understood, leaving significant room for further research. The present description refers to a number of documents, the content of which is herein incorporated by reference in their entirety.
SUMMARY OF THE INVENTION
As described herein, whole exome sequencing (WES) was performed in a French-Canadian IS cohort, followed by a second phase of targeted sequencing of the 24 best candidate genes in a replication cohort. In parallel, WES was performed in a unique multiplex family of three affected sisters with healthy parents. The goal was to find new genes enriched with rare variants, which may be involved in the etiology of the disease. Variants in the FAT3 gene were found to be associated with IS development and/or progression.
Accordingly, in a first aspect, the present disclosure concerns a method of determining whether a subject is at risk of developing Idiopathic scoliosis (IS) comprising: (i) in a biological sample from the subject, detecting the presence or absence of at least one risk variant in at least one allele of the FAT3 gene or a marker in linkage disequilibrium therewith, wherein the detection of at least one risk variant is indicative that the subject is at risk of developing IS.
In embodiments, the risk of developing IS is a risk of developing a severe scoliosis. In embodiments, the risk of developing IS is a risk of scoliosis progression. In embodiments, the risk of developing IS is a risk of developing a severe scoliosis progression.
In embodiments, the method comprises determining the genotype of the subject (i.e., the presence or absence of a given variant in both alleles of the FAT3gene of the subject) for the at least one risk variant.
In a further aspect, the present disclosure concerns a method of genotyping a subject having IS or at risk of developing IS comprising determining the genotype of the subject (homozygous or heterozygous or wild type/ancestral allele) for at least one variant of the FAT3 gene.
In another aspect, the present disclosure concerns a method of determining the risk of future parents of having a child suffering from IS (or at risk of developing IS) comprising (i) determining the presence or absence of at least two risk variants in at least one allele of the FAT3 gene in a first biological sample from the first future parent; (ii) determining the presence or absence of the at least two risk variants in at least one allele of the FAT3 gene in a second biological sample from the second future parent; and (iii) determining the risk of the future parents of having a child suffering from IS based on the presence or absence of the at least two risk variants in the first and second biological samples.
In embodiments, the at least one IS risk variant is within the coding sequence (i.e., within an exon) of the FAT3 gene. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid which is conserved between human, mouse and rat FAT3 protein sequences. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid which is conserved between species set forth in FIG. 1C. In embodiments, the at least one IS risk variant introduces a mutation at an amino acid located within a cadherin repeat of the FAT3 protein. In embodiments, the at least one IS risk variant introduces a mutation in an amino acid located in the C-terminal region of the FAT3protein. In embodiments, the mutation introduced by the at least one IS risk variant is a non -conservative substitution. In embodiments, the mutation introduced by the at least one IS risk variant is a silent mutation. In embodiments, the variant is located at a position on the FAT3 gene which is set forth in Table 4. In embodiments, the at least one IS risk variant comprises a variant which introduces a mutation at an amino acid position listed in Table 4 (FIG. 1A and FIGs. 2A-J). In embodiments, the at least one IS risk variant is selected from the variants listed in Table 4. In embodiments, the at least one IS risk variant comprises a variant which introduces a mutation at amino acid L517 and/or L4544. In embodiments, the at least one IS risk variant comprises a variant which introduces the mutation L517S and/or L4544F in the FAT3 protein.
In embodiments, the method comprises determining the haplotype of the subject for at least two variants shown in
FIGs. 2A-J or Table 4.
In embodiments, the method comprises determining the presence or absence of at least two risk variants. In embodiments, the method comprises determining the presence or absence of at least three risk variants. In embodiments, the method comprises determining the presence or absence of at least four risk variants. In embodiments, the method comprises determining the presence or absence of at least five risk variants. In embodiments, the method comprises determining the presence or absence of at least six risk variants. In embodiments, the method comprises determining the presence or absence of all risk variants identified herein.
In embodiments, the biological sample is a blood sample. In embodiments, the biological sample is a cell sample. In embodiments, the biological sample is a protein sample. In embodiments, the biological sample is a nucleic acid sample.
In embodiments, the above methods comprise the use of an oligonucleotide probe or primer. In embodiments, the oligonucleotide probe or primer is specific for the detection of a variant sequence (allele) set forth in Table 4 or FIGs. 2A-J. In embodiments, the oligonucleotide probe or primer is specific for the detection of an ancestral sequence (allele) set forth in Table 4 or FIGs. 2A-J. In embodiments, the oligonucleotide probe or primer specific for the ancestral (wild-type or native) allele comprises at least 12 consecutive nucleotides of SEQ ID NO: 1 or the complement thereof.
In embodiments, the subject is a male. In embodiments, the subject is a female. In embodiments, the subject is a pediatric subject. In embodiments, the pediatric subject is between 6 and 18 years old. In embodiments, the pediatric subject is between 10 and 15 years old. In embodiments, the subject is a subject is a subject at risk of developing IS.
In embodiments, the subject has at least one family member diagnosed with IS (first, second or third degree relative). In embodiments, the subject is diagnosed with IS. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG1 endophenotype. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG2 endophenotype. In embodiments, the subject is a subject diagnosed with IS and belonging to the FG3 endophenotype.
In another aspect, the present disclosure relates to a method of treating or preventing IS (e.g, AIS) in a subject comprising: (i) identifying a subject at risk of developing Idiopathic scoliosis (IS) using the method disclosed herein; and (ii) providing a suitable therapy to the subject so as to treat or prevent IS. In an embodiment, the therapy comprises wearing a brace.
The present disclosure also concerns a method of treating or preventing IS (e.g, AIS) comprising increasing the level of FAT3 protein (native FAT3 protein) in the subject. In an embodiment, the method comprises administering an exogenous or recombinant FAT3 polypeptide, or a cell expressing a FAT3 polypeptide, to the subject. In other embodiment, the method comprises increasing the expression of the endogenous FAT3 polypeptide or correcting a defective FAT3 gene, e.g. using a genome-editing technique such as the CRISPR/Cas9 system. The present disclosure also concerns the use of an exogenous or recombinant FAT3 polypeptide for treating or preventing IS (e.g, AIS) in a subject, or for manufacture of a medicament for treating or preventing Idiopathic scoliosis in a subject. The present disclosure also concerns an exogenous or recombinant FAT3 polypeptide for treating or preventing IS (e.g, AIS) in a subject.
In another aspect, the present disclosure concerns a composition or kit for use in methods disclosed herein (e.g, for example, a kit for (i) detecting a variant in at least one allele of the FAT3 gene in a biological sample; (ii) determining whether a subject is at risk of developing IS; (iii) determining the risk of future parents of having a suffering from IS; (iv) for genotyping a subject for at least one variant in the FAT3gene, etc.). The kit may comprise for example one or more oligonucleotide probes or primers and/or one or more antibodies specific for detection at least one FAT3gene variant. In embodiments, the composition or kit further comprises a biological sample (e.g, polynucleotide or protein sample) from the subject.
In another aspect, the present disclosure concerns a DNA chip comprising at least one oligonucleotide for detecting the presence or absence of at least one FAT3 gene variant set forth in Table 4 and a substrate on which the oligonucleotide is immobilized. In embodiments, the variant is an IS risk variant set forth in Table 4.
In a further aspect, the present disclosure provides oligonucleotide probes or primers for use in the above described methods, compositions, kits, DNA chips, etc. In embodiments, the oligonucleotide is for the specific detection of a variant of the present disclosure and comprises or consists of a nucleotide sequence having a variant nucleotide at a position corresponding to that defined in Table 4. In embodiments, the variant is a risk variant defined in Table 4. In embodiments, the oligonucleotide primer or probe hybridizes to a reference (ancestral) or a variant polynucleotide sequence set forth in Table 4, or FIGs. 2A-J or to its complementary sequence. In embodiments, the oligonucleotide primer or probe further comprises a label. In embodiments, the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 7-58 or the complement thereof and includes the polymorphic nucleotide from the ancestral or variant allele set forth in Table 4 (or its complement). In embodiments, the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56 and 58 or the complement thereof and includes the nucleotide from the variant allele set forth in Table 4 (or its complement). In embodiments, the oligonucleotide primer or probe comprises or consists of at least 10 nucleotides of a polynucleotide sequence set forth in any one of SEQ ID NOs: 7, 9, 11 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55 and 57 or the complement thereof and includes the nucleotide from the ancestral allele set forth in Table 4 (or its complement). In embodiments, the oligonucleotide primer or probe consists of 10 to 100 nucleotides, preferably 10 to 60, 50 or 40 nucleotides. In embodiments, the oligonucleotide primer or probe consists of at least 12 nucleotides.
In a further aspect, the present disclosure relates to the use of methods, compositions, kits, oligonucleotide primers or probes, and DNA chips of the present disclosure for (i) detecting a variant in at least one allele of the FAT3 gene in a biological sample; (ii) determining whether a subject is at risk of developing IS; (iii) determining the risk of future parents of having a child likely to suffer from IS; (iv) genotyping a subject for at least one variant in the FAT3 gene, etc.).
Other objects, advantages and features of the present disclosure will become more apparent upon reading of the following non-restrictive description of specific embodiments thereof, given by way of example only with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the appended drawings:
FIGs. 1A to C show variants identified in the FAT3 gene in IS subjects. (A) shows the FAT3 protein organization as annotated by NCBI is 4557 amino acids long and includes multiple functional homology domains. The positions of the 26 rare variants identified in our study among the IS cases are labelled from 1 to 26 and are indicated by vertical arrows above the protein schema. The location of two heterozygous mutations present in a multiplex IS family are indicated by the boxes (i.e., #1 and 26). (B) shows a simplified pedigree and segregation of the FAT3 mutations in one family (ID1581 ) which consisted of three affected sisters and two unaffected parents. (C), shows a sequence alignment with different species showing that both mutations (L517S and L4544F) affect an invariantly conserved amino acid sequences in FAT3orthologues.
FIGs. 2A-J show the amino acid sequence of the human FAT3 protein (SEQ ID NO: 2, NP_001008781.2). Position of nucleotide and amino acid variants identified in IS subjects and listed in Table 4 are underlined and in bold. Novel polymorphic sites are in italic. The nucleotides and amino acids shown on the figure are those of the ancestral (wild- type sequence, SEQ ID NO: 1 (nucleic acid, NM_001008781.2) and SEQ ID NO: 2 (protein, NP_ 001008781.2).
FIGs. 2K-L show the nucleotide sequence of the human FAT3 transcript (SEQ ID NO: 1 , NM_001008781.2).
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
Whole exome sequencing was performed on a cohort of French-Canadian patients with idiopathic scoliosis and analyzed in a collapsing gene burden test for rare protein-altering variants using case control statistics. No single gene achieved statistical significance, therefore targeted exon sequencing was performed for 24 genes with the smallest p-values in an independent replication cohort of severely affected females. One gene, FAT3, achieved statistical significance in the replication cohort using a similar collapsing gene burden test, with multiple different rare variants (26 variants see Table 4 and FIGs. 2A-K) in the gene present in the replication samples. Independently, we sequenced the exomes of all members of a rare multiplex family having three affected sisters and unaffected parents. All three sisters were compound heterozygous for two rare protein-altering variants in FAT3. The parents were singly heterozygous for each variant. The two variants in the family were also present in the case control cohort. FAT3 is expressed in primary osteoblasts, but no significant difference was found in the level of RNA expression in cells from scoliotic patients versus controls. Thus, FAT3was, established as a new genetic factor in the etiology of idiopathic scoliosis.
Definitions
In order to provide clear and consistent understanding of the terms in the instant application, the following definitions are provided.
The articles "a," "an" and "the" are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article.
As used in this specification and claim(s), the words“comprising” (and any form of comprising, such as“comprise” and“comprises”),“having” (and any form of having, such as“have” and“has”),“including” (and any form of including, such as“includes” and“include”) or“containing” (and any form of containing, such as“contains” and“contain”) are inclusive or open-ended and do not exclude additional, un-recited elements or method steps and are used interchangeably with, the phrases "including but not limited to" and "comprising but not limited to".
For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 18-20, the numbers 18, 19 and 20 are explicitly contemplated, and for the range 6.0-7.0, the number 6.0, 6.1 , 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9, and 7.0 are explicitly contemplated.
The terms "such as" are used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
Unless otherwise defined herein, scientific and technical terms used in connection with the present disclosure shall have the meanings that are commonly understood by those of ordinary skill in the art. For example, any nomenclature used in connection with, and techniques of, cell and tissue culture, molecular biology, immunology, microbiology, genetics and protein and nucleic acid chemistry and hybridization described herein are those that are well known and commonly used in the art. The meaning and scope of the terms should be clear; in the event however of any latent ambiguity, definitions provided herein take precedent over any dictionary or extrinsic definition. Further, unless otherwise required by context, singular terms shall include pluralities and plural terms shall include the singular.
Practice of the methods, as well as preparation and use of the products and compositions disclosed herein employ, unless otherwise indicated, conventional techniques in molecular biology, biochemistry, chromatin structure and analysis, computational chemistry, cell culture, recombinant DNA and related fields as are within the skill of the art. These techniques are fully explained in the literature. See, for example, Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL, Second edition, Cold Spring Harbor Laboratory Press, 1989 and Third edition, 2001 ; Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley & Sons, New York, 1987 and periodic updates; the series METHODS IN ENZYMOLOGY, Academic Press, San Diego; Wolffe, CHROMATIN STRUCTURE AND FUNCTION, Third edition, Academic Press, San Diego, 1998; METHODS IN ENZYMOLOGY, Vol. 304, "Chromatin" (P. M. Wassarman and A. P. Wolffe, eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULAR BIOLOGY, Vol. 119, "Chromatin Protocols" (P. B. Becker, ed.) Humana Press, Totowa, 1999.
As used herein, the FAT3 gene” refers to the gene encoding the FAT Atypical Cadherin 3 protein (FAT3 protein) (GCID: GC11 P092314; HGNC: 231 12; Entrez Gene: 1201 14; Ensembl: ENSG00000165323; OMIM: 612483; UniProt™ KB: Q8TDW7; OMIM: 612483, chromosomal location: 1 1q14.3; 1 ; RefSeqGene: NG_052813.1 )
As used herein, the term“Idiopathic scoliosis” or“IS” refers to the common complex disorder of the spine. It is a three-dimensional deformity of the skeleton characterized by a lateral curvature of ≥10° on a standing radiograph (Cobb method), combined with vertebral rotation. It is the most common form of spinal disorder. It mostly occurs at the age of adolescence and affects 1 -4% [1 ] of the global pediatric population with higher prevalence in females who are generally more severely affected than males. The term “IS” includes Infantile (age of onset < 3 years old), Juvenile (age of onset between 3 and 9 years old) and Adolescent (age of onset between 10 and 15 years old) idiopathic scoliosis. A subject“diagnosed with IS” is a subject having a minimum curvature in the coronal plane of 10°, showed by for example a standing posteroanterior spinal radiograph, by the Cobb method with vertebral rotation and without any congenital or genetic disorder which could be the source of the spinal deformity observed.
As used herein, the terms“risk of developing IS” (e.g., a subject at risk of developing IS) or the like refer to a genetic or metabolic predisposition of a subject to develop a scoliosis (i.e. spinal deformity) and/or a more severe scoliosis at a future time (i.e., curve progression of the spine). For instance, an increase of the Cobb’s angle of a subject ( e.g., from 40° to 50° or from 18° to 25°) is a“development” of a scoliosis (i.e., a scoliosis progression). The terminology “a subject at risk of developing IS” includes asymptomatic subjects which are more likely than the general population to suffer in a future time of IS and includes subjects (e.g., children) having at least one parent, sibling or family member suffering from a scoliosis (either first degree, second degree or third degree relative). It also includes subjects which carry one or more known IS susceptibility markers (SNPs or other mutation/genetic variations). Also included in the terminology“a subject at risk of developing a scoliosis” are asymptomatic subjects (i.e., subjects which do not yet have a spinal deformity of over 10°) but which have been identified as having a GiPCR signaling defect and classified in the FG1 , FG2 or FG3 endophenotype using well known methods (e.g., cAMP measurement, cellular impedance, etc.) as disclosed for example in WO/2003/073102, WO/2010/040234, WO/2012/045176, WO/2015/032005, WO/2014/201560, WO/2014/201557.
As used herein, the terms“severe scoliosis”,“severe IS” or“severe scoliosis progression” refers to a scoliosis (or scoliosis progression) with a Cobb’s angle of 40° or more.
As used herein, the terms“biological fluid sample” refers to blood, saliva, tears, sweat, urine, semen and milk. As used herein, the terminology“blood sample” is meant to refer to blood, plasma or serum.
As used herein, the terminology“polynucleotide sample” or“nucleic acid sample” is meant to refer to a sample comprising DNA, or RNA (including cDNA) from a test subject. The sample should contain an amount sufficient of polynucleotides for determining the presence or absence of SNPs and/or haplotypes (i.e., for genotyping) disclosed herein according to the selected method. The choice of the sample type will of course depend on the specific conditions of the assay. For examples, gene variants (e.g, SNPs) found in intronic (or other untranscribed) sequences may not be detected using an RNA sample (or cDNA) sample as known in the art. Preferably, the sample is a cell sample from the subject but is not so limited as long as the polynucleotide sample allows for the detection of the gene variant.
As used herein, the term“subject” is meant to refer to any mammal including human, mouse, rat, dog, chicken, cat, pig, monkey, horse, etc. Preferably, the subject is a human, for example a pediatric human subject.
“Polymorphism” or “variant". The genomic sequence within populations is not identical when individuals are compared. Rather, the genome exhibits sequence variability between individuals at many locations in the genome. Such variations in sequence are commonly referred to as polymorphisms, and there are many such sites within each genome. For example, the human genome exhibits sequence variations which occur on average every 500 base pairs. Thus, as used herein, a“polymorphism” or“variant” refers to a variation in the sequence of nucleic acid (e.g, a gene sequence). Such variation includes insertion, deletion, and substitutions in one or more nucleotides.
The most common sequence variation (or polymorphism) consist of base variations at a single base position in the genome, and such sequence variants, or polymorphisms, are commonly called Single Nucleotide Polymorphisms ("SNPs"). There are usually two possibilities (or two alleles) at each SNP site; the original allele and the mutated allele (although there may be 3 or 4 possibilities for each SNP site). Due to natural genetic drift and possibly also selective pressure, the original mutation has resulted in a polymorphism characterized by a particular frequency of its alleles in any given population. There may also exists SNPs that vary between paired chromosomes in an individual. Each individual is in this instance either homozygous for one allele of the polymorphism (i.e. both chromosomal copies of the individual have the same nucleotide at the SNP location), or the individual is heterozygous (i.e. the two sister chromosomes of the individual contain different nucleotides). As used herein an SNP thus refers to a variation at a single nucleotide in a given nucleic acid sequence.
In general terms, each version of the sequence with respect to the polymorphic site represents a specific allele of the polymorphic site. These sequence variants can all be referred to as polymorphisms, occurring at specific polymorphic sites characteristic of the sequence variant in question. In general terms, polymorphisms can comprise any number of specific alleles.
In some instances, reference is made to different alleles at a variant/polymorphic site without choosing a reference allele. Alternatively, a reference sequence can be referred to for a particular polymorphic site. The reference allele is sometimes referred to as the "wild-type" allele or“ancestral allele” and refers herein to the allele from a "non- affected" or control/reference individual (e.g., an individual that does not display a trait or disease phenotype i.e., which does not suffer from a scoliosis or which has a lower risk of (or predisposition to) developing a scoliosis).
A“gene variant’ , "genetic marker" or "polymorphic marker", as described herein, refers to a variation (mutation or alteration) in a gene sequence that occurs in a given population. Each polymorphic marker/gene variant has at least two sequence variations characteristic of particular alleles at the polymorphic site. The marker/gene variant can comprise any allele of any variant type found in the genome, including variations in a single nucleotide (SNPs, microsatellites, insertions, deletions, duplications and translocations. The polymorphic marker/gene variant, if found in a transcribed region of the genome can be detected not only in genomic DNA but also in RNA. Polymorphic markers or gene variants of the present disclosure (identified in Table 4 and FIGs. 2A-K) are found in transcribed regions of the genome (were identified following exome sequencing). In addition, when the polymorphism/variant is found in the gene portion that is translated into a polypeptide or protein, the polymorphic marker/gene variant can be detected at the protein/polypeptide level.
The term“defective FAT3gene” as used herein refers to a FAT3 gene comprising one or more mutations that affect the expression of the FAT3 gene and/or that results in a FAT3 protein having reduced activity relative to the native protein. In an embodiment, the defective FAT3 gene comprises one or more of the SNPs (variant allele) disclosed herein (e.g, variant allele in Table 4).
The polymorphic marker/gene variant of the present disclosure and its specific sequence variation can be detected by various means such as by sequencing the nucleic acid or protein. Alternatively, when the polymorphism/variation affects the function of the gene or of its translated protein/polypeptide, the biological activity can be evaluated in order to identify which allele is present in the subject’s sample. For example, if a particular risk allele (comprising a risk variant or combination of risk variants) affects the enzymatic activity of the protein, then, the presence of the allele or variant(s) can be assessed by performing an enzymatic test. Alternatively, if the risk allele (comprising a gene variant or combination of variants) affects the expression level of a polypeptide or nucleic acid, then, the presence of the variants(s) can be determined by assessing the expression level (e.g, immunoassays, amplification assays, etc.) of such protein or nucleic acid and comparing it to a reference level in a control sample ( e.g, sample from a subject not suffering from a scoliosis or at risk of developing a scoliosis).
An "allele" refers to the nucleotide sequence of a given locus (position) on a chromosome. A polymorphic marker allele thus refers to the composition (i.e., sequence) of the marker on a chromosome. Genomic DNA from an individual contains two alleles for any given polymorphic marker, representative of each copy of the marker on each chromosome. A“risk allele”, a“susceptibility allele” or a“predisposition allele” or a“risk variant” is nucleic acid sequence variation that is associated with an increased risk of (i.e. compared to a control/reference) or predisposition to suffering from IS. Conversely, a“protective allele” or“protective variant” is a sequence variation of a polymorphic marker that is associated with a lower risk of (i.e., compared to a control/reference) or predisposition to suffering from IS.
As used herein, the term “non-conservative mutation” or “non-conservative substitution” in the context of polypeptides refers to a mutation in a polypeptide that changes an amino acid to a different amino acid with different biochemical properties (i.e., charge, hydrophobicity and/or size). Although there are many ways to classify amino acids, they are often sorted into six main groups on the basis of their structure and the general chemical characteristics of their R groups (i) Aliphatic (Glycine, Alanine, Valine, Leucine, Isoleucine); (ii) Hydroxyl or Sulfur/Selenium-containing (also known as polar amino acids) (Serine, Cysteine, Selenocysteine, Threonine, Methionine); (iii) Cyclic (Proline); (iv) Aromatic (Phenylalanine, Tyrosine, Tryptophan); (v) Basic (Histidine, Lysine, Arginine) and (vi) Acidic and their Amide (Aspartate, Glutamate, Asparagine, Glutamine). Thus, a non -conservative substitution includes one that changes an amino acid of one group with another amino acid of another group (e.g., an aliphatic amino acid for a basic, a cyclic, an aromatic or a polar amino acid; a basic amino acid for an acidic amino acid, a negatively charged amino acid (aspartic acid or glutamic acid) for a positively charged amino acid (lysine, arginine or histidine) etc.
Conversely, a“conservative substitution” or“conservative mutation” in the context of polypeptides is a mutation that changes an amino acid to a different amino acid with similar biochemical properties (e.g. charge, hydrophobicity and size). For example, a leucine and isoleucine are both aliphatic, branched hydrophobic. Similarly, aspartic acid and glutamic acid are both small, negatively charged residues. Therefore, changing a leucine for an isoleucine (or vice versa) or changing an aspartic acid for a glutamic acid (or vice versa) are examples of conservative substitutions.
"Complement" or "complementary" as used herein refers to Watson-Crick (e.g, A-T/U and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. "Complementarity" refers to a property shared between two nucleic acid sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position will be complementary.
Figure imgf000011_0001
"Homology" and“homologous” refers to sequence similarity between two peptides or two nucleic acid molecules. Homology can be determined by comparing each position in the aligned sequences. A degree of homology between nucleic acid or between amino acid sequences is a function of the number of identical or matching nucleotides or amino acids at positions shared by the sequences. As the term is used herein, a nucleic acid sequence is "substantially homologous" to another sequence if the two sequences are substantially identical and the functional activity of the sequences is conserved (as used herein, the term “homologous” does not infer evolutionary relatedness, but rather refers to substantial sequence identity, and thus is interchangeable with the terms “identity’Tidentical”). Two nucleic acid sequences are considered substantially identical if, when optimally aligned (with gaps permitted), they share at least about 50% sequence similarity or identity, or if the sequences share defined functional motifs. In alternative embodiments, sequence similarity in optimally aligned substantially identical sequences may be at least 60%, 70%, 75%, 80%, 85%, 90% or 95%. For the sake of brevity, the units (e.g., 66, 67...81 , 82, ...91 , 92%....) have not systematically been recited but are considered, nevertheless, within the scope of the present disclosure.
Substantially complementary nucleic acids are nucleic acids in which the complement of one molecule is substantially identical to the other molecule. Two nucleic acid or protein sequences are considered substantially identical if, when optimally aligned, they share at least about 70% sequence identity. In alternative embodiments, sequence identity may for example be at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 98% or at least 99%. Optimal alignment of sequences for comparisons of identity may be conducted using a variety of algorithms, such as the local homology algorithm of Smith and Waterman, 1981 , Adv. Appl. Math 2: 482, the homology alignment algorithm of Needleman and Wunsch, 1970, J. Mol. Biol. 48:443, the search for similarity method of Pearson and Lipman (Pearson and Lipman 1988), and the computerized implementations of these algorithms (such as GAP, BESTFIT, FASTA and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, Madison, Wl, U.S.A.). Sequence identity may also be determined using the BLAST algorithm, described in Altschul et al. (Altschul et al. 1990) 1990 (using the published default settings). Software for performing BLAST analysis may be available through the National Center for Biotechnology Information (through the internet at http://www.ncbi.nlm.nih.gov/). The BLAST algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold. Initial neighborhood word hits act as seeds for initiating searches to find longer HSPs. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extension of the word hits in each direction is halted when the following parameters are met: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment. One measure of the statistical similarity between two sequences using the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. In alternative embodiments of the invention, nucleotide or amino acid sequences are considered substantially identical if the smallest sum probability in a comparison of the test sequences is less than about 1 , preferably less than about 0.1 , more preferably less than about 0.01 , and most preferably less than about 0.001.
An alternative indication that two nucleic acid sequences are substantially complementary is that the two sequences hybridize to each other under moderately stringent, or preferably stringent, conditions. Hybridization to filter-bound sequences under moderately stringent conditions may, for example, be performed in 0.5 M NaFIP04, 7% sodium dodecyl sulfate (SDS), 1 mM EDTA at 65°C, and washing in 0.2 x SSC/0.1 % SDS at 42°C (Ausubel 2010). Alternatively, hybridization to filter-bound sequences under stringent conditions may, for example, be performed in 0.5 M NaHP04, 7% SDS, 1 mM EDTA at 65°C, and washing in 0.1 x SSC/0.1 % SDS at 68°C (Ausubel 2010). Hybridization conditions may be modified in accordance with known methods depending on the sequence of interest (Tijssen 1993). Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point for the specific sequence at a defined ionic strength and pH.
Methods of detecting FAT3 qene variants
As noted above, the present disclosure identifies gene variants (polymorphic markers) in the FAT3 gene which are associated with a risk of developing IS.
In the above methods, detecting the presence of a risk allele (risk variant(s)) in polymorphic markers of one or more of the above genes is indicative of a risk of developing a scoliosis (or predisposition to IS). The level of risk or the likelihood of developing a scoliosis is determined depending on the number of risk-associated variants that are present in cells from a subject. The level of risk is determined by calculating a genetic score (ODD ratio), as well known in the art.
Accordingly, the present disclosure encompasses detecting the presence or absence of at least one polymorphic marker [e.g, SNP) in the FAT3gene (e.g, a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12, 13, 14, 15, 15, 16, 17, 18, 20, 21 , 21 , 22, 23, 24, 25, 26, variants).
Alleles for gene variants as referred to herein refer to the bases A, C, G or T as they occur at the polymorphic site in the SNP assay employed. The person skilled in the art will realize that by assaying or reading the opposite DNA strand, the complementary allele can in each case be measured. Thus, for a polymorphic site (polymorphic marker) characterized by an A/G polymorphism, the assay employed may be designed to specifically detect the presence of one or both of the two bases possible, i.e. A and G. Alternatively, by designing an assay that is designed to detect the opposite strand on the DNA template, the presence of the complementary bases T and C can be measured. Quantitatively (for example, in terms of relative risk), identical results would be obtained from measurement of DNA strands (+ strand or - strand).
Detecting specific gene variants or polymorphic markers and/or haplotypes of the present disclosure can be accomplished by methods known in the art. Such detection can be made at the nucleic acid or amino acid (protein) level.
For example, standard techniques for genotyping for the presence of gene variants (e.g., SNPs and/or microsatellite markers) can be used, such as sequencing, fluorescence-based techniques (Chen, X. etai, Genome Res. 9(5): 492- 98 (1999)), methods utilizing PCR, LCR, Nested PCR and other methods for nucleic acid amplification. Specific methodologies available for SNP genotyping include, but are not limited to, TaqMan™ genotyping assays and SNPlex™ platforms (Applied Biosystems), mass spectrometry (e.g., MassARRAY™ system from Sequenom), minisequencing methods, real-time PCR, Bio-Plex™ system (BioRad), CEQ and SNPstream™ systems (Beckman), Molecular Inversion Probe™ array technology (e.g, Affymetrix GeneChip™), and BeadArray™ Technologies (e.g., Illumina GoldenGate and Infinium assays). By these or other methods available to the person skilled in the art, one or more alleles at gene variants, including microsatellites, SNPs or other types of gene variants/polymorphic markers, can be identified.
Figure imgf000013_0001
In order to determine the risk of developing a scoliosis (or predisposition to IS) it is also possible to assess the presence of a gene variant (such as a SNP) in linkage disequilibrium with any of the gene variants identified herein (e.g, SNPs/variants listed in Table 4).
Once a first SNP has been identified in a genomic region of interest, the practitioner of ordinary skill in the art can easily identify additional SNPs in linkage disequilibrium with this first SNP. In the context of the invention, the additional SNPs in linkage disequilibrium with a first SNP are within the same gene of said first SNP. Linkage disequilibrium (LD) is defined as the non-random association of alleles at different loci across the genome. Alleles at two or more loci are in LD if their combination occurs more or less frequently than expected by chance in the population.
For example, if a particular genetic element (e.g., an allele of a polymorphic marker, or a haplotype) occurs in a population at a frequency of 0.50 (50%) and another element occurs at a frequency of 0.50 (50%), then the predicted occurrence of a person’s having both elements is 0.25 (25%), assuming a random distribution of the elements. However, if it is discovered that the two elements occur together at a frequency higher than 0.25, then the elements are said to be in linkage disequilibrium, since they tend to be inherited together at a higher rate than what their independent frequencies of occurrence (e.g., allele or haplotype frequencies) would predict.
When there is a causal locus in a DNA region, due to LD, one or more SNPs nearby are likely associated with the trait too. Therefore, any SNPs in LD with a first SNP associated with IS or an associated disorder will be associated with this trait. Identification of additional SNPs in linkage disequilibrium with a given SNP involves: (a) amplifying a fragment from the gene comprising a first SNP from a plurality of individuals; (b) identifying of second SNPs in the gene comprising said first SNP; (c) conducting a linkage disequilibrium analysis between said first SNP and second SNPs; and (d) selecting said second SNPs as being in linkage disequilibrium with said first marker. Subcombinations comprising steps (b) and (c) are also contemplated.
Methods to identify SNPs and to conduct linkage disequilibrium analysis can be carried out by the skilled person without undue experimentation by using well-known methods.
Thus, the practitioner of ordinary skill in the art can easily identify SNPs or combination of SNPs within haplotypes in linkage disequilibrium with the at-risk gene variant (e.g., risk SNP).
Such markers are mapped and listed in public databases like HapMap as well known to the skilled person. Genomic LD maps have been generated across the genome, and such LD maps have been proposed to serve as framework for mapping disease-genes (Risch et al, 1996; Maniatis et al, 2002; Reich et al, 2001 ). If all polymorphisms in the genome were independent at the population level (i.e., no LD), then every single one of them would need to be investigated in association studies, to assess all the different polymorphic states. However, due to linkage disequilibrium between polymorphisms, tightly linked polymorphisms are strongly correlated, which reduces the number of polymorphisms that need to be investigated in an association study to observe a significant association. Another consequence of LD is that many polymorphisms may give an association signal due to the fact that these polymorphisms are strongly correlated.
The two metrics most commonly used to measure LD are D’ and r2 and can be written in terms of each other and allele frequencies. Both measures range from 0 (the two alleles are independent or in equilibrium) to 1 (the two alleles are completely dependent or in complete disequilibrium), but with different interpretation. D’ is equal to 1 if at most two or three of the possible haplotypes defined by two markers are present, and <1 if all four possible haplotypes are present r2 measures the statistical correlation between two markers and is equal to 1 if only two haplotypes are present.
Most SNPs in humans probably arose by single base modifying events that took place within chromosomes many times ago. A single newly created allele, at its time of origin, would have been surrounded by a series of alleles at other polymorphic loci like SNPs establishing a unique grouping of alleles (i.e. haplotype). If this specific haplotype is transmitted intact to next generations, complete LD exists between the new allele and each of the nearby polymorphisms meaning that these alleles would be 100% predictive of the new allele. Thus, because of complete LD (D’ = 1 or r2 = 1) an allele of one polymorphic marker can be used as a surrogate for a specific allele of another. Event like recombination may decrease LD between markers. But, moderate (i.e. 0.5≤; r2 <0.8) to high (i.e. 0.8≤; r2 < 1 ) LD conserve the "surrogate" properties of markers. In LD based association studies, when LD exist between markers and an unknown pathogenic allele, then all markers show a similar association with the disease.
It is well known that many SNPs have alleles that show strong LD (or high LD, defined as r2≥0.80) with other nearby SNP alleles and in regions of the genome with strong LD, a selection of evenly spaced SNPs, or those chosen on the basis of their LD with other SNPs (proxy SNPs or Tag SNPs), can capture most of the genetic information of SNPs, which are not genotyped with only slight loss of statistical power. In association studies, this region of LD is adequately covered using few SNPs (Tag SNPs) and a statistical association between a SNP and the phenotype under study means that the SNP is a causal variant or is in LD with a causal variant. It is a general consensus that a proxy (or Tag SNP) is defined as a SNP in LD (r2≥ 0.8) with one or more other SNPs. The genotype of the proxy SNP could predict the genotype of the other SNP via LD and inversely. In particular, any SNP in LD with one of the SNPs used herein may be replaced by one or more proxy SNPs defined according to their LD as r2≥0.8.
These SNPs in linkage disequilibrium can also be used in the methods according to the present disclosure, and more particularly in the diagnostic methods according to the present disclosure. In particular, the presence of SNPs in linkage disequilibrium (LD) with the above identified SNPs may be genotyped, in place of, or in addition to, said identified SNPs. In the context of the present disclosure, the SNPs in linkage disequilibrium with the above identified SNP are within the same gene of the above identified SNP. Therefore, in the present disclosure, the presence of SNPs in linkage disequilibrium (LD) with a SNP of interest and located within the same gene as the SNP of interest may be genotyped, in place of, or in addition to, said SNP of interest. Preferably, such an SNP and the SNP of interest have r2≥ 0.70, preferably r2≥ 0.75, more preferably r2≥ 0.80, and/or have D’≥ 0.60, preferably D’≥ 0.65, D’ ≥ 0.7, D’≥ 0.75, more preferably D’≥ 0.80. Most preferably, such an SNP and the SNP of interest have r2≥ 0.80, which is used as reference value to define "LD" between SNPs.
Compositions and kits
Compositions and kits for use in the methods of the present disclosure (i.e. for determining the risk of developing a scoliosis; for genotyping a subject and for classifying a subject suffering from a scoliosis or at risk of developing a scoliosis) may include for example (i) one or more reagents for detecting the presence or absence of one or more FAT3 variants (e.g., one or more variants listed in Table 4 and/or shown in FIGs. 2A-K) or a substitute marker in linkage disequilibrium therewith. Alternatively, or in addition, the one or more variants can be detected at the protein level. Accordingly, compositions and kits for use in the methods of the present disclosure may include one or more reagents for detecting the presence or absence of one or more mutations in the FAT3 protein (e.g., a mutation listed in Table 4). Compositions and kits can comprise oligonucleotide primers and hybridization probes (e.g, allele-specific oligonucleotide primers and hybridization probes for determining the presence or absence of a variant in the FAT3 gene (e.g, listed in Table 4, FIG.1 and/or FIG. 3), restriction enzymes (e.g, for RFLP analysis) and/or antibodies that bind to a mutated FAT3 polypeptide (polymorphic polypeptide) which is encoded by a nucleic acid comprising the gene variant of the present disclosure (e.g., a nucleic acid comprising a variant (polymorphic marker) as defined in Table 4 and reagents (e.g, antibodies) for detecting a mutation in the FAT3 protein listed in Table 4 or for detecting the wild-type amino acid).
The kit (or composition) may also include any necessary buffers, enzymes (e.g., DNA polymerase) and/or reagents necessary for performing the methods of the present disclosure. The kit may comprise one or more labeled nucleic acids (or labeled antibody) capable of specific detection of one or more gene variants of the present disclosure (e.g., gene variants defined in Table 4) or any markers in linkage disequilibrium therewith as well as reagents for the detection of the label.
Reagents may be provided in separate containers or premixed depending on the requirements of the method. Suitable labels are well known in the art and will be chosen according to the specific method used. Non-limiting examples of suitable labels (including non-naturally occurring labels/synthetic labels) include a radioisotope, a fluorescent label, a magnetic label, an enzyme, etc.
The detection of a FAT3gene variant (e.g., defined in Table 4, FIGs. 1 and 3) associated with IS in accordance with the present disclosure may be determined by DNA Chip analysis. Such DNA chip or nucleic acid microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead. A microchip may be constituted of polymers, plastics, resins, polysaccharides, silica or silica-based materials, carbon, metals, inorganic glasses, or nitrocellulose. Probes comprise nucleic acids such as cDNAs or oligonucleotides that may be about 10, 1 1 , 12, 13, 14 or 15 to about 60, 50, 40 or 30 base pairs. To determine the alteration of the genes, a sample from a test subject is labelled and contacted with the microarray in hybridization conditions, leading to the formation of complexes between target nucleic acids that are complementary to probe sequences attached to the microarray surface. The presence of labelled hybridized complexes is then detected. Many variants of the microarray hybridization technology are available to the man skilled in the art.
In embodiments, there is provided a composition (e.g, a diagnostic composition) or assay mixture which is generated following one or more steps of the methods describe herein and which include a biological sample (e.g., cell sample, blood sample, etc.) from the subject to be tested. The preparation of such composition occurs while testing a subject’s biological sample for the risk of developing a scoliosis (including the risk of developing a more severe scoliosis); for aiding in the prevention and treatment of scoliosis including for determining the best treatment regimen; for adapting an undergoing treatment regimen; for selecting a new treatment regimen or for determining the frequency of a specific treatment regimen or follow-up schedule. Such compositions may be prepared using as kits described herein.
In embodiments, compositions and kits of the present disclosure may thus comprise one or more oligonucleotides probe or amplification primer for the detection (e.g, amplification or hybridization) of A FAT3 gene variant of the present disclosure (e.g, a variant or reference sequence defined in Table 4). In embodiments, oligonucleotide probes are provided in the form of a microarray or DNA chip. The kit may further include instructions to use the kit in accordance with the methods of the present disclosure (e.g, for determining the risk of (or predisposition to) developing a scoliosis; for genotyping a subject or for classifying a subject suffering from a scoliosis or at risk of developing a scoliosis in a specific genetic or functional group).
The present disclosure is illustrated in further details by the following non-limiting examples.
EXAMPLE 1 : MATERIALS AND METHODS
Ethics approval. This study was approved by the institutional review boards Sainte-Justine University Hospital, Montreal Children’s Hospital, Shriners Hospital for Children, Montreal and McGill University, as well as the Affluent and Montreal English School Boards. Written informed consents were given by parents or legal guardians and assents were given by minors.
Subjects. All patients with IS were examined by orthopedic surgeons from the three pediatric centers participating to this study. A diagnosis of IS required both history and physical examination with a minimum curvature in the coronal plane of 10°, showed by a standing posteroanterior spinal radiograph, by the Cobb method with vertebral rotation and without any congenital or genetic disorder. Healthy children were recruited from Montreal’s schools and examined by one orthopedic surgeon.
First French-Canadian IS cohort (discovery cohort). Seventy-three (73) unrelated IS cases and seventy (70) age- and gender-matched healthy controls were selected; all were of French-Canadian ancestry. Fifty of the cases were severe (Cobb angle≥ 40) and twenty-three were moderate (Cobb angle < 40).
Second French-Canadian IS cohort (replication cohort). Ninety-six (96) patients of French-Canadian origin were selected for the replication study, unrelated to each other or to the cases in the discovery cohort. Since 93% of the initial cohort were females and 68% were severe cases, the second cohort were chosen to be all females and severely affected. Thirty-six healthy French-Canadian females were recruited from Montreal’s schools, plus an additional sixty French-Canadian females from the CARTAGENE project [25, 26]
French-Canadian multiplex family. A rare multiplex French-Canadian family with three affected sisters and healthy parents was ascertained and analyzed by WES analysis. The proband was diagnosed at the age of 13 years old with IS with a right lumbar curve and Cobb angle measuring 15°. Her first sister was diagnosed with IS with a left lumbar curve measuring 23° and the second sister was also diagnosed with IS with right thoracic curve measuring 13°.
DNA extraction. Blood was obtained by standard venipuncture. Genomic DNA was extracted from peripheral leukocytes using PureLink™ genomic DNA kit (Thermo Fisher Scientific, Waltham, Massachusetts, USA).
Exome and targeted sequencing. Whole-exome sequencing (WES) for a discovery French-Canadian IS cohort. Exome capture was performed using Agilent SureSelecF Human All Exon 50 Mb v3 according the manufacturer’s recommendations. Sequencing was done using SOLiD™ 5500x1 from Applied Biosystems by Life technologies at the Sainte-Justine University Hospital genomic platform. The average coverage of targeted sites was approximately 100X.
Targeted sequencing for selected genes in a replication French-Canadian IS cohort. Twenty-four genes were chosen for resequencing in a second French-Canadian cohort. Enrichment of coding exons of these genes was done using Roche Nimblegen EZ Choice custom baits, with bar code multiplexing of 96 samples per lane of sequencing. Sequencing was done on an lllumina HiSeq™ 2000 at the McGill University and Genome Quebec Innovation Centre (MUGQIC). The average coverage of targeted sites was approximately 400X.
WES for a French-Canadian family. Exome capture for the multiplex family was performed using Agilent SureSelect™ Human All Exon 50 Mb v3 according the manufacturer’s recommendations. Sequencing was done on an lllumina HiSeq2500 at the Sainte-Justine University Hospital genomic platform.
SOLID™ 5500x1 WES sequencing of the discovery cohort. Libraries were constructed using a modified version of the Fragment Library Preparation 5500 Series SOLiD™ Systems User Guide. Genomic DNA was fragmented with a Covaris®S2 System, then 3 μg amounts as measured with a Bioanalyzer (Agilent) were used for library construction. Truncated adaptors were used to minimize nonspecific capture during SureSelect™ in-solution hybridization. Therefore, the P1-T and barcode-T-0XX adaptors were replaced by: (1) Tr5500P1 : 5'-
CCTCTCTAT GGGCAGT CGGT GA*T -3' (SEQ ID NO: 3) and 3’- C*C*GGAGAGAT ACCCGT CAGCCACT -5’ (SEQ ID NO: 4); and (2) Tr5500IA: 5'-CGCCTTGGCCGTACAGC-3' (SEQ ID NO: 5), and 3’
T*GCGGAACCGGCAT GT CG*T*C -5’ (SEQ ID NO: 6) (* Phosphorothioate bond).
In addition, Library PCR Primer 1 and Library PCR Primer 2 were replaced by SureSelect™ Pre-Capture Primers provided in the SureSelect™ AB Barcoding Library Kit (Agilent). Exome capture was performed using Agilent SureSelect™ XT Human All Exon 50 Mb v3 according to the manufacturer’s recommendations. Final libraries were quantified using the SOLiD™ Library TaqMan® Quantitation Kit. Standard steps were taken thereafter to create enriched, templated beads for the SOLiD™ 5500x1 system. Pools of 8 libraries were loaded on each flowchip (6 lanes). Sequencing was performed in paired-end 50 bases in forward and 25 bases in reverse.
HiSeq2500 WES sequencing for the French-Canadian family. Genomic DNA was quantified using the QuantiFluor™ dsDNA System (Promega) and 3ug amounts was used as input. Libraries were constructed using the SureSelect™ Target Enrichment System for lllumina Paired-End Multiplexed Sequencing Library protocol and the SureSelect™T Human All Exon 50 Mb v5 capture kit (Agilent). Final libraries were qualified using a Bioanalyzer (Agilent) and quantified using the KAPA Library Quantification kit for lllumina. The clustering was done on an lllumina cBot using 16pM of pooled libraries. Pools of 4 libraries were loaded on each lane of a High Output flowcell (8 lanes). Sequencing was performed on a HiSeq2500 for 125 cycles in paired-end using HCS 2.2.38 and RTA 1.18.61.
HiSeq2000 targeted genes sequencing in the replication cohort. gDNA was quantified using the Quant-iT™ PicoGreen™ dsDNA Assay Kit (Life Technologies). Libraries were generated robotically using the KAPA HTP Library Preparation Kit lllumina® platforms (Kapa Biosystems) as per the manufacturer’s recommendations. TruSeq™ adapters and PCR primers were purchased from BioO. Libraries were quantified using the Kapa lllumina GA with Revised Primers-SYBR Fast Universal kit (D-Mark). Average size fragment was determined using a LaChip™ GX (PerkinElmer) instrument. 20 ng of 48 libraries were pooled together (total of 1000 ng per capture) prior to proceeding with the enrichment of the targeted regions using the Roche Nimblegen™ EZ Choice custom baits. Captures were performed robotically according the manufacturer’s recommendations. Final libraries were quantified using the Quant-iT™ PicoGreen™ dsDNA Assay Kit (Life Technologies) and the Kapa lllumina GA with Revised Primers-SYBR Fast Universal kit (D-Mark). Average size fragment was determined using a LaChip™ GX (PerkinElmer) instrument. The clustering was done on an lllumina cBot using 1 1 pM of each capture pool (2 captures per lane) and the flowcell was ran on a HiSeq™ 2000 for 100 cycles in paired-end mode using HCS 2.2.58 and RTA 1.18.63 and using the manufacturer’s instructions.
Bioinformatics. Only protein coding, and near intronic regions were analyzed. The analysis included SNPs, and small insertions or deletions (indels). The SIFT (Sorting Intolerant from Tolerant) [27], PolyPhen-2 (Polymorphism Phenotyping v2) [28] and MutationTaster2 [29] algorithms were used to predict possible impact of amino acid substitutions on the structure and function of a human FAT3 protein in IS patients harboring different FAT3 gene variants.
The discovery cohort was analyzed with the following pipeline:* xsq files were converted to csfasta and qual file using XSQTools™. 5’ and 3’ reads were filtered independently using SOLiD™ aware software (http://bioinformatics.oxfordjournals.Org/content/26/6/849.full) and re-balanced to retain only read pairs i.e. singletons removed. The process was tested to ensure the settings used did not remove too much information, just poor-quality reads. Reads were mapped in color space using bfast+bwa-0.7.0a to hg19 at the library level and then merged by sample. SNPs and INDELs were called with samtools 0.1.19 in batch mode i.e. all sample used during calling. SNPEff™ 3.3h was used to add genetic variant information and effect prediction (http://snpeff.sourceforge.net/). GATK IndelRealiger™ (2.5-2) was used to help resolve indels, Picard Mark Duplicates (1.96) to label PCR duplicates, and GATK base recalibration (2.5-2, SOLiD™ specific settings) to re-calibrate base qualities due to various sources of systematic technical error. Variants were annotated using Gemini 0.1 1.1 a (http://gemini.readthedocs.io/en/latest/index.html).
For quality control validation, Sanger sequencing was performed for more than 100 different variants throughout the exome and results showed consistency in 85% of the genotypes obtained using both sequencing techniques. Based on this, the quality criteria of coverage was set to > 10x, call rate> 90%, map quality > 20. Only variants having Minor allele frequency < 0.05 in 1000 genomes, Exome Sequencing Project, ExAC and dpSNP were retained.
Both the replication French-Canadian IS cohort and the French-Canadian family were analyzed using the same pipeline which was slightly different from the discovery cohort. The reads were trimmed and aligned to the reference human genome (hg19) using Picard, BWA (0.5.9) (Li and Durbin, 2009) and Samtools (v.0.1.12a) (Li, et al., 2009). Variants were called using Pileup and varFilter commands, followed by filtering to keep Single nucleotide polymorphisms (SNPs) and insertion-deletions of Phred-like quality scores of more than 20 and 50 respectively. The coverage was approximately 400x for the targeted sequencing and 100x for the five members of the family. Variants were annotated using ANNOVAR (Wang, et at, 2010), according to the type of mutation and frequency in the different data bases. Both SNPs and small insertion/deletions (indels) were considered, and those which are not in either exonic region or in a neighboring potential splice site were filtered.
Sanger sequencing. Sanger sequencing was performed at the Genome Quebec Innovation Centre at McGill University. Primers were designed using the program Primer3. Sanger sequence chromatograms were analyzed using Mutation Surveyor (Soft Genetics, Inc.). Exons 25 and 26 of FAT3 were not initially sequenced in the replication cohort because the custom baits that were used to capture the selected genes for sequencing were designed according to the RefSeq gene model, which did not include those 2 alternative exons. Hence, Sanger sequencing was performed for the 2 additional exons in 72 patients of the replication cohort. DNA of the other patients was not available. Numbering of variants in FAT3 is based on NCBI reference sequence entries NM_001008781.2 (SEQ ID NO: 1) and NP_001008781.2 (SEQ ID NO: 2).
Statistical Analysis. In both phases of the case/control study, a collapsing gene burden test was employed for significance, under the assumption that all rare, potentially protein-altering variants act in the same phenotypic direction with the same magnitude, independent of specific allele frequencies. In the instances where an individual carried two rare variants in the same gene, this was counted as a single event. In the first, whole exome discovery phase, chi-square p-values were calculated to compare the accumulation of rare variants (MAF < 0.01 ) in genes throughout the exome in patients versus controls, assuming a threshold of p=6x10 6 (0.05/8150), based on the number of genes harboring at least one rare variant among either the cases or controls in the exome data set. In the second, targeted gene phase (24 selected genes), Fisher’s exact test was used to calculate the p-values for comparisons between patients and controls where p=2x10 3 (0.05/24) was considered significant.
Validation of FAT3 gene structure. The gene model for FAT3 used by RefSeq does not appear to be supported by long individual human cDNA clones and seems to be based on homology to several long rodent cDNAs. Therefore, to confirm the gene structure, in-house brain RNA-Seq data and WGBS data (Whole Genome Bisulfite) from an unrelated individual not part of the cohorts, as well as from GENCODE public annotations, were analyzed. FAT3 expression was also profiled using the Genotype-Tissue Expression (GTEx) Transcriptome Portal (www. gtexportal.org/home/gene/ FAT3) .
Cell culture and RNA extraction. Primary osteoblasts were derived from bone specimens that were obtained from IS patients intraoperatively and from traumatic individuals unaffected by IS as controls. Briefly, cells were grown in culture dishes 10 cm2 in Alpha Modification of Eagle’s Medium (aMEM) containing 10% fetal bovine serum (FBS, Hyclone™) and 1 % penicillin/ streptomycin (antibiotic/antimycotic, Invitrogen) at 37°C and 5% CO2 Cells were grown until they reached confluency. The cells were then washed by phosphate-buffered saline (PBS 1x) twice and were treated with 1 ml TRIzol™, lysed and transferred to 1.5 ml tube and stored at -80°C. RNA was extracted using TRIzol™ (Life Technologies), following the manufacturer’s instructions.
Quantitative RT-Polymerase Chain Reaction (qRT-PCR). Expression analysis by qRT-PCR was done at the Institute for Research in Immunology and Cancer (IRIC), Universite de Montreal. Tests were done in triplicate using GAPDH and PP!A (Peptidylprolyl isomerase A) as normalizing housekeeping genes. Total RNA was treated with DNase and reverse transcribed using the Maxima First Strand cDNA synthesis kit with ds DNase (Thermo Scientific). Before use, RT samples were diluted 1 :5. Gene expression was determined using assays designed with the Universal Probe Library from Roche (www.universalprobelibrary.com). For each qPCR assay, a standard curve was performed to ensure that the efficacy of the assay is between 90% and 1 10%. QPCR reactions were performed using PERFECTA QPCR FASTMIX™ II (Quanta), 2 mM of each primer and 1 mM of the corresponding UPL probe. The Viia7 qPCR instrument (Life Technologies) was used to detect the amplification level and was programmed with an initial step of 20 sec at 95°C, followed by 40 cycles of 1 sec at 95°C and 20 secs at 60°C.
EXAMPLE 2: IDENTIFICATION OF TWENTY-FOUR CANDIDATE GENES FROM WES OF THE DISCOVERY
COHORT
An initial discovery cohort of 73 unrelated IS patients (68 females and 5 males), with 70 age- and gender-matched controls, all of French-Canadian ancestry, was studied. Fifty of the patients were considered severely affected as their Cobb angles were at least 40°, and the other twenty-three were considered as moderate cases. WES was performed for this cohort, followed by variant annotation and filtering to identify rare variants contributing to IS. Because the frequency of individual rare variants in the cohort was too low to yield sufficient statistical power, genes harboring an overall excess of rare variants in the discovery patient cohort were looked for. A collapsing gene burden test was performed, in which the enrichment of rare variants per gene in patients versus controls was compared. To define rare variants, a minor allele frequency (MAF) of <1 %) was used as an initial cutoff, and a MAF <0.5% as a more stringent cutoff (MAFs according to 1000 Genomes European ancestry (EUR) and the Exome Sequencing Project European ancestry (ESP-EA). Due to the total cohort size and technical limits of SOLiD™ sequencing, only 8150 genes harbored at least one such rare variant among all case and control samples. Therefore, a threshold for statistical significance was set at 0.05/8150 = 6x10 6. None of the 8150 genes met the required p-value threshold. The 24 best candidate genes were selected for follow-up validation in a separate replication cohort (Table 1). For the selection of candidate genes, both the p-value and the absolute number of patients and controls who carried rare variants were taken into consideration.
Table 1. Genes selected from the discovery cohort with SNPs of MAF < 1%
Figure imgf000021_0001
Figure imgf000022_0001
EXAMPLE 3: TARGETED SEQUENCING OF THE SELECTED TWENTY-FOUR GENES IN A REPLICATION
COHORT
The replication cohort was chosen to be more homogeneous; all cases were severely affected females. For comparison, in the initial discovery cohort, 93% of IS patients were females and only 68% were severe cases. The replication cohort comprised 96 patients and 96 gender-matched controls. Coding exons of the 24 candidate genes were sequenced in the 192 replication samples, using a custom capture library. After calling and annotating variants, a filtration step was performed to remove poor quality calls and variants with a MAF greater than 1 % (according to 1000 Genomes (EUR) and ESP-EA). For the replication study, a collapsing gene burden test was again employed. Of the 24 candidate genes, only one gene; FAT3, showed significant enrichment for rare variants in the IS patients (Table 2). Specifically, in FAT3, 21 of the replication IS cases harbored rare protein altering variants versus only 2 of the controls (Odds ratio (OR) 13.16, p-value=2.38x105, Fisher-exact test) (Table 2). Interestingly, when the synonymous variants were included, the p-value became more significant (OR 8.27, p-value=3.23x10 6) (Table 3). None of the other 23 genes had p-values approaching the required threshold. The non-synonymous variants in FAT3 in both the discovery and replication cohorts were distributed across much of the protein (FIG. 1 A).
Table 2. Statistical analysis for all selected genes and SNPs in the replication cohort with SNPs with MAF
< 1% (after removing the synonymous SNPs)
Figure imgf000022_0002
Table 3. Statistical analysis for all selected genes and SNPs in the replication cohort with SNPs with MAF
< 1% (including the synonymous SNPs)
Figure imgf000022_0003
Figure imgf000023_0001
EXAMPLE 4: EXOME SEQUENCING OF AN INDEPENDENT IS FAMILY
In parallel, a very rare multiplex family in the cohort in which three sisters were affected with IS, while the parents were unaffected, was studied (FIG. 1). This family was not included in the previous cohorts. WES was performed for all 5 members of this family. As in the case control exome and candidate gene sequencing, the analysis was restricted to rare (MAF≤ 1 %), potentially protein altering SNPs or small indels. The exome data was analyzed with different inheritance models, given the unaffected status of the parents. First, a dominant de novo mutation model in which the three sisters would share a heterozygous variant absent in the parents was considered. Second, a recessive model was considered; either homozygous variant in the three sisters which is heterozygous in the parents or compound heterozygous in which the three sisters have two heterozygous variants in the same gene, each coming from one parent. No genes were found consistent with the de novo dominant or recessive homozygous models. However, the presence of compound heterozygous variants in FAT3was found consistent with the recessive model. Of note, the selection of candidate genes from the WES cohort was done before the family study was performed, and the selection of the targeted resequencing genes was unbiased. The two variants of FAT3 gene in the multiplex family are non-synonymous: p.L517S and p.L4544F (FIG. 1). The first variant was present in four cases and one control in the replication cohort, and the second variant was present in one case in the discovery cohort. Both variants in FAT3were confirmed by Sanger sequencing of DNA from all members of the family.
EXAMPLE 5: VALIDATION OF FAT3 GENE STRUCTURE AND IDENTIFICATION OF A NOVEL UNANNOTATED
EXON
The gene model for FAT3 used by RefSeq appears to be supported only by long individual rodent cDNA clones in the NCBI database, whereas there are only fragmentary human cDNA clones documented in the public genome browsers. Therefore, to confirm the human FAT3 gene structure, the in-house brain RNA-Seq data and WGBS data for one individual were analyzed. The results were consistent with the RefSeq gene model (NM_001008781.2) with two exceptions. Just upstream of the 3’ terminal exon, evidence for two alternative exons which were either included or excluded together in various RNA-Seq reads were found. The two exons are also annotated by the GENCODE project website (version 24). In addition, a previously uncharacterized exon located 125kb upstream of the first annotated exon was identified, supported by multiple individual reads splicing this sequence to the second (but first protein-coding) exon. This novel exon lies in a hypomethylated CpG island, a feature that is characteristic of active promoters. Because the 5’ -most exon annotated by GENCODE (exon 2 of the gene model) begins precisely at the splice acceptor junction, it was suspected that the GENCODE raw data probably included exon 1 in some junction reads, which were not aligned to the genome across exon 1 due to the very long first intron. FAT3 expression in several tissues was profiled using the GTEx Transcriptome Portal, and a strong enrichment in brain and artery tissues was observed. The tissues in which FAT3 expression was measured are the following (presented in descending order of FAT3 expression level): Brain Nucleus Accumbens (basal ganglia); Brain-Caudate; (basal ganglia); Brain-Putamen (basal ganglia); Artery-Coronary; Brain-Frontal Cortex (BA9); Brain-Cortex; Brain-anterior cingulate cortex (BA24); Artery-Tibial; Artery-Aorta; Brain-Hypothalamus; Brain-Amygdala; Brain-Cerebellum; Esophagus-gastroesophageal junction; Brain-Hippocampus; Bladder; Brain-Substantia nigra; Prostate; testis; Lung; Esophagus-Muscularis; Vagina; Brain-Cerebella hemisphere; Cervix-Ectocervix; Fallopian Tube; Cervix-Endocervix; Brain-Spinal cord (cervical c-1); Uterus; Adipose-subcutaneous; Cells Transformed Fibroblasts; Ovary; Nerve-Tibial; Pituitary; Adipose-Visceral (Omentum); Colon-Sigmoid; Heart-Arterial Appendage; Breast-Mammary Tissue; Skin- Sun Exposed (lower leg); Stomach; Kidney-Cortex; Colon-Transverse; Skin-Not Sun Exposed (Suprapubic); Minor Salivary gland; Small Intestine-Terminal ileum; Esophagus-Mucosa; Muscle-skeletal; Thyroid; Spleen; Liver; Pancreas; Heart-left ventricle; Adrenal gland; Cells-EBV Transformed lymphocytes; Whole Blood.
EXAMPLE 6: SANGER SEQUENCING OF EXONS 25 AND 26 OF FAT3
The alternative exons 25 and 26 were not captured in the replication capture sequencing because they are not annotated by RefSeq. Hence, direct Sanger sequencing was performed for these two exons in 72 cases of the replication cohort (DNA was not available for the rest of the cases). No rare potentially protein-altering variants were observed among the sequenced cases for either of these two (very small) exons.
EXAMPLE 7: CONSEQUENCES OF THE FAT3 RARE VARIANTS
26 non-synonymous SNVs (18 previously reported in public databases and 8 novel) in FAT3 gene were identified in IS patients (Table 4 and FIGs. 2A-J). Prediction of the functional consequences of the non-synonymous SNVs was performed using three different algorithms SFIT, PolyPhen-2 and MutationTaster2. Of note, two variants were predicted as likely pathogenic by all three software tools, 13 variants as likely pathogenic by two of the three algorithms, and one variant is a frameshift mutation (Table 4). To test if these rare variants affect the expression of FAT3, a qPCR expression analysis was performed using RNA extracted from primary osteoblasts obtained from seven scoliotic patients who had rare variants in FAT3\mm the discovery and replication cohorts, and seven controls (traumatic patients who did not have scoliosis from whom we could extract osteoblasts). No statistically significant difference in averaged FAT3 mRNA expression was observed between both groups. This suggest that the mutations affect the biological activity of FAT3.
The scope of the claims should not be limited by the preferred embodiments set forth in the examples but should be given the broadest interpretation consistent with the description as a whole.
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
REFERENCES
1. Cheng JC, Castelein RM, Chu WC, et al. PRIMER. NATURE 2015.
2. Asher MA, Burton DC. Adolescent idiopathic scoliosis: natural history and long term treatment effects. Scoliosis and Spinal Disorders 2006; 1 (1):2.
3. Wise CA, Barnes R, Gillum J, et al. Localization of susceptibility to familial idiopathic scoliosis. Spine 2000;25(18):2372-80.
4. Wynne-Davies R. Familial (idiopathic) scoliosis. J Bone Joint Surg [Br] 1968;50:24-30.
5. Kesling KL, Reinker KA. Scoliosis in Twins: A Meta-analysis of the Literature and Report of Six Cases. Spine 1997;22(17):2009-14.
6. Ward K, Ogilvie J, Argyle V, et al. Polygenic inheritance of adolescent idiopathic scoliosis: a study of extended families in Utah. American journal of medical genetics Part A 2010; 152(5): 1 178-88.
7. Gorman KF, Julien C, Oliazadeh N, et al. Genetics of Idiopathic Scoliosis. eLS 2014.
8. Edery P, Margaritte-Jeannin P, Biot B, et al. New disease gene location and high genetic heterogeneity in idiopathic scoliosis. European Journal of Fluman Genetics 201 1 ;19(8):865-69.
9. Ocaka L, Zhao C, Reed JA, et al. Assignment of two loci for autosomal dominant adolescent idiopathic scoliosis (AIS) to chromosomes 9q31. 2-q34. 2 and 17q25. 3-qtel. Journal of medical genetics 2007.
10. Alden KJ, Marosy B, Nzegwu N, et al. Idiopathic scoliosis: identification of candidate regions on chromosome 19p13. Spine 2006;31 (16): 1815-19.
1 1. Gorman KF, Julien C, Moreau A. The genetic epidemiology of idiopathic scoliosis. European Spine Journal 2012;21 (10): 1905-19.
12. Sharma S, Londono D, Eckalbar WL, et al. A PAX1 enhancer locus is associated with susceptibility to idiopathic scoliosis in females. Nature communications 2015;6.
13. Sharma S, Gao X, Londono D, et al. Genome-wide association studies of adolescent idiopathic scoliosis suggest candidate susceptibility genes. Fluman molecular genetics 201 1 ;20(7): 1456-66.
14. Takahashi Y, Kou I, Takahashi A, et al. A genome-wide association study identifies common variants near LBX1 associated with adolescent idiopathic scoliosis. Nature genetics 2011 ;43(12): 1237-40.
15. Kou I, Takahashi Y, Johnson TA, et al. Genetic variants in GPR126 are associated with adolescent idiopathic scoliosis. Nature genetics 2013;45(6):676-79.
16. Ogura Y, Kou I, Miura S, et al. A functional SNP in BNC2 is associated with adolescent idiopathic scoliosis. The American Journal of Fluman Genetics 2015;97(2):337-42. 17. Zuk 0, Hechter E, Sunyaev SR, et al. The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences 2012;109(4): 1193-98.
18. Asimit J, Zeggini E. Testing for rare variant associations in complex diseases. Genome medicine 201 1 ;3(4): 1.
19. Marian AJ. Elements of 'missing heritability’. Current opinion in cardiology 2012;27(3): 197-201.
20. Buchan JG, Alvarado DM, Haller GE, et al. Rare variants in FBN 1 and FBN2 are associated with severe adolescent idiopathic scoliosis. Human molecular genetics 2014;23(19):5271 -82.
21. Baschal EE, Wethey Cl, Swindle K, et al. Exome sequencing identifies a rare HSPG2 variant associated with familial idiopathic scoliosis. G3: Genes| Genomes| Genetics 2015;5(2):167-74.
22. Patten SA, Margaritte-Jeannin P, Bernard J-C, et al. Functional variants of POC5 identified in patients with idiopathic scoliosis. The Journal of clinical investigation 2015; 125(3): 1 124-28 doi: 10.1 172/JCI77262[published Online First: Epub Date]|.
23. Li W, Li Y, Zhang L, et al. AKAP2 identified as a novel gene mutated in a Chinese family with adolescent idiopathic scoliosis. Journal of medical genetics 2016:jmedgenet-2015-103684.
24. Haller G, Alvarado D, Mccall K, et al. A polygenic burden of rare variants across extracellular matrix genes among individuals with adolescent idiopathic scoliosis. Human molecular genetics 2016;25(1):202-09.
25. Godard B, Marshall J, Laberge C. Community engagement in genetic research: results of the first public consultation for the Quebec CARTaGENE project. Public Health Genomics 2007; 10(3):147-58.
26. Awadalla P, Boileau C, Payette Y, et al. Cohort profile of the CARTaGENE study: Quebec’s population-based biobank for public health and personalized genomics. International journal of epidemiology 2012:dys160.
27. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic acids research 2003;31 (13):3812-14.
28. Adzhubei IA, Schmidt S, Peshkin L, et al. A method and server for predicting damaging missense mutations. Nature methods 2010;7(4):248-49.
29. Schwarz JM, Cooper DN, Schuelke M, et al. MutationTaster2: mutation prediction for the deep-sequencing age. Nature methods 2014; 11 (4):361 -62.
30. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. The American Journal of Human Genetics 2008;83(3):31 1 -21.
31. Yang P, Liu H, Lin J, et al. The Association of rs4753426 Polymorphism in the Melatonin Receptor 1 B (MTNR1 B) Gene and Susceptibility to Adolescent Idiopathic Scoliosis: A Systematic Review and Meta-analysis. Pain physician 2015; 18(5):419-31.
32. Lin D-C, Hao J-J, Nagata Y, et al. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nature genetics 2014;46(5):467-73. 33. Katoh Y, Katoh M. Comparative integromics on FAT1 , FAT2, FAT3 and FAT4. International journal of molecular medicine 2006; 18(3):523.
34. Sadeqzadeh E, Bock CE, Thorne RF. Sleeping giants: emerging roles for the fat cadherins in health and disease. Medicinal research reviews 2014;34(1 ): 190-221.
35. Neumann M, Fleesch S, Schlee C, et al. Whole-exome sequencing in adult ETP-ALL reveals a high rate of DNMT2A mutations. Blood 2013; 121 (23):4749-52.
36. Network CGAR. Integrated genomic analyses of ovarian carcinoma. Nature 201 1 ;474(7353):609-15.
37. Furukawa T, Sakamoto FI, Takeuchi S, et al. Whole exome sequencing reveals recurrent mutations in BRCA2 and FAT genes in acinar cell carcinomas of the pancreas. Scientific reports 2015;5.
38. Luzon-Toro B, Gui FI, Ruiz-Ferrer M, et al. Exome sequencing reveals a high genetic heterogeneity on familial Hirschsprung disease. Scientific reports 2015;5.
39. McDonald-McGinn DM, Emanuel BS, Zackai EH. 22q11. 2 Deletion Syndrome. 2013.
40. Reish O, Gorlin RJ, Hordinsky M, et al. Brain anomalies, retardation of mentality and growth, ectodermal dysplasia, skeletal malformations, Hirschsprung disease, ear deformity and deafness, eye hypoplasia, cleft palate, cryptorchidism, and kidney dysplasia/hypoplasia BRESEK/BRESFIECK: New X-linked syndrome? American journal of medical genetics 1997;68(4):386-90.
41. Brooks AS, Breuning MH, Osinga J, et al. A consanguineous family with Hirschsprung disease, microcephaly, and mental retardation (Goldberg-Shprintzen syndrome). Journal of medical genetics 1999; 36(6) :485-89
42. Saburi S, Hester I, Goodrich L, et al. Functional interactions between Fat family cadherins in tissue morphogenesis and planar polarity. Development 2012; 139(10):1806-20.
43. Flayes M, Gao X, Yu LX, et al. ptk7 mutant zebrafish models of congenital and idiopathic scoliosis implicate dysregulated Wnt signalling in disease. Nat Commun 2014;5:4777 doi: 10.1038/ncomms5777[published Online First: Epub Date]|.
44. Le Pabic P, Ng C, Schilling TF. Fat-Dachsous signaling coordinates cartilage differentiation and polarity during craniofacial development. PLoS Genet 2014; 10(10):e1004726.
45. Deans MR, Krol A, Abraira VE, et al. Control of neuronal morphology by the atypical cadherin Fat3. Neuron 201 1 ;71 (5):820-32.
46. Rock R, Schrauth S, Gessler M. Expression of mouse dchsl , fjx1 , and fat-j suggests conservation of the planar cell polarity pathway identified in drosophila. Developmental dynamics 2005;234(3):747-55.
47. Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Reviews Genetics 2002;3(4):285-98. 48. Nackley AG, Shabalina S, Tchivileva IE, et al. Human catechol -O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 2006;314(5807): 1930-33.
49. Kimchi-Sarfaty C, Oh JM, Kim l-W, et al. A" silent" polymorphism in the MDR1 gene changes substrate specificity. Science 2007;315(581 1 ):525-28.
50. Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics
25(14): 1754-1760.
51. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. The sequence alignment/map format and SAMtools. Bioinformatics 25(16):2078-2079.
52. Wang K, Li M, Hakonarson H. 2010. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic acids research 38(16):e164-e164.

Claims

CLAIMS:
1. A method of determining whether a subject is at risk of developing Idiopathic scoliosis (IS) comprising detecting the presence or absence of at least one variant in at least one allele of the FAT3 gene or a marker in linkage disequilibrium therewith, in a biological sample from the subject, wherein the detection of at least one risk variant is indicative that the subject is at risk of developing IS.
2. The method of claim 1 , wherein the risk of developing a scoliosis is a risk of developing a severe scoliosis.
3. The method of claim 2, wherein the risk of developing a scoliosis is a risk of severe scoliosis progression.
4. A method of genotyping a subject having IS or at risk of developing IS comprising determining the genotype of the subject for at least one variant of the FAT3gene.
5. The method of any one of claims 1 to 4, wherein the at least one variant introduces a mutation selected from an insertion, a deletion, and a substitution.
6. The method of any one of the claims 1 to 5, wherein the at least one variant is within an exon of the FAT3 gene.
7. The method of claim 6, wherein the variant introduces a frameshift mutation in the FAT3 gene.
8. The method of claim 6, wherein the variant introduces a non-conservative mutation in the FAT3polypeptide.
9. The method of claim 6 or 8, wherein the at least one variant is located within a codon encoding amino acid L517, N800, R894, L958, N1945, 12158, H2359, R2409, Q2459, R2460, I2839, P3233, R3408, A3418, S3485, A3607, Q3646, R3686, S3709, R4070, T4190, A4199, N4333, G4398, C4521 , or L4544 of SEQ ID NO:2.
10. The method of any one of claims 1 to 9, wherein the variant is located at a variant nucleotide position identified in the second column of Table 5:
Table 5
Figure imgf000032_0001
Figure imgf000033_0001
1 1. The method of anyone of claims 1 to 10, wherein the variant introduces a mutation in the FAT3 polypeptide selected from the mutations set forth in Table 5.
12. The method of any one of claims 1 to 1 1 , wherein the at least one variant comprises rs139595720; rs188857169; rs80293525; rs76869520; rs80046666; rs1 18056487; rs200944979; rs200241295; rs200404766; rs201449521 ; rs200032318; rs138237129; rs75081660; rs201379307; rs186899262; rs201053443; rs142403035; rs187159256; or a combination thereof.
13. The method of any one of claims 1 to 12, wherein the at least one variant comprises a variant which introduces a mutation at amino acid L517 and/or L4544.
14. The method of claim 13, wherein the at least one variant comprises a variant which introduces a L517S and/or L4544F substitution in the FAT3 polypeptide.
15. The method of any one of claims 1 to 14, which comprises detecting the presence or absence of at least two variants.
16. The method of claim 15, wherein the at least two variants comprises a variant which introduces a mutation at amino acid L517 and/or L4544.
17. The method of any one of claims 1 to 16, wherein the at least one variant comprises rs139595720 and rs 187159256.
18. A method of determining the risk of future parents of having a child suffering from IS comprising (i) determining the presence or absence of at least two risk variants in at least one allele of the FAT3 gene in a first biological sample from the first future parent; (ii) determining the presence or absence of the at least two risk variants in at least one allele of the FAT3 gene in a second biological sample from the second future parent; and (iii) determining the risk of the future parents of having a child suffering from IS based on the presence or absence of the at least two risk variants in the first and second biological samples.
19. A kit or composition for (i) determining whether a subject is at risk of developing Idiopathic scoliosis (IS), (ii) genotyping a subject, (iii) determining the risk of future parents of having a child suffering from IS, or (iv) detecting at least one variant in at least one allele of the FAT3 gene, the kit comprising one or more oligonucleotide probes or primers, one or more restriction enzymes and/or one or more antibodies specific for detection at least one FAT3 gene variant.
20. An oligonucleotide probe or primer for (i) determining whether a subject is at risk of developing Idiopathic scoliosis (IS), (ii) genotyping a subject, (iii) determining the risk of future parents of having a child suffering from IS, or (iv) detecting at least one variant in at least one allele of the FAT3gene.
21. An oligonucleotide probe or primer specific for the detection of a FAT3 variant set forth in Table 5 of claim 10
22. A DNA chip comprising the oligonucleotide probe or primer defined in claim 20 and 21.
23. A method of treating or preventing IS in a subject comprising increasing the level of FAT3 protein in the subject.
24. The method of claims 23, comprising administering an effective amount of (i) a FAT3 polypeptide, (ii) a nucleic acid encoding the FAT3 polypeptide of (i), or (iii) a cell expressing the FAT3 polypeptide of (i) or the nucleic acid of (ii), to the subject.
25. The method of claim 24, wherein said FAT3 polypeptide comprises an amino acid sequence having at least 70% identity with the sequence set forth in SEQ ID NO: 2.
26. The method of claim 25, wherein said FAT3 polypeptide comprises an amino acid sequence having at least 90% identity with the sequence set forth in SEQ ID NO: 2.
27. The method of claim 26, wherein said FAT3 polypeptide comprises the amino acid sequence set forth in
SEQ ID NO: 2.
PCT/CA2019/050055 2018-01-16 2019-01-16 Method of use of fat3 in scoliosis WO2019140518A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862617811P 2018-01-16 2018-01-16
US62/617,811 2018-01-16

Publications (1)

Publication Number Publication Date
WO2019140518A1 true WO2019140518A1 (en) 2019-07-25

Family

ID=67301654

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2019/050055 WO2019140518A1 (en) 2018-01-16 2019-01-16 Method of use of fat3 in scoliosis

Country Status (1)

Country Link
WO (1) WO2019140518A1 (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
EINARSDOTTIR, E. ET AL.: "CELSR2 is a candidate susceptibility gene in idiopathic scoliosis", PLOS ONE, vol. 12, no. 12, 14 December 2017 (2017-12-14), pages 1 - 14, XP055625415, ISSN: 1932-6203 *
GAO, X. ET AL.: "CHD7 Gene Polymorphisms are associated with susceptibility to Idiopathic Scoliosis", AMERICAN JOURNAL OF HUMAN GENETICS, vol. 80, no. 5, 1 May 2007 (2007-05-01), pages 957 - 965, XP055625416, ISSN: 0002-9297 *
ZHU, Z. ET AL.: "Genome-wide association study identifies new susceptibility loci for adolescent idiopathic scoliosis in Chinese girls", NATURE COMMUNICATIONS, vol. 6, no. 8355, 22 September 2015 (2015-09-22), pages 1 - 6, XP055625417, ISSN: 2041-1723 *

Similar Documents

Publication Publication Date Title
Lee et al. CAG repeat not polyglutamine length determines timing of Huntington’s disease onset
US11279977B2 (en) Materials and methods for identifying spinal muscular atrophy carriers
Vieira et al. Medical sequencing of candidate genes for nonsyndromic cleft lip and palate
JP5881420B2 (en) Autism-related genetic markers
WO2013088457A1 (en) Genetic variants useful for risk assessment of thyroid cancer
Bell et al. Novel regional age-associated DNA methylation changes within human common disease-associated loci
Basit et al. Exome sequencing identified rare variants in genes HSPG2 and ATP2B4 in a family segregating developmental dysplasia of the hip
Cheong et al. Three new single nucleotide polymorphisms identified by a genome-wide association study in Korean patients with vitiligo
WO2015184249A2 (en) Sle and sle-related disease-associated risk markers and uses thereof
EP2118321A2 (en) Genemap of the human genes associated with endometriosis
US20150284806A1 (en) Materials and methods for determining susceptibility or predisposition to cancer
WO2011146788A2 (en) Methods of assessing a risk of developing necrotizing meningoencephalitis
EP2681337B1 (en) Brip1 variants associated with risk for cancer
Tanaka et al. Genetic polymorphisms within the intronless ACTL7A and ACTL7B genes encoding spermatogenesis-specific actin-like proteins in Japanese males
KR101724130B1 (en) Biomarkers for Diagnosing Intestinal Behcet&#39;s Disease and Uses Thereof
Genetic Modifiers of Huntington’s Disease (GeM-HD) Consortium et al. Huntington’s disease onset is determined by length of uninterrupted CAG, not encoded polyglutamine, and is modified by DNA maintenance mechanisms
WO2014131468A1 (en) Mutations of depdc5 for diagnosing epileptic diseases
WO2019140518A1 (en) Method of use of fat3 in scoliosis
Nada et al. Identification of FAT3 as a new candidate gene for adolescent idiopathic scoliosis
WO2021243166A2 (en) Deterimining risk of spontaneous coronary artery dissection and myocardial infarction and systems and methods of use thereof
JP2005278479A (en) Marker gene for inspecting articular rheumatism
WO2013061342A1 (en) Variants conferring risk of intracranial aneurysm and abdominal aortic aneurysm
KR102650359B1 (en) SNP for drug hypersensitivity and diagnosis method using the same
KR102409336B1 (en) SNP markers for Immunoglobulin A (IgA) nephropathy and IgA vasculitis diagnosis and diagnosis method using the same
KR20190015827A (en) Method of providing the information for predicting adverse drug reaction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19741247

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19741247

Country of ref document: EP

Kind code of ref document: A1