WO2024033422A1 - Novel pore monomers and pores - Google Patents

Novel pore monomers and pores Download PDF

Info

Publication number
WO2024033422A1
WO2024033422A1 PCT/EP2023/072068 EP2023072068W WO2024033422A1 WO 2024033422 A1 WO2024033422 A1 WO 2024033422A1 EP 2023072068 W EP2023072068 W EP 2023072068W WO 2024033422 A1 WO2024033422 A1 WO 2024033422A1
Authority
WO
WIPO (PCT)
Prior art keywords
pore
monomer
csgg
complex
seq
Prior art date
Application number
PCT/EP2023/072068
Other languages
French (fr)
Inventor
Alistair James SCOTT
Ranga Prabhath MALAVIARACHCHIGE RABEL
Aaron Luke ACTON
Rhys Connor GRIFFITHS
Elizabeth Jayne Wallace
Lakmal Nishantha JAYASINGHE
Original Assignee
Oxford Nanopore Technologies Plc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oxford Nanopore Technologies Plc filed Critical Oxford Nanopore Technologies Plc
Publication of WO2024033422A1 publication Critical patent/WO2024033422A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6872Intracellular protein regulatory factors and their receptors, e.g. including ion channels
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/24Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Enterobacteriaceae (F), e.g. Citrobacter, Serratia, Proteus, Providencia, Morganella, Yersinia
    • C07K14/245Escherichia (G)
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B82NANOTECHNOLOGY
    • B82YSPECIFIC USES OR APPLICATIONS OF NANOSTRUCTURES; MEASUREMENT OR ANALYSIS OF NANOSTRUCTURES; MANUFACTURE OR TREATMENT OF NANOSTRUCTURES
    • B82Y15/00Nanotechnology for interacting, sensing or actuating, e.g. quantum dots as markers in protein assays or molecular motors
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B82NANOTECHNOLOGY
    • B82YSPECIFIC USES OR APPLICATIONS OF NANOSTRUCTURES; MEASUREMENT OR ANALYSIS OF NANOSTRUCTURES; MANUFACTURE OR TREATMENT OF NANOSTRUCTURES
    • B82Y5/00Nanobiotechnology or nanomedicine, e.g. protein engineering or drug delivery

Definitions

  • the present invention relates to novel pore monomer conjugates, pore complexes formed from the conjugates and their uses in analyte detection and characterisation.
  • Nanopore sensing is an approach to analyte detection and characterization that relies on the observation of individual binding or interaction events between the analyte molecules and an ion conducting channel.
  • Two of the essential components of analyte characterization using nanopore sensing are (1) the control of analyte movement through the pore and (2) the discrimination of the composing building blocks as the analyte is moved through the pore.
  • the narrowest part of the pore forms the most discriminating part of the nanopore with respect to the current signatures as a function of the passing analyte.
  • CsgG was identified as an ungated, non-selective protein secretion channel from Escherichia coli (Goyal et al., 2014) and has been used as a nanopore for detecting and characterising analytes. Mutations to the wild-type CsgG pore that improve the properties of the pore in this context have also been disclosed (WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241, and WO 2019/002893, all incorporated by reference herein in their entirety).
  • nucleotide discrimination is achieved by measuring the current as the polynucleotide passes through the pore. Multiple nucleotides contribute to the observed current, so the height of the channel constriction and extent of the interaction with the polynucleotide affect the relationship between observed current and polynucleotide sequence. While the current range and signal-to-noise ratio for nucleotide discrimination have been improved through mutation of the CsgG pore, a sequencing system would have higher performance if the current differences between nucleotides could be improved further. Accordingly, there is a need to identify novel ways to improve nanopore sensing features.
  • pore complexes formed from pore monomer conjugates in which a CsgG pore monomer is attached to a CsgF peptide at two or more positions display an increased current range and/or increased signal-to-noise ratio (SNR) during analyte characterisation compared with conjugates with attachment at only one position.
  • SNR signal-to-noise ratio
  • the invention therefore provides a pore monomer conjugate comprising a CsgG pore monomer attached to a CsgF peptide, wherein the CsgF peptide is attached to the CsgG pore monomer at two or more positions.
  • the invention also provides:
  • a construct comprising two or more covalently attached pore monomer conjugates of the invention; a pore complex comprising at least one pore monomer conjugate of the invention or at least one construct of the invention, wherein the CsgF peptide(s) form(s) a constriction in the pore complex; a pore multimer comprising two or more pores, wherein at least one of the pores is a pore complex of the invention; a membrane comprising a pore complex of the invention or a pore multimer of the invention; a method for producing a pore monomer conjugate of the invention comprising attaching the CsgF peptide to the CsgG pore at two or more positions; a method for producing a pore complex of the invention or a pore multimer of the invention, the method comprising expressing at least one pore monomer conjugate of the invention or a construct of the invention and sufficient pore monomers or constructs to form the pore complex or the pore multimer in a host cell and allowing
  • kit for characterising a target analyte comprising (a) a pore complex of the invention or a pore multimer of the invention and (b) the components of a membrane;
  • kit for characterising a target polynucleotide or a target polypeptide comprising (a) a pore complex of the invention or a pore multimer of the invention and (b) a polynucleotide binding protein;
  • an apparatus for characterising a target polynucleotide or a target polypeptide in a sample comprising (a) a plurality of pore complexes of the invention or a plurality of pore multimers of the invention and (b) a plurality of polynucleotide binding proteins; an array comprising a plurality of membranes of the invention; a system comprising (a) a membrane of the invention or an array of the invention, (b) means for applying a potential across the membrane(s) and (c) means for detecting electrical or optical signals across the membrane(s); an apparatus comprising a pore complex of the invention or a pore multimer of the invention inserted into an in vitro membrane; and
  • an apparatus produced by a method comprising (i) obtaining a pore complex of the invention or a pore multimer of the invention and (ii) contacting the pore complex or a pore multimer with an in vitro membrane such that the pore complex or the pore multimer is inserted in the in vitro membrane.
  • FIG. 1 The structure and size of the wild-type CsgG pore from Escherichia coli strain K12 (the databank accession code for this structure is 4UV3). The distances shown are measured from backbone to backbone of the amino acids forming the pore structure.
  • the CsgG pore is a tightly interconnected symmetrical nonameric pore that resembles a crown.
  • the overall height is 98 A, and the largest outer diameter is 120 A. It defines a central channel and consists of three parts: (A) the cap region, (B) the constriction region and (C) the transmembrane beta barrel region.
  • Cap axial length, or height, is 39 A. It has an inner diameter of 43 A and a 66 A mouth.
  • the beta barrel has 36 strands, an axial length of 39 A and inner diameter of 55 A. Transition between pore cap and beta barrel is sharp, being the constriction located among them, at the level of the predicted lipid-aqueous interface.
  • the constriction is approximately 18.5 A in diameter and exhibits a length of 20A along the axis of the channel.
  • SEQ ID NO: 1 shows the polynucleotide sequence of wild-type E. coli CsgG from strain K12, including signal sequence (Gene ID: 945619).
  • SEQ ID NO: 2 shows the amino acid sequence of wild-type E. coli CsgG including signal sequence (Uniprot accession number P0AEA2).
  • SEQ ID NO: 3 shows the amino acid sequence of wild-type E. coli CsgG as a mature protein (Uniprot accession number P0AEA2).
  • SEQ ID NO: 4 shows the polynucleotide sequence of wild-type E. coli CsgF from strain K12, including signal sequence (Gene ID: 945622).
  • SEQ ID NO: 5 shows the amino acid sequence of wild-type E. coli CsgF including signal sequence (Uniprot accession number P0AE98).
  • SEQ ID NO: 6 shows the amino acid sequence of wild-type E. coli CsgF as a mature protein (Uniprot accession number P0AE98).
  • a polynucleotide includes two or more polynucleotides
  • reference to “a polynucleotide binding protein” includes two or more such proteins
  • reference to “a helicase” includes two or more helicases
  • reference to “a monomer” refers to two or more monomers
  • reference to "a pore” includes two or more pores and the like.
  • Standard substitution notation is also used, i.e., Q42R means that Q at position 42 is replaced with R.
  • the I symbol means "or".
  • Q87R/K means Q87R or Q87K.
  • the I symbol means "and” such that Y51/N55 is Y51 and N55.
  • the invention provides pore monomer conjugates comprising a CsgG pore monomer attached to a CsgF peptide.
  • the CsgF peptide is attached to the CsgG pore monomer at two or more positions, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more positions.
  • the CsgF peptide is preferably covalently attached to the CsgG pore monomer at two or more positions, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more positions.
  • attachment at two or more positions mean two or more pairs of residues in the CsgF peptide and CsgG pore monomer are attached, preferably covalently attached, to each another.
  • SEQ ID NO: 3 shows the amino acid sequence of wild-type E. coli CsgG as a mature protein.
  • the two or more positions in the CsgG pore monomer are preferably selected from residues 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in the CsgG pore monomer.
  • the two or more positions in the CsgG pore monomer are preferably selected from residues corresponding to positions 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in SEQ ID NO: 3.
  • SEQ ID NO: 6 shows the amino acid sequence of wild-type E. coli CsgF as a mature protein.
  • the two or more positions in the CsgF peptide are preferably selected from the N terminus and residues 1-35 in the CsgF peptide.
  • the two or more positions in the CsgF peptide are preferably selected from the N terminus and residues corresponding to positions 1-35 in SEQ ID NO: 6.
  • the N terminus is the amino group of the first residue in the CsgF peptide (i.e., residue 1).
  • residue 1 refers to the side chain of residue 1 or the residue corresponding to position 1 in SEQ ID NO: 6.
  • the two or more positions are preferably the following positions/residues in the CsgF peptide or the positions/residues in the CsgF peptide which correspond to the following positions in SEQ ID NO: 6: the N terminus and any one of 1-35, 1 and any of the N terminus and 2-35, 2 and any of the N terminus, 1 and 3-35, 3 and any of the N terminus, 1-2 and 4-35, 4 and any of the N terminus, 1-3 and 5-35, 5 and any of the N terminus, 1-4 and 6-35, 6 and any of the N terminus, 1-5 and 7-35, 7 and any of the N terminus, 1-6 and 8-35, 8 and any of the N terminus, 1-7 and 9-35, 9 and any of the N terminus, 1-8 and 10-35, 10 and any of the N terminus, 1-9 and 11-35, 11 and any of the N terminus, 1-10 and 12-35, 12 and any of the
  • Each column shows positions/residues in the CsgF peptide and CsgG pore monomer or positions in SEQ ID NO: 6 and SEQ ID NO: 3 to which the positions/residues in the CsgF peptide and CsgG pore monomer correspond.
  • the two or more positions may be any two or more of the rows in the table.
  • the position/residue in the CsgF peptide or the position/residue corresponding to the position in SEQ ID NO: 6 may be attached, preferably covalently attached, to any of the listed positions/residues in the CsgG pore monomer or to any position/residue corresponding to listed positions in SEQ ID NO: 3.
  • 6 is preferably attached, more preferably covalently attached, to any of positions 47-54, 57, 59, 60, 130-134, 136, 151, 153, 155, 181, 183, 185, 207, 209, 211-212 in the CsgG pore monomer or a residue corresponding to any of positions 47-54, 57, 59, 60, 130-134, 136, 151, 153, 155, 181, 183, 185, 207, 209, 211-212 in SEQ ID NO: 3.
  • One of the two or more attachments preferably comprises the N terminus of the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 153 of the CsgG pore monomer.
  • One of the two or more attachments preferably comprises the N terminus of the CsgF peptide attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3.
  • One of the two or more attachments preferably comprises position 4 in the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 133 in the CsgG pore monomer.
  • One of the two or more attachments preferably comprises the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 133 in SEQ ID NO: 3.
  • One of the two or more attachments preferably comprises position 4 in the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 153 of the CsgG pore monomer.
  • One of the two or more attachments preferably comprises the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3.
  • One of the two or more attachments preferably comprises any one of positions 30, 31, 32 and 33 in the CsgF peptide attached, preferably covalently attached, to any one of positions 193, 195, 196 and 197 in the CsgG pore monomer.
  • One of the two or more attachments preferably comprises the residue in the CsgF peptide corresponding to any one of positions 30, 31, 32 and 33 in SEQ ID NO: 6 attached, preferably covalently attached, to the position in the CsgG pore monomer corresponding to any one of positions 193, 195, 196 and 197 in SEQ ID NO: 3.
  • Corresponding positions may be determined by standard techniques in the art. For example, the PILEUP and BLAST algorithms mentioned below can be used to align the sequence of a CsgG pore monomer with SEQ ID NO: 3 and hence to identify corresponding residues.
  • the attachment at two or more positions preferably comprises one or more reactive groups which react with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer.
  • the attachment at two or more positions preferably comprises a reaction between a position, residue, or linker in the CsgF peptide with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer.
  • the attachment at all of the two or more positions preferably comprises one or more reactive groups which react with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer.
  • the attachment at all of the two or more positions preferably comprises a reaction between a position, residue, or linker in the CsgF peptide with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer.
  • the lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine may be native to the CsgG pore monomer.
  • the lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine may be introduced into the CsgG pore monomer, preferably by substitution or addition.
  • Reactive groups which react with lysine include, but are not limited to, maleimide, activated esters, anhydrides, carbonates, isocyanates, isothiocyanates, a range of other acylating and alkylating agents, oxidative coupling O-aminophenols, aldehydes, activated carbodiimides, ketenes, sulfonyl halides, fluorosulfates, and sulfonyl triazoles.
  • Positions, residues, or linkers may also be attached to lysine using periodate oxidation, reductive amination, transamination, aniline/arylamine conjugation via oxidative coupling, azaelectrocyclization, iminoboronate formation, or conjugation of arene diazonium salts.
  • Reactive groups which react with cysteine include, but are not limited to, haloacetamides and other alpha-halocarbonyls, maleimides, acrylates, vinyl sulfones, vinylpyridines, epoxides, oxanorbornadienes, methylsulfonyl functioanlised heteroaromatic, allenes, allyl selenosulfate salts, perfluoroaromatic, thiol-ene and thiol-yne click chemistry, pyridyl dithiol, vinylsulfones, sulfonyl halides, fluorosulfates, and sulfonyl triazoles.
  • Positions, residues, or linkers may also be attached to cysteine using strain-release alkylation, nickel(II)-catalyzed oxidative coupling, oxidative coupling with aminophenols, conjugation with allenes (in the presence of gold catalyst, or allyl selenosulfate salts), native chemical ligation, Pd-catalysed arylation/alkynylation, or allylation followed by cross-metathesis.
  • Reactive groups which react with tyrosine include, but are not limited to, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to tyrosine using oxidative conjugation of tyrosines including O-alkylation, hydrazone and oxime condensations, addition reactions with electron deficient alkynes such as alkynones, alkynoate, amide or esters, cyclic diazodicarboxamides, Pd catalysed alkylation, diazonium salts, or Mannich reaction with imines formed from aldehydes, cyclic diazodicarboxamides, modification with Rhodium carbenoids.
  • Reactive groups which react with serine or threonine include, but are not limited to, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to serine or threonine using periodate oxidation and subsequent transimination reactions of ketones/aldehydes with hydrazides/alkoxyamines. Resultant aldehydes/ketones may also modified through aldol ligation.
  • Positions, residues, or linkers may be attached to proline using oxidative coupling with O- aminophenols at N-terminus.
  • Reactive groups which react with tryptophan include, but are not limited to, aldehydes, ketones, and tetrazoles. Positions, residues, or linkers may be attached to tryptophan using a condensation reaction, modification with Rhodium carbenoids, conjugation with N/O centred radicals, and N-terminal Trp modification using Pictet-Spengler reaction.
  • Positions, residues, or linkers may be attached to arginine using condensation with a, [3- dicarbonyl compounds.
  • Reactive groups which react with histidine include, but are not limited to, vinylsulfones, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may be attached to hisitidine using C2 alkylation and N3 alkylation/thiophosphorylation.
  • Positions, residues, or linkers may be attached to methionine using S-alkylation/imidation.
  • Positions, residues, or linkers may be attached to phenylalanine using modification with Rhodium carbenoids.
  • the attachment at two or more positions preferably comprises one or more reactive groups which react with any amino acid in the CsgG pore monomer.
  • the attachment at all of the two or more positions preferably comprises one or more reactive groups which react with any amino acid in the CsgG pore monomer.
  • Reactive groups which react with any amino acid include, but are not limited to, activated esters, anhydrides, carbonates, isocyanates, isothiocyanates, and a range of other acylating and alkylating agents, oxidative coupling O- aminophenols, aldehydes, activated carbodiimides, ketenes, transamination, and vinylboronic acids.
  • the attachment at two or more positions preferably comprises reacting a position, residue, or linker in the CsgF peptide with any amino acid in the CsgG pore monomer.
  • the attachment at all of the two or more positions preferably comprises reacting a position, residue, or linker in the CsgF peptide with any amino acid in the CsgG pore monomer.
  • Positions, residues, or linkers may be attached to any amino acid using periodate oxidation, or reductive amination.
  • the attachment at two or more positions preferably comprises one or more reactive groups which undergo click chemistry.
  • the attachment at all of the two or more positions preferably comprises one or more reactive groups which undergo click chemistry.
  • Suitable click chemistries include, but are not limited to, CuAAC Azide/alkyne, staudinger ligation, strain- promoted azide-alkyne cycloaddition, inverse-electron demand Diels-Alder reaction between 1,2,4,5-tetrazines and strained alkenes.
  • the attachment at two or more positions preferably comprises two or more versions of the same or similar reactive groups, such as maleimide.
  • the attachment at two or more positions preferably comprises two or more versions of the same or similar reaction.
  • the attachment at two or more positions preferably comprises two or more maleimide- containing linkers.
  • the attachment at two or more positions preferably comprises two or more maleimide reactions. Any of the maleimide groups and linkers discussed above may be used.
  • the attachment at two or more positions preferably comprises two or more different reactive groups.
  • the attachment at two or more positions preferably comprises two or more different reactions.
  • the two or more reactive groups or reactions may be any of those discussed above in relation to the CsgF peptide and/or the CsgG pore monomer.
  • the CsgF peptide is preferably attached to the CsgG pore monomer using two or more linkers, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more linkers.
  • the two or more linkers may be the same.
  • the two or more linkers may be different.
  • the skilled person is capable of designing two or more linkers for use in the invention.
  • the two or more linkers may be any of the linkers discussed below with reference to the constructs of the invention.
  • the two or more linkers preferably comprise or consist of a linear carbon chain of 2, 3, 4, 5, 6 or more carbon atoms and/or saturated or unsaturated cyclic groups containing 3, 5 or 6 carbon atoms.
  • One or more of, such as all of the, two or more linkers are preferably a maleimide-containing linker.
  • the maleimide group may be used to react with cysteine in the CsgF peptide and/or the CsgG pore monomer.
  • the maleimide-containing linker preferably comprises or consists of a maleimide group and a linear carbon chain of 2, 3, 4, 5, 6 or more carbon atoms.
  • the linear carbon chain is typically attached to the nitrogen atom in the maleimide group.
  • the linear carbon chain also preferably comprises a terminal carboxyl group. This carboxyl group is capable of forming an amide bond with an amino acid in the CsgF peptide.
  • the maleimide-containing linker is preferably maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid. Any combination of these linkers may be used in the two or more linkers.
  • the maleimide-containing linker is most preferably maleimidopropionic acid.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 3.00 nm, such as less than about 2.90nm, less than about 2.80 nm, less than about 2.70 nm, less than about 2.60 nm, less than about 2.50 nm, less than about 2.40 nm, less than about 2.30 nm, less than about 2.20 nm, less than about 2.10, less than about 2.00 nm, less than about 1.90 nm, less than about 1.80 nm, less than about 1.70 nm, less than about 1.60 nm, less than about 1.50 nm, less than about 1.40 nm, less than about 1.30 nm, less than about 1.20 nm, less than about 1.10 nm, less than about 1.00 nm, less than about 0.90 nm, less than about 0.80 nm, less than about 0.70 nm
  • This distance/length can be achieved using any of specific maleimide-containing linkers discussed above, including maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid.
  • the linker is most preferably maleimidopropionic acid.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 1.20 nm. This distance/length can be achieved using maleimidohexanonic acid as discussed in more detail above.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 0.8 nm. This distance/length can be achieved using maleimidopropionic acid as discussed above.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.40 nm to about 3.00 nm, such as about 0.45 nm to about 2.80 nm, from about 0.50 nm to about 2.50 nm, from about 0.55 nm to about 2.20 nm, from about 0.60 nm to about 2.00 nm, from about 0.65 nm to about 1.50 nm, from about 0.70 nm to about 1.40 nm, from about 0.75 nm to about 1.30 nm, from about 0.80 nm to about 1.20 nm, from about 0.85 nm to about 1.10 nm and from about 0.90 nm to about 1.00 nm.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.50 nm to about 1.50 nm.
  • the distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.60 nm to about 1.20 nm. This distance/length can be achieved using any of specific maleimide-containing linkers discussed above, including maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid.
  • the linker is most preferably maleimidopropionic acid.
  • the pore monomer conjugates of the invention are capable of forming a pore or a pore complex. This can be measured using routine methods, including any of those described in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241 and WO 2019/002893 (all incorporated by reference herein in their entirety) and in the Example.
  • a CsgG pore monomer is a monomer that is capable of forming a CsgG pore. Such monomers are known in the art, especially from WO 2019/002893 (incorporated by reference herein in its entirety).
  • the CsgG pore preferably comprises one or more of (a) a cap region, (b) a constriction region, and (c) a transmembrane beta barrel region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c).
  • the CsgG pore monomer preferably comprises one or more of (a) a cap forming region, (b) a constriction forming region, and (c) a transmembrane beta barrel forming region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c).
  • the residues of SEQ ID NO: 3 which form these regions are defined below.
  • the CsgG pore formed by the monomer may have any structure but preferably has or comprises the structure of the wild-type CsgG pore ( Figure 1).
  • the protein structure of CsgG defines a channel or hole that allows the translocation of molecules and ions from one side of the membrane to the other.
  • constriction refers to an aperture defined by a luminal surface of a pore or pore complex, which acts to allow the passage of ions and target molecules (e.g., but not limited to polynucleotides or individual nucleotides) but not other non-target molecules through the pore or pore complex channel.
  • target molecules e.g., but not limited to polynucleotides or individual nucleotides
  • the constriction(s) are typically the narrowest aperture(s) within a pore or pore complex or within the channel defined by the pore or pore complex. The constriction(s) may serve to limit the passage of molecules through the pore.
  • the size of the constriction is typically a key factor in determining suitability of a pore or pore complex for analyte characterisation. If the constriction is too small, the molecule to be characterised will not be able to pass through. However, to achieve a maximal effect on ion flow through the channel, the constriction should not be too large. For example, the constriction should not be wider than the solvent-accessible transverse diameter of a target analyte. Ideally, any constriction should be as close as possible in diameter to the transverse diameter of the analyte passing through.
  • the CsgF peptide and the CsgG pore monomer typically each provide at least one constriction such that the pore complex of the invention comprises two or more constrictions.
  • the CsgG pore may be any size but preferably has the dimensions of the wild-type CsgG pore ( Figure 1).
  • the CsgG pore preferably has an external diameter of from about 100 to about 150 A at its widest point, such as from about 110 to about 140 A or from about 115 to about 125 A at its widest point.
  • the CsgG pore preferably has an external diameter of about 120 A at its widest point.
  • the CsgG pore preferably has a total length of from about 80 to about 120 A, such as from about 90 to about 110 A or from about 95 to about 105 A.
  • the CsgG pore preferably has a total length of about 98 A. References to "total length” and “length” relate to the length of the pore or pore region when viewed from the side (see, e.g., the side view in Figure 1).
  • the cap region preferably has a length of from about 20 to about 60 A, such as from about 30 to about 50 A or from about 35 to about 45 A.
  • the cap region preferably has a length of about 39 A.
  • the channel defined by the cap region preferably has an opening of from about 45 to about 85 A in diameter, such as from about 55 to about 75 A or from about 60 to about 70 A in diameter.
  • the channel defined by the cap region preferably has an opening of about 66 A in diameter.
  • the channel defined by the cap region is preferably from about 30 to about 70 A in diameter at its narrowest point, such as from about 35 to about 60 A or from about 40 to about 50 A in diameter at its narrowest point.
  • the channel defined by the cap region is preferably about 43 A in diameter at its narrowest point.
  • the constriction region preferably has a length of from about 5 to about 40 A, such as from about 10 to about 30 A or from about 15 to about 25 A.
  • the constriction region preferably has a length of about 20 A.
  • the channel defined by the constriction region is preferably from about 2 to about 40 A in diameter at its narrowest point, such as from about 5 to about 35 A, from about 8 to about 25 A or from about 10 to about 20 A in diameter at its narrowest point.
  • the channel defined by the constriction region is preferably about 9 A or 12 A in diameter.
  • the channel defined by the constriction region is preferably about 18.5 A in diameter.
  • the constriction is preferably from about 2 to about 40 A in diameter, such as from about 5 to about 35 A, from about 8 to about 25 A or from about 10 to about 20 A in diameter.
  • the constriction is preferably about 9 A or 12 A in diameter.
  • the constriction is preferably about 12 A in diameter.
  • the transmembrane beta barrel region preferably has a length of from about 20 to about 60 A, such as from about 30 to about 50 A or from about 35 to about 45 A.
  • the transmembrane beta barrel preferably has a length of about 39 A.
  • the channel defined by the transmembrane beta barrel region is preferably from about 35 to about 75 A in diameter at its narrowest point, such as from about 45 to about 65 A or from about 50 to about 60 A in diameter at its narrowest point.
  • the channel defined by the transmembrane beta barrel region is preferably about 55 A in diameter at its narrowest point.
  • SEQ ID NO: 3 shows the sequence of wild-type E. coli CsgG as a mature protein. Residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3 form the cap region. Residues 42 to 63 of SEQ ID NO: 3 form the constriction region. Residues 132 to 155 and 181 to 211 of SEQ ID NO: 3 form the transmembrane beta barrel region.
  • the CsgG pore monomer is preferably a variant of SEQ ID NO: 3.
  • the variant CsgG momomer may also be referred to as a modified CsgG pore monomer or a mutant CsgG pore monomer.
  • the modifications, or mutations, in the variant include but are not limited to any one or more of the modifications disclosed herein, or combinations of said modifications.
  • the CsgG pore monomer may be a CsgG homologue monomer.
  • a CsgG homologue monomer is a polypeptide that has at least 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or 99% complete sequence identity to wild-type E. coli CsgG as shown in SEQ ID NO: 3.
  • a CsgG homologue is also referred to as a polypeptide that contains the PFAM domain PF03783, which is characteristic for CsgG-like proteins.
  • PFAM domain PF03783 A list of presently known CsgG homologues and CsgG architectures can be found at
  • a variant Over the entire length of the amino acid sequence of SEQ ID NO: 3, a variant will preferably be at least 40% homologous to that sequence based on amino acid identity. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 3 over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 3, a variant will preferably be at least 40% identical to that sequence.
  • the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 3 over the entire sequence.
  • Sequence identity can also relate to a fragment or portion of the CsgG pore monomer.
  • a sequence may have less than 40% overall sequence homology/identity with SEQ ID NO: 3, but the sequence of a particular region, domain or subunit could share at least 80%, 90%, or as much as 99% sequence homology/identity with the corresponding region of SEQ ID NO: 3.
  • the CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the cap region of SEQ ID NO: 3 (residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262).
  • the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3.
  • the variant preferably comprises a sequence that is at least 40% identical to residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3.
  • the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the cap region.
  • the CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the constriction region of SEQ ID NO: 3 (residues 42 to 63). More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 42 to 63 of SEQ ID NO: 3. The variant preferably comprises a sequence that is at least 40% identical to residues 42 to 63 of SEQ ID NO: 3.
  • the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 42 to 63 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the constriction region.
  • the CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the transmembrane beta barrel region of SEQ ID NO: 3 (residues 132 to 155 and 181 to 211). More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 132 to 155 and 181 to 211 of SEQ ID NO: 3.
  • the variant preferably comprises a sequence that is at least 40% identical to residues 132 to 155 and 181 to 211 of SEQ ID NO: 3. More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 132 to 155 and 181 to 211 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the transmembrane beta barrel region.
  • CsgG pore monomers are highly conserved (as can be readily appreciated from Figures 45 to 47 of WO 2017/149317). Furthermore, from knowledge of the mutations in relation to SEQ ID NO: 3 it is possible to determine the equivalent positions for mutations of CsgG pore monomers other than that of SEQ ID NO: 3.
  • mutant CsgG pore monomer comprising a variant of the sequence as shown in SEQ ID NO: 3 and specific amino-acid mutations thereof as set out in the claims and elsewhere in the specification also encompasses a mutant CsgG pore monomer comprising a variant of any of the sequences shown in SEQ ID NOs: 68 to 88 of WO 2019/002893 (incorporated by reference herein in its entirety) and corresponding aminoacid mutations thereof.
  • the CsgG pore monomer may also be any of the sequences shown in CN 113773373 A, CN 113896776 A, CN 113912683 A, and CN 113754743 A or a variant thereof. It will further be appreciated that the invention extends to other variant CsgG pore monomers not expressly identified in the specification that show highly conserved regions.
  • Standard methods in the art may be used to determine homology.
  • the UWGCG Package provides the BESTFIT program which can be used to calculate homology, for example used on its default settings (Devereux et al (1984) Nucleic Acids Research 12, p387-395).
  • the PILEUP and BLAST algorithms can be used to calculate homology or line up sequences (such as identifying equivalent residues or corresponding sequences (typically on their default settings)), for example as described in Altschul S. F. (1993) J Mol Evol 36:290- 300; Altschul, S.F et al (1990) J Mol Biol 215:403-10.
  • Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/).
  • SEQ ID NO: 3 is the wild-type CsgG pore monomer from Escherichia coli Str. K-12 substr. MC4100.
  • a variant of SEQ ID NO: 3 may comprise any of the substitutions present in another CsgG homologue.
  • Preferred CsgG homologues are shown in SEQ ID NOs: 68 to 88 of WO 2019/002893 (incorporated by reference herein in its entirety).
  • the variant may comprise combinations of one or more of the substitutions present in SEQ ID NOs: 68 to 88 WO 2019/002893 (incorporated by reference herein in its entirety) compared with SEQ ID NO: 3, including one or more substitutions, one or more conservative mutations, one or more deletions or one or more insertion mutations, such as deletion or insertion of 1 to 10 amino acids, such as of 2 to 8 or 3 to 6 amino acids.
  • the CsgG pore monomer in the pore monomer conjugate of the invention typically retains the ability to form the same 3D structure as the wild-type CsgG pore monomer, such as the same 3D structure as a CsgG pore monomer having the sequence of SEQ ID NO: 3.
  • the 3D structure of CsgG is known in the art and is disclosed, for example, in Goyal et al (2014) Nature 516(7530):250-3. Any number of mutations may be made in the wild-type CsgG sequence in addition to the mutations described herein provided that the CsgG pore monomer retains the improved properties imparted on it by the mutations of the present invention.
  • the CsgG pore monomer will retain the ability to form a structure comprising five alpha-helices and five beta-strands. Therefore, it is envisaged that further mutations may be made in any of these regions in any CsgG pore monomer without affecting the ability of the monomer to form a pore that can translocate polynucleotides. It is also expected that deletions of one or more amino acids can be made in any of the loop regions linking the alpha helices and beta-strands and/or in the N-terminal and/or C-terminal regions of the CsgG pore monomer without affecting the ability of the monomer to form a pore that can translocate polynucleotides.
  • Amino acid substitutions may be made to the amino acid sequence of SEQ ID NO: 3 in addition to those discussed above, for example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions.
  • Conservative substitutions replace amino acids with other amino acids of similar chemical structure, similar chemical properties, or similar side-chain volume.
  • the amino acids introduced may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge to the amino acids they replace.
  • the conservative substitution may introduce another amino acid that is aromatic or aliphatic in the place of a pre-existing aromatic or aliphatic amino acid.
  • Conservative amino acid changes are well- known in the art.
  • the CsgG pore monomer may be modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more non-native amino acids, one or more polar amino acids, or one or more photoreactive amino acids. Any number and combination of such introductions may be made. The introduction is preferably by substitution or addition.
  • One or more amino acid residues of the amino acid sequence of SEQ ID NO: 3 may additionally be deleted from the polypeptides described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30 or more residues may be deleted. Variants may include fragments of SEQ ID NO: 3. Such fragments retain pore forming activity. Fragments may be at least 50, at least 100, at least 150, at least 200 or at least 250 amino acids in length. Such fragments may be used to produce the pores.
  • a fragment preferably comprises the transmembrane beta barrel region of SEQ ID NO: 3, namely residues 132 to 155 and 181 to 211, or a variant thereof as discussed above.
  • One or more amino acids may be alternatively or additionally added to the polypeptides described above.
  • An extension may be provided at the amino terminal or carboxy terminal of the amino acid sequence of SEQ ID NO: 3 or polypeptide variant or fragment thereof.
  • the extension may be quite short, for example from 1 to 10 amino acids in length. Alternatively, the extension may be longer, for example up to 50 or 100 amino acids.
  • a carrier protein may be fused to an amino acid sequence according to the invention. Other fusion proteins are discussed in more detail below.
  • a variant of SEQ ID NO: 3 is a polypeptide that has an amino acid sequence which varies from that of SEQ ID NO: 3 and which retains its ability to form a pore.
  • a variant typically contains the regions of SEQ ID NO: 3 that are responsible for pore formation. The pore forming ability of CsgG, which contains a p-barrel, is provided by p-strands in the transmembrane beta barrel region of each monomer.
  • a variant of SEQ ID NO: 3 typically comprises the region in SEQ ID NO: 3 that forms p-strands, namely residues 132 to 155 and 181 to 211, or a variant thereof as discussed above. One or more modifications can be made to the region of SEQ ID NO: 3 that form p-strands as long as the resulting variant retains its ability to form a pore.
  • the one or more modifications in the CsgG pore monomer preferably improve the ability of a pore complex comprising the pore monomer to characterise an analyte.
  • modifications/mutations/substitutions are contemplated to alter the number, size, shape, placement or orientation of the constriction within a channel from the pore monomer conjugate of the invention.
  • the CsgG pore monomer or the variant of SEQ ID NO: 3 may have any of the particular modifications or substitutions disclosed in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (all incorporated by reference herein in their entirety).
  • SEQ ID NO: 3 Preferred modifications or substitutions in SEQ ID NO: 3 include, but are not limited to, one or more of, such as 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more or all of:
  • a substitution at position Y51 such as Y51I, Y51L, Y51A, Y51V, Y51T, Y51S, Y51Q or Y51N;
  • N55 such as N55I, N55L, N55A, N55V, N55T, N55S or N55Q
  • F56 a substitution at position F56, such as F56I, F56L, F56A, F56V, F56T, F56S, F56Q or F56N;
  • N91 a substitution at position N91, such as N91D, N91E, N91R or N91K;
  • a substitution at position C215 such as C215T, C215S, C215I, C215L, C215A, C215V, or C215G.
  • the variant of SEQ ID NO: 3 may further comprise a deletion of one or more positions, such as a deletion of T104-N109, a deletion of F193-L199 or a deletion of F195-L199.
  • any number of the CsgG pore monomers in the pore or pore complex of the invention may be a variant of SEQ ID NO: 3. All six to ten monomers in the pore or pore complex are preferably variants of SEQ ID NO: 3.
  • the variants in the pore complex may be the same or different.
  • the variants are preferably identical in each pore monomer conjugate in the pore complex of the invention.
  • CsgF peptide preferably defines a CsgF peptide that has been truncated from its C-terminal end (i.e., is an N-terminal fragment).
  • the CsgF peptide may be a fragment of wild-type E. coli CsgF (SEQ ID NO: 5 or SEQ ID NO: 6), or of a wild-type homologue of E. coli CsgF, such as for example, a peptide comprising any one of the amino acid sequences shown in WO 2019/002893 (incorporated by reference herein in its entirety).
  • a CsgF homologue is referred to as a polypeptide that has at least 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to wild-type E. coli CsgF as shown in SEQ ID NO: 6.
  • a CsgF homologue may also referred to as a polypeptide that contains the PFAM domain PF10614, which is characteristic for CsgF-like proteins.
  • a list of presently known CsgF homologues and CsgF architectures can be found at Mature CsgF (shown in SEQ ID NO:6) can be divided into three main regions: a "CsgF constriction peptide" (FCP), a "neck” region and a "head” region.
  • FCP CsgF constriction peptide
  • the "head” region of the CsgF peptide is distinct from a constriction of a pore as described herein.
  • the "head” region of the CsgF peptide may also be referred to as the "C-terminal head domain".
  • the structure of CsgF is discussed in detail in WO 2019/002893 (incorporated by reference herein in its entirety).
  • the CsgF peptide used in the pore monomer conjugate of the invention is preferably a truncated CsgF peptide lacking the C-terminal head; lacking the C-terminal head and a part of the neck domain of CsgF (e.g., the truncated CsgF peptide may comprise only a portion of the neck domain of CsgF); or lacking the C-terminal head and neck domains of CsgF.
  • the CsgF peptide may lack part of the CsgF neck domain, e.g.
  • the CsgF peptide may comprise a portion of the neck domain, such as for example, from amino acid residue 36 at the N- terminal end of the neck domain (see SEQ ID:NO:6) (e.g. residues 36-40, 36-41, 36-42, 36-43, 36-45,36-46 up to residues 36-50 or 36-60 of SEQ ID NO: 6).
  • the CsgF peptide preferably comprises a CsgG-binding region and a region that forms a constriction in the pore.
  • the CsgG-binding region typically comprises residues 1 to 11 and/or 29 to 32 of the CsgF protein (SEQ ID NO: 6 or a homologue from another species) and may include one or more modifications.
  • the region that forms a constriction in the pore typically comprises residues 9 to 28 of the CsgF protein (SEQ ID NO: 6 or a homologue from another species) and may include one or more modifications.
  • Residues 9 to 17 comprise the conserved motif N9PXFGGXXX17 and form a turn region.
  • Residues 9 to 28 form an alpha-helix.
  • X i7 N17 in SEQ ID NO: 6) forms the apex of the constriction region, corresponding to the narrowest part of the CsgF constriction in the pore.
  • the CsgF constriction region also makes stabilising contacts with the CsgG beta-barrel, primarily at residues 98, 9, 10, 11, 12, 18, 21, 22, 29 and 30 of SEQ ID NO: 6.
  • the CsgF peptide typically has a length of from 28 to 50 amino acids, such as 29 to 49, 30 to 45 or 32 to 40 amino acids. Preferably the CsgF peptide comprises from 29 to 35 amino acids, or 29 to 45 amino acids.
  • the CsgF peptide comprises all or part of the FCP, which corresponds to residues 1 to 35 of SEQ ID NO: 6. Where the CsgF peptide is shorter that the FCP, the truncation is preferably made at the C-terminal end.
  • the CsgF peptide may have a length of 24, 25, 26, J , 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54 or 55 amino acids.
  • the CsgF peptide may comprise the amino acid sequence of SEQ ID NO: 6 from residue 1 up to any one of residues 25 to 60, such as 27 to 50, for example, 28 to 45 of SEQ ID NO: 6, or the corresponding residues from a homologue of SEQ ID NO: 6, or variant of either thereof. More specifically, the CsgF peptide may comprise residues 1 to 29 of SEQ ID NO: 6, or a homologue or variant thereof.
  • the CsgF peptide is preferably a truncated CsgF peptide lacking one or more amino acids from CsgF shown in SEQ ID NO: 6.
  • the CsgF peptide is preferably a truncated CsgF peptide lacking a stretch of amino acids starting at any one of positions 15-35 and finishing at position 119 of SEQ ID NO: 6.
  • the CsgF peptide is preferably a truncated CsgF peptide lacking amino acids 15-119, 16-119, 17-119, 18-119, 19-119, 20-119, 21-119, 22-119, 23- 119, 24-119, 25-119, 26-119, 27-119, 28-119, 29-119, 30-119, 31-119, 32-119, 33-119, 34-119, or 35-119 from SEQ ID NO: 6.
  • CsgF peptides comprises, consist essentially of, or consist of residues 1 to 34 of SEQ ID NO: 6, residues 1 to 30 of SEQ ID NO: 6, residues 1 to 45 of SEQ ID NO: 6, or residues 1 to 35 of SEQ ID NO: 6 and homologues or variants of any thereof.
  • the CsgF peptide may comprise a modification at a position corresponding to one or more of the following positions in SEQ ID NO: 6: Gl, T4, F5, R8, N9, Nil, F12, N17, A20, N24, A26, Q27 and Q29.
  • the CsgF peptide may be modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more non-native amino acids, one or more polar amino acids, or one or more photoreactive amino acids, for example at a position corresponding to one or more of the following positions in SEQ ID NO: 6: Gl, T4, F5, R8, N9, Nil, F12, A26 and Q29. Any number and combination of such introductions may be made. The introduction is preferably by substitution or addition.
  • the CsgF peptide may comprise a modification at a position corresponding to one or more of the following positions in SEQ ID NO: 6: N15, N17, A20, N24 and A28.
  • the CsgF peptide may comprise a modification at a position corresponding to D34 to stabilise the CsgG-CsgF complex.
  • the CsgF peptide may comprise one or more of the substitutions: N 15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, N 17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E, N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E, A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E and D34F/Y/W/R/K/N/Q/C/E.
  • the CsgF peptide may, for example, comprise one or more of the following substitutions: G1C, T4C, N17S, and D34Y or D34N.
  • the CsgF peptide may be produced by cleavage of a longer protein, such as full-length CsgF using an enzyme. Cleavage at a particular site may be directed by modifying the longer protein, such as full-length CsgF, to include an enzyme cleavage site at an appropriate position. Examples of CsgF amino acid sequences that have been modified to include such enzyme cleavage sites are shown in SEQ ID NOs: 56 to 67 of WO 2019/002893 (incorporated by reference herein in its entirety). Following cleavage all or part of the added enzyme cleavage site may be present in the CsgF peptide that associates with CsgG to form a pore. Thus, the CsgF peptide may further comprise all or part of an enzyme cleavage site at its C-terminal end.
  • CsgF peptides are shown in Table 3 of WO 2019/002893 (incorporated by reference herein in its entirety).
  • the CsgF peptide is preferably a variant of any of the CsgF sequences discussed above, including SEQ ID NO: 6, comprising one or more modifications compared with the comparative sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 6, a variant will preferably be at least 40% homologous to that sequence based on amino acid identity.
  • the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 6 over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 6, a variant will preferably be at least 40% identical to that sequence.
  • the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 6 over the entire sequence.
  • any number of the CsgF peptides in the pore or pore complex of the invention may contain one or more substitutions compared with SEQ ID NO: 6. All six to ten monomers in the pore or pore complex preferably contain one or more substitutions compared with SEQ ID NO: 6.
  • the CsgF peptides in the pore complex may be the same or different.
  • the CsgF peptides are preferably identical in each pore monomer conjugate in the pore complex of the invention.
  • the interaction between the CsgF peptide and the CsgG pore may, for example, be stabilised by hydrophobic interactions and/or electrostatic interactions. These may be interactions between one or more of the following pairs of positions of SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and 153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, and 29 and 144.
  • the residues in the CsgF peptide and/or the CsgG pore monomer at one or more of the positions listed above may be modified in order to enhance the interaction between CsgG and CsgF in the pore complex.
  • the CsgG:CsgF complex is very stable, when CsgF is truncated, the stability of CsgG:CsgF complexes decrease compared to a complex comprising full length CsgF. Therefore, disulfide bonds can be made between CsgG and CsgF to make the complex more stable, for example following introduction of cysteine residues at the positions identified herein.
  • the pore complex can be made in any of the previously mentioned methods and disulfide bond formation can be induced by using oxidising agents (eg: Copper-orthophenanthroline).
  • oxidising agents eg: Copper-orthophenanthroline
  • Other interactions eg: hydrophobic interactions, charge-charge interactions/electrostatic interactions
  • cysteine interactions can also be used in those positions instead of cysteine interactions.
  • Unnatural amino acids can also be incorporated in those positions. Covalent bonds may be by via click chemistry.
  • unnatural amino acids with azide or alkyne or with a di benzocyclooctyne (DBCO) group and/or a bicyclo[6.1.0]nonyne (BCN) group may be introduced at one or more of these positions.
  • DBCO di benzocyclooctyne
  • BCN bicyclo[6.1.0]nonyne
  • Such stabilising mutations can be combined with any other modifications to CsgG and/or CsgF, for example the modifications disclosed herein.
  • one or more non-native or photoreactive amino acids may be included/substituted in the CsgG pore monomer at one or more positions corresponding to one or more of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 3.
  • one or more non-native reactive or photoreactive amino acids may be included/substituted at one or more positions corresponding to one or more of positions 1, 4, 5, 8, 9, 11, 12, 26 or 29 of SEQ ID NO: 6.
  • Preferred exemplary CsgF peptides comprise the following mutations relative to SEQ ID NO: 6: N15X 1 /N17X 2 /A20X 3 /N24X 4 /A28X 5 /D34X 6 , wherein X t is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, X 2 is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, X 3 is A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E, X 4 is N/S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E, X 5 is A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E
  • the invention also provides a construct comprising two or more covalently attached pore monomer conjugates of the invention.
  • the construct may comprise 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more pore monomer conjugates of the invention.
  • the construct may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10 pore monomer conjugates of the invention.
  • the two or more pore monomer conjugates may be the same or different.
  • the two or more pore monomer conjugates may differ based on one or more of (a) the sequence of the CsgG pore monomer, (b) the sequence of the CsgF peptide, (c) the linker, (d) the attachment position on the CsgG pore monomer, and (e) the attachment position on the CsgF peptide.
  • the pore monomer conjugates may differ based on (a); (b); (c); (d); (e); (a) and (b); (a) and (c); (a) and (d); (a) and (e); (b) and (c); (b) and (d); (b) and (e); (c) and (d); (c) and (e); (d) and (e); (a), (b) and (c); (a), (b) and (c); (a), (b) and (d); (a), (b) and (c); (a), (b) and (d); (a), (b) and (c); (a), (b) and (d); (a), (b) and (c);
  • the two or more pore monomer conjugates are preferably the same (i.e., identical).
  • the construct preferably comprises two pore monomer conjugates.
  • the two or more pore monomer conjugates may be the same or different.
  • the two or more pore monomer conjugates are preferably the same (i.e., identical).
  • the pore monomer conjugates may be genetically fused, optionally via a linker, or chemically fused, for instance via a chemical crosslinker.
  • Methods for covalently attaching monomers are disclosed in WO 2017/149316, WO 2017/149317, and WO 2017/149318 (incorporated herein by reference in their entirety).
  • the linker is preferably an amino acid sequence and/or a chemical crosslinker.
  • Suitable amino acid linkers such as peptide linkers, are known in the art.
  • the length, flexibility and hydrophilicity of the amino acid or peptide linker are typically designed such that the CsgF peptide forms a constriction in the pore complex of the invention.
  • Preferred flexible peptide linkers are stretches of 2 to 20, such as 4, 6, 8, 10 or 16, serine and/or glycine amino acids.
  • More preferred flexible linkers include (SG)i, (SG) 2 , (SG) 3 , (SG) 4 , (SG) 5 , (SG) 8 , (SG)i 0 , (SG) i5 or (SG) 2 O wherein S is serine and G is glycine.
  • Preferred rigid linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24, proline amino acids. More preferred rigid linkers include (P) i2 wherein P is proline.
  • Suitable chemical crosslinkers are well-known in the art. Suitable chemical crosslinkers include, but are not limited to, those including the following functional groups: maleimide, active esters, succinimide, azide, alkyne (such as dibenzocyclooctynol (DIBO or DBCO), difluoro cycloalkynes and linear alkynes), phosphine (such as those used in traceless and non-traceless Staudinger ligations), haloacetyl (such as iodoacetamide), phosgene type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazines, disulfides, vinyl sulfones, aziridines and photoreactive reagents (such as aryl azides, diaziridines).
  • alkyne such as dibenzocyclooctynol (DIBO or DBCO), di
  • Reactions between amino acids and functional groups may be spontaneous, such as cysteine/maleimide, or may require external reagents, such as Cu(I) for linking azide and linear alkynes.
  • Linkers can comprise any molecule that stretches across the distance required. Linkers can vary in length from one carbon (phosgene-type linkers) to many Angstroms. Examples of linker molecules, include but are not limited to, are polyethyleneglycols (PEGs), polypeptides, polysaccharides, deoxyribonucleic acid (DNA), peptide nucleic acid (PNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), saturated and unsaturated hydrocarbons, polyamides. These linkers may be inert or reactive, in particular they may be chemically cleavable at a defined position, or may be themselves modified with a fluorophore or ligand. The linker is preferably resistant to reducing agents, such as dithiothreitol (DTT), following the covalent attachment of the CsgF peptide to the CsgG pore monomer.
  • DTT dithiothreitol
  • Preferred crosslinkers include 2,5-dioxopyrrolidin-l-yl 3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-l-yl 4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-l-yl 8- (pyridin-2-yldisulfanyl)octananoate, di-maleimide PEG Ik, di-maleimide PEG 3.4k, di- maleimide PEG 5k, di-maleimide PEG 10k, bis(maleimido)ethane (BMOE), bis- maleimidohexane (BMH), 1,4-bis-maleimidobutane (BMB), 1,4 bis-maleimidyl-2,3- di hydroxybutane (BMDB), BM[PEO]2 (1,8-bis-maleimidodiethylenegly
  • the linker is preferably resistant to dithiothreitol (DTT).
  • Suitable linkers include, but are not limited to, iodoacetamide-based and maleimide-based linkers.
  • the pore monomer conjugates may be connected using two or more linkers each comprising a hybridizable region and a group capable of forming a covalent bond.
  • the hybridizable regions in the linkers hybridize and link the CsgG pore monomer and CsgF peptide.
  • the linked CsgG pore monomer and CsgF peptide are then coupled via the formation of covalent bonds between the groups.
  • Any of the specific linkers disclosed in WO 2010/086602 (incorporated herein by reference in its entirety) may be used in accordance with the invention.
  • the linkers may be labelled. Suitable labels include, but are not limited to, fluorescent molecules (such as Cy3 or AlexaFluor®555), radioisotopes, e.g. 125 I, 35 S, 32 P, enzymes, antibodies, antigens, polynucleotides and ligands such as biotin. Such labels allow the amount of linker to be quantified.
  • the label could also be a cleavable purification tag, such as biotin, or a specific sequence to show up in an identification method, such as a peptide that is not present in the protein itself, but that is released by trypsin digestion.
  • a preferred method of connecting the pore monomer conjugates is via cysteine linkage. This can be mediated by a bi-functional chemical crosslinker or by an amino acid linker with a terminal presented cysteine residue.
  • Another preferred method of attachment via 4-azidophenylalanine or Faz linkage can be mediated by a bi-functional chemical linker or by a polypeptide linker with a terminal presented 4-azidophenylalanine or Faz residue. Additional suitable linkers are discussed in more detail below.
  • pore complex refers to an oligomeric pore complex comprising at least one pore monomer conjugate of the invention (including, e.g., one or more pore monomer conjugates such as two or more pore monomer conjugates, three or more pore monomer conjugates etc.).
  • the pore complex of the invention has the features of a biological pore, i.e., it has a typical protein structure and defines a channel. When the pore complex is provided in an environment having membrane components, membranes, cells, or an insulating layer, the pore complex will insert in the membrane or the insulating layer and form a "transmembrane pore complex".
  • the CsgG part of the pore complex of the invention i.e., the part formed from the at least one CsgG pore monomer in the at least one conjugate of the invention
  • the CsgG constriction in the pore complex of the invention preferably has or comprises any of the constriction diameters described above.
  • the at least one CsgF peptide (in the at least one pore monomer conjugate or construct) preferably forms a constriction in the pore complex.
  • the at least one CsgF peptide is preferably inserted into the lumen of the pore complex.
  • the invention relates to CsgG pores complexed with a CsgF peptide that introduces an additional channel constriction in the pore complex and surprisingly results in an increased current range and increased signal-to-noise ratio (SNR).
  • SNR signal-to-noise ratio
  • Pores comprising the pore monomer conjugates of the invention can improve the characterisation of analytes, such as polynucleotides, providing a more discriminating direct relationship between the observed current as the polynucleotide moves through the pore.
  • analytes such as polynucleotides
  • the pore complex may facilitate characterization of polynucleotides that contain at least one homopolymeric stretch, e.g., several consecutive copies of the same nucleotide that otherwise exceed the interaction length of the single CsgG constriction.
  • small molecule analytes including organic or inorganic drugs and pollutants passing through the pore complex will consecutively pass the two constrictions.
  • the chemical nature of either constriction can be independently modified, each giving unique interaction properties with the analyte, thus providing additional discriminating power during analyte detection.
  • the CsgF constriction formed in the pore complex preferably has a diameter in the range of from about 5 to about 20 A, such as from about 7 to about 18 A, from about 10 A to about 15 A or from about 11 to about 12 A.
  • the additional CsgF peptide constriction may be about lOnm or less, such as about 5nm or less, such as about 1, 2, 3, 4, 5, 6, 7, 8, 9 nm from the constriction of the CsgG pore. Distances between the CsgF peptide and CsgG pore monomer are also discussed above with reference to the pore monomer conjugates of the invention.
  • the pore complex or transmembrane pore complex of the invention includes a pore complex with two constrictions, i.e., two channel constrictions positioned in such a way that one constriction does not interfere in the accuracy of the other constriction.
  • Said pore complexes may include any of the mutations, CsgG pore monomers or CsgF peptides are described in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2019/002893, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (herein all incorporated by reference in their entirety).
  • the pore complex or transmembrane pore complex of the invention includes a pore complex with one constriction.
  • the constriction may be removed from the CsgG pore monomer in the conjugate of the invention such that the pore complex of the invention only contains one constriction provided by the CsgF peptide.
  • the invention provides a pore complex comprising at least one pore monomer conjugate of the invention.
  • the pore complex typically comprises at least 6, 7, 8, 9 or 10 pore monomer conjugates of the invention.
  • the pore complex preferably comprises 8 or 9 pore monomer conjugates of the invention.
  • the pore monomer conjugates are typically the same (i.e., identical).
  • the pore complex is preferably a homooligomer comprising 6 to 10, such as 6, 7, 8, 9 or 10, pore monomer conjugates of the invention.
  • the pore monomer conjugates are typically identical.
  • the pore complex preferably comprises 8 or 9 identical pore monomer conjugates of the invention.
  • the pore monomer conjugates may be any of those discussed above.
  • the invention provides a pore complex comprising at least one construct of the invention.
  • the pore complex typically comprises at least 1, 2, 3, 4 or 5 constructs of the invention.
  • the pore complex comprises sufficient CsgG pore monomers to form a pore.
  • an octameric pore may comprise (a) four constructs each comprising two pore monomer conjugates, (b) two constructs each comprising four pore monomer conjugates, (c) one construct comprising two pore monomer conjugates and six pore monomer conjugates that do not form part of a construct, (d) three constructs comprising two pore monomer conjugates and two pore monomer conjugates that do not form part of a construct, and (e) combinations thereof.
  • One or more constructs of the invention may be used to form a pore complex for characterising, such as sequencing, polynucleotides.
  • the pore complex preferably comprises 4 constructs of the invention each of which comprises two pore monomer conjugates.
  • the constructs are typically the same (i.e., identical).
  • the pore complex is preferably a homooligomer comprising 1-5, such as 1, 2, 3, 4, 5, constructs of the invention.
  • the constructs are typically the same (i.e., identical).
  • the pore complex preferably comprises 4 identical constructs of the invention each of which comprises two pore monomer conjugate.
  • the constructs may be any of those discussed above.
  • the CsgG pore monomers in the CsgG pore are preferably all approximately the same length or are the same length.
  • the barrels of the CsgG pore monomers of the invention in the pore are preferably approximately the same length or are the same length. Length may be measured in number of amino acids and/or units of length.
  • the pore complex of the invention may be isolated, substantially isolated, purified or substantially purified.
  • a pore complex of the invention is isolated or purified if it is completely free of any other components, such as lipids or other pores.
  • a pore complex is substantially isolated if it is mixed with carriers or diluents which will not interfere with its intended use.
  • a pore complex is substantially isolated or substantially purified if it is present in a form that comprises less than 10%, less than 5%, less than 2% or less than 1% of other components, such as block copolymers, lipids or other pores.
  • a pore complex of the invention may be present in a membrane. Suitable membranes are discussed below.
  • a pore complex of the invention may be present as an individual or single pore complex.
  • a pore complex of the invention may be present in a homologous or heterologous population of two or more pore complexes or pores.
  • Other formats involving the pore complexes of the invention are discussed in more detail below.
  • the invention also provides a pore multimer comprising two or more pores, wherein at least one of the pores is a pore complex of the invention.
  • the multimer may comprise any number of pores, such as 3, 4, 5, 6, 7 or 8 or more pores. Any number of the pores in the multimer, including all of them, may be a pore complex of the invention.
  • the pore multimer may be a double pore complex comprising a first pore complex of the invention and a second pore or complex.
  • the second pore or complex is typically derived from CsgG.
  • the second pore complex may be a complex of the invention. Both the first pore complex and the second pore complex are preferably pore complexes of the invention.
  • the first pore complex may be attached to the second pore (complex) by hydrophobic interactions and/or by one or more disulfide bonds.
  • One or more, such as 2, 3, 4, 5, 6, 8, 9, for example all, of the monomers in the first pore complex and/or the second pore (complex) may be modified to enhance such interactions. This may be achieved in any suitable way.
  • Particular methods of forming double pores from CsgG- derived pores are described in WO 2019/002893 (incorporated by reference herein in its entirety).
  • the pore multimer of the invention may be isolated, substantially isolated, purified or substantially purified. Such terms are defined above with reference to the pore complexes of the invention.
  • the invention also provides a pore complex of the invention or a pore multimer of the invention which is comprised in a membrane.
  • the invention also provides a membrane comprising a pore complex of the invention or a pore multimer of the invention.
  • proteins may be modified to assist their identification or purification, for example by the addition of a streptavidin tag or by the addition of a signal sequence to promote their secretion from a cell where the monomer does not naturally contain such a sequence.
  • the proteins may also be produced using D-amino acids or a mixture of L-amino acids and D-amino acids. This is conventional in the art for producing such proteins or peptides.
  • the CsgG pore monomer, the CsgF peptide, the pore monomer conjugate, the construct, the pore complex, or the pore multimer may be chemically modified.
  • the protein can be chemically modified in any way and at any site.
  • the protein may be chemically modified by attachment of a molecule to one or more cysteines (cysteine linkage), attachment of a molecule to one or more lysines, attachment of a molecule to one or more non-natural amino acids, enzyme modification of an epitope or modification of a terminus. Suitable methods for carrying out such modifications are well- known in the art.
  • the protein may be chemically modified by the attachment of any molecule, such as a dye or a fluorophore.
  • the protein may be chemically modified with a molecular adaptor that facilitates the interaction between a pore comprising the monomer and a target nucleotide or target polynucleotide sequence.
  • Suitable adaptors including a cyclic molecule, a cyclodextrin, a species that is capable of hybridization, a DNA binder or interchelator, a peptide or peptide analogue, a synthetic polymer, an aromatic planar molecule, a small positively charged molecule or a small molecule capable of hydrogen-bonding, are described in WO 2019/002893 (incorporated by reference herein in its entirety).
  • the molecular adaptor may be attached using any of the methods and linkers discussed above.
  • the protein may be attached to a polynucleotide binding protein.
  • Polynucleotide binding proteins are discussed below.
  • the protein can be covalently attached to the monomer using any method known in the art.
  • the monomer and protein may be chemically fused or genetically fused. Genetic fusion of a monomer to a polynucleotide binding protein is discussed in WO 2010/004265 (incorporated herein by reference in its entirety).
  • the polynucleotide binding protein may be attached via cysteine linkage using any method described above.
  • the polynucleotide binding protein may be attached directly to the protein via one or more linkers.
  • the molecule may be attached to the CsgG pore monomer using the hybridization linkers described in as WO 2010/086602 (incorporated herein by reference in its entirety).
  • peptide linkers may be used. Suitable peptide linkers are discussed above.
  • any of the proteins may be modified to assist their identification or purification, for example by the addition of histidine residues (a his tag), aspartic acid residues (an asp tag), a streptavidin tag, a flag tag, a SUMO tag, a GST tag or a MBP tag, or by the addition of a signal sequence to promote their secretion from a cell where the polypeptide does not naturally contain such a sequence.
  • An alternative to introducing a genetic tag is to chemically react a tag onto a native or engineered position on the protein. An example of this would be to react a gel-shift reagent to a cysteine engineered on the outside of the protein. This has been demonstrated as a method for separating hemolysin heterooligomers (Chem Biol. 1997 Jul;4(7):497-505).
  • any of the proteins may be labelled with a revealing label.
  • the revealing label may be any suitable label which allows the protein to be detected. Suitable labels include, but are not limited to, fluorescent molecules, radioisotopes, e.g., 1251, 35S, enzymes, antibodies, antigens, polynucleotides, and ligands such as biotin.
  • the protein may also contain other non-specific modifications as long as they do not interfere with the function of the protein. A number of non-specific side chain modifications are known in the art and may be made to the side chains of the protein(s). Such modifications include, for example, reductive alkylation of amino acids by reaction with an aldehyde followed by reduction with NaBH4, amidation with methylacetimidate or acylation with acetic anhydride.
  • any of the proteins can be produced using standard methods known in the art.
  • Polynucleotide sequences encoding a protein may be derived and replicated using standard methods in the art.
  • Polynucleotide sequences encoding a protein may be expressed in a bacterial host cell using standard techniques in the art.
  • the protein may be produced in a cell by in situ expression of the polypeptide from a recombinant expression vector.
  • the expression vector optionally carries an inducible promoter to control the expression of the polypeptide.
  • Proteins may be produced in large scale following purification by any protein liquid chromatography system from protein producing organisms or after recombinant expression.
  • Typical protein liquid chromatography systems include FPLC, AKTA systems, the Bio-Cad system, the Bio-Rad BioLogic system and the Gilson HPLC system.
  • the invention provides methods for producing a pore monomer conjugate of the invention.
  • the method comprises attaching, preferably covalently attaching, the CsgF peptide to the CsgG pore monomer at two or more positions.
  • the method typically comprises modifying the CsgF peptide at two or more positions to include two or more reactive groups capable of attaching to two or more positions in the CsgG pore monomer.
  • the two or more reactive groups may be the same.
  • the two or more reactive groups may be different.
  • the methods preferably comprise contacting the CsgF peptide and the CsgG pore monomer with two or more linkers.
  • the components may be contacted with the two or more linkers in any order, such as CsgF peptide first and then the CsgG pore monomer, the CsgG pore monomer first and then the CsgF peptide or both components at the same time.
  • One or more linkers may be attached to the CsgF peptide and one or more linkers may be attached to the CsgG pore monomer before the two proteins are attached at two or more positions.
  • the two or more linkers are preferably attached to the CsgF peptide or the CsgG pore monomer first and then attached to the other component of the conjugate.
  • the method preferably comprises covalently attaching the two or more linkers to the CsgF peptide and then contacting the linkers and CsgF peptide with the CsgG pore monomer under conditions which attach, preferably covalently attach, the CsgF peptide to the CsgG pore monomer at two or more positions. Such conditions are well known to a person skilled in the art and are discussed in the Example.
  • the method is typically carried out in vitro as defined below.
  • the invention also provides methods for producing a pore complex of the invention or a pore multimer of the invention.
  • the method may involve expressing the pore complex in a host cell.
  • the method may comprise expressing at least one pore monomer conjugate of the invention or a construct of the invention and sufficient pore monomers or constructs to form the pore complex or the pore multimer in a host cell and allowing the pore complex or pore multimer to form in the host cell.
  • the sufficient pore monomers or constructs are preferably sufficient pore monomer conjugates of the invention or sufficient constructs of the invention.
  • the numbers of CsgG pore monomers, pore monomer conjugates or constructs needed to form the pore complexes of the invention or pore multimers of the invention are discussed above. Suitable host cells and expression systems are known in the art and are discussed in the Example.
  • the method may involve forming the pore complex in a non-cellular or in vitro context.
  • the method may comprise contacting at least one pore monomer conjugate of the invention or a construct of the invention with sufficient pore monomers or constructs in vitro and allowing the formation of the pore complex or pore multimer.
  • the pore monomer conjugate or the construct may be produced separately by in vitro translation and transcription (IVTT) and then incubated with the sufficient pore monomers or constructs.
  • the sufficient pore monomers or constructs are preferably sufficient pore monomer conjugates of the invention or sufficient constructs of the invention.
  • the numbers of CsgG pore monomers, pore monomer conjugates or constructs needed to form the pore complexes of the invention or pore multimers of the invention are discussed above.
  • the method may be conducted in an "in vitro system", which refers to a system comprising at least the necessary components and environment to execute said method, and makes use of biological molecules, organisms, a cell (or part of a cell) outside of their normal naturally occurring environment, permitting a more detailed, more convenient, or more efficient analysis than can be done with whole organisms.
  • An in vitro system may also comprise a suitable buffer composition provided in a test tube, wherein said protein components to form the complex have been added. A person skilled in the art is aware of the options to provide said system.
  • Some or all of the components of the pore complex or pore multimer may be tagged to facilitate purification. Purification can also be performed when the components are untagged. Methods known in the art (e.g., ion exchange, gel filtration, hydrophobic interaction column chromatography etc.) can be used alone or in different combinations to purify the components of the pore.
  • the pore complex or pore multimer can be made prior to insertion into a membrane or after insertion of the components into a membrane.
  • the invention provides a method of determining the presence, absence or one or more characteristics of a target analyte.
  • the method involves contacting the target analyte with a pore complex of the invention or pore multimer of the invention such that the target analyte moves with respect to, such as into or through, the pore complex or pore multimer and taking one or more measurements as the analyte moves with respect to the pore complex or pore multimer and thereby determining the presence, absence or one or more characteristics of the analyte.
  • the target analyte may also be called the template analyte or the analyte of interest.
  • the pore complex of the invention or the pore multimer of the invention may be any of those discussed above.
  • the method is for determining the presence, absence or one or more characteristics of a target analyte.
  • the method may be for determining the presence, absence or one or more characteristics of at least one analyte.
  • the method may concern determining the presence, absence or one or more characteristics of two or more analytes.
  • the method may comprise determining the presence, absence or one or more characteristics of any number of analytes, such as 2, 5, 10, 15, 20, 30, 40, 50, 100 or more analytes. Any number of characteristics of the one or more analytes may be determined, such as 1, 2, 3, 4, 5, 10 or more characteristics.
  • the degree of reduction in ion flow is related to the size of the obstruction within, or in the vicinity of, the pore. Binding of a molecule of interest, also referred to as an "analyte", in or near the pore therefore provides a detectable and measurable event, thereby forming the basis of a "biological sensor".
  • Suitable molecules for nanopore sensing include nucleic acids; proteins; peptides; polysaccharides and small molecules (refers here to a low molecular weight (e.g., ⁇ 900Da or ⁇ 500Da) organic or inorganic compound) such as pharmaceuticals, toxins, cytokines, and pollutants. Detecting the presence of biological molecules finds application in personalised drug development, medicine, diagnostics, life science research, environmental monitoring and in the security and/or the defence industry.
  • the pore complex or pore multimer may serve as a molecular or biological sensor.
  • the analyte molecule that is to be detected may bind to either face of the channel, or within the lumen of the channel itself. The position of binding may be determined by the size of the molecule to be sensed.
  • the target analyte is preferably a metal ion, an inorganic salt, a polymer, an amino acid, a peptide, a polypeptide, a protein, a nucleotide, an oligonucleotide, a polynucleotide, a monosaccharide, an oligosaccharide, a polysaccharide, a dye, a bleach, a pharmaceutical, a diagnostic agent, a recreational drug, an explosive, a toxic compound, or an environmental pollutant.
  • the analyte may comprise two or more different molecules, such as a peptide and a polypeptide.
  • the method may concern determining the presence, absence or one or more characteristics of two or more analytes of the same type, such as two or more proteins, two or more nucleotides or two or more pharmaceuticals.
  • the method may concern determining the presence, absence or one or more characteristics of two or more analytes of different types, such as one or more proteins, one or more nucleotides and one or more pharmaceuticals.
  • the target analyte can be secreted from cells.
  • the target analyte can be an analyte that is present inside cells such that the analyte must be extracted from the cells before the method can be carried out.
  • the pore complex or pore multimer may be modified via recombinant or chemical methods to increase the strength of binding, the position of binding, or the specificity of binding of the molecule to be sensed. Typical modifications include addition of a specific binding moiety complimentary to the structure of the molecule to be sensed.
  • this binding moiety may comprise a cyclodextrin or an oligonucleotide; for small molecules this may be a known complimentary binding region, for example the antigen binding portion of an antibody or of a non-antibody molecule, including a single chain variable fragment (scFv) region or an antigen recognition domain from a T- cell receptor (TCR); or for proteins, it may be a known ligand of the target protein.
  • scFv single chain variable fragment
  • TCR T- cell receptor
  • the pore complex or pore multimer may be rendered capable of acting as a molecular sensor for detecting presence in a sample of suitable antigens (including epitopes) that may include cell surface antigens, including receptors, markers of solid tumours or haematologic cancer cells (e.g. lymphoma or leukaemia), viral antigens, bacterial antigens, protozoal antigens, allergens, allergy related molecules, albumin (e.g. human, rodent, or bovine), fluorescent molecules (including fluorescein), blood group antigens, small molecules, drugs, enzymes, catalytic sites of enzymes or enzyme substrates, and transition state analogues of enzyme substrates.
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • suitable antigens including epitopes
  • modifications may be achieved using known genetic engineering and recombinant DNA techniques.
  • the positioning of any adaptation would be dependent on the nature of the molecule to be sensed, for example, the size, three-dimensional structure, and its biochemical nature.
  • the choice of adapted structure may make use of computational structural design. Determination and optimization of protein-protein interactions or protein-small molecule interactions can be investigated using technologies such as a BIAcore® which detects molecular interactions using surface plasmon resonance (BIAcore, Inc., Piscataway, NJ; see also www.biacore.com).
  • the analyte is preferably an amino acid, a peptide, a polypeptides, or protein.
  • the amino acid, peptide, polypeptide or protein can be naturally occurring or non-naturally occurring.
  • the polypeptide or protein can include within them synthetic or modified amino acids. Several different types of modification to amino acids are known in the art. Suitable amino acids and modifications thereof are above. It is to be understood that the target analyte can be modified by any method available in the art.
  • the analyte is preferably a polynucleotide, such as a nucleic acid, which is defined as a macromolecule comprising two or more nucleotides.
  • Nucleic acids are particularly suitable for nanopore sequencing.
  • the naturally occurring nucleic acid bases in DNA and RNA may be distinguished by their physical size.
  • the variation in ion flow may be recorded. Suitable electrical measurement techniques for recording ion flow variations are discussed above. Through suitable calibration, the characteristic reduction in ion flow can be used to identify the particular nucleotide and associated base traversing the channel in realtime.
  • the open-channel ion flow is reduced as the individual nucleotides of the nucleic sequence of interest sequentially pass through the channel of the nanopore due to the partial blockage of the channel by the nucleotide. It is this reduction in ion flow that is measured using the suitable recording techniques described above.
  • the reduction in ion flow may be calibrated to the reduction in measured ion flow for known nucleotides through the channel resulting in a means for determining which nucleotide is passing through the channel, and therefore, when done sequentially, a way of determining the nucleotide sequence of the nucleic acid passing through the nanopore.
  • sequencing may be performed upon an intact nucleic acid polymer that is 'threaded' through the pore via the action of an associated polymerase, for example.
  • sequences may be determined by passage of nucleotide triphosphate bases that have been sequentially removed from a target nucleic acid in proximity to the pore (see for example WO 2014/187924 incorporated herein by reference in its entirety).
  • the polynucleotide or nucleic acid may comprise any combination of any nucleotides.
  • the nucleotides can be naturally occurring or artificial.
  • One or more nucleotides in the polynucleotide can be oxidized or methylated.
  • One or more nucleotides in the polynucleotide may be damaged.
  • the polynucleotide may comprise a pyrimidine dimer. Such dimers are typically associated with damage by ultraviolet light and are the primary cause of skin melanomas.
  • One or more nucleotides in the polynucleotide may be modified, for instance with a label or a tag, for which suitable examples are known by a skilled person.
  • the polynucleotide may comprise one or more spacers.
  • a nucleotide typically contains a nucleobase, a sugar and at least one phosphate group.
  • the nucleobase and sugar form a nucleoside.
  • the nucleobase is typically heterocyclic.
  • Nucleobases include, but are not limited to, purines and pyrimidines and more specifically adenine (A), guanine (G), thymine (T), uracil (U) and cytosine (C).
  • the sugar is typically a pentose sugar.
  • Nucleotide sugars include, but are not limited to, ribose and deoxyribose. The sugar is preferably a deoxyribose.
  • the polynucleotide preferably comprises the following nucleosides: deoxyadenosine (dA), deoxyuridine (dU) and/or thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC).
  • the nucleotide is typically a ribonucleotide or deoxyribonucleotide.
  • the nucleotide typically contains a monophosphate, diphosphate, or triphosphate.
  • the nucleotide may comprise more than three phosphates, such as 4 or 5 phosphates. Phosphates may be attached on the 5' or 3' side of a nucleotide.
  • the nucleotides in the polynucleotide may be attached to each other in any manner.
  • the nucleotides are typically attached by their sugar and phosphate groups as in nucleic acids.
  • the nucleotides may be connected via their nucleobases as in pyrimidine dimers.
  • the polynucleotide may be single stranded or double stranded. At least a portion of the polynucleotide is preferably double stranded.
  • the polynucleotide is most preferably ribonucleic nucleic acid (RNA) or deoxyribonucleic acid (DNA).
  • said method using a polynucleotide as an analyte alternatively comprises determining one or more characteristics selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide and (v) whether or not the polynucleotide is modified.
  • the polynucleotide can be any length (i).
  • the polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs in length.
  • the polynucleotide can be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs in length or 100000 or more nucleotides or nucleotide pairs in length. Any number of polynucleotides can be investigated. For instance, the method may concern characterising 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more polynucleotides.
  • polynucleotides may be different polynucleotides or two instances of the same polynucleotide.
  • the polynucleotide can be naturally occurring or artificial.
  • the method may be used to verify the sequence of a manufactured oligonucleotide. The method is typically carried out in vitro.
  • Nucleotides can have any identity (ii), and include, but are not limited to, adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidine monophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidine monophosphate, 5- hydroxy methylcytidine monophosphate, cytidine monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate (dCMP) and deoxymethylcytidine monophosphate.
  • AMP adenosine monophosphate
  • GFP guanosine monophosphate
  • TMP thymidine monophosphate
  • UMP
  • the nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP.
  • a nucleotide may be abasic (i.e., lack a nucleobase).
  • a nucleotide may also lack a nucleobase and a sugar (i.e., is a C3 spacer).
  • the sequence of the nucleotides (iii) is determined by the consecutive identity of following nucleotides attached to each other throughout the polynucleotide strain, in the 5' to 3' direction of the strand.
  • the pore complexes and pore multimers of the invention are particularly useful in analysing homopolymers. For example, they may be used to determine the sequence of a polynucleotide comprising two or more, such as at least 3, 4, 5, 6, 7, 8, 9 or 10, consecutive nucleotides that are identical. For example, they may be used to sequence a polynucleotide comprising a polyA, polyT, polyG and/or polyC region.
  • the CsgG pore constriction is made of the residues at the 51, 55 and 56 positions of SEQ ID NO: 3.
  • the constriction of CsgG and its constriction mutants are generally sharp.
  • interactions of approximately 5 bases of DNA with the constriction of the pore at any given time dominate the current signal.
  • these sharper constrictions are very good in reading mixed sequence regions of DNA (when A, T, G and C are mixed)
  • the signal becomes flat and lack information when there is a homopolymeric region within the DNA (eg: polyT, polyG, polyA, polyC).
  • 5 bases dominate the signal of the CsgG and its constriction mutants, it's difficult to discriminate photopolymers longer than 5 without using additional dwell time information.
  • DNA is passing through a second constriction formed by the CsgF peptide, more DNA bases will interact with the combined constrictions, increasing the length of the homopolymers that can be discriminated.
  • the movement of the polynucleotide with respect to the pore, such as through the pore, is preferably controlled using a polynucleotide binding protein. Suitable proteins are discussed in more detail below.
  • the invention provides a method for determining the presence, absence or one or more characteristics of a target polynucleotide, comprising the steps of:
  • the one or more characteristics of the target analyte are preferably measured by electrical measurement and/or optical measurement.
  • the electrical measurement is a current measurement, an impedance measurement, a tunnelling measurement, or a field effect transistor (FET) measurement.
  • FET field effect transistor
  • the method preferably comprises measuring the current flowing through the pore complex or the pore multimer as the analyte moves with respect to, such as through, the pore.
  • the invention also provides a polynucleotide which encodes a pore monomer conjugate of the invention or a construct of the invention.
  • the polynucleotide may be any of those discussed above.
  • the invention also provides an expression vector comprising a polynucleotide of the invention.
  • the invention also provides a host cell comprising a polynucleotide of the invention or a host cell of the invention. Suitable vectors and host cells are known in the art.
  • kits for characterising a target analyte comprises (a) a pore complex of the invention or a pore multimer of the invention and (b) the components of a membrane. Suitable membranes and components are discussed below.
  • the kit comprises (a) a pore complex of the invention or a pore multimer of the invention and (b) a polynucleotide binding protein.
  • the kit preferably further comprises the components of a membrane.
  • the kit may comprise components of any type of membranes, such as an amphiphilic layer or a triblock copolymer membrane.
  • Preferred polynucleotide binding proteins are polymerases, exonucleases, helicases and topoisomerases, such as gyrases. Suitable enzymes include, but are not limited to, exonuclease I from E. coli, exonuclease III enzyme from E. coli, RecJ from T.
  • thermophilus and bacteriophage lambda exonuclease, TatD exonuclease and variants thereof Three subunits comprising the RecJ sequence from T. thermophilus or a variant thereof interact to form a trimer exonuclease.
  • the polymerase may be PyroPhage® 3173 DNA Polymerase (which is commercially available from Lucigen® Corporation), SD Polymerase (commercially available from Bioron®) or variants thereof.
  • the enzyme may be Phi29 DNA polymerase or a variant thereof.
  • the topoisomerase is preferably a member of any of the Moiety Classification (EC) groups 5.99.1.2 and 5.99.1.3.
  • the enzyme is most preferably derived from a helicase, such as Hel308 Mbu, Hel308 Csy, Hel308 Tga, Hel308 Mhu, Tral Eco, XPD Mbu or a variant thereof.
  • a helicase such as Hel308 Mbu, Hel308 Csy, Hel308 Tga, Hel308 Mhu, Tral Eco, XPD Mbu or a variant thereof.
  • Any helicase may be used in the invention.
  • the helicase may be or be derived from a Hel308 helicase, a RecD helicase, such as Tral helicase or a TrwC helicase, a XPD helicase or a Dda helicase.
  • the helicase may be any of the helicases, modified helicases or helicase constructs disclosed in WO 2013/057495; WO 2013/098562; WO 2013098561; WO 2014/013260; WO 2014/013259; WO 2014/013262 and WO 2015/055981. All of these are incorporated by reference in their entirety.
  • the kit may further comprise one or more anchors, such as cholesterol, for coupling the target analyte to the membrane.
  • the kit may further comprise one or more polynucleotide adaptors that can be attached to a target polynucleotide to facilitate characterisation of the polynucleotide.
  • the anchor such as cholesterol, is preferably attached to the polynucleotide adaptor.
  • the kit may additionally comprise one or more other reagents or instruments which enable any of the embodiments mentioned above to be carried out.
  • reagents or instruments include one or more of the following: suitable buffer(s) (aqueous solutions), means to obtain a sample from a subject (such as a vessel or an instrument comprising a needle), means to amplify and/or express polynucleotides or voltage or patch clamp apparatus.
  • Reagents may be present in the kit in a dry state such that a fluid sample resuspends the reagents.
  • the kit may also, optionally, comprise instructions to enable the kit to be used in the method of the invention or details regarding for which organism the method may be used.
  • the kit may also comprise additional components useful in analyte characterization.
  • the invention also provides an apparatus for characterising target analytes in a sample, comprising (a) a plurality of pore complexes of the invention or a plurality of pore multimers of the invention and (b) a plurality of polynucleotide binding proteins.
  • the plurality of pore complexes or plurality of pore multimers may be any of those discussed above.
  • the invention also provides an apparatus comprising a pore complex of the invention or a pore multimer of the invention inserted into an in vitro membrane.
  • the invention also provides an apparatus produced by a method comprising: (i) obtaining a pore complex of the invention or a pore multimer of the invention and (ii) contacting the pore complex or pore multimer with an in vitro membrane such that the pore complex or pore multimer is inserted in the in vitro membrane.
  • the invention also provides an array comprising a plurality of membranes of the invention. Any of the embodiments discussed above with respect to the membranes of the invention equally apply the array of the invention.
  • the array may be set up to perform any of the methods described below.
  • each membrane in the array comprises one pore complex or pore multimer. Due to the manner in which the array is formed, for example, the array may comprise one or more membranes that do not comprise a pore complex or pore multimer, and/or one or more membranes that comprise two or more pores complexes or multimers.
  • the array may comprise from about 2 to about 1000, such as from about 10 to about 800, from about 20 to about 600 or from about 30 to about 500 membranes.
  • the invention provides a system comprising (a) a membrane of the invention or an array of the invention, (b) means for applying a potential across the membrane(s) and (c) means for detecting electrical or optical signals across the membrane(s).
  • the pores and membranes may be any as described above and below.
  • the system further comprises a first chamber and a second chamber, wherein the first and second chambers are separated by the membrane(s).
  • the system may further comprise a target analyte, wherein the target analyte is transiently located within the continuous channel and wherein one end of the target analyte is located in the first chamber and one end of the target analyte is located in the second chamber.
  • the target analyte is preferably a target polypeptide or a target polynucleotide.
  • the system further comprises an electrically conductive solution in contact with the pore(s), electrodes providing a voltage potential across the membrane(s), and a measurement system for measuring the current through the pore(s).
  • the voltage applied across the membranes and pore is preferably from +5 V to -5 V, such as -600 mV to +600mV or -400 mV to +400 mV.
  • the voltage used is preferably in the range 100 mV to 240 mV and more preferably in the range of 120 mV to 220 mV. It is possible to increase discrimination between different amino acids or nucleotides by a pore by using an increased applied potential. Any suitable electrically conductive solution may be used.
  • the solution may comprise charge carriers, such as metal salts, for example alkali metal salt, halide salts, for example chloride salts, such as alkali metal chloride salt.
  • Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1- ethyl-3-methyl imidazolium chloride.
  • salt is present in the aqueous solution in the chamber. Potassium chloride (KCI), sodium chloride (NaCI), caesium chloride (CsCI) or a mixture of potassium ferrocyanide and potassium ferricyanide is typically used.
  • KCI, NaCI and a mixture of potassium ferrocyanide and potassium ferricyanide are preferred.
  • the charge carriers may be asymmetric across the membrane. For instance, the type and/or concentration of the charge carriers may be different on each side of the membrane, e.g., in each chamber.
  • the salt concentration may be at saturation.
  • the salt concentration may be 3 M or lower and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M, from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M or from 1 M to 1.4 M.
  • the salt concentration is preferably from 150 mM to 1 M.
  • the method is preferably carried out using a salt concentration of at least 0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least 0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M or at least 3.0 M.
  • High salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of an amino acid or nucleotide to be identified against the background of normal current fluctuations.
  • a buffer may be present in the electrically conductive solution.
  • the buffer is phosphate buffer.
  • Other suitable buffers are HEPES and Tris-HCI buffer.
  • the pH of the electrically conductive solution may be from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5 to 8.5.
  • the pH used is preferably about 7.5.
  • the system may be comprised in an apparatus.
  • the apparatus may be any conventional apparatus for analyte analysis, such as an array or a chip.
  • the apparatus is preferably set up to carry out the disclosed method.
  • the apparatus may comprise a chamber comprising an aqueous solution and a barrier that separates the chamber into two sections.
  • the barrier typically has an aperture in which the membrane(s) containing the pore(s) are formed.
  • the barrier forms the membrane in which the pore is present.
  • the apparatus may also comprise an electrical circuit capable of applying a potential and measuring an electrical signal across the membrane and pore.
  • the apparatus may be any of those described in WO 2008/102120, WO 2009/077734, WO 2010/122293, WO 2011/067559, or WO 00/28312 (all incorporated herein by reference in their entirety).
  • the membrane is preferably an amphiphilic layer.
  • An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties.
  • the amphiphilic molecules may be synthetic or naturally occurring.
  • Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009, 25, 10447-10450).
  • Block copolymers are polymeric materials in which two or more monomer sub-units that are polymerized together to create a single polymer chain.
  • Block copolymers typically have properties that are contributed by each monomer sub-unit. However, a block copolymer may have unique properties that polymers formed from the individual sub-units do not possess. Block copolymers can be engineered such that one of the monomer sub-units is hydrophobic (i.e., lipophilic), whilst the other sub-unit(s) are hydrophilic whilst in aqueous media. In this case, the block copolymer may possess amphiphilic properties and may form a structure that mimics a biological membrane.
  • the block copolymer may be a diblock (consisting of two monomer sub-units) but may also be constructed from more than two monomer sub-units to form more complex arrangements that behave as amphipiles.
  • the copolymer may be a triblock, tetrablock or pentablock copolymer.
  • the membrane is preferably a triblock copolymer membrane.
  • the membrane may comprise one of the membranes disclosed in International Application No. WO 2014/064443 or WO 2014/064444.
  • the amphiphilic molecules may be chemically modified or functionalised to facilitate coupling of the polynucleotide.
  • the amphiphilic layer may be a monolayer or a bilayer.
  • the amphiphilic layer is typically planar.
  • the amphiphilic layer may be curved.
  • the amphiphilic layer may be supported.
  • Amphiphilic membranes are typically naturally mobile, essentially acting as two-dimensional fluids with lipid diffusion rates of approximately IO -8 cm s 4 . This means that the pore and coupled polynucleotide can typically move within an amphiphilic membrane.
  • the membrane may be a lipid bilayer.
  • Lipid bilayers are models of cell membranes and serve as excellent platforms for a range of experimental studies.
  • lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel recording.
  • lipid bilayers can be used as biosensors to detect the presence of a range of substances.
  • the lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, a planar lipid bilayer, a supported bilayer, or a liposome.
  • the lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in WO 2008/102121, WO 2009/077734, and WO 2006/100484 (all incorporated herein by reference in their entirety).
  • the membrane may comprise a solid-state layer.
  • Solid state layers can be formed from both organic and inorganic materials including, but not limited to, microelectronic materials, insulating materials such as Si 3 N 4 , A1 2 O 3 , and SiO, organic and inorganic polymers such as polyamide, plastics such as Teflon® or elastomers such as two-component addition-cure silicone rubber, and glasses.
  • the solid-state layer may be formed from graphene. Suitable graphene layers are disclosed in WO 2009/035647 (incorporated herein by reference in its entirety).
  • the pore is typically present in an amphiphilic membrane or layer contained within the solid-state layer, for instance within a hole, well, gap, channel, trench or slit within the solid-state layer.
  • amphiphilic membrane or layer contained within the solid-state layer for instance within a hole, well, gap, channel, trench or slit within the solid-state layer.
  • suitable solid state/amphiphilic hybrid systems are disclosed in WO 2009/020682 and WO 2012/005857 (both incorporated herein by reference in their entirety). Any of the amphiphilic membranes or layers discussed above may be used.
  • the method is typically carried out using (i) an artificial amphiphilic layer comprising a pore, (ii) an isolated, naturally occurring lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein.
  • the method is typically carried out using an artificial amphiphilic layer, such as a di- or tri-block copolymer layer.
  • the layer may comprise other transmembrane and/or intramembrane proteins as well as other molecules in addition to the pore. Suitable apparatus and conditions are discussed below.
  • the method of the invention is typically carried out in vitro.
  • SEQ ID NO:2 (>P0AEA2 (1 :277); WT Pro-CsgG from E. coli K12)
  • SEQ ID NO:4 (>P0AE98; coding sequence for WT CsgF from E. coli K12) ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGAC TTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAGCGCTC AGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACACCCTCAGCGTTA GATAACTTTACTCAGGCCATCCAGTCACAAATTTTAGGTGGGCTACTGTCGAATATTAATACCGGTAA ACCGGGCCGCATGGTGACCAACGATTATATTGTCGATATTGCCAACCGCGATGGTCAATTGCAGTTG AACGTGACAGATCGTAAAACCGGACAAACCTCGACCATCCAGGTTTCGGGTTTACAAAATAACTCAA CCGATTTT
  • SEQ ID NO:5 (>P0AE98 (1 : 138); WT Pro-CsgF from E. coli K12) MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSAL DNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTD F
  • SEQ ID NO:6 (>P0AE98 (20: 138); WT mature CsgF from E. coli K12) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSALDNFTQAIQSQILGGLLSNIN TGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTDF
  • Recombinant expression vectors encoding the CsgG variant nanopores with a C-terminal Strep affinity tag and ampicillin resistance gene are transformed into chemically competent E. coli cells.
  • the cells are plated onto an LB Agar plate containing appropriate antibiotics for selection.
  • a single colony from the agar plate is inoculated in LB Media with antibiotics and grown overnight.
  • the culture is diluted into autoinduction media plus necessary antibiotics and incubated at 18°C for 68 hours.
  • the cells are harvested through centrifugation before being lysed and extracted into lx Bugbuster extraction reagent (Merck 70921) and 0.1% DDM.
  • the pore is purified from the supernatant using affinity chromatography, heat treatment and then size exclusion chromatography, selecting for oligomoeric nanopores as judged by SDS-PAGE.
  • CsgG-CsgF complexes are prepared from nanopores purified as above and chemically synthesised CsgF peptides with or without one or two linkers capable of attaching to CsgG.
  • Nanopores are buffer exchanged into a pH 7.0 buffer with reducing agents removed and incubated in a 8x molar excess of peptide to CsgG monomer for Jackpot at 25°C. Reactions are stopped with heating at 60°C for 15 mins followed by centrifugation to remove any precipitate, DTT is added to 5 mM to prevent any further reaction.
  • a Y-adapter is prepared by annealing DNA oligonucleotides as described previously (WO 2016/034591, which is incorporated herein in its entirety). A DNA motor is loaded and closed on the adapter. The subsequent material is HPLC purified. The Y-adapter contains a 30 C3 leader section for easier capture by the nanopore and a side arm for tethering to the membrane.
  • the analyte being used to assess the DNA squiggle is a 3.6-kilobase DNA section from the 3' end of the lambda genome.
  • Preparation of the analyte, ligating the analyte to the Y- adapter, SPRI-bead clean-up of the ligated analyte and addition to a minlON flow cell is carried out using the Oxford Nanopore Technologies Q-SQK-LSK109 protocol.
  • the pore complex comprises CsgF that is functionalized with a single linker or two linkers capable of attaching to CsgG
  • CsgG/CsgF complexes pores with open pore currents between 70 pA and 140 pA. Both classifications also have open pore noise ⁇ 18 pA.
  • SNR is the signal to noise ratio which is the range of the ionic current divided by the noise as single stranded DNA is translocating through the pore.
  • the SNR and/or range increase in the presence of a single attachment of CsgF to CsgG.
  • the SNR and/or range of double attached CsgG/CsgF complexes is also increased compared with the single attached complexes.

Abstract

The present invention relates to novel pore monomer conjugates, pore complexes formed from the conjugates and their uses in analyte detection and characterisation.

Description

NOVEL PORE MONOMERS AND PORES
TECHNICAL FIELD
The present invention relates to novel pore monomer conjugates, pore complexes formed from the conjugates and their uses in analyte detection and characterisation.
BACKGROUND
Nanopore sensing is an approach to analyte detection and characterization that relies on the observation of individual binding or interaction events between the analyte molecules and an ion conducting channel. Two of the essential components of analyte characterization using nanopore sensing are (1) the control of analyte movement through the pore and (2) the discrimination of the composing building blocks as the analyte is moved through the pore. During nanopore sensing, the narrowest part of the pore forms the most discriminating part of the nanopore with respect to the current signatures as a function of the passing analyte. CsgG was identified as an ungated, non-selective protein secretion channel from Escherichia coli (Goyal et al., 2014) and has been used as a nanopore for detecting and characterising analytes. Mutations to the wild-type CsgG pore that improve the properties of the pore in this context have also been disclosed (WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241, and WO 2019/002893, all incorporated by reference herein in their entirety).
For polynucleotide analytes, nucleotide discrimination is achieved by measuring the current as the polynucleotide passes through the pore. Multiple nucleotides contribute to the observed current, so the height of the channel constriction and extent of the interaction with the polynucleotide affect the relationship between observed current and polynucleotide sequence. While the current range and signal-to-noise ratio for nucleotide discrimination have been improved through mutation of the CsgG pore, a sequencing system would have higher performance if the current differences between nucleotides could be improved further. Accordingly, there is a need to identify novel ways to improve nanopore sensing features.
SUMMARY OF THE INVENTION
The inventors have surprisingly shown that pore complexes formed from pore monomer conjugates in which a CsgG pore monomer is attached to a CsgF peptide at two or more positions display an increased current range and/or increased signal-to-noise ratio (SNR) during analyte characterisation compared with conjugates with attachment at only one position.. Increased current range and increased SNR both improve the ability to discriminate analytes as they pass through the pore. Neither the improvement in range nor the improvement in SNR could be predicted from previous experiments using CsgG and CsgF. The invention therefore provides a pore monomer conjugate comprising a CsgG pore monomer attached to a CsgF peptide, wherein the CsgF peptide is attached to the CsgG pore monomer at two or more positions.
The invention also provides:
- a construct comprising two or more covalently attached pore monomer conjugates of the invention; a pore complex comprising at least one pore monomer conjugate of the invention or at least one construct of the invention, wherein the CsgF peptide(s) form(s) a constriction in the pore complex; a pore multimer comprising two or more pores, wherein at least one of the pores is a pore complex of the invention; a membrane comprising a pore complex of the invention or a pore multimer of the invention; a method for producing a pore monomer conjugate of the invention comprising attaching the CsgF peptide to the CsgG pore at two or more positions; a method for producing a pore complex of the invention or a pore multimer of the invention, the method comprising expressing at least one pore monomer conjugate of the invention or a construct of the invention and sufficient pore monomers or constructs to form the pore complex or the pore multimer in a host cell and allowing the pore complex or the pore multimer to form in the host cell; a method for producing a pore complex of the invention or a pore multimer of the invention, the method comprising contacting at least one pore monomer conjugate of the invention or a construct of the invention with sufficient pore monomers or constructs in vitro and allowing the formation of the pore complex or the pore multi mer; a method for determining the presence, absence or one or more characteristics of a target analyte, comprising the steps of:
(i) contacting the target analyte with a pore complex of the invention or a pore multimer of the invention, such that the target analyte moves with respect to the pore complex or the pore multimer; and
(ii) taking one or more measurements as the analyte moves with respect to the pore complex or the pore multimer and thereby determining the presence, absence or one or more characteristics of the analyte. use of a pore complex of the invention or a pore multimer of the invention to determine the presence, absence or one or more characteristics of a target analyte; a polynucleotide which encodes a pore monomer conjugate of the invention or a construct of the invention;
- a kit for characterising a target analyte comprising (a) a pore complex of the invention or a pore multimer of the invention and (b) the components of a membrane;
- a kit for characterising a target polynucleotide or a target polypeptide comprising (a) a pore complex of the invention or a pore multimer of the invention and (b) a polynucleotide binding protein;
- an apparatus for characterising a target polynucleotide or a target polypeptide in a sample, comprising (a) a plurality of pore complexes of the invention or a plurality of pore multimers of the invention and (b) a plurality of polynucleotide binding proteins; an array comprising a plurality of membranes of the invention; a system comprising (a) a membrane of the invention or an array of the invention, (b) means for applying a potential across the membrane(s) and (c) means for detecting electrical or optical signals across the membrane(s); an apparatus comprising a pore complex of the invention or a pore multimer of the invention inserted into an in vitro membrane; and
- an apparatus produced by a method comprising (i) obtaining a pore complex of the invention or a pore multimer of the invention and (ii) contacting the pore complex or a pore multimer with an in vitro membrane such that the pore complex or the pore multimer is inserted in the in vitro membrane.
DESCRIPTION OF THE FIGURES
Figure 1: The structure and size of the wild-type CsgG pore from Escherichia coli strain K12 (the databank accession code for this structure is 4UV3). The distances shown are measured from backbone to backbone of the amino acids forming the pore structure. The CsgG pore is a tightly interconnected symmetrical nonameric pore that resembles a crown. The overall height is 98 A, and the largest outer diameter is 120 A. It defines a central channel and consists of three parts: (A) the cap region, (B) the constriction region and (C) the transmembrane beta barrel region. Cap axial length, or height, is 39 A. It has an inner diameter of 43 A and a 66 A mouth. The beta barrel has 36 strands, an axial length of 39 A and inner diameter of 55 A. Transition between pore cap and beta barrel is sharp, being the constriction located among them, at the level of the predicted lipid-aqueous interface. The constriction is approximately 18.5 A in diameter and exhibits a length of 20A along the axis of the channel.
DESCRIPTION OF THE SEQUENCE LISTING
SEQ ID NO: 1 shows the polynucleotide sequence of wild-type E. coli CsgG from strain K12, including signal sequence (Gene ID: 945619).
SEQ ID NO: 2 shows the amino acid sequence of wild-type E. coli CsgG including signal sequence (Uniprot accession number P0AEA2).
SEQ ID NO: 3 shows the amino acid sequence of wild-type E. coli CsgG as a mature protein (Uniprot accession number P0AEA2).
SEQ ID NO: 4 shows the polynucleotide sequence of wild-type E. coli CsgF from strain K12, including signal sequence (Gene ID: 945622).
SEQ ID NO: 5 shows the amino acid sequence of wild-type E. coli CsgF including signal sequence (Uniprot accession number P0AE98).
SEQ ID NO: 6 shows the amino acid sequence of wild-type E. coli CsgF as a mature protein (Uniprot accession number P0AE98).
DETAILED DESCRIPTION
All publications, patents and patent applications cited herein, whether supra or infra, are hereby incorporated by reference in their entirety. All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the invention contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. Any reference signs in the claims shall not be construed as limiting the scope. Of course, it is to be understood that not necessarily all aspects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other aspects or advantages as may be taught or suggested herein. In addition, as used in this specification and the appended claims, the singular forms "a", "an", and "the" include plural referents unless the content clearly dictates otherwise. Thus, for example, reference to "a polynucleotide" includes two or more polynucleotides, reference to "a polynucleotide binding protein" includes two or more such proteins, reference to "a helicase" includes two or more helicases, reference to "a monomer" refers to two or more monomers, reference to "a pore" includes two or more pores and the like.
In all of the discussion herein, the standard one letter codes for amino acids are used. These are as follows: alanine (A), arginine (R), asparagine (N), aspartic acid (D), cysteine (C), glutamic acid (E), glutamine (Q), glycine (G), histidine (H), isoleucine (I), leucine (L), lysine (K), methionine (M), phenylalanine (F), proline (P), serine (S), threonine (T), tryptophan (W), tyrosine (Y) and valine (V). Standard substitution notation is also used, i.e., Q42R means that Q at position 42 is replaced with R.
In the paragraphs herein where different amino acids at a specific position are separated by the I symbol, the I symbol means "or". For instance, Q87R/K means Q87R or Q87K. In the paragraphs herein where different positions are separated by the I symbol, the I symbol means "and" such that Y51/N55 is Y51 and N55.
The general definitions in WO 2019/002893 are incorporated by reference herein in their entirety.
Pore monomer conjugates
The invention provides pore monomer conjugates comprising a CsgG pore monomer attached to a CsgF peptide. The CsgF peptide is attached to the CsgG pore monomer at two or more positions, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more positions. The CsgF peptide is preferably covalently attached to the CsgG pore monomer at two or more positions, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more positions.
In the context of the invention, attachment at two or more positions mean two or more pairs of residues in the CsgF peptide and CsgG pore monomer are attached, preferably covalently attached, to each another. For instance, residue 1 in the CsgF peptide may be attached to residue 153 in the CsgG pore monomer (one pair of residues = 1 and 153) and residue 30 in the CsgF peptide may be attached to residue 196 in the CsgG pore monomer (a second pair of residues = 30 and 196).
SEQ ID NO: 3 shows the amino acid sequence of wild-type E. coli CsgG as a mature protein. The two or more positions in the CsgG pore monomer are preferably selected from residues 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in the CsgG pore monomer. The two or more positions in the CsgG pore monomer are preferably selected from residues corresponding to positions 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in SEQ ID NO: 3.
SEQ ID NO: 6 shows the amino acid sequence of wild-type E. coli CsgF as a mature protein. The two or more positions in the CsgF peptide are preferably selected from the N terminus and residues 1-35 in the CsgF peptide. The two or more positions in the CsgF peptide are preferably selected from the N terminus and residues corresponding to positions 1-35 in SEQ ID NO: 6. The N terminus is the amino group of the first residue in the CsgF peptide (i.e., residue 1). In this context, residue 1 refers to the side chain of residue 1 or the residue corresponding to position 1 in SEQ ID NO: 6. The two or more positions are preferably the following positions/residues in the CsgF peptide or the positions/residues in the CsgF peptide which correspond to the following positions in SEQ ID NO: 6: the N terminus and any one of 1-35, 1 and any of the N terminus and 2-35, 2 and any of the N terminus, 1 and 3-35, 3 and any of the N terminus, 1-2 and 4-35, 4 and any of the N terminus, 1-3 and 5-35, 5 and any of the N terminus, 1-4 and 6-35, 6 and any of the N terminus, 1-5 and 7-35, 7 and any of the N terminus, 1-6 and 8-35, 8 and any of the N terminus, 1-7 and 9-35, 9 and any of the N terminus, 1-8 and 10-35, 10 and any of the N terminus, 1-9 and 11-35, 11 and any of the N terminus, 1-10 and 12-35, 12 and any of the
N terminus, 1-11 and 13-35, 13 and any of the N terminus, 1-12 and 14-35, 14 and any of the N terminus, 1-13 and 15-35, 15 and any of the N terminus, 1-14 and 16-35, 16 and any of the N terminus, 1-15 and 17-35, 17 and any of the N terminus, 1-16 and 18-35, 18 and any of the N terminus, 1-17 and 19-35, 19 and any of the N terminus, 1-18 and 20-35, 20 and any of the N terminus, 1-19 and 21-35, 21 and any of the N terminus, 1-20 and 22-35, 22 and any of the N terminus, 1-21 and 23-35, 23 and any of the N terminus, 1-22 and 24- 35, 24 and any of the N terminus, 1-23 and 25-35, 25 and any of the N terminus, 1-24 and 26-35, 26 and any of the N terminus, 1-25 and 27-35, 27 and any of the N terminus, 1-26 and 28-35, 28 and any one of the N terminus, 1-27 and 29-35, 29 and any of the N terminus, 1-28 and 30-35, 30 and any of the N terminus, 1-29 and 31-35, 31 and any of the N terminus, 1-30 and 32-35, 32 and any of the N terminus, 1-31 and 33-35, 33 and any of the N terminus, 1-32 and 34-35, 34 and any of the N terminus, 1-33 and 35, and 35 and any of the N terminus and 1-34.
Preferred combinations of two or more positions are shown in the table below. Each column shows positions/residues in the CsgF peptide and CsgG pore monomer or positions in SEQ ID NO: 6 and SEQ ID NO: 3 to which the positions/residues in the CsgF peptide and CsgG pore monomer correspond. The two or more positions may be any two or more of the rows in the table. In each row, the position/residue in the CsgF peptide or the position/residue corresponding to the position in SEQ ID NO: 6 may be attached, preferably covalently attached, to any of the listed positions/residues in the CsgG pore monomer or to any position/residue corresponding to listed positions in SEQ ID NO: 3. For instance, in the third row, position 2 of the CsgF peptide or the residue corresponding to position 2 in SEQ ID NO;
6 is preferably attached, more preferably covalently attached, to any of positions 47-54, 57, 59, 60, 130-134, 136, 151, 153, 155, 181, 183, 185, 207, 209, 211-212 in the CsgG pore monomer or a residue corresponding to any of positions 47-54, 57, 59, 60, 130-134, 136, 151, 153, 155, 181, 183, 185, 207, 209, 211-212 in SEQ ID NO: 3.
Figure imgf000009_0001
Figure imgf000010_0001
One of the two or more attachments preferably comprises the N terminus of the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 153 of the CsgG pore monomer. One of the two or more attachments preferably comprises the N terminus of the CsgF peptide attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3.
One of the two or more attachments preferably comprises position 4 in the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 133 in the CsgG pore monomer. One of the two or more attachments preferably comprises the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 133 in SEQ ID NO: 3.
One of the two or more attachments preferably comprises position 4 in the CsgF peptide attached, preferably covalently attached, to a cysteine residue at position 153 of the CsgG pore monomer. One of the two or more attachments preferably comprises the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached, preferably covalently attached, to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3.
One of the two or more attachments preferably comprises any one of positions 30, 31, 32 and 33 in the CsgF peptide attached, preferably covalently attached, to any one of positions 193, 195, 196 and 197 in the CsgG pore monomer. One of the two or more attachments preferably comprises the residue in the CsgF peptide corresponding to any one of positions 30, 31, 32 and 33 in SEQ ID NO: 6 attached, preferably covalently attached, to the position in the CsgG pore monomer corresponding to any one of positions 193, 195, 196 and 197 in SEQ ID NO: 3.
Corresponding positions may be determined by standard techniques in the art. For example, the PILEUP and BLAST algorithms mentioned below can be used to align the sequence of a CsgG pore monomer with SEQ ID NO: 3 and hence to identify corresponding residues. The attachment at two or more positions preferably comprises one or more reactive groups which react with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer. The attachment at two or more positions preferably comprises a reaction between a position, residue, or linker in the CsgF peptide with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer. The attachment at all of the two or more positions preferably comprises one or more reactive groups which react with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer. The attachment at all of the two or more positions preferably comprises a reaction between a position, residue, or linker in the CsgF peptide with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer.
The lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine may be native to the CsgG pore monomer. The lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine may be introduced into the CsgG pore monomer, preferably by substitution or addition.
Reactive groups which react with lysine include, but are not limited to, maleimide, activated esters, anhydrides, carbonates, isocyanates, isothiocyanates, a range of other acylating and alkylating agents, oxidative coupling O-aminophenols, aldehydes, activated carbodiimides, ketenes, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to lysine using periodate oxidation, reductive amination, transamination, aniline/arylamine conjugation via oxidative coupling, azaelectrocyclization, iminoboronate formation, or conjugation of arene diazonium salts.
Reactive groups which react with cysteine include, but are not limited to, haloacetamides and other alpha-halocarbonyls, maleimides, acrylates, vinyl sulfones, vinylpyridines, epoxides, oxanorbornadienes, methylsulfonyl functioanlised heteroaromatic, allenes, allyl selenosulfate salts, perfluoroaromatic, thiol-ene and thiol-yne click chemistry, pyridyl dithiol, vinylsulfones, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to cysteine using strain-release alkylation, nickel(II)-catalyzed oxidative coupling, oxidative coupling with aminophenols, conjugation with allenes (in the presence of gold catalyst, or allyl selenosulfate salts), native chemical ligation, Pd-catalysed arylation/alkynylation, or allylation followed by cross-metathesis.
Reactive groups which react with tyrosine include, but are not limited to, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to tyrosine using oxidative conjugation of tyrosines including O-alkylation, hydrazone and oxime condensations, addition reactions with electron deficient alkynes such as alkynones, alkynoate, amide or esters, cyclic diazodicarboxamides, Pd catalysed alkylation, diazonium salts, or Mannich reaction with imines formed from aldehydes, cyclic diazodicarboxamides, modification with Rhodium carbenoids.
Reactive groups which react with serine or threonine include, but are not limited to, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may also be attached to serine or threonine using periodate oxidation and subsequent transimination reactions of ketones/aldehydes with hydrazides/alkoxyamines. Resultant aldehydes/ketones may also modified through aldol ligation.
Positions, residues, or linkers may be attached to proline using oxidative coupling with O- aminophenols at N-terminus.
Reactive groups which react with tryptophan include, but are not limited to, aldehydes, ketones, and tetrazoles. Positions, residues, or linkers may be attached to tryptophan using a condensation reaction, modification with Rhodium carbenoids, conjugation with N/O centred radicals, and N-terminal Trp modification using Pictet-Spengler reaction.
Positions, residues, or linkers may be attached to arginine using condensation with a, [3- dicarbonyl compounds.
Reactive groups which react with histidine include, but are not limited to, vinylsulfones, sulfonyl halides, fluorosulfates, and sulfonyl triazoles. Positions, residues, or linkers may be attached to hisitidine using C2 alkylation and N3 alkylation/thiophosphorylation.
Positions, residues, or linkers may be attached to methionine using S-alkylation/imidation.
Positions, residues, or linkers may be attached to phenylalanine using modification with Rhodium carbenoids.
The attachment at two or more positions preferably comprises one or more reactive groups which react with any amino acid in the CsgG pore monomer. The attachment at all of the two or more positions preferably comprises one or more reactive groups which react with any amino acid in the CsgG pore monomer. Reactive groups which react with any amino acid include, but are not limited to, activated esters, anhydrides, carbonates, isocyanates, isothiocyanates, and a range of other acylating and alkylating agents, oxidative coupling O- aminophenols, aldehydes, activated carbodiimides, ketenes, transamination, and vinylboronic acids. The attachment at two or more positions preferably comprises reacting a position, residue, or linker in the CsgF peptide with any amino acid in the CsgG pore monomer. The attachment at all of the two or more positions preferably comprises reacting a position, residue, or linker in the CsgF peptide with any amino acid in the CsgG pore monomer. Positions, residues, or linkers may be attached to any amino acid using periodate oxidation, or reductive amination.
The attachment at two or more positions preferably comprises one or more reactive groups which undergo click chemistry. The attachment at all of the two or more positions preferably comprises one or more reactive groups which undergo click chemistry. Suitable click chemistries include, but are not limited to, CuAAC Azide/alkyne, staudinger ligation, strain- promoted azide-alkyne cycloaddition, inverse-electron demand Diels-Alder reaction between 1,2,4,5-tetrazines and strained alkenes.
All of the discussion above with reference to reactive groups and reactions for attaching the CsgF peptide to residues/amino acids in the CsgG pore monomer equally applies to attaching the CsgG pore monomer to the CsgF peptide. Any of the reactive groups or reactions may be used for attachment in the CsgF peptide. Specific residues in the CsgF peptide may be native to the protein. Specific residues may also be introduced into the CsgF peptide, preferably by substitution or addition. The skilled person is capable of attaching, preferably covalently attaching, two proteins at two or more positions.
The attachment at two or more positions preferably comprises two or more versions of the same or similar reactive groups, such as maleimide. The attachment at two or more positions preferably comprises two or more versions of the same or similar reaction. The attachment at two or more positions preferably comprises two or more maleimide- containing linkers. The attachment at two or more positions preferably comprises two or more maleimide reactions. Any of the maleimide groups and linkers discussed above may be used.
The attachment at two or more positions preferably comprises two or more different reactive groups. The attachment at two or more positions preferably comprises two or more different reactions. The two or more reactive groups or reactions may be any of those discussed above in relation to the CsgF peptide and/or the CsgG pore monomer.
The CsgF peptide is preferably attached to the CsgG pore monomer using two or more linkers, such as 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more linkers. The two or more linkers may be the same. The two or more linkers may be different. The skilled person is capable of designing two or more linkers for use in the invention. The two or more linkers may be any of the linkers discussed below with reference to the constructs of the invention. The two or more linkers preferably comprise or consist of a linear carbon chain of 2, 3, 4, 5, 6 or more carbon atoms and/or saturated or unsaturated cyclic groups containing 3, 5 or 6 carbon atoms. One or more of, such as all of the, two or more linkers are preferably a maleimide-containing linker. The maleimide group may be used to react with cysteine in the CsgF peptide and/or the CsgG pore monomer. The maleimide-containing linker preferably comprises or consists of a maleimide group and a linear carbon chain of 2, 3, 4, 5, 6 or more carbon atoms. The linear carbon chain is typically attached to the nitrogen atom in the maleimide group. The linear carbon chain also preferably comprises a terminal carboxyl group. This carboxyl group is capable of forming an amide bond with an amino acid in the CsgF peptide. The maleimide-containing linker is preferably maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid. Any combination of these linkers may be used in the two or more linkers. The maleimide-containing linker is most preferably maleimidopropionic acid.
The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 3.00 nm, such as less than about 2.90nm, less than about 2.80 nm, less than about 2.70 nm, less than about 2.60 nm, less than about 2.50 nm, less than about 2.40 nm, less than about 2.30 nm, less than about 2.20 nm, less than about 2.10, less than about 2.00 nm, less than about 1.90 nm, less than about 1.80 nm, less than about 1.70 nm, less than about 1.60 nm, less than about 1.50 nm, less than about 1.40 nm, less than about 1.30 nm, less than about 1.20 nm, less than about 1.10 nm, less than about 1.00 nm, less than about 0.90 nm, less than about 0.80 nm, less than about 0.70 nm, less than about 0.60 nm, less than about 0.50 nm, or less than about 0.40 nm. This distance/length can be achieved using any of specific maleimide-containing linkers discussed above, including maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid. The linker is most preferably maleimidopropionic acid.
The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 1.20 nm. This distance/length can be achieved using maleimidohexanonic acid as discussed in more detail above. The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably less than about 0.8 nm. This distance/length can be achieved using maleimidopropionic acid as discussed above.
The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.40 nm to about 3.00 nm, such as about 0.45 nm to about 2.80 nm, from about 0.50 nm to about 2.50 nm, from about 0.55 nm to about 2.20 nm, from about 0.60 nm to about 2.00 nm, from about 0.65 nm to about 1.50 nm, from about 0.70 nm to about 1.40 nm, from about 0.75 nm to about 1.30 nm, from about 0.80 nm to about 1.20 nm, from about 0.85 nm to about 1.10 nm and from about 0.90 nm to about 1.00 nm. The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.50 nm to about 1.50 nm. The distance between the CsgG pore monomer and the CsgF peptide in the pore monomer conjugate and/or the length of the linker is preferably from about 0.60 nm to about 1.20 nm. This distance/length can be achieved using any of specific maleimide-containing linkers discussed above, including maleimidoacetic acid, maleimidopropionic acid, maleimidobutyric acid, maleimidopentanoic acid or maleimidohexanonic acid. The linker is most preferably maleimidopropionic acid.
The pore monomer conjugates of the invention are capable of forming a pore or a pore complex. This can be measured using routine methods, including any of those described in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241 and WO 2019/002893 (all incorporated by reference herein in their entirety) and in the Example.
CsgG pore monomer
A CsgG pore monomer is a monomer that is capable of forming a CsgG pore. Such monomers are known in the art, especially from WO 2019/002893 (incorporated by reference herein in its entirety). The CsgG pore preferably comprises one or more of (a) a cap region, (b) a constriction region, and (c) a transmembrane beta barrel region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c). The CsgG pore monomer preferably comprises one or more of (a) a cap forming region, (b) a constriction forming region, and (c) a transmembrane beta barrel forming region, such as (a), (b), (c), (a) and (b), (a) and (c), (b) and (c), or (a), (b) and (c). The residues of SEQ ID NO: 3 which form these regions are defined below. The CsgG pore formed by the monomer may have any structure but preferably has or comprises the structure of the wild-type CsgG pore (Figure 1). The protein structure of CsgG defines a channel or hole that allows the translocation of molecules and ions from one side of the membrane to the other.
The "constriction", "orifice", "constriction region", "channel constriction", or "constriction site", as used interchangeably herein, refers to an aperture defined by a luminal surface of a pore or pore complex, which acts to allow the passage of ions and target molecules (e.g., but not limited to polynucleotides or individual nucleotides) but not other non-target molecules through the pore or pore complex channel. The constriction(s) are typically the narrowest aperture(s) within a pore or pore complex or within the channel defined by the pore or pore complex. The constriction(s) may serve to limit the passage of molecules through the pore. The size of the constriction is typically a key factor in determining suitability of a pore or pore complex for analyte characterisation. If the constriction is too small, the molecule to be characterised will not be able to pass through. However, to achieve a maximal effect on ion flow through the channel, the constriction should not be too large. For example, the constriction should not be wider than the solvent-accessible transverse diameter of a target analyte. Ideally, any constriction should be as close as possible in diameter to the transverse diameter of the analyte passing through. The CsgF peptide and the CsgG pore monomer typically each provide at least one constriction such that the pore complex of the invention comprises two or more constrictions.
The CsgG pore may be any size but preferably has the dimensions of the wild-type CsgG pore (Figure 1). The CsgG pore preferably has an external diameter of from about 100 to about 150 A at its widest point, such as from about 110 to about 140 A or from about 115 to about 125 A at its widest point. The CsgG pore preferably has an external diameter of about 120 A at its widest point. The CsgG pore preferably has a total length of from about 80 to about 120 A, such as from about 90 to about 110 A or from about 95 to about 105 A. The CsgG pore preferably has a total length of about 98 A. References to "total length" and "length" relate to the length of the pore or pore region when viewed from the side (see, e.g., the side view in Figure 1).
The cap region preferably has a length of from about 20 to about 60 A, such as from about 30 to about 50 A or from about 35 to about 45 A. The cap region preferably has a length of about 39 A. The channel defined by the cap region preferably has an opening of from about 45 to about 85 A in diameter, such as from about 55 to about 75 A or from about 60 to about 70 A in diameter. The channel defined by the cap region preferably has an opening of about 66 A in diameter. The channel defined by the cap region is preferably from about 30 to about 70 A in diameter at its narrowest point, such as from about 35 to about 60 A or from about 40 to about 50 A in diameter at its narrowest point. The channel defined by the cap region is preferably about 43 A in diameter at its narrowest point.
The constriction region preferably has a length of from about 5 to about 40 A, such as from about 10 to about 30 A or from about 15 to about 25 A. The constriction region preferably has a length of about 20 A. The channel defined by the constriction region is preferably from about 2 to about 40 A in diameter at its narrowest point, such as from about 5 to about 35 A, from about 8 to about 25 A or from about 10 to about 20 A in diameter at its narrowest point. The channel defined by the constriction region is preferably about 9 A or 12 A in diameter. The channel defined by the constriction region is preferably about 18.5 A in diameter. The constriction is preferably from about 2 to about 40 A in diameter, such as from about 5 to about 35 A, from about 8 to about 25 A or from about 10 to about 20 A in diameter. The constriction is preferably about 9 A or 12 A in diameter. The constriction is preferably about 12 A in diameter. The transmembrane beta barrel region preferably has a length of from about 20 to about 60 A, such as from about 30 to about 50 A or from about 35 to about 45 A. The transmembrane beta barrel preferably has a length of about 39 A. The channel defined by the transmembrane beta barrel region is preferably from about 35 to about 75 A in diameter at its narrowest point, such as from about 45 to about 65 A or from about 50 to about 60 A in diameter at its narrowest point. The channel defined by the transmembrane beta barrel region is preferably about 55 A in diameter at its narrowest point.
All of the measurements above are based on measuring from backbone to backbone of the amino acids forming the different regions (as shown in Figure 1).
SEQ ID NO: 3 shows the sequence of wild-type E. coli CsgG as a mature protein. Residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3 form the cap region. Residues 42 to 63 of SEQ ID NO: 3 form the constriction region. Residues 132 to 155 and 181 to 211 of SEQ ID NO: 3 form the transmembrane beta barrel region.
The CsgG pore monomer is preferably a variant of SEQ ID NO: 3. The variant CsgG momomer may also be referred to as a modified CsgG pore monomer or a mutant CsgG pore monomer. The modifications, or mutations, in the variant include but are not limited to any one or more of the modifications disclosed herein, or combinations of said modifications. The CsgG pore monomer may be a CsgG homologue monomer. A CsgG homologue monomer is a polypeptide that has at least 40%, 50%, 60%, 70%, 80%, 85%, 90%, 95% or 99% complete sequence identity to wild-type E. coli CsgG as shown in SEQ ID NO: 3. A CsgG homologue is also referred to as a polypeptide that contains the PFAM domain PF03783, which is characteristic for CsgG-like proteins. A list of presently known CsgG homologues and CsgG architectures can be found at
Figure imgf000017_0001
Over the entire length of the amino acid sequence of SEQ ID NO: 3, a variant will preferably be at least 40% homologous to that sequence based on amino acid identity. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 3 over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 3, a variant will preferably be at least 40% identical to that sequence. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 3 over the entire sequence. Sequence identity can also relate to a fragment or portion of the CsgG pore monomer. Hence, a sequence may have less than 40% overall sequence homology/identity with SEQ ID NO: 3, but the sequence of a particular region, domain or subunit could share at least 80%, 90%, or as much as 99% sequence homology/identity with the corresponding region of SEQ ID NO: 3. There may be at least 80%, for example at least 85%, 90% or 95%, amino acid identity over a stretch of 100 or more, for example 125, 150, 175 or 200 or more, contiguous amino acids ("hard homology"). The CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the cap region of SEQ ID NO: 3 (residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262). More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3. The variant preferably comprises a sequence that is at least 40% identical to residues 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3. More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 1 to 41, 64 to 131, 156 to 180 and 212 to 262 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the cap region.
The CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the constriction region of SEQ ID NO: 3 (residues 42 to 63). More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 42 to 63 of SEQ ID NO: 3. The variant preferably comprises a sequence that is at least 40% identical to residues 42 to 63 of SEQ ID NO: 3. More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 42 to 63 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the constriction region.
The CsgG pore monomer is preferably a variant of SEQ ID NO: 3 comprising a sequence that is at least 40% homologous to the transmembrane beta barrel region of SEQ ID NO: 3 (residues 132 to 155 and 181 to 211). More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to residues 132 to 155 and 181 to 211 of SEQ ID NO: 3. The variant preferably comprises a sequence that is at least 40% identical to residues 132 to 155 and 181 to 211 of SEQ ID NO: 3. More preferably, the variant may comprise a sequence that is at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to residues of 132 to 155 and 181 to 211 of SEQ ID NO: 3. Homology and/or identity is typically measured over the entire length of the transmembrane beta barrel region.
CsgG pore monomers are highly conserved (as can be readily appreciated from Figures 45 to 47 of WO 2017/149317). Furthermore, from knowledge of the mutations in relation to SEQ ID NO: 3 it is possible to determine the equivalent positions for mutations of CsgG pore monomers other than that of SEQ ID NO: 3.
Thus, reference to a mutant CsgG pore monomer comprising a variant of the sequence as shown in SEQ ID NO: 3 and specific amino-acid mutations thereof as set out in the claims and elsewhere in the specification also encompasses a mutant CsgG pore monomer comprising a variant of any of the sequences shown in SEQ ID NOs: 68 to 88 of WO 2019/002893 (incorporated by reference herein in its entirety) and corresponding aminoacid mutations thereof. The CsgG pore monomer may also be any of the sequences shown in CN 113773373 A, CN 113896776 A, CN 113912683 A, and CN 113754743 A or a variant thereof. It will further be appreciated that the invention extends to other variant CsgG pore monomers not expressly identified in the specification that show highly conserved regions.
Standard methods in the art may be used to determine homology. For example, the UWGCG Package provides the BESTFIT program which can be used to calculate homology, for example used on its default settings (Devereux et al (1984) Nucleic Acids Research 12, p387-395). The PILEUP and BLAST algorithms can be used to calculate homology or line up sequences (such as identifying equivalent residues or corresponding sequences (typically on their default settings)), for example as described in Altschul S. F. (1993) J Mol Evol 36:290- 300; Altschul, S.F et al (1990) J Mol Biol 215:403-10. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/).
SEQ ID NO: 3 is the wild-type CsgG pore monomer from Escherichia coli Str. K-12 substr. MC4100. A variant of SEQ ID NO: 3 may comprise any of the substitutions present in another CsgG homologue. Preferred CsgG homologues are shown in SEQ ID NOs: 68 to 88 of WO 2019/002893 (incorporated by reference herein in its entirety). The variant may comprise combinations of one or more of the substitutions present in SEQ ID NOs: 68 to 88 WO 2019/002893 (incorporated by reference herein in its entirety) compared with SEQ ID NO: 3, including one or more substitutions, one or more conservative mutations, one or more deletions or one or more insertion mutations, such as deletion or insertion of 1 to 10 amino acids, such as of 2 to 8 or 3 to 6 amino acids.
The CsgG pore monomer in the pore monomer conjugate of the invention typically retains the ability to form the same 3D structure as the wild-type CsgG pore monomer, such as the same 3D structure as a CsgG pore monomer having the sequence of SEQ ID NO: 3. The 3D structure of CsgG is known in the art and is disclosed, for example, in Goyal et al (2014) Nature 516(7530):250-3. Any number of mutations may be made in the wild-type CsgG sequence in addition to the mutations described herein provided that the CsgG pore monomer retains the improved properties imparted on it by the mutations of the present invention.
Typically, the CsgG pore monomer will retain the ability to form a structure comprising five alpha-helices and five beta-strands. Therefore, it is envisaged that further mutations may be made in any of these regions in any CsgG pore monomer without affecting the ability of the monomer to form a pore that can translocate polynucleotides. It is also expected that deletions of one or more amino acids can be made in any of the loop regions linking the alpha helices and beta-strands and/or in the N-terminal and/or C-terminal regions of the CsgG pore monomer without affecting the ability of the monomer to form a pore that can translocate polynucleotides.
Amino acid substitutions may be made to the amino acid sequence of SEQ ID NO: 3 in addition to those discussed above, for example up to 1, 2, 3, 4, 5, 10, 20 or 30 substitutions. Conservative substitutions replace amino acids with other amino acids of similar chemical structure, similar chemical properties, or similar side-chain volume. The amino acids introduced may have similar polarity, hydrophilicity, hydrophobicity, basicity, acidity, neutrality or charge to the amino acids they replace. Alternatively, the conservative substitution may introduce another amino acid that is aromatic or aliphatic in the place of a pre-existing aromatic or aliphatic amino acid. Conservative amino acid changes are well- known in the art.
The CsgG pore monomer may be modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more non-native amino acids, one or more polar amino acids, or one or more photoreactive amino acids. Any number and combination of such introductions may be made. The introduction is preferably by substitution or addition.
One or more amino acid residues of the amino acid sequence of SEQ ID NO: 3 may additionally be deleted from the polypeptides described above. Up to 1, 2, 3, 4, 5, 10, 20 or 30 or more residues may be deleted. Variants may include fragments of SEQ ID NO: 3. Such fragments retain pore forming activity. Fragments may be at least 50, at least 100, at least 150, at least 200 or at least 250 amino acids in length. Such fragments may be used to produce the pores. A fragment preferably comprises the transmembrane beta barrel region of SEQ ID NO: 3, namely residues 132 to 155 and 181 to 211, or a variant thereof as discussed above.
One or more amino acids may be alternatively or additionally added to the polypeptides described above. An extension may be provided at the amino terminal or carboxy terminal of the amino acid sequence of SEQ ID NO: 3 or polypeptide variant or fragment thereof. The extension may be quite short, for example from 1 to 10 amino acids in length. Alternatively, the extension may be longer, for example up to 50 or 100 amino acids. A carrier protein may be fused to an amino acid sequence according to the invention. Other fusion proteins are discussed in more detail below.
A variant of SEQ ID NO: 3 is a polypeptide that has an amino acid sequence which varies from that of SEQ ID NO: 3 and which retains its ability to form a pore. A variant typically contains the regions of SEQ ID NO: 3 that are responsible for pore formation. The pore forming ability of CsgG, which contains a p-barrel, is provided by p-strands in the transmembrane beta barrel region of each monomer. A variant of SEQ ID NO: 3 typically comprises the region in SEQ ID NO: 3 that forms p-strands, namely residues 132 to 155 and 181 to 211, or a variant thereof as discussed above. One or more modifications can be made to the region of SEQ ID NO: 3 that form p-strands as long as the resulting variant retains its ability to form a pore.
The one or more modifications in the CsgG pore monomer preferably improve the ability of a pore complex comprising the pore monomer to characterise an analyte. For example, modifications/mutations/substitutions are contemplated to alter the number, size, shape, placement or orientation of the constriction within a channel from the pore monomer conjugate of the invention. The CsgG pore monomer or the variant of SEQ ID NO: 3 may have any of the particular modifications or substitutions disclosed in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (all incorporated by reference herein in their entirety).
Preferred modifications or substitutions in SEQ ID NO: 3 include, but are not limited to, one or more of, such as 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more or all of:
(a) a substitution at position Y51, such as Y51I, Y51L, Y51A, Y51V, Y51T, Y51S, Y51Q or Y51N;
(b) a substitution at position N55, such as N55I, N55L, N55A, N55V, N55T, N55S or N55Q; (c) a substitution at position F56, such as F56I, F56L, F56A, F56V, F56T, F56S, F56Q or F56N;
(d) a substitution at position L90, such as L90N, L90D, L90E, L90R or L90K;
(e) a substitution at position N91, such as N91D, N91E, N91R or N91K;
(f) a substitution at position K94, such as K94R, K94F, K94Y, K94Q, K94W, K94L, K94S or K94N;
(g) a substitution at position R192, such as R192Q, R192F, R192S R192D, or R192T; and
(i) a substitution at position C215, such as C215T, C215S, C215I, C215L, C215A, C215V, or C215G.
The variant of SEQ ID NO: 3 may further comprise a deletion of one or more positions, such as a deletion of T104-N109, a deletion of F193-L199 or a deletion of F195-L199.
Any number of the CsgG pore monomers in the pore or pore complex of the invention, such as 6, 7, 8, 9 or 10, may be a variant of SEQ ID NO: 3. All six to ten monomers in the pore or pore complex are preferably variants of SEQ ID NO: 3. The variants in the pore complex may be the same or different. The variants are preferably identical in each pore monomer conjugate in the pore complex of the invention.
CsqF peptide
The term "CsgF peptide" preferably defines a CsgF peptide that has been truncated from its C-terminal end (i.e., is an N-terminal fragment). The CsgF peptide may be a fragment of wild-type E. coli CsgF (SEQ ID NO: 5 or SEQ ID NO: 6), or of a wild-type homologue of E. coli CsgF, such as for example, a peptide comprising any one of the amino acid sequences shown in WO 2019/002893 (incorporated by reference herein in its entirety). A CsgF homologue is referred to as a polypeptide that has at least 40%, 50%, 60%, 70%, 80%, 90%, 95% or 99% complete sequence identity to wild-type E. coli CsgF as shown in SEQ ID NO: 6. A CsgF homologue may also referred to as a polypeptide that contains the PFAM domain PF10614, which is characteristic for CsgF-like proteins. A list of presently known CsgF homologues and CsgF architectures can be found at
Figure imgf000022_0001
Mature CsgF (shown in SEQ ID NO:6) can be divided into three main regions: a "CsgF constriction peptide" (FCP), a "neck" region and a "head" region. The "head" region of the CsgF peptide is distinct from a constriction of a pore as described herein. The "head" region of the CsgF peptide may also be referred to as the "C-terminal head domain". The structure of CsgF is discussed in detail in WO 2019/002893 (incorporated by reference herein in its entirety). The CsgF peptide used in the pore monomer conjugate of the invention is preferably a truncated CsgF peptide lacking the C-terminal head; lacking the C-terminal head and a part of the neck domain of CsgF (e.g., the truncated CsgF peptide may comprise only a portion of the neck domain of CsgF); or lacking the C-terminal head and neck domains of CsgF. The CsgF peptide may lack part of the CsgF neck domain, e.g. the CsgF peptide may comprise a portion of the neck domain, such as for example, from amino acid residue 36 at the N- terminal end of the neck domain (see SEQ ID:NO:6) (e.g. residues 36-40, 36-41, 36-42, 36-43, 36-45,36-46 up to residues 36-50 or 36-60 of SEQ ID NO: 6). The CsgF peptide preferably comprises a CsgG-binding region and a region that forms a constriction in the pore. The CsgG-binding region typically comprises residues 1 to 11 and/or 29 to 32 of the CsgF protein (SEQ ID NO: 6 or a homologue from another species) and may include one or more modifications. The region that forms a constriction in the pore typically comprises residues 9 to 28 of the CsgF protein (SEQ ID NO: 6 or a homologue from another species) and may include one or more modifications. Residues 9 to 17 comprise the conserved motif N9PXFGGXXX17 and form a turn region. Residues 9 to 28 form an alpha-helix. Xi7 (N17 in SEQ ID NO: 6) forms the apex of the constriction region, corresponding to the narrowest part of the CsgF constriction in the pore. The CsgF constriction region also makes stabilising contacts with the CsgG beta-barrel, primarily at residues 98, 9, 10, 11, 12, 18, 21, 22, 29 and 30 of SEQ ID NO: 6.
The CsgF peptide typically has a length of from 28 to 50 amino acids, such as 29 to 49, 30 to 45 or 32 to 40 amino acids. Preferably the CsgF peptide comprises from 29 to 35 amino acids, or 29 to 45 amino acids. The CsgF peptide comprises all or part of the FCP, which corresponds to residues 1 to 35 of SEQ ID NO: 6. Where the CsgF peptide is shorter that the FCP, the truncation is preferably made at the C-terminal end.
The CsgF peptide may have a length of 24, 25, 26, J , 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54 or 55 amino acids.
The CsgF peptide may comprise the amino acid sequence of SEQ ID NO: 6 from residue 1 up to any one of residues 25 to 60, such as 27 to 50, for example, 28 to 45 of SEQ ID NO: 6, or the corresponding residues from a homologue of SEQ ID NO: 6, or variant of either thereof. More specifically, the CsgF peptide may comprise residues 1 to 29 of SEQ ID NO: 6, or a homologue or variant thereof.
The CsgF peptide is preferably a truncated CsgF peptide lacking one or more amino acids from CsgF shown in SEQ ID NO: 6. The CsgF peptide is preferably a truncated CsgF peptide lacking a stretch of amino acids starting at any one of positions 15-35 and finishing at position 119 of SEQ ID NO: 6. The CsgF peptide is preferably a truncated CsgF peptide lacking amino acids 15-119, 16-119, 17-119, 18-119, 19-119, 20-119, 21-119, 22-119, 23- 119, 24-119, 25-119, 26-119, 27-119, 28-119, 29-119, 30-119, 31-119, 32-119, 33-119, 34-119, or 35-119 from SEQ ID NO: 6.
Examples of such CsgF peptides comprises, consist essentially of, or consist of residues 1 to 34 of SEQ ID NO: 6, residues 1 to 30 of SEQ ID NO: 6, residues 1 to 45 of SEQ ID NO: 6, or residues 1 to 35 of SEQ ID NO: 6 and homologues or variants of any thereof.
In the CsgF peptide, one or more residues may be modified. For example, the CsgF peptide may comprise a modification at a position corresponding to one or more of the following positions in SEQ ID NO: 6: Gl, T4, F5, R8, N9, Nil, F12, N17, A20, N24, A26, Q27 and Q29.
The CsgF peptide may be modified to introduce one or more cysteines, one or more hydrophobic amino acids, one or more charged amino acids, one or more non-native amino acids, one or more polar amino acids, or one or more photoreactive amino acids, for example at a position corresponding to one or more of the following positions in SEQ ID NO: 6: Gl, T4, F5, R8, N9, Nil, F12, A26 and Q29. Any number and combination of such introductions may be made. The introduction is preferably by substitution or addition.
For example, the CsgF peptide may comprise a modification at a position corresponding to one or more of the following positions in SEQ ID NO: 6: N15, N17, A20, N24 and A28. The CsgF peptide may comprise a modification at a position corresponding to D34 to stabilise the CsgG-CsgF complex. The CsgF peptide may comprise one or more of the substitutions: N 15S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, N 17S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, A20S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E, N24S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E, A28S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E and D34F/Y/W/R/K/N/Q/C/E. The CsgF peptide may, for example, comprise one or more of the following substitutions: G1C, T4C, N17S, and D34Y or D34N.
The CsgF peptide may be produced by cleavage of a longer protein, such as full-length CsgF using an enzyme. Cleavage at a particular site may be directed by modifying the longer protein, such as full-length CsgF, to include an enzyme cleavage site at an appropriate position. Examples of CsgF amino acid sequences that have been modified to include such enzyme cleavage sites are shown in SEQ ID NOs: 56 to 67 of WO 2019/002893 (incorporated by reference herein in its entirety). Following cleavage all or part of the added enzyme cleavage site may be present in the CsgF peptide that associates with CsgG to form a pore. Thus, the CsgF peptide may further comprise all or part of an enzyme cleavage site at its C-terminal end.
Some examples of suitable CsgF peptides are shown in Table 3 of WO 2019/002893 (incorporated by reference herein in its entirety). The CsgF peptide is preferably a variant of any of the CsgF sequences discussed above, including SEQ ID NO: 6, comprising one or more modifications compared with the comparative sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 6, a variant will preferably be at least 40% homologous to that sequence based on amino acid identity. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% homologous based on amino acid identity to the amino acid sequence of SEQ ID NO: 6 over the entire sequence. Over the entire length of the amino acid sequence of SEQ ID NO: 6, a variant will preferably be at least 40% identical to that sequence. More preferably, the variant may be at least 45%, at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90% and more preferably at least 95%, 97% or 99% identical to SEQ ID NO: 6 over the entire sequence. There may be at least 80%, for example at least 85%, 90% or 95%, amino acid identity over a stretch of 100 or more, for example 125, 150, 175 or 200 or more, contiguous amino acids ("hard homology"). These levels of homology/identity equally apply to any of the other CsgF peptides described above.
Any number of the CsgF peptides in the pore or pore complex of the invention, such as 6, 7, 8, 9 or 10, may contain one or more substitutions compared with SEQ ID NO: 6. All six to ten monomers in the pore or pore complex preferably contain one or more substitutions compared with SEQ ID NO: 6. The CsgF peptides in the pore complex may be the same or different. The CsgF peptides are preferably identical in each pore monomer conjugate in the pore complex of the invention.
Stabilisation and other mutations
In the pore complex of the invention, the interaction between the CsgF peptide and the CsgG pore may, for example, be stabilised by hydrophobic interactions and/or electrostatic interactions. These may be interactions between one or more of the following pairs of positions of SEQ ID NO: 6 and SEQ ID NO: 3, respectively: 1 and 153, 4 and 133, 5 and 136, 8 and 187, 8 and 203, 9 and 203, 11 and 142, 11 and 201, 12 and 149, 12 and 203, 26 and 191, and 29 and 144.
The residues in the CsgF peptide and/or the CsgG pore monomer at one or more of the positions listed above may be modified in order to enhance the interaction between CsgG and CsgF in the pore complex. Although the CsgG:CsgF complex is very stable, when CsgF is truncated, the stability of CsgG:CsgF complexes decrease compared to a complex comprising full length CsgF. Therefore, disulfide bonds can be made between CsgG and CsgF to make the complex more stable, for example following introduction of cysteine residues at the positions identified herein. The pore complex can be made in any of the previously mentioned methods and disulfide bond formation can be induced by using oxidising agents (eg: Copper-orthophenanthroline). Other interactions (eg: hydrophobic interactions, charge-charge interactions/electrostatic interactions) can also be used in those positions instead of cysteine interactions.
Unnatural amino acids can also be incorporated in those positions. Covalent bonds may be by via click chemistry. For example, unnatural amino acids with azide or alkyne or with a di benzocyclooctyne (DBCO) group and/or a bicyclo[6.1.0]nonyne (BCN) group may be introduced at one or more of these positions.
Such stabilising mutations can be combined with any other modifications to CsgG and/or CsgF, for example the modifications disclosed herein.
To facilitate such interactions, one or more non-native or photoreactive amino acids may be included/substituted in the CsgG pore monomer at one or more positions corresponding to one or more of positions 132, 133, 136, 138, 140, 142, 144, 145, 147, 149, 151, 153, 155, 183, 185, 187, 189, 191, 201, 203, 205, 207 and 209 of SEQ ID NO: 3.
To facilitate such interactions, one or more non-native reactive or photoreactive amino acids may be included/substituted at one or more positions corresponding to one or more of positions 1, 4, 5, 8, 9, 11, 12, 26 or 29 of SEQ ID NO: 6.
Preferred exemplary CsgF peptides comprise the following mutations relative to SEQ ID NO: 6: N15X1/N17X2/A20X3/N24X4/A28X5/D34X6, wherein Xt is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, X2 is N/S/A/T/Q/G/L/V/I/F/Y/W/R/K/D/C/E, X3 is A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E, X4 is N/S/T/Q/A/G/L/V/I/F/Y/W/R/K/D/C/E, X5 is A/S/T/Q/N/G/L/V/I/F/Y/W/R/K/D/C/E and X6 is D/F/Y/W/R/K/N/Q/C/E. The mutations at positions N15, N17, A20, N24 and A28 are constriction mutations and the mutation at position 34 affects the interaction of CsgF with the bottom of the CsgG pore monomer to stabilise the interaction.
Constructs
The invention also provides a construct comprising two or more covalently attached pore monomer conjugates of the invention. The construct may comprise 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more or 10 or more pore monomer conjugates of the invention. The construct may comprise at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9 or at least 10 pore monomer conjugates of the invention. The two or more pore monomer conjugates may be the same or different. The two or more pore monomer conjugates may differ based on one or more of (a) the sequence of the CsgG pore monomer, (b) the sequence of the CsgF peptide, (c) the linker, (d) the attachment position on the CsgG pore monomer, and (e) the attachment position on the CsgF peptide. The pore monomer conjugates may differ based on (a); (b); (c); (d); (e); (a) and (b); (a) and (c); (a) and (d); (a) and (e); (b) and (c); (b) and (d); (b) and (e); (c) and (d); (c) and (e); (d) and (e); (a), (b) and (c); (a), (b) and (d); (a), (b) and
(e); (a), (c) and (d); (a), (c) and (e); (a), (d) and (e); (b), (c) and (d); (b), (c) and (e);
(b), (d) and (e); (c), (d) and (e); (a), (b), (c) and (d); (a), (b), (c) and (e); (a), (b), (d) and (e); (a), (c), (d) and (e); (b), (c), (d) and (e); and (a), (b), (c), (d) and (e). The two or more pore monomer conjugates are preferably the same (i.e., identical).
The construct preferably comprises two pore monomer conjugates. The two or more pore monomer conjugates may be the same or different. The two or more pore monomer conjugates are preferably the same (i.e., identical).
The pore monomer conjugates may be genetically fused, optionally via a linker, or chemically fused, for instance via a chemical crosslinker. Methods for covalently attaching monomers are disclosed in WO 2017/149316, WO 2017/149317, and WO 2017/149318 (incorporated herein by reference in their entirety).
The linker is preferably an amino acid sequence and/or a chemical crosslinker. Suitable amino acid linkers, such as peptide linkers, are known in the art. The length, flexibility and hydrophilicity of the amino acid or peptide linker are typically designed such that the CsgF peptide forms a constriction in the pore complex of the invention. Preferred flexible peptide linkers are stretches of 2 to 20, such as 4, 6, 8, 10 or 16, serine and/or glycine amino acids. More preferred flexible linkers include (SG)i, (SG)2, (SG)3, (SG)4, (SG)5, (SG)8, (SG)i0, (SG)i5 or (SG)2O wherein S is serine and G is glycine. Preferred rigid linkers are stretches of 2 to 30, such as 4, 6, 8, 16 or 24, proline amino acids. More preferred rigid linkers include (P)i2 wherein P is proline.
Suitable chemical crosslinkers are well-known in the art. Suitable chemical crosslinkers include, but are not limited to, those including the following functional groups: maleimide, active esters, succinimide, azide, alkyne (such as dibenzocyclooctynol (DIBO or DBCO), difluoro cycloalkynes and linear alkynes), phosphine (such as those used in traceless and non-traceless Staudinger ligations), haloacetyl (such as iodoacetamide), phosgene type reagents, sulfonyl chloride reagents, isothiocyanates, acyl halides, hydrazines, disulfides, vinyl sulfones, aziridines and photoreactive reagents (such as aryl azides, diaziridines).
Reactions between amino acids and functional groups may be spontaneous, such as cysteine/maleimide, or may require external reagents, such as Cu(I) for linking azide and linear alkynes.
Linkers can comprise any molecule that stretches across the distance required. Linkers can vary in length from one carbon (phosgene-type linkers) to many Angstroms. Examples of linker molecules, include but are not limited to, are polyethyleneglycols (PEGs), polypeptides, polysaccharides, deoxyribonucleic acid (DNA), peptide nucleic acid (PNA), threose nucleic acid (TNA), glycerol nucleic acid (GNA), saturated and unsaturated hydrocarbons, polyamides. These linkers may be inert or reactive, in particular they may be chemically cleavable at a defined position, or may be themselves modified with a fluorophore or ligand. The linker is preferably resistant to reducing agents, such as dithiothreitol (DTT), following the covalent attachment of the CsgF peptide to the CsgG pore monomer.
Preferred crosslinkers include 2,5-dioxopyrrolidin-l-yl 3-(pyridin-2-yldisulfanyl)propanoate, 2,5-dioxopyrrolidin-l-yl 4-(pyridin-2-yldisulfanyl)butanoate and 2,5-dioxopyrrolidin-l-yl 8- (pyridin-2-yldisulfanyl)octananoate, di-maleimide PEG Ik, di-maleimide PEG 3.4k, di- maleimide PEG 5k, di-maleimide PEG 10k, bis(maleimido)ethane (BMOE), bis- maleimidohexane (BMH), 1,4-bis-maleimidobutane (BMB), 1,4 bis-maleimidyl-2,3- di hydroxybutane (BMDB), BM[PEO]2 (1,8-bis-maleimidodiethyleneglycol), BM[PEO]3 (1,11- bis-maleimidotriethylene glycol), tris[2-maleimidoethyl]amine (TMEA), DTME dithiobismaleimidoethane, bis-maleimide PEG3, bis-maleimide PEGU, DBCO-maleimide, DBCO-PEG4-maleimide, DBCO-PEG4-NH2, DBCO-PEG4-NHS, DBCO-NHS, DBCO-PEG-DBCO 2.8kDa, DBCO-PEG-DBCO 4.0kDa, DBCO-15 atoms-DBCO, DBCO-26 atoms-DBCO, DBCO- 35 atoms-DBCO, DBCO-PEG4-S-S-PEG3-biotin, DBCO-S-S-PEG3-biotin, DBCO-S-S-PEG11- biotin, (succinimidyl 3-(2-pyridyldithio)propionate (SPDP) and maleimide-PEG(2kDa)- maleimide (ALPHA, OMEGA-BIS-MALEIMIDO POLYETHYLENE GLYCOL)). The most preferred crosslinker is maleimide-propyl-SRDFWRS-(l,2-diaminoethane)-propyl-maleimide.
The linker is preferably resistant to dithiothreitol (DTT). Suitable linkers include, but are not limited to, iodoacetamide-based and maleimide-based linkers.
The pore monomer conjugates may be connected using two or more linkers each comprising a hybridizable region and a group capable of forming a covalent bond. The hybridizable regions in the linkers hybridize and link the CsgG pore monomer and CsgF peptide. The linked CsgG pore monomer and CsgF peptide are then coupled via the formation of covalent bonds between the groups. Any of the specific linkers disclosed in WO 2010/086602 (incorporated herein by reference in its entirety) may be used in accordance with the invention.
The linkers may be labelled. Suitable labels include, but are not limited to, fluorescent molecules (such as Cy3 or AlexaFluor®555), radioisotopes, e.g. 125I, 35S, 32P, enzymes, antibodies, antigens, polynucleotides and ligands such as biotin. Such labels allow the amount of linker to be quantified. The label could also be a cleavable purification tag, such as biotin, or a specific sequence to show up in an identification method, such as a peptide that is not present in the protein itself, but that is released by trypsin digestion. A preferred method of connecting the pore monomer conjugates is via cysteine linkage. This can be mediated by a bi-functional chemical crosslinker or by an amino acid linker with a terminal presented cysteine residue.
Another preferred method of attachment via 4-azidophenylalanine or Faz linkage. This can be mediated by a bi-functional chemical linker or by a polypeptide linker with a terminal presented 4-azidophenylalanine or Faz residue. Additional suitable linkers are discussed in more detail below.
Pore complexes of the invention
The term "pore complex", or "complex pore", as used interchangeably herein, refer to an oligomeric pore complex comprising at least one pore monomer conjugate of the invention (including, e.g., one or more pore monomer conjugates such as two or more pore monomer conjugates, three or more pore monomer conjugates etc.). The pore complex of the invention has the features of a biological pore, i.e., it has a typical protein structure and defines a channel. When the pore complex is provided in an environment having membrane components, membranes, cells, or an insulating layer, the pore complex will insert in the membrane or the insulating layer and form a "transmembrane pore complex".
The CsgG part of the pore complex of the invention (i.e., the part formed from the at least one CsgG pore monomer in the at least one conjugate of the invention) preferably has or comprises any of the structures and/or dimensions of the CsgG pores discussed above. The CsgG constriction in the pore complex of the invention preferably has or comprises any of the constriction diameters described above.
The at least one CsgF peptide (in the at least one pore monomer conjugate or construct) preferably forms a constriction in the pore complex. The at least one CsgF peptide is preferably inserted into the lumen of the pore complex. The invention relates to CsgG pores complexed with a CsgF peptide that introduces an additional channel constriction in the pore complex and surprisingly results in an increased current range and increased signal-to-noise ratio (SNR). The additional constriction introduced by complex formation with the CsgF peptides expands the contact surface with passing analytes and can act as a second constriction for analyte detection and characterization. Pores comprising the pore monomer conjugates of the invention can improve the characterisation of analytes, such as polynucleotides, providing a more discriminating direct relationship between the observed current as the polynucleotide moves through the pore. In particular, by having two stacked constrictions spaced at a defined distance, the pore complex may facilitate characterization of polynucleotides that contain at least one homopolymeric stretch, e.g., several consecutive copies of the same nucleotide that otherwise exceed the interaction length of the single CsgG constriction. Additionally, by having two stacked constrictions at a defined distance, small molecule analytes including organic or inorganic drugs and pollutants passing through the pore complex will consecutively pass the two constrictions. The chemical nature of either constriction can be independently modified, each giving unique interaction properties with the analyte, thus providing additional discriminating power during analyte detection.
The CsgF constriction formed in the pore complex preferably has a diameter in the range of from about 5 to about 20 A, such as from about 7 to about 18 A, from about 10 A to about 15 A or from about 11 to about 12 A. The additional CsgF peptide constriction may be about lOnm or less, such as about 5nm or less, such as about 1, 2, 3, 4, 5, 6, 7, 8, 9 nm from the constriction of the CsgG pore. Distances between the CsgF peptide and CsgG pore monomer are also discussed above with reference to the pore monomer conjugates of the invention.
The pore complex or transmembrane pore complex of the invention includes a pore complex with two constrictions, i.e., two channel constrictions positioned in such a way that one constriction does not interfere in the accuracy of the other constriction. Said pore complexes may include any of the mutations, CsgG pore monomers or CsgF peptides are described in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2019/002893, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (herein all incorporated by reference in their entirety). The pore complex or transmembrane pore complex of the invention includes a pore complex with one constriction. For instance, the constriction may be removed from the CsgG pore monomer in the conjugate of the invention such that the pore complex of the invention only contains one constriction provided by the CsgF peptide. The invention provides a pore complex comprising at least one pore monomer conjugate of the invention. The pore complex typically comprises at least 6, 7, 8, 9 or 10 pore monomer conjugates of the invention. The pore complex preferably comprises 8 or 9 pore monomer conjugates of the invention. The pore monomer conjugates are typically the same (i.e., identical).
The pore complex is preferably a homooligomer comprising 6 to 10, such as 6, 7, 8, 9 or 10, pore monomer conjugates of the invention. The pore monomer conjugates are typically identical. The pore complex preferably comprises 8 or 9 identical pore monomer conjugates of the invention. The pore monomer conjugates may be any of those discussed above.
The invention provides a pore complex comprising at least one construct of the invention. The pore complex typically comprises at least 1, 2, 3, 4 or 5 constructs of the invention. The pore complex comprises sufficient CsgG pore monomers to form a pore. For instance, an octameric pore may comprise (a) four constructs each comprising two pore monomer conjugates, (b) two constructs each comprising four pore monomer conjugates, (c) one construct comprising two pore monomer conjugates and six pore monomer conjugates that do not form part of a construct, (d) three constructs comprising two pore monomer conjugates and two pore monomer conjugates that do not form part of a construct, and (e) combinations thereof. Same and additional possibilities are provided for a nonameric pore for instance. Other combinations of constructs and monomers can be envisaged by the skilled person. One or more constructs of the invention may be used to form a pore complex for characterising, such as sequencing, polynucleotides. The pore complex preferably comprises 4 constructs of the invention each of which comprises two pore monomer conjugates. The constructs are typically the same (i.e., identical).
The pore complex is preferably a homooligomer comprising 1-5, such as 1, 2, 3, 4, 5, constructs of the invention. The constructs are typically the same (i.e., identical). The pore complex preferably comprises 4 identical constructs of the invention each of which comprises two pore monomer conjugate. The constructs may be any of those discussed above.
The CsgG pore monomers in the CsgG pore are preferably all approximately the same length or are the same length. The barrels of the CsgG pore monomers of the invention in the pore are preferably approximately the same length or are the same length. Length may be measured in number of amino acids and/or units of length.
The pore complex of the invention may be isolated, substantially isolated, purified or substantially purified. A pore complex of the invention is isolated or purified if it is completely free of any other components, such as lipids or other pores. A pore complex is substantially isolated if it is mixed with carriers or diluents which will not interfere with its intended use. For instance, a pore complex is substantially isolated or substantially purified if it is present in a form that comprises less than 10%, less than 5%, less than 2% or less than 1% of other components, such as block copolymers, lipids or other pores.
Alternatively, a pore complex of the invention may be present in a membrane. Suitable membranes are discussed below.
A pore complex of the invention may be present as an individual or single pore complex. Alternatively, a pore complex of the invention may be present in a homologous or heterologous population of two or more pore complexes or pores. Other formats involving the pore complexes of the invention are discussed in more detail below.
Multimeric pore complexes
The invention also provides a pore multimer comprising two or more pores, wherein at least one of the pores is a pore complex of the invention. The multimer may comprise any number of pores, such as 3, 4, 5, 6, 7 or 8 or more pores. Any number of the pores in the multimer, including all of them, may be a pore complex of the invention. The pore multimer may be a double pore complex comprising a first pore complex of the invention and a second pore or complex. The second pore or complex is typically derived from CsgG. The second pore complex may be a complex of the invention. Both the first pore complex and the second pore complex are preferably pore complexes of the invention. In the double pore complex, the first pore complex may be attached to the second pore (complex) by hydrophobic interactions and/or by one or more disulfide bonds. One or more, such as 2, 3, 4, 5, 6, 8, 9, for example all, of the monomers in the first pore complex and/or the second pore (complex) may be modified to enhance such interactions. This may be achieved in any suitable way. Particular methods of forming double pores from CsgG- derived pores are described in WO 2019/002893 (incorporated by reference herein in its entirety).
The pore multimer of the invention may be isolated, substantially isolated, purified or substantially purified. Such terms are defined above with reference to the pore complexes of the invention.
Membrane embodiments
The invention also provides a pore complex of the invention or a pore multimer of the invention which is comprised in a membrane. The invention also provides a membrane comprising a pore complex of the invention or a pore multimer of the invention. These products are directly applicable for use in molecular sensing, such as analyte characterisation and polynucleotide sequencing. Suitable membranes are discussed in more detail below.
Method for making modified proteins
Methods for introducing or substituting non-naturally occurring amino acids in CsgG pore monomers and CsgF peptides are also well known in the art and described in WO 2019/002893 (incorporated by reference herein in its entirety). The proteins may be modified to assist their identification or purification, for example by the addition of a streptavidin tag or by the addition of a signal sequence to promote their secretion from a cell where the monomer does not naturally contain such a sequence. The proteins may also be produced using D-amino acids or a mixture of L-amino acids and D-amino acids. This is conventional in the art for producing such proteins or peptides.
The CsgG pore monomer, the CsgF peptide, the pore monomer conjugate, the construct, the pore complex, or the pore multimer (i.e., any protein of the invention) may be chemically modified. The protein can be chemically modified in any way and at any site. The protein may be chemically modified by attachment of a molecule to one or more cysteines (cysteine linkage), attachment of a molecule to one or more lysines, attachment of a molecule to one or more non-natural amino acids, enzyme modification of an epitope or modification of a terminus. Suitable methods for carrying out such modifications are well- known in the art. The protein may be chemically modified by the attachment of any molecule, such as a dye or a fluorophore.
The protein may be chemically modified with a molecular adaptor that facilitates the interaction between a pore comprising the monomer and a target nucleotide or target polynucleotide sequence. Suitable adaptors, including a cyclic molecule, a cyclodextrin, a species that is capable of hybridization, a DNA binder or interchelator, a peptide or peptide analogue, a synthetic polymer, an aromatic planar molecule, a small positively charged molecule or a small molecule capable of hydrogen-bonding, are described in WO 2019/002893 (incorporated by reference herein in its entirety). The molecular adaptor may be attached using any of the methods and linkers discussed above.
The protein may be attached to a polynucleotide binding protein. This forms a modular sequencing system that may be used in the methods of sequencing of the invention. Polynucleotide binding proteins are discussed below. The protein can be covalently attached to the monomer using any method known in the art. The monomer and protein may be chemically fused or genetically fused. Genetic fusion of a monomer to a polynucleotide binding protein is discussed in WO 2010/004265 (incorporated herein by reference in its entirety). The polynucleotide binding protein may be attached via cysteine linkage using any method described above.
The polynucleotide binding protein may be attached directly to the protein via one or more linkers. The molecule may be attached to the CsgG pore monomer using the hybridization linkers described in as WO 2010/086602 (incorporated herein by reference in its entirety). Alternatively, peptide linkers may be used. Suitable peptide linkers are discussed above.
Any of the proteins may be modified to assist their identification or purification, for example by the addition of histidine residues (a his tag), aspartic acid residues (an asp tag), a streptavidin tag, a flag tag, a SUMO tag, a GST tag or a MBP tag, or by the addition of a signal sequence to promote their secretion from a cell where the polypeptide does not naturally contain such a sequence. An alternative to introducing a genetic tag is to chemically react a tag onto a native or engineered position on the protein. An example of this would be to react a gel-shift reagent to a cysteine engineered on the outside of the protein. This has been demonstrated as a method for separating hemolysin heterooligomers (Chem Biol. 1997 Jul;4(7):497-505).
Any of the proteins may be labelled with a revealing label. The revealing label may be any suitable label which allows the protein to be detected. Suitable labels include, but are not limited to, fluorescent molecules, radioisotopes, e.g., 1251, 35S, enzymes, antibodies, antigens, polynucleotides, and ligands such as biotin. The protein may also contain other non-specific modifications as long as they do not interfere with the function of the protein. A number of non-specific side chain modifications are known in the art and may be made to the side chains of the protein(s). Such modifications include, for example, reductive alkylation of amino acids by reaction with an aldehyde followed by reduction with NaBH4, amidation with methylacetimidate or acylation with acetic anhydride.
Any of the proteins can be produced using standard methods known in the art. Polynucleotide sequences encoding a protein may be derived and replicated using standard methods in the art. Polynucleotide sequences encoding a protein may be expressed in a bacterial host cell using standard techniques in the art. The protein may be produced in a cell by in situ expression of the polypeptide from a recombinant expression vector. The expression vector optionally carries an inducible promoter to control the expression of the polypeptide. These methods are described in Sambrook, J. and Russell, D. (2001).
Molecular Cloning: A Laboratory Manual, 3rd Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
Proteins may be produced in large scale following purification by any protein liquid chromatography system from protein producing organisms or after recombinant expression. Typical protein liquid chromatography systems include FPLC, AKTA systems, the Bio-Cad system, the Bio-Rad BioLogic system and the Gilson HPLC system.
Method for producing pore monomer conjugates
The invention provides methods for producing a pore monomer conjugate of the invention. The method comprises attaching, preferably covalently attaching, the CsgF peptide to the CsgG pore monomer at two or more positions.
The method typically comprises modifying the CsgF peptide at two or more positions to include two or more reactive groups capable of attaching to two or more positions in the CsgG pore monomer. The two or more reactive groups may be the same. The two or more reactive groups may be different.
The methods preferably comprise contacting the CsgF peptide and the CsgG pore monomer with two or more linkers. The components may be contacted with the two or more linkers in any order, such as CsgF peptide first and then the CsgG pore monomer, the CsgG pore monomer first and then the CsgF peptide or both components at the same time. One or more linkers may be attached to the CsgF peptide and one or more linkers may be attached to the CsgG pore monomer before the two proteins are attached at two or more positions.
The two or more linkers are preferably attached to the CsgF peptide or the CsgG pore monomer first and then attached to the other component of the conjugate. The method preferably comprises covalently attaching the two or more linkers to the CsgF peptide and then contacting the linkers and CsgF peptide with the CsgG pore monomer under conditions which attach, preferably covalently attach, the CsgF peptide to the CsgG pore monomer at two or more positions. Such conditions are well known to a person skilled in the art and are discussed in the Example. The method is typically carried out in vitro as defined below.
Any of the embodiments discussed above with reference to the pore monomer conjugates of the invention equally applies to these methods.
Method of producing pores
The invention also provides methods for producing a pore complex of the invention or a pore multimer of the invention.
The method may involve expressing the pore complex in a host cell. In particular, the method may comprise expressing at least one pore monomer conjugate of the invention or a construct of the invention and sufficient pore monomers or constructs to form the pore complex or the pore multimer in a host cell and allowing the pore complex or pore multimer to form in the host cell. The sufficient pore monomers or constructs are preferably sufficient pore monomer conjugates of the invention or sufficient constructs of the invention. The numbers of CsgG pore monomers, pore monomer conjugates or constructs needed to form the pore complexes of the invention or pore multimers of the invention are discussed above. Suitable host cells and expression systems are known in the art and are discussed in the Example.
The method may involve forming the pore complex in a non-cellular or in vitro context. In particular, the method may comprise contacting at least one pore monomer conjugate of the invention or a construct of the invention with sufficient pore monomers or constructs in vitro and allowing the formation of the pore complex or pore multimer. The pore monomer conjugate or the construct may be produced separately by in vitro translation and transcription (IVTT) and then incubated with the sufficient pore monomers or constructs. The sufficient pore monomers or constructs are preferably sufficient pore monomer conjugates of the invention or sufficient constructs of the invention. The numbers of CsgG pore monomers, pore monomer conjugates or constructs needed to form the pore complexes of the invention or pore multimers of the invention are discussed above. The method may be conducted in an "in vitro system", which refers to a system comprising at least the necessary components and environment to execute said method, and makes use of biological molecules, organisms, a cell (or part of a cell) outside of their normal naturally occurring environment, permitting a more detailed, more convenient, or more efficient analysis than can be done with whole organisms. An in vitro system may also comprise a suitable buffer composition provided in a test tube, wherein said protein components to form the complex have been added. A person skilled in the art is aware of the options to provide said system.
Some or all of the components of the pore complex or pore multimer may be tagged to facilitate purification. Purification can also be performed when the components are untagged. Methods known in the art (e.g., ion exchange, gel filtration, hydrophobic interaction column chromatography etc.) can be used alone or in different combinations to purify the components of the pore.
The pore complex or pore multimer can be made prior to insertion into a membrane or after insertion of the components into a membrane.
Methods for making the pores and complexes of the invention and ways of tagging them are disclosed in WO 2016/034591, WO 2017/149316, WO 2017/149317 and, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (all incorporated by reference herein in their entirety).
Methods of characterising an analyte
The invention provides a method of determining the presence, absence or one or more characteristics of a target analyte. The method involves contacting the target analyte with a pore complex of the invention or pore multimer of the invention such that the target analyte moves with respect to, such as into or through, the pore complex or pore multimer and taking one or more measurements as the analyte moves with respect to the pore complex or pore multimer and thereby determining the presence, absence or one or more characteristics of the analyte. The target analyte may also be called the template analyte or the analyte of interest.
The pore complex of the invention or the pore multimer of the invention may be any of those discussed above.
The method is for determining the presence, absence or one or more characteristics of a target analyte. The method may be for determining the presence, absence or one or more characteristics of at least one analyte. The method may concern determining the presence, absence or one or more characteristics of two or more analytes. The method may comprise determining the presence, absence or one or more characteristics of any number of analytes, such as 2, 5, 10, 15, 20, 30, 40, 50, 100 or more analytes. Any number of characteristics of the one or more analytes may be determined, such as 1, 2, 3, 4, 5, 10 or more characteristics.
The binding of a molecule in the channel of the pore complex or pore multimer, or in the vicinity of either opening of the channel will have an effect on the open-channel ion flow through the pore complex or pore multimer, which is the essence of "molecular sensing". In a similar manner to the nucleic acid sequencing application, variation in the open-channel ion flow can be measured using suitable measurement techniques by the change in electrical current (for example, WO 2000/28312 and D. Stoddart et al., Proc. Natl. Acad. Sci., 2010, 106, 7702-7 or WO 2009/077734; all incorporated herein by reference in their entirety). The degree of reduction in ion flow, as measured by the reduction in electrical current, is related to the size of the obstruction within, or in the vicinity of, the pore. Binding of a molecule of interest, also referred to as an "analyte", in or near the pore therefore provides a detectable and measurable event, thereby forming the basis of a "biological sensor". Suitable molecules for nanopore sensing include nucleic acids; proteins; peptides; polysaccharides and small molecules (refers here to a low molecular weight (e.g., < 900Da or < 500Da) organic or inorganic compound) such as pharmaceuticals, toxins, cytokines, and pollutants. Detecting the presence of biological molecules finds application in personalised drug development, medicine, diagnostics, life science research, environmental monitoring and in the security and/or the defence industry.
The pore complex or pore multimer may serve as a molecular or biological sensor. The analyte molecule that is to be detected may bind to either face of the channel, or within the lumen of the channel itself. The position of binding may be determined by the size of the molecule to be sensed.
The target analyte is preferably a metal ion, an inorganic salt, a polymer, an amino acid, a peptide, a polypeptide, a protein, a nucleotide, an oligonucleotide, a polynucleotide, a monosaccharide, an oligosaccharide, a polysaccharide, a dye, a bleach, a pharmaceutical, a diagnostic agent, a recreational drug, an explosive, a toxic compound, or an environmental pollutant. The analyte may comprise two or more different molecules, such as a peptide and a polypeptide. The method may concern determining the presence, absence or one or more characteristics of two or more analytes of the same type, such as two or more proteins, two or more nucleotides or two or more pharmaceuticals. Alternatively, the method may concern determining the presence, absence or one or more characteristics of two or more analytes of different types, such as one or more proteins, one or more nucleotides and one or more pharmaceuticals.
The target analyte can be secreted from cells. Alternatively, the target analyte can be an analyte that is present inside cells such that the analyte must be extracted from the cells before the method can be carried out.
The pore complex or pore multimer may be modified via recombinant or chemical methods to increase the strength of binding, the position of binding, or the specificity of binding of the molecule to be sensed. Typical modifications include addition of a specific binding moiety complimentary to the structure of the molecule to be sensed. Where the analyte molecule comprises a nucleic acid, this binding moiety may comprise a cyclodextrin or an oligonucleotide; for small molecules this may be a known complimentary binding region, for example the antigen binding portion of an antibody or of a non-antibody molecule, including a single chain variable fragment (scFv) region or an antigen recognition domain from a T- cell receptor (TCR); or for proteins, it may be a known ligand of the target protein. In this way the pore complex or pore multimer may be rendered capable of acting as a molecular sensor for detecting presence in a sample of suitable antigens (including epitopes) that may include cell surface antigens, including receptors, markers of solid tumours or haematologic cancer cells (e.g. lymphoma or leukaemia), viral antigens, bacterial antigens, protozoal antigens, allergens, allergy related molecules, albumin (e.g. human, rodent, or bovine), fluorescent molecules (including fluorescein), blood group antigens, small molecules, drugs, enzymes, catalytic sites of enzymes or enzyme substrates, and transition state analogues of enzyme substrates. As described above, modifications may be achieved using known genetic engineering and recombinant DNA techniques. The positioning of any adaptation would be dependent on the nature of the molecule to be sensed, for example, the size, three-dimensional structure, and its biochemical nature. The choice of adapted structure may make use of computational structural design. Determination and optimization of protein-protein interactions or protein-small molecule interactions can be investigated using technologies such as a BIAcore® which detects molecular interactions using surface plasmon resonance (BIAcore, Inc., Piscataway, NJ; see also www.biacore.com).
The analyte is preferably an amino acid, a peptide, a polypeptides, or protein. The amino acid, peptide, polypeptide or protein can be naturally occurring or non-naturally occurring. The polypeptide or protein can include within them synthetic or modified amino acids. Several different types of modification to amino acids are known in the art. Suitable amino acids and modifications thereof are above. It is to be understood that the target analyte can be modified by any method available in the art.
The analyte is preferably a polynucleotide, such as a nucleic acid, which is defined as a macromolecule comprising two or more nucleotides. Nucleic acids are particularly suitable for nanopore sequencing. The naturally occurring nucleic acid bases in DNA and RNA may be distinguished by their physical size. As a nucleic acid molecule, or individual base, passes through the channel of a nanopore, the size differential between the bases causes a directly correlated reduction in the ion flow through the channel. The variation in ion flow may be recorded. Suitable electrical measurement techniques for recording ion flow variations are discussed above. Through suitable calibration, the characteristic reduction in ion flow can be used to identify the particular nucleotide and associated base traversing the channel in realtime. In typical nanopore nucleic acid sequencing, the open-channel ion flow is reduced as the individual nucleotides of the nucleic sequence of interest sequentially pass through the channel of the nanopore due to the partial blockage of the channel by the nucleotide. It is this reduction in ion flow that is measured using the suitable recording techniques described above. The reduction in ion flow may be calibrated to the reduction in measured ion flow for known nucleotides through the channel resulting in a means for determining which nucleotide is passing through the channel, and therefore, when done sequentially, a way of determining the nucleotide sequence of the nucleic acid passing through the nanopore. For the accurate determination of individual nucleotides, it has typically required for the reduction in ion flow through the channel to be directly correlated to the size of the individual nucleotide passing through the constriction. It will be appreciated that sequencing may be performed upon an intact nucleic acid polymer that is 'threaded' through the pore via the action of an associated polymerase, for example. Alternatively, sequences may be determined by passage of nucleotide triphosphate bases that have been sequentially removed from a target nucleic acid in proximity to the pore (see for example WO 2014/187924 incorporated herein by reference in its entirety).
The polynucleotide or nucleic acid may comprise any combination of any nucleotides. The nucleotides can be naturally occurring or artificial. One or more nucleotides in the polynucleotide can be oxidized or methylated. One or more nucleotides in the polynucleotide may be damaged. For instance, the polynucleotide may comprise a pyrimidine dimer. Such dimers are typically associated with damage by ultraviolet light and are the primary cause of skin melanomas. One or more nucleotides in the polynucleotide may be modified, for instance with a label or a tag, for which suitable examples are known by a skilled person. The polynucleotide may comprise one or more spacers. A nucleotide typically contains a nucleobase, a sugar and at least one phosphate group. The nucleobase and sugar form a nucleoside. The nucleobase is typically heterocyclic. Nucleobases include, but are not limited to, purines and pyrimidines and more specifically adenine (A), guanine (G), thymine (T), uracil (U) and cytosine (C). The sugar is typically a pentose sugar. Nucleotide sugars include, but are not limited to, ribose and deoxyribose. The sugar is preferably a deoxyribose. The polynucleotide preferably comprises the following nucleosides: deoxyadenosine (dA), deoxyuridine (dU) and/or thymidine (dT), deoxyguanosine (dG) and deoxycytidine (dC). The nucleotide is typically a ribonucleotide or deoxyribonucleotide. The nucleotide typically contains a monophosphate, diphosphate, or triphosphate. The nucleotide may comprise more than three phosphates, such as 4 or 5 phosphates. Phosphates may be attached on the 5' or 3' side of a nucleotide. The nucleotides in the polynucleotide may be attached to each other in any manner. The nucleotides are typically attached by their sugar and phosphate groups as in nucleic acids. The nucleotides may be connected via their nucleobases as in pyrimidine dimers. The polynucleotide may be single stranded or double stranded. At least a portion of the polynucleotide is preferably double stranded. The polynucleotide is most preferably ribonucleic nucleic acid (RNA) or deoxyribonucleic acid (DNA). In particular, said method using a polynucleotide as an analyte alternatively comprises determining one or more characteristics selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide and (v) whether or not the polynucleotide is modified.
The polynucleotide can be any length (i). For example, the polynucleotide can be at least 10, at least 50, at least 100, at least 150, at least 200, at least 250, at least 300, at least 400 or at least 500 nucleotides or nucleotide pairs in length. The polynucleotide can be 1000 or more nucleotides or nucleotide pairs, 5000 or more nucleotides or nucleotide pairs in length or 100000 or more nucleotides or nucleotide pairs in length. Any number of polynucleotides can be investigated. For instance, the method may concern characterising 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 50, 100 or more polynucleotides. If two or more polynucleotides are characterised, they may be different polynucleotides or two instances of the same polynucleotide. The polynucleotide can be naturally occurring or artificial. For instance, the method may be used to verify the sequence of a manufactured oligonucleotide. The method is typically carried out in vitro.
Nucleotides can have any identity (ii), and include, but are not limited to, adenosine monophosphate (AMP), guanosine monophosphate (GMP), thymidine monophosphate (TMP), uridine monophosphate (UMP), 5-methylcytidine monophosphate, 5- hydroxy methylcytidine monophosphate, cytidine monophosphate (CMP), cyclic adenosine monophosphate (cAMP), cyclic guanosine monophosphate (cGMP), deoxyadenosine monophosphate (dAMP), deoxyguanosine monophosphate (dGMP), deoxythymidine monophosphate (dTMP), deoxyuridine monophosphate (dUMP), deoxycytidine monophosphate (dCMP) and deoxymethylcytidine monophosphate. The nucleotides are preferably selected from AMP, TMP, GMP, CMP, UMP, dAMP, dTMP, dGMP, dCMP and dUMP. A nucleotide may be abasic (i.e., lack a nucleobase). A nucleotide may also lack a nucleobase and a sugar (i.e., is a C3 spacer). The sequence of the nucleotides (iii) is determined by the consecutive identity of following nucleotides attached to each other throughout the polynucleotide strain, in the 5' to 3' direction of the strand.
The pore complexes and pore multimers of the invention are particularly useful in analysing homopolymers. For example, they may be used to determine the sequence of a polynucleotide comprising two or more, such as at least 3, 4, 5, 6, 7, 8, 9 or 10, consecutive nucleotides that are identical. For example, they may be used to sequence a polynucleotide comprising a polyA, polyT, polyG and/or polyC region.
The CsgG pore constriction is made of the residues at the 51, 55 and 56 positions of SEQ ID NO: 3. The constriction of CsgG and its constriction mutants are generally sharp. When DNA is passing through the constriction, interactions of approximately 5 bases of DNA with the constriction of the pore at any given time dominate the current signal. Although these sharper constrictions are very good in reading mixed sequence regions of DNA (when A, T, G and C are mixed), the signal becomes flat and lack information when there is a homopolymeric region within the DNA (eg: polyT, polyG, polyA, polyC). Because 5 bases dominate the signal of the CsgG and its constriction mutants, it's difficult to discriminate photopolymers longer than 5 without using additional dwell time information. However, if DNA is passing through a second constriction formed by the CsgF peptide, more DNA bases will interact with the combined constrictions, increasing the length of the homopolymers that can be discriminated.
The movement of the polynucleotide with respect to the pore, such as through the pore, is preferably controlled using a polynucleotide binding protein. Suitable proteins are discussed in more detail below. The invention provides a method for determining the presence, absence or one or more characteristics of a target polynucleotide, comprising the steps of:
(i) contacting the target polynucleotide with a pore complex of the invention or a pore multimer of the invention and a polynucleotide binding protein, such that the polynucleotide binding protein controls the movement of the target analyte moves with respect to, such as through, the pore complex or the pore multimer; and
(ii) taking one or more measurements as the polynucleotide moves with respect to, such as through, the pore complex or the pore multimer and thereby determining the presence, absence or one or more characteristics of the polynucleotide.
In any of the methods, the one or more characteristics of the target analyte are preferably measured by electrical measurement and/or optical measurement. The electrical measurement is a current measurement, an impedance measurement, a tunnelling measurement, or a field effect transistor (FET) measurement. The method preferably comprises measuring the current flowing through the pore complex or the pore multimer as the analyte moves with respect to, such as through, the pore.
General conditions for conducting the methods of the invention are discussed in more detail below with reference to the kits and systems of the invention.
Polynucleotides of the invention
The invention also provides a polynucleotide which encodes a pore monomer conjugate of the invention or a construct of the invention. The polynucleotide may be any of those discussed above. The invention also provides an expression vector comprising a polynucleotide of the invention. The invention also provides a host cell comprising a polynucleotide of the invention or a host cell of the invention. Suitable vectors and host cells are known in the art.
Kits
The invention also provides kits for characterising a target analyte. In one embodiment, the kit comprises (a) a pore complex of the invention or a pore multimer of the invention and (b) the components of a membrane. Suitable membranes and components are discussed below.
In another embodiment, the kit comprises (a) a pore complex of the invention or a pore multimer of the invention and (b) a polynucleotide binding protein. The kit preferably further comprises the components of a membrane. The kit may comprise components of any type of membranes, such as an amphiphilic layer or a triblock copolymer membrane. Preferred polynucleotide binding proteins are polymerases, exonucleases, helicases and topoisomerases, such as gyrases. Suitable enzymes include, but are not limited to, exonuclease I from E. coli, exonuclease III enzyme from E. coli, RecJ from T. thermophilus and bacteriophage lambda exonuclease, TatD exonuclease and variants thereof. Three subunits comprising the RecJ sequence from T. thermophilus or a variant thereof interact to form a trimer exonuclease. The polymerase may be PyroPhage® 3173 DNA Polymerase (which is commercially available from Lucigen® Corporation), SD Polymerase (commercially available from Bioron®) or variants thereof. The enzyme may be Phi29 DNA polymerase or a variant thereof. The topoisomerase is preferably a member of any of the Moiety Classification (EC) groups 5.99.1.2 and 5.99.1.3.
The enzyme is most preferably derived from a helicase, such as Hel308 Mbu, Hel308 Csy, Hel308 Tga, Hel308 Mhu, Tral Eco, XPD Mbu or a variant thereof. Any helicase may be used in the invention. The helicase may be or be derived from a Hel308 helicase, a RecD helicase, such as Tral helicase or a TrwC helicase, a XPD helicase or a Dda helicase. The helicase may be any of the helicases, modified helicases or helicase constructs disclosed in WO 2013/057495; WO 2013/098562; WO 2013098561; WO 2014/013260; WO 2014/013259; WO 2014/013262 and WO 2015/055981. All of these are incorporated by reference in their entirety.
The kit may further comprise one or more anchors, such as cholesterol, for coupling the target analyte to the membrane. The kit may further comprise one or more polynucleotide adaptors that can be attached to a target polynucleotide to facilitate characterisation of the polynucleotide. The anchor, such as cholesterol, is preferably attached to the polynucleotide adaptor.
The kit may additionally comprise one or more other reagents or instruments which enable any of the embodiments mentioned above to be carried out. Such reagents or instruments include one or more of the following: suitable buffer(s) (aqueous solutions), means to obtain a sample from a subject (such as a vessel or an instrument comprising a needle), means to amplify and/or express polynucleotides or voltage or patch clamp apparatus. Reagents may be present in the kit in a dry state such that a fluid sample resuspends the reagents. The kit may also, optionally, comprise instructions to enable the kit to be used in the method of the invention or details regarding for which organism the method may be used. Finally, the kit may also comprise additional components useful in analyte characterization.
Apparatus
The invention also provides an apparatus for characterising target analytes in a sample, comprising (a) a plurality of pore complexes of the invention or a plurality of pore multimers of the invention and (b) a plurality of polynucleotide binding proteins. The plurality of pore complexes or plurality of pore multimers may be any of those discussed above.
The invention also provides an apparatus comprising a pore complex of the invention or a pore multimer of the invention inserted into an in vitro membrane.
The invention also provides an apparatus produced by a method comprising: (i) obtaining a pore complex of the invention or a pore multimer of the invention and (ii) contacting the pore complex or pore multimer with an in vitro membrane such that the pore complex or pore multimer is inserted in the in vitro membrane.
Any of the specific embodiments discussed above are equally applicable to the apparatuses of the invention.
Arrays
The invention also provides an array comprising a plurality of membranes of the invention. Any of the embodiments discussed above with respect to the membranes of the invention equally apply the array of the invention. The array may be set up to perform any of the methods described below.
In a preferred embodiment, each membrane in the array comprises one pore complex or pore multimer. Due to the manner in which the array is formed, for example, the array may comprise one or more membranes that do not comprise a pore complex or pore multimer, and/or one or more membranes that comprise two or more pores complexes or multimers. The array may comprise from about 2 to about 1000, such as from about 10 to about 800, from about 20 to about 600 or from about 30 to about 500 membranes.
System The invention provides a system comprising (a) a membrane of the invention or an array of the invention, (b) means for applying a potential across the membrane(s) and (c) means for detecting electrical or optical signals across the membrane(s).
The pores and membranes may be any as described above and below.
In one embodiment, the system further comprises a first chamber and a second chamber, wherein the first and second chambers are separated by the membrane(s). When used to characterise a target analyte, the system may further comprise a target analyte, wherein the target analyte is transiently located within the continuous channel and wherein one end of the target analyte is located in the first chamber and one end of the target analyte is located in the second chamber. The target analyte is preferably a target polypeptide or a target polynucleotide.
In one embodiment, the system further comprises an electrically conductive solution in contact with the pore(s), electrodes providing a voltage potential across the membrane(s), and a measurement system for measuring the current through the pore(s). The voltage applied across the membranes and pore is preferably from +5 V to -5 V, such as -600 mV to +600mV or -400 mV to +400 mV. The voltage used is preferably in the range 100 mV to 240 mV and more preferably in the range of 120 mV to 220 mV. It is possible to increase discrimination between different amino acids or nucleotides by a pore by using an increased applied potential. Any suitable electrically conductive solution may be used. For example, the solution may comprise charge carriers, such as metal salts, for example alkali metal salt, halide salts, for example chloride salts, such as alkali metal chloride salt. Charge carriers may include ionic liquids or organic salts, for example tetramethyl ammonium chloride, trimethylphenyl ammonium chloride, phenyltrimethyl ammonium chloride, or 1- ethyl-3-methyl imidazolium chloride. In an exemplary system, salt is present in the aqueous solution in the chamber. Potassium chloride (KCI), sodium chloride (NaCI), caesium chloride (CsCI) or a mixture of potassium ferrocyanide and potassium ferricyanide is typically used. KCI, NaCI and a mixture of potassium ferrocyanide and potassium ferricyanide are preferred. The charge carriers may be asymmetric across the membrane. For instance, the type and/or concentration of the charge carriers may be different on each side of the membrane, e.g., in each chamber.
The salt concentration may be at saturation. The salt concentration may be 3 M or lower and is typically from 0.1 to 2.5 M, from 0.3 to 1.9 M, from 0.5 to 1.8 M, from 0.7 to 1.7 M, from 0.9 to 1.6 M or from 1 M to 1.4 M. The salt concentration is preferably from 150 mM to 1 M. The method is preferably carried out using a salt concentration of at least 0.3 M, such as at least 0.4 M, at least 0.5 M, at least 0.6 M, at least 0.8 M, at least 1.0 M, at least 1.5 M, at least 2.0 M, at least 2.5 M or at least 3.0 M. High salt concentrations provide a high signal to noise ratio and allow for currents indicative of the presence of an amino acid or nucleotide to be identified against the background of normal current fluctuations.
A buffer may be present in the electrically conductive solution. Typically, the buffer is phosphate buffer. Other suitable buffers are HEPES and Tris-HCI buffer. The pH of the electrically conductive solution may be from 4.0 to 12.0, from 4.5 to 10.0, from 5.0 to 9.0, from 5.5 to 8.8, from 6.0 to 8.7 or from 7.0 to 8.8 or 7.5 to 8.5. The pH used is preferably about 7.5.
The system may be comprised in an apparatus. The apparatus may be any conventional apparatus for analyte analysis, such as an array or a chip. The apparatus is preferably set up to carry out the disclosed method. For example, the apparatus may comprise a chamber comprising an aqueous solution and a barrier that separates the chamber into two sections. The barrier typically has an aperture in which the membrane(s) containing the pore(s) are formed. Alternatively, the barrier forms the membrane in which the pore is present.
The apparatus may also comprise an electrical circuit capable of applying a potential and measuring an electrical signal across the membrane and pore.
The apparatus may be any of those described in WO 2008/102120, WO 2009/077734, WO 2010/122293, WO 2011/067559, or WO 00/28312 (all incorporated herein by reference in their entirety).
Membrane
Any suitable membrane may be used in the system. The membrane is preferably an amphiphilic layer. An amphiphilic layer is a layer formed from amphiphilic molecules, such as phospholipids, which have both hydrophilic and lipophilic properties. The amphiphilic molecules may be synthetic or naturally occurring. Non-naturally occurring amphiphiles and amphiphiles which form a monolayer are known in the art and include, for example, block copolymers (Gonzalez-Perez et al., Langmuir, 2009, 25, 10447-10450). Block copolymers are polymeric materials in which two or more monomer sub-units that are polymerized together to create a single polymer chain. Block copolymers typically have properties that are contributed by each monomer sub-unit. However, a block copolymer may have unique properties that polymers formed from the individual sub-units do not possess. Block copolymers can be engineered such that one of the monomer sub-units is hydrophobic (i.e., lipophilic), whilst the other sub-unit(s) are hydrophilic whilst in aqueous media. In this case, the block copolymer may possess amphiphilic properties and may form a structure that mimics a biological membrane. The block copolymer may be a diblock (consisting of two monomer sub-units) but may also be constructed from more than two monomer sub-units to form more complex arrangements that behave as amphipiles. The copolymer may be a triblock, tetrablock or pentablock copolymer. The membrane is preferably a triblock copolymer membrane.
The membrane may comprise one of the membranes disclosed in International Application No. WO 2014/064443 or WO 2014/064444.
The amphiphilic molecules may be chemically modified or functionalised to facilitate coupling of the polynucleotide. The amphiphilic layer may be a monolayer or a bilayer. The amphiphilic layer is typically planar. The amphiphilic layer may be curved. The amphiphilic layer may be supported.
Amphiphilic membranes are typically naturally mobile, essentially acting as two-dimensional fluids with lipid diffusion rates of approximately IO-8 cm s4. This means that the pore and coupled polynucleotide can typically move within an amphiphilic membrane.
The membrane may be a lipid bilayer. Lipid bilayers are models of cell membranes and serve as excellent platforms for a range of experimental studies. For example, lipid bilayers can be used for in vitro investigation of membrane proteins by single-channel recording. Alternatively, lipid bilayers can be used as biosensors to detect the presence of a range of substances. The lipid bilayer may be any lipid bilayer. Suitable lipid bilayers include, but are not limited to, a planar lipid bilayer, a supported bilayer, or a liposome. The lipid bilayer is preferably a planar lipid bilayer. Suitable lipid bilayers are disclosed in WO 2008/102121, WO 2009/077734, and WO 2006/100484 (all incorporated herein by reference in their entirety).
The membrane may comprise a solid-state layer. Solid state layers can be formed from both organic and inorganic materials including, but not limited to, microelectronic materials, insulating materials such as Si3N4, A12O3, and SiO, organic and inorganic polymers such as polyamide, plastics such as Teflon® or elastomers such as two-component addition-cure silicone rubber, and glasses. The solid-state layer may be formed from graphene. Suitable graphene layers are disclosed in WO 2009/035647 (incorporated herein by reference in its entirety). If the membrane comprises a solid-state layer, the pore is typically present in an amphiphilic membrane or layer contained within the solid-state layer, for instance within a hole, well, gap, channel, trench or slit within the solid-state layer. The skilled person can prepare suitable solid state/amphiphilic hybrid systems. Suitable systems are disclosed in WO 2009/020682 and WO 2012/005857 (both incorporated herein by reference in their entirety). Any of the amphiphilic membranes or layers discussed above may be used.
The method is typically carried out using (i) an artificial amphiphilic layer comprising a pore, (ii) an isolated, naturally occurring lipid bilayer comprising a pore, or (iii) a cell having a pore inserted therein. The method is typically carried out using an artificial amphiphilic layer, such as a di- or tri-block copolymer layer. The layer may comprise other transmembrane and/or intramembrane proteins as well as other molecules in addition to the pore. Suitable apparatus and conditions are discussed below. The method of the invention is typically carried out in vitro.
SEQUENCE LISTING
SEQ ID NO: 1 (>P0AEA2; coding sequence for WT CsgG from E. coli K12) ATGCAGCGCTTATTTCTTTTGGTTGCCGTCATGTTACTGAGCGGATGCTTAACCGCCCCGCCTAAAG AAGCCGCCAGACCGACATTAATGCCTCGTGCTCAGAGCTACAAAGATTTGACCCATCTGCCAGCGCC GACGGGTAAAATCTTTGTTTCGGTATACAACATTCAGGACGAAACCGGGCAATTTAAACCCTACCCG GCAAGTAACTTCTCCACTGCTGTTCCGCAAAGCGCCACGGCAATGCTGGTCACGGCACTGAAAGATT CTCGCTGGTTTATACCGCTGGAGCGCCAGGGCTTACAAAACCTGCTTAACGAGCGCAAGATTATTCG TGCGGCACAAGAAAACGGCACGGTTGCCATTAATAACCGAATCCCGCTGCAATCTTTAACGGCGGCA AATATCATGGTTGAAGGTTCGATTATCGGTTATGAAAGCAACGTCAAATCTGGCGGGGTTGGGGCAA GATATTTTGGCATCGGTGCCGACACGCAATACCAGCTCGATCAGATTGCCGTGAACCTGCGCGTCGT CAATGTGAGTACCGGCGAGATCCTTTCTTCGGTGAACACCAGTAAGACGATACTTTCCTATGAAGTT CAGGCCGGGGTTTTCCGCTTTATTGACTACCAGCGCTTGCTTGAAGGGGAAGTGGGTTACACCTCGA ACGAACCTGTTATGCTGTGCCTGATGTCGGCTATCGAAACAGGGGTCATTTTCCTGATTAATGATGG TATCGACCGTGGTCTGTGGGATTTGCAAAATAAAGCAGAACGGCAGAATGACATTCTGGTGAAATAC CGCCATATGTCGGTTCCACCGGAATCCTGA
SEQ ID NO:2 (>P0AEA2 (1 :277); WT Pro-CsgG from E. coli K12)
MQRLFLLVAVMLLSGCLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNF STAVPQSATAMLVTALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSI IGYESNVKSGGVGARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQ RLLEGEVGYTSNEPVMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES SEQ ID NO:3 (>P0AEA2 (16:277); mature CsgG from E. coli K12) CLTAPPKEAARPTLMPRAQSYKDLTHLPAPTGKIFVSVYNIQDETGQFKPYPASNFSTAVPQSATAMLVT ALKDSRWFIPLERQGLQNLLNERKIIRAAQENGTVAINNRIPLQSLTAANIMVEGSIIGYESNVKSGGVG ARYFGIGADTQYQLDQIAVNLRVVNVSTGEILSSVNTSKTILSYEVQAGVFRFIDYQRLLEGEVGYTSNEP VMLCLMSAIETGVIFLINDGIDRGLWDLQNKAERQNDILVKYRHMSVPPES
SEQ ID NO:4 (>P0AE98; coding sequence for WT CsgF from E. coli K12) ATGCGTGTCAAACATGCAGTAGTTCTACTCATGCTTATTTCGCCATTAAGTTGGGCTGGAACCATGAC TTTCCAGTTCCGTAATCCAAACTTTGGTGGTAACCCAAATAATGGCGCTTTTTTATTAAATAGCGCTC AGGCCCAAAACTCTTATAAAGATCCGAGCTATAACGATGACTTTGGTATTGAAACACCCTCAGCGTTA GATAACTTTACTCAGGCCATCCAGTCACAAATTTTAGGTGGGCTACTGTCGAATATTAATACCGGTAA ACCGGGCCGCATGGTGACCAACGATTATATTGTCGATATTGCCAACCGCGATGGTCAATTGCAGTTG AACGTGACAGATCGTAAAACCGGACAAACCTCGACCATCCAGGTTTCGGGTTTACAAAATAACTCAA CCGATTTT
SEQ ID NO:5 (>P0AE98 (1 : 138); WT Pro-CsgF from E. coli K12) MRVKHAVVLLMLISPLSWAGTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSAL DNFTQAIQSQILGGLLSNINTGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTD F
SEQ ID NO:6 (>P0AE98 (20: 138); WT mature CsgF from E. coli K12) GTMTFQFRNPNFGGNPNNGAFLLNSAQAQNSYKDPSYNDDFGIETPSALDNFTQAIQSQILGGLLSNIN TGKPGRMVTNDYIVDIANRDGQLQLNVTDRKTGQTSTIQVSGLQNNSTDF
The following Example illustrates the invention. It is to be understood that although particular embodiments, specific configurations as well as materials and/or molecules, have been discussed herein for engineered cells and methods according to the present invention, various changes or modifications in form and detail may be made without departing from the scope and spirit of this invention. The following example is provided to better illustrate particular embodiments, and they should not be considered limiting the application. The application is limited only by the claims.
EXAMPLE
Detailed methods for making and testing mutant CsgG pores and mutant CsgG/CsgF complexes are described in WO 2016/034591, WO 2017/149316, WO 2017/149317, WO 2017/149318, WO 2018/211241, and WO 2019/002893 (all incorporated by reference herein in their entirety).
E coli CsgG pore production
Recombinant expression vectors encoding the CsgG variant nanopores with a C-terminal Strep affinity tag and ampicillin resistance gene are transformed into chemically competent E. coli cells. The cells are plated onto an LB Agar plate containing appropriate antibiotics for selection. A single colony from the agar plate is inoculated in LB Media with antibiotics and grown overnight. The culture is diluted into autoinduction media plus necessary antibiotics and incubated at 18°C for 68 hours. The cells are harvested through centrifugation before being lysed and extracted into lx Bugbuster extraction reagent (Merck 70921) and 0.1% DDM. The pore is purified from the supernatant using affinity chromatography, heat treatment and then size exclusion chromatography, selecting for oligomoeric nanopores as judged by SDS-PAGE.
CSQG/CSQF complex formation protocol
CsgG-CsgF complexes are prepared from nanopores purified as above and chemically synthesised CsgF peptides with or without one or two linkers capable of attaching to CsgG. Nanopores are buffer exchanged into a pH 7.0 buffer with reducing agents removed and incubated in a 8x molar excess of peptide to CsgG monomer for Ihr at 25°C. Reactions are stopped with heating at 60°C for 15 mins followed by centrifugation to remove any precipitate, DTT is added to 5 mM to prevent any further reaction. SDS PAGE analysis - with heating
300ng of complex and CsgG-only pore control is added to individual 0.5 mL ProteinLoBind Eppendorf tubes (Fisher, 10316752) and made to 10 pL volume with Reaction Buffer. This is made to a final volume of 20 pL by the addition of lOuL of 2x Laemmli buffer. Each sample is loaded in its entirety onto a 4-20% TGX gel (BioRad, 5671093) running with lx TGS buffer (Sigma, T7777). This is run for 21 minutes at 300V. To image the gel, Spyro Ruby (Merk, S4942) stain is used as per the manufacturer's instructions. This is then imaged on a GE Typhoon gel imager using a 450 nm laser.
SDS-PAGE gel analysis of the CsgG-only pore controls and CsgG/CsgF complexes when broken down to their constituent monomer components upon boiling in the presence of DTT. Lanes in which CsgG is attached to CsgF at one or two positions show a band shift compared with the CsgG-only control, indicating a covalent linkage between CsgG and CsgF.
DNA squiggle (Le., DNA translocation current trace)
Electrical measurements are acquired from CsgG-only, single attached CsgG/CsgF complexes and double attached CsgG/CsgF complexes that are inserted into MinlON flow cells. After a single pore inserted into the block co-polymer membrane, 1 mL of a buffer comprising 25 mM Potassium Phosphate, 150 mM Potassium Ferrocyanide (II), 150 mM Potassium Ferricyanide (III), pH 8.0 is flowed through the system to remove any excess nanopores.
A Y-adapter is prepared by annealing DNA oligonucleotides as described previously (WO 2016/034591, which is incorporated herein in its entirety). A DNA motor is loaded and closed on the adapter. The subsequent material is HPLC purified. The Y-adapter contains a 30 C3 leader section for easier capture by the nanopore and a side arm for tethering to the membrane.
The analyte being used to assess the DNA squiggle is a 3.6-kilobase DNA section from the 3' end of the lambda genome. Preparation of the analyte, ligating the analyte to the Y- adapter, SPRI-bead clean-up of the ligated analyte and addition to a minlON flow cell is carried out using the Oxford Nanopore Technologies Q-SQK-LSK109 protocol.
Electrical measurements are acquired using minlON Mklb from Oxford Nanopore Technologies. A standard sequencing script at -180 mV is run for 2-6 hours, with static flicks every 5 minute to remove extended nanopore blocks. Raw data is collected in a bulk FAST5 file using MinKNOW software (Oxford Nanopore Technologies). A minimum of 150 pores per flow cell are tested per pore type. In the absence of the CsgF peptide, the majority of pores are CsgG-only pores. Note that a small number of pores get misclassified as CsgG/CsgF complexes. However, when the pore complex comprises CsgF that is functionalized with a single linker or two linkers capable of attaching to CsgG, a high proportion of CsgG/CsgF complexes are observed. The thresholds used to classify the inserted pore types are as follows: CsgG-only pores = pores with open pore current between 160 pA and 200 pA; CsgG/CsgF complexes = pores with open pore currents between 70 pA and 140 pA. Both classifications also have open pore noise < 18 pA.
Current traces show the ionic current (pA) versus time (s) as single stranded DNA translocates through CsgG attached to CsgF at one or two positions. Each individual graph corresponds to a single pore inserted into a minlON flow cell. The open pore current observed for the CsgG/CsgF complexes is approximately 100 pA under the applied voltage of -180 mV. All other channels shown are either CsgG-only or empty/blocked channels.
Box plots show the signal metrics of CsgG-based pores. SNR is the signal to noise ratio which is the range of the ionic current divided by the noise as single stranded DNA is translocating through the pore. The SNR and/or range increase in the presence of a single attachment of CsgF to CsgG. The SNR and/or range of double attached CsgG/CsgF complexes is also increased compared with the single attached complexes.

Claims

CLAIMS A pore monomer conjugate comprising a CsgG pore monomer attached to a CsgF peptide, wherein the CsgF peptide is attached to the CsgG pore monomer at two or more positions. A pore monomer conjugate according to claim 1, wherein the two or more positions in the CsgG pore monomer are selected from (a) residues 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in the CsgG pore monomer or (b) residues corresponding to positions 47-54, 57, 59, 60, 130-134, 136, 137, 138, 140, 142-145, 147, 149, 151, 153, 155, 181, 183, 185, 187, 189, 191, 193, 195-199, 201, 203, 205, 207, 209 and 211-212 in SEQ ID NO: 3. A pore monomer conjugate according to claim 1 or 2, wherein the two or more positions in the CsgF peptide are selected from the N terminus and residues 1-35 in the CsgF peptide or from the N terminus and residues corresponding to positions 1-35 in SEQ ID NO: 6. A pore monomer conjugate according to any one of the preceding claims, wherein one of the two or more attachments comprises (a) the N terminus of the CsgF peptide attached to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3, (b) the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached to a cysteine residue in the CsgG pore monomer corresponding to position 133 in SEQ ID NO: 3, or (c) the position in the CsgF peptide corresponding to position 4 of SEQ ID NO: 6 attached to a cysteine residue in the CsgG pore monomer corresponding to position 153 in SEQ ID NO: 3. A pore monomer conjugate according to any one of preceding claims, wherein one of the two or more attachments comprises the position in the CsgF peptide corresponding to any one of positions 30, 31, 32 and 33 in SEQ ID NO: 6 attached to the position in the CsgG pore monomer corresponding to any one of positions 193, 195, 196 and 197 in SEQ ID NO: 3. A pore monomer conjugate according to any one of the preceding claims, wherein the attachment at two or more positions comprises one or more reactive groups which (a) react with lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine in the CsgG pore monomer, (b) react with any amino acid in the CsgG pore monomer, and/or (c) undergo click chemistry. A pore monomer conjugate according to claim 6, wherein the lysine, cysteine, tyrosine, serine, threonine, proline, tryptophan, arginine, histidine, methionine, or phenylalanine is native to the CsgG pore monomer or is introduced into the CsgG pore monomer, optionally by substitution or addition. A pore monomer conjugate according to any one of preceding claims, wherein the attachment at two or more positions comprises two or more different reactive groups. A pore monomer conjugate according to any one of preceding claims, wherein the CsgF peptide is attached to the CsgG pore monomer using two or more linkers. . A pore monomer conjugate according to any one of preceding claims, wherein the CsgF peptide is covalently attached to the CsgG pore monomer at two or more positions.. A pore monomer conjugate according to any one of the preceding claims, wherein the CsgG pore monomer is a variant of SEQ ID NO: 3 and/or the CsgF peptide is a variant of SEQ ID NO: 6. . A construct comprising two or more covalently attached pore monomer conjugates according to any one of claims 1-11. . A construct according to claim 12, wherein the pore monomer conjugates are genetically fused and/or are attached via a linker. . A pore complex comprising at least one pore monomer conjugate according to any one of the preceding claims or at least one construct according to claim 12 or 13, wherein the CsgF peptide(s) form(s) a constriction in the pore complex. . A pore complex according to claim 14, wherein the pore complex is a homooligomer comprising 6 to 10 pore monomer conjugates according to any one of claims 1-11 or 1-5 constructs according to claim 12 or 13. . A pore complex according to claim 14 or 15, wherein the CsgF peptide(s) is/are inserted into the lumen of the pore complex. . A pore multimer comprising two or more pores, wherein at least one of the pores is a pore complex according to any one of claims 14-16. . A pore complex according to any one of claims 14-16 or a pore multimer according to claim 17, which is comprised in a membrane. . A membrane comprising a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17.
. A method for producing a pore monomer conjugate according to any one of claims 1-11 comprising attaching the CsgF peptide to the CsgG pore monomer at two or more positions. . A method for producing a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17, the method comprising expressing at least one pore monomer conjugate according to any one of claims 1-11 or a construct according to claim 12 or 13 and sufficient pore monomers or constructs to form the pore complex or the pore multimer in a host cell and allowing the pore complex or the pore multimer to form in the host cell. . A method for producing a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17, the method comprising contacting at least one pore monomer conjugate according to any one of claims 1-11 or a construct according to claim 12 or 13 with sufficient pore monomers or constructs in vitro and allowing the formation of the pore complex or the pore multimer. . A method for determining the presence, absence or one or more characteristics of a target analyte, comprising the steps of:
(i) contacting the target analyte with a pore complex according to any one of claims 14- 16 or a pore multimer according to claim 17, such that the target analyte moves with respect to the pore complex or the pore multimer; and
(ii) taking one or more measurements as the analyte moves with respect to the pore complex or the pore multimer and thereby determining the presence, absence or one or more characteristics of the analyte. . A method according to claim 23, wherein the analyte is a peptide, a polypeptide, a monosaccharide, an oligosaccharide, a polysaccharide, a small organic or inorganic compound, such as pharmacologically active compounds, toxic compounds, and pollutants. . A method according to claim 24, wherein the analyte is a polynucleotide. . A method according to claim 25, wherein the polynucleotide comprises at least one homopolymeric region. . A method according to claim 25 or 26, comprising determining one or more characteristics selected from (i) the length of the polynucleotide, (ii) the identity of the polynucleotide, (iii) the sequence of the polynucleotide, (iv) the secondary structure of the polynucleotide and (v) whether or not the polynucleotide is modified.
. A method of characterising a polynucleotide, a peptide or a polypeptide using a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17. . Use of a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17 to determine the presence, absence or one or more characteristics of a target analyte. . A polynucleotide which encodes a pore monomer conjugate according to any one of claims 1-11 or a construct according to claim 12 or 13. . A kit for characterising a target analyte comprising (a) a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17 and (b) the components of a membrane. . A kit for characterising a target polynucleotide or a target polypeptide comprising (a) a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17 and (b) a polynucleotide binding protein or a polypeptide handling enzyme. . An apparatus for characterising a target polynucleotide or a target polypeptide in a sample, comprising (a) a plurality of pore complexes according to any one of claims 14- 16 or a plurality of pore multimers according to claim 17 and (b) a plurality of polynucleotide binding proteins or a plurality of polypeptide handling enzymes. . An array comprising a plurality of membranes according to claim 19. . A system comprising (a) a membrane according to claim 19 or an array according to claim 34, (b) means for applying a potential across the membrane(s) and (c) means for detecting electrical or optical signals across the membrane(s). . An apparatus comprising a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17 inserted into an in vitro membrane. . An apparatus produced by a method comprising (i) obtaining a pore complex according to any one of claims 14-16 or a pore multimer according to claim 17 and (ii) contacting the pore complex or a pore multimer with an in vitro membrane such that the pore complex or the pore multimer is inserted in the in vitro membrane.
PCT/EP2023/072068 2022-08-09 2023-08-09 Novel pore monomers and pores WO2024033422A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB2211607.3A GB202211607D0 (en) 2022-08-09 2022-08-09 Novel pore monomers and pores
GB2211607.3 2022-08-09

Publications (1)

Publication Number Publication Date
WO2024033422A1 true WO2024033422A1 (en) 2024-02-15

Family

ID=84546331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/072068 WO2024033422A1 (en) 2022-08-09 2023-08-09 Novel pore monomers and pores

Country Status (2)

Country Link
GB (1) GB202211607D0 (en)
WO (1) WO2024033422A1 (en)

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000028312A1 (en) 1998-11-06 2000-05-18 The Regents Of The University Of California A miniature support for thin films containing single channels or nanopores and methods for using same
WO2006100484A2 (en) 2005-03-23 2006-09-28 Isis Innovation Limited Deliver of molecules to a li id bila
WO2008102121A1 (en) 2007-02-20 2008-08-28 Oxford Nanopore Technologies Limited Formation of lipid bilayers
WO2009020682A2 (en) 2007-05-08 2009-02-12 The Trustees Of Boston University Chemical functionalization of solid-state nanopores and nanopore arrays and applications thereof
WO2009035647A1 (en) 2007-09-12 2009-03-19 President And Fellows Of Harvard College High-resolution molecular graphene sensor comprising an aperture in the graphene layer
WO2009077734A2 (en) 2007-12-19 2009-06-25 Oxford Nanopore Technologies Limited Formation of layers of amphiphilic molecules
WO2010004265A1 (en) 2008-07-07 2010-01-14 Oxford Nanopore Technologies Limited Enzyme-pore constructs
WO2010086602A1 (en) 2009-01-30 2010-08-05 Oxford Nanopore Technologies Limited Hybridization linkers
WO2010122293A1 (en) 2009-04-20 2010-10-28 Oxford Nanopore Technologies Limited Lipid bilayer sensor array
WO2011067559A1 (en) 2009-12-01 2011-06-09 Oxford Nanopore Technologies Limited Biochemical analysis instrument
WO2012005857A1 (en) 2010-06-08 2012-01-12 President And Fellows Of Harvard College Nanopore device with graphene supported artificial lipid membrane
WO2013057495A2 (en) 2011-10-21 2013-04-25 Oxford Nanopore Technologies Limited Enzyme method
WO2013098562A2 (en) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Enzyme method
WO2013098561A1 (en) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Method for characterising a polynucelotide by using a xpd helicase
WO2014013260A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Modified helicases
WO2014013262A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Enzyme construct
WO2014013259A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Ssb method
WO2014064444A1 (en) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Droplet interfaces
WO2014064443A2 (en) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Formation of array of membranes and apparatus therefor
WO2014187924A1 (en) 2013-05-24 2014-11-27 Illumina Cambridge Limited Pyrophosphorolytic sequencing
WO2015055981A2 (en) 2013-10-18 2015-04-23 Oxford Nanopore Technologies Limited Modified enzymes
WO2016034591A2 (en) 2014-09-01 2016-03-10 Vib Vzw Mutant pores
WO2017149318A1 (en) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Mutant pores
WO2018211241A1 (en) 2017-05-04 2018-11-22 Oxford Nanopore Technologies Limited Transmembrane pore consisting of two csgg pores
WO2019002893A1 (en) 2017-06-30 2019-01-03 Vib Vzw Novel protein pores
CN113754743A (en) 2021-10-12 2021-12-07 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113773373A (en) 2021-10-12 2021-12-10 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113896776A (en) 2021-10-12 2022-01-07 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113912683A (en) 2021-10-12 2022-01-11 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
US20220056517A1 (en) * 2018-11-08 2022-02-24 Oxford Nanopore Technologies Limited Pore

Patent Citations (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000028312A1 (en) 1998-11-06 2000-05-18 The Regents Of The University Of California A miniature support for thin films containing single channels or nanopores and methods for using same
WO2006100484A2 (en) 2005-03-23 2006-09-28 Isis Innovation Limited Deliver of molecules to a li id bila
WO2008102121A1 (en) 2007-02-20 2008-08-28 Oxford Nanopore Technologies Limited Formation of lipid bilayers
WO2008102120A1 (en) 2007-02-20 2008-08-28 Oxford Nanopore Technologies Limited Lipid bilayer sensor system
WO2009020682A2 (en) 2007-05-08 2009-02-12 The Trustees Of Boston University Chemical functionalization of solid-state nanopores and nanopore arrays and applications thereof
WO2009035647A1 (en) 2007-09-12 2009-03-19 President And Fellows Of Harvard College High-resolution molecular graphene sensor comprising an aperture in the graphene layer
WO2009077734A2 (en) 2007-12-19 2009-06-25 Oxford Nanopore Technologies Limited Formation of layers of amphiphilic molecules
WO2010004265A1 (en) 2008-07-07 2010-01-14 Oxford Nanopore Technologies Limited Enzyme-pore constructs
WO2010086602A1 (en) 2009-01-30 2010-08-05 Oxford Nanopore Technologies Limited Hybridization linkers
WO2010122293A1 (en) 2009-04-20 2010-10-28 Oxford Nanopore Technologies Limited Lipid bilayer sensor array
WO2011067559A1 (en) 2009-12-01 2011-06-09 Oxford Nanopore Technologies Limited Biochemical analysis instrument
WO2012005857A1 (en) 2010-06-08 2012-01-12 President And Fellows Of Harvard College Nanopore device with graphene supported artificial lipid membrane
WO2013057495A2 (en) 2011-10-21 2013-04-25 Oxford Nanopore Technologies Limited Enzyme method
WO2013098562A2 (en) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Enzyme method
WO2013098561A1 (en) 2011-12-29 2013-07-04 Oxford Nanopore Technologies Limited Method for characterising a polynucelotide by using a xpd helicase
WO2014013262A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Enzyme construct
WO2014013260A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Modified helicases
WO2014013259A1 (en) 2012-07-19 2014-01-23 Oxford Nanopore Technologies Limited Ssb method
WO2014064444A1 (en) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Droplet interfaces
WO2014064443A2 (en) 2012-10-26 2014-05-01 Oxford Nanopore Technologies Limited Formation of array of membranes and apparatus therefor
WO2014187924A1 (en) 2013-05-24 2014-11-27 Illumina Cambridge Limited Pyrophosphorolytic sequencing
WO2015055981A2 (en) 2013-10-18 2015-04-23 Oxford Nanopore Technologies Limited Modified enzymes
WO2016034591A2 (en) 2014-09-01 2016-03-10 Vib Vzw Mutant pores
WO2017149317A1 (en) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Mutant pore
WO2017149318A1 (en) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Mutant pores
WO2017149316A1 (en) 2016-03-02 2017-09-08 Oxford Nanopore Technologies Limited Mutant pore
WO2018211241A1 (en) 2017-05-04 2018-11-22 Oxford Nanopore Technologies Limited Transmembrane pore consisting of two csgg pores
WO2019002893A1 (en) 2017-06-30 2019-01-03 Vib Vzw Novel protein pores
US20220056517A1 (en) * 2018-11-08 2022-02-24 Oxford Nanopore Technologies Limited Pore
CN113754743A (en) 2021-10-12 2021-12-07 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113773373A (en) 2021-10-12 2021-12-10 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113896776A (en) 2021-10-12 2022-01-07 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof
CN113912683A (en) 2021-10-12 2022-01-11 成都齐碳科技有限公司 Mutant of porin monomer, protein pore and application thereof

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
"Uniprot", Database accession no. POAE98
ALTSCHUL S. F, J MOL EVOL, vol. 36, 1993, pages 290 - 300
ALTSCHUL, S.F ET AL., J MOL BIOL, vol. 215, 1990, pages 403 - 10
CHEM BIOL., vol. 4, no. 7, July 1997 (1997-07-01), pages 497 - 505
D. STODDART ET AL., PROC. NATL. ACAD. SCI., vol. 106, 2010, pages 7702 - 7
DEVEREUX ET AL., NUCLEIC ACIDS RESEARCH, vol. 12, 1984, pages 387 - 395
GONZALEZ-PEREZ ET AL., LANGMUIR, vol. 25, 2009, pages 10447 - 10450
GOYAL ET AL., NATURE, vol. 516, no. 7530, 2014, pages 250 - 3
SAMBROOK, JRUSSELL, D: "Molecular Cloning: A Laboratory Manual", 2001, COLD SPRING HARBOR LABORATORY PRESS
VAN DER VERREN SANDER E ET AL: "A dual-constriction biological nanopore resolves homonucleotide sequences with high fidelity", NATURE BIOTECHNOLOGY, vol. 38, no. 12, 1 December 2020 (2020-12-01), pages 1415 - 1420, XP037311062, ISSN: 1087-0156, DOI: 10.1038/S41587-020-0570-8 *
ZHANG MANFENG ET AL: "Cryo-EM structure of the nonameric CsgG-CsgF complex and its implications for controlling curli biogenesis in Enterobacteriaceae", PLOS BIOLOGY, vol. 18, no. 6, 19 June 2020 (2020-06-19), pages e3000748, XP055821231, Retrieved from the Internet <URL:https://storage.***apis.com/plos-corpus-prod/10.1371/journal.pbio.3000748/1/pbio.3000748.pdf?X-Goog-Algorithm=GOOG4-RSA-SHA256&[email protected]/20210705/auto/storage/goog4_request&X-Goog-Date=20210705T155251Z&X-Goog-Expires=86400&X-Goog-SignedHeaders=h> DOI: 10.1371/journal.pbio.3000748 *

Also Published As

Publication number Publication date
GB202211607D0 (en) 2022-09-21

Similar Documents

Publication Publication Date Title
US11845780B2 (en) Mutant lysenin pores
AU2018294660B2 (en) Novel protein pores
US10167503B2 (en) Mutant pores
US10266885B2 (en) Mutant pores
US10472673B2 (en) Hetero-pores
JP6169976B2 (en) Mutant pore
US20220024994A9 (en) Transmembrane pore consisting of two csgg pores
EP3440098B1 (en) Mutant pore
WO2024033422A1 (en) Novel pore monomers and pores
WO2024033421A2 (en) Novel pore monomers and pores
WO2024033443A1 (en) Novel pore monomers and pores
WO2024089270A2 (en) Pore monomers and pores
WO2024100270A1 (en) Novel pore monomers and pores
WO2024033447A1 (en) De novo pores
WO2023198911A2 (en) Novel modified protein pores and enzymes
WO2023118404A1 (en) Pore

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23758525

Country of ref document: EP

Kind code of ref document: A1