WO2014099057A1 - Lipid bilayers for dna molecule organization and uses thereof - Google Patents

Lipid bilayers for dna molecule organization and uses thereof Download PDF

Info

Publication number
WO2014099057A1
WO2014099057A1 PCT/US2013/058641 US2013058641W WO2014099057A1 WO 2014099057 A1 WO2014099057 A1 WO 2014099057A1 US 2013058641 W US2013058641 W US 2013058641W WO 2014099057 A1 WO2014099057 A1 WO 2014099057A1
Authority
WO
WIPO (PCT)
Prior art keywords
dna
mutsa
diffusion
protein
bound
Prior art date
Application number
PCT/US2013/058641
Other languages
French (fr)
Inventor
Eric C. Greene
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Publication of WO2014099057A1 publication Critical patent/WO2014099057A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00596Solid-phase processes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00605Making arrays on substantially continuous surfaces the compounds being directly bound or immobilised to solid supports
    • B01J2219/00608DNA chips
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00605Making arrays on substantially continuous surfaces the compounds being directly bound or immobilised to solid supports
    • B01J2219/00623Immobilisation or binding
    • B01J2219/00626Covalent
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00605Making arrays on substantially continuous surfaces the compounds being directly bound or immobilised to solid supports
    • B01J2219/00623Immobilisation or binding
    • B01J2219/0063Other, e.g. van der Waals forces, hydrogen bonding
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00605Making arrays on substantially continuous surfaces the compounds being directly bound or immobilised to solid supports
    • B01J2219/00632Introduction of reactive groups to the surface
    • B01J2219/00637Introduction of reactive groups to the surface by coating it with another layer
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00657One-dimensional arrays
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00583Features relative to the processes being carried out
    • B01J2219/00603Making arrays on substantially continuous surfaces
    • B01J2219/00659Two-dimensional arrays
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/0068Means for controlling the apparatus of the process
    • B01J2219/00702Processes involving means for analysing and characterising the products
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00718Type of compounds synthesised
    • B01J2219/0072Organic compounds
    • B01J2219/00722Nucleotides
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B01PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
    • B01JCHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
    • B01J2219/00Chemical, physical or physico-chemical processes in general; Their relevant apparatus
    • B01J2219/00274Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
    • B01J2219/00718Type of compounds synthesised
    • B01J2219/0072Organic compounds
    • B01J2219/00734Lipids
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B20/00Methods specially adapted for identifying library members
    • C40B20/04Identifying library members by means of a tag, label, or other readable or detectable entity associated with the library members, e.g. decoding processes
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/14Solid phase synthesis, i.e. wherein one or more library building blocks are bound to a solid support during library creation; Particular methods of cleavage from the solid support
    • C40B50/18Solid phase synthesis, i.e. wherein one or more library building blocks are bound to a solid support during library creation; Particular methods of cleavage from the solid support using a particular method of attachment to the solid support

Definitions

  • the invention is based, in part, on the discovery that nucleic acid molecules can be disposed on a substrate and positionally aligned to allow analysis of individual nucleic acid molecules. Accordingly, in one aspect, the invention features an array that includes a substrate and single-stranded nucleic acid molecules attached to the substrate.
  • the single- stranded nucleic acid molecules can be attached to the substrate by means of a linkage, e.g., a linkage between cognate binding proteins, e.g., neutravidin and biotin, or an antibody and antigen (e.g., anti-digoxigenin antibody and digoxigenin); or a crosslinking linkage, e.g., disulfide linkage or coupling between primary amines using gluteraldehyde.
  • a linkage e.g., a linkage between cognate binding proteins, e.g., neutravidin and biotin, or an antibody and antigen (e.g., anti-digoxigenin antibody and digoxigenin); or a crosslinking linkage, e.g., disulfide linkage or coupling between primary amines using gluteraldehyde.
  • a linkage e.g., a linkage between cognate binding proteins, e.g., neutravidin and biotin,
  • FIG. 1 shows single-molecule DNA. curtain assay for promoter-specific binding by RNA polymerase
  • FIG. 1 A Double-tethered DNA curtain assay for organizing substrates on surfaces of a microfluidic device.
  • FIG. IB Two-color images of YOYO 1 --stained DNA (green) bound by QD-RNAP (magenta).
  • FIG. 1C Schematic of the ⁇ -phage genome (48.5- kb), including relative locations and orientations of promoters aligned with images of QD- RNAP on single DNA molecules (Table 2). As shown in (FIG.
  • FIG. 2 shows visualization of single molecules of RNA polymerase as they search for and engage promoters
  • FIG. 2A Kymograms of RNAP binding to ⁇ -DN A showing kinetically distinct intermediates. DNA is unlabeled, and RNAP is magenta.
  • FIG. 2A Kymograms of RNAP binding to ⁇ -DN A showing kinetically distinct intermediates. DNA is unlabeled, and RNAP is magenta.
  • FIG. 2B Representative example of RNAP binding and initiating transcription from XPR; for this assay RNAP was premixed with all four rNTPs immediately prior to injection into the
  • FIG. 3 shows single-molecule kinetics reveal the promoter search is dominated by 3D— diffusion
  • FIG. 3A Influence of protein orientation on target association.
  • the angle ⁇ 0 defines the effective DNA-binding surface of QD-RNAP
  • defines the orientation of the effective binding surface relative to the promoter.
  • FIG. 3B Illustration of linear target size (a), for example where a 2-bp: a 1-bp offset (in either direction) results in target recognition, but a 2-bp offset does not result in target recognition.
  • FIG. 3C Relationship between ⁇ , a and ⁇ , and their influence on promoter recognition.
  • FIG. 3D Observed promoter assocition rates (ka).
  • the boundary between the shaded and unshaded regions of the graph represents the facilitation threshold (Cthr; as indicated).
  • FIG. 3E Effective target size ( ⁇ ) versus RNAP concentration.
  • the dashed black line highlights the iimiting value of ⁇ .
  • the difference between the experimental values and k ( ⁇ ) a (t) reflects facilitated diffusion, and the orange shaded region represents the maximum possible acceleration due ID-sliding and/or hopping.
  • error bars represent S.E.M. (n > 50 for each data point).
  • FIG. 4 shows protein concentration exerts a dominant influence on target searches even for proteins capable of sliding on DNA
  • FIG. 4A DNA schematic showing the location of the 5x lac operator.
  • FIG. 4B Two-color image of YOYOl-stained DNA (green) bound by QD-lac repressor (magenta).
  • FIG. 4C Kymogram showing an example of lac repressor binding to nonspecific DNA and then diffusing in ID to the operator; data were collected at 33 pM lac repressor. The distance between the initial binding site and the operator is indicated as ⁇ .
  • FIG. 4D ymogram showing an example of direct operator binding in the absence of any detectable ID sliding; data were collected at 800 pM lac repressor.
  • FIG. 4E Graph showing the mean value of ⁇ as a function of pro tein concentration for proteins that successfully engage the operator. Inset, percentage of total operator binding events that are attributable to FD (magenta) and 3D (green) at each protein concentration. Error bars represent S.D. of the data (n > 54 for each data point).
  • FIG. 4F Graph of ⁇ for all observed proteins. Blue data points correspond to proteins that fail to bind the operator, magenta data points are proteins that bind the operator after undergoing FD, and green data points correspond 3D binding to the operator.
  • FIG. 5 is a schematic showing Facilitated diffusion (FD) will be favored at concentrations below the facilitation threshold because the initial encounter with the DI A will most often occur at nonspecific sites, so the probability (P) of target engagement through FD exceeds the probability of engagement through 3D (PFD > P3D). Concentrations equal to or exceeding the facilitation threshold will favor 3D because the relative increase in protein abundance increases the probability of a direct collision with the target site (PFD > P3D). FD-related processes such as sliding/hopping can still occur at high protein concentrations, but those proteins undergoing FD are less likely to reach the target site before those that collide directly with the target.
  • the facilitation threshold will vary for different proteins and different conditions, higher protein concentrations will still favor 3D collisions irrespective of the local environment (e.g. the presence of recruitment factors, DNA-bound obstacles, macromolecular crowding, local DNA folding) or global DNA architecture.
  • FIG. 6 is a schematic showing the Target Search Problem.
  • Diffusion-based models for how proteins might search for binding targets random collision through 3D- diffusion (i.e. jumping); I D-hopping, involving a series of microscopic dissociation and rebinding events; ID-sliding, wherein the protein moves without dissociating from the DNA; and intersegmental transfer, involving movement from one distal location to another via a looped intermediate.
  • 3D- diffusion i.e. jumping
  • I D-hopping involving a series of microscopic dissociation and rebinding events
  • ID-sliding wherein the protein moves without dissociating from the DNA
  • intersegmental transfer involving movement from one distal location to another via a looped intermediate.
  • FIG. 7 shows lifetime analysis of ⁇ » and ⁇ events.
  • FIG. 7A Histogram of lifetimes for QDs only in the absence of RNAP, and the red line is a single exponential fit to the histogram.
  • FIG. 7B Shows the same QDonly data, but the y-axis is on a logarithmic scale.
  • FIG. 7C ⁇ D Histogram of lifetimes for QD-RNAP and corresponding double exponential fit. The first time constant obtained from the double exponential fit (5.6 msec) is the same as is obtained from the single exponential fit to the QD-only data set.
  • FIG. 7E-F Histogram of lifetimes for QD-RN AP and corresponding exponential fit for data collected in the absence of DNA.
  • FIG. 7G This binding distribution uses the data points presented in Fig. 2c, but the data were restricted to only those e vents that had a lifetime of >40-msec. Based upon the two exponential components obtained from the lifetime measurements, this ensures that most of the events (>93%) plotted in this binding distribution histogram are ⁇ events (i.e. nonspecsfically bound RNAP).
  • FIG. 7G Semi-log plot of the lifetime distributions for the 3 ⁇ 4 events, corresponding to the inset shown in the lower panel FIG. 2C.
  • FIG. 8 shows promoter binding by QD-RNAP.
  • FIG. 8A Schematic of substrate with a ligated promoter.
  • IDT 100-bp synthetic DNA fragment
  • FIG. 8B Binding site distribution.
  • the ligated DNA was used to assess QD- RNAP binding distributions in single-molecule DNA curtain assays. As shown here, the presence of the new "' ⁇ romote fragment resulted in a new peak of QD-RNAP in the binding distribution at the expected location.
  • FIG. 9 shows transcription by QD-RNAP.
  • RNAP mo vement along the DN A in the presence of all four rNTPs were collect at room temperature.
  • RN AP and rNTPs were premixed prior to injection into the sample chamber.
  • the trajectories are color coded for each corresponding promoter, and the relative orientation of each promoter is indicated on the left.
  • FIG. 10 shows diffusion of lac repressor and T7 RNAP.
  • FIG. 1 ⁇ E. coli RNAP compared to T7 RNAP; buffer conditions: 40 mM Tris-HCl (pH 8.0), 0.2 mg ml-1, 5 mM DTT for T7 RNAP and 1 mM DTT for E. coli RNAP.
  • FIG. 10B E.
  • coli RNAP compared to lac repressor buffer conditions: 10 mM Tris HC1 (pH 8.0), 1 mM MgC12, 1 mM DTT, 1 mg ml-1 BSA.
  • the DNA used in the experiments with lac repressor contained a single, 21 -bp symmetric lac operator, as indicated by the arrow.4 (FIG. IOC) E. coli compared to T7 RNAP; buffer conditions: 2.0 mM Tris-HCl (pH 8.0), 25 mM KCi, 1 mM MgC12, 1 mM DTT, 0.2 mg ml-1 BSA.
  • FIG. 11 shows RNAP Bead- Aggregates Exhibit 1 D Movement.
  • FIG. 11 A Particle-tracking trajectory showing ID diffusive motion of an RNAP-saturated bead (1.0 ⁇ ) bound to a DNA molecule in the absence of buffer flow.
  • FIG. 11B Trajectory of an RNAP-saturated bead (1.0 um) when buffer flow (0.4 ml min- 1) was applied in the direction indicated by the arrowhead.
  • FIG. 11 C A typical trajectory of QD-tagged RNAP bound to DNA is shown for comparison.
  • FIG. 12 shows RNAP and dig-QD Diffusion Coefficient Data.
  • FIG. 12A Shows a comparison of the single molecule and ensemble diffusion coefficients obtained for QD- tagged RNAP and an immobilized dig-QD (this study), along with reported values for lac repressor 2, p53 5, and Mlhl -Pmsl 6.
  • FIG. 12B Magnified view of the RNAP and dig-QD data sets. Red circles represent diffusion coefficients obtained from all individual particle- tracking trajectories for RNAP, and blue circles represent diffusion coefficients from dig-QD trajectories collected and analyzed under identical conditions. Squares represent ensemble values for the diffusion coefficients obtained from the cumulative tracking data along with corresponding error bars.
  • FIG. 13 shows diffusion Coefficients and DNA Fluctuations.
  • FIG. 13A Cartoon illustration of DNA motion giving rise to the apparent diffusion coefficients for the stationary dig-QDs The underlying fluctuations of the DNA were analyzed by linking a single QD to a fixed digoxigenen tag covalently attached to the double-tethered DNA molecules.7
  • FIG. 13B Distributions of single- frame displacements for data collected at either 5 or 10 Hz (as indicated) for the entire dig-QD data set. The distributions have been normalized, and the overlay is a Gaussian fit generated using the mean and standard error of the distribution. The number of individual displacements is indicated.
  • FIG. 13C Reference graphs showing the mean squared displacement analysis of the stationary dig-QD particles.
  • FIG. 14 shows activity of RNAP Under Dilute Conditions.
  • FIG. 14A Gel shift assay showing RN AP and promoter association. RNAP was titrated into reactions containing 0.4 nM Cy-3 labeled promoter DNA fragment and then challenged with heparin to disrupt protein-DNA complexes that had not formed open complexes. The right two lanes are examples of negative controls containing different amounts of input DNA, which was used to calibrate band intensity.
  • FIG. 14B Quantitation showing the fraction of bound promoter DNA fragment as a function of the RNAP to DNA ratio, c, Stability of RNAP under dilute conditions. RNAP was diluted to 0.6 nM and incubated at room temperature in the absence of DNA.
  • the samples were assayed for DNA binding activity using the Cy3- labeled E promoter DNA fragment. Bound and unbound D A fractions were separated by native gel electrophoresis and quantitated based on the fluorescence intensity of the bands.
  • FIG. 15 shows parallel Array of Double-tethered Isolated (PARDI) Molecules.
  • FIG. ISA Schematic diagram of the new PARDI DNA curtain design used for the promoter association rate measurements.
  • FIG. 15B Optical image highlighting nanofabricated PARDI pattern design.
  • FIG. 15C Image of a typical PARDI field-of-view, showing the double-tethered, YOYO 1 -stained DNA molecules.
  • FIG. 15D Histogram showing the measured distances between neighboring DN A molecules anchored to the P ARDI patterned surface.
  • FIG. 16 shows promoter Association Rate Analysis.
  • FIG. 16A Illustration of a PARDI curtain, where each DNA is numbered. Overlapping DNA and DNA molecules closer than 7- ⁇ are excluded from analysis,
  • FIG. 16B A kmyogram is made for each DNA in the field- of-view, and the time required for def ection of the first promoter bound protein on each individual DNA is extracted from the kymograms (e.g.: Ll ⁇ ⁇ 2 : for DNA #1 & #2, respectively). Importantly, we are not measuring the rate of either closed complex (cc;
  • FIG. 17 is a schematic of a DNA molecule tethered to lipid-coated flow cell.
  • FIG. 18 is a schematic of the procedures for making ssDN A curtains.
  • ssDNA is generated by rolling circle replication.
  • FIG. 18B Agarose gel sho wing the products of rolling circle replication; note that the ssDNA generated in these assays is too long to verify its length by electrophoresis.
  • FIG. 18C For single-tethered curtains, biotinylated ssDNA is anchored to a single lipid within the bilayer, and the DNA is then aligned at barriers through the application of hydrodynamic force. RPA-GFP is then introduced into the flow cell to label the DNA. and remove secondary structure.
  • FIG. 18D For double-tethered curtains, the RPA-ssDNA is nonspecific ally adsorbed to exposed anchor points downstream from the linear diffusion barriers.
  • FIG. 19 shows single-tethered ssDNA curtains.
  • FIG. 19A Kymogram showing RPA-dependent extension of an ssDNA substrate; ScRPA-eGFP was injected at time zero, the eGFP signal is in green, and the location of the linear barrier is indicated as "b".
  • FIG. 19B ' Transient pause of fow confirms that the scRPA-eGFP-ssDNA is not stuck to the sample chamber surface.
  • FIG. 19C Full-field view of ssDNA molecules labeled with scRPA-eGFP. The six linear barriers are marked bl— b6. Image was collected while buffer was flowing through the sample chamber.
  • FIG. 19D scRPA-eGFP remains bound to the ssDNA for long periods of time.
  • FIG. 1 E The scRPA-eGFP-ssDNA complex is resistant to bu"ers containing denaturant (3.5 M urea); note that the background increases while urea is flushed through the sample chamber, likely due to protein being stripped of the mierofluidics upstream of the observation area.
  • FIG. 20 shows double-tethered ssDNA curtains.
  • FIG. 20A Full-field view of extended scRPA-eGFP labeled ssDNA anchored by both ends to the sample chamber surface, as illustrated in Figure 3 SB; the linear barrier and anchor are indicated as “b” and "a”, respectively. Image was collected in the absence of flow.
  • FIG. 20B Kymogram of a double-tethered ssDNA in the presence and absence of buffer flow, as indicated, confirming the molecule remains confined within the evanescent field even in the absence of an externally applied hydrodynamic force.
  • FIG. 20C Kymogram showing QD-tagged Sg l bound to an ssDNA molecule.
  • the ssDNA is in green (scRPA-eGFP; upper panel), the QD- tagged Sgsl is shown in magenta (middle panel), and an overlay of the ScRPA-eGFP and QD-Sgsl is also shown (bottom panel).
  • FIG. 20D Kymogram shows an example where the ssDNA spontaneously breaks during observation. Both the ssDNA and the Sgsl immediately diffuse out of view, confirming they are not nonspecific-ally adsorbed to the surface.
  • FIG. 21 Mismatch recognition by MutSct
  • FIG. 21A Schematic of single- tethered DNA curtains. DNA substrates are anchored to the bilayer and aligned along nanofabricated barriers
  • FIG. 21B Images of a three-tiered DNA curtain with flow on (Left), during a transient pause in flow (Center), and after flow has been resumed (Right). Flo w is from top to bottom; DN A is green, and proteins are magenta. The location of the three tandem G/T mismatches (MM) is indicated.
  • FIG. 21A Schematic of single- tethered DNA curtains. DNA substrates are anchored to the bilayer and aligned along nanofabricated barriers
  • FIG. 21B Images of a three-tiered DNA curtain with flow on (Left), during a transient pause in flow (Center), and after flow has been resumed (Right). Flo w is from top to bottom; DN A is green, and proteins are magenta. The location of the three tandem G/
  • FIG. 21C Kymogram generated from a single DNA molecule subjected to transient pauses in buffer flow (light blue arrowheads) followed by quickly resuming flow (green arrowheads).
  • FIG. 21D Distribution of MutSa bound to mismatch-containing DNA. Error bars in this and subsequent figures represent the SD from N bootstrap samples (B44).
  • FIG. 22 Mechanisms of mismatch targeting by MutSa
  • FIG. 22A Schematic of the double-tethered DNA curtains. DNA substrates are anchored by one end to the lipid bilayer, are aligned along the nanofabricated barriers, and then are anchored at their downstream ends through a digoxigemn-antibody linkage.
  • FIG. 22B Example of MutSa undergoing I D diffusion until encountering the lesion. MutSa is magenta, the DNA is not labeled, and gaps in the trajectories reflect QD blinking. The lower panels highlight the first few seconds of the trajectory.
  • FIG. 22C Example of MutSa capturing the mismatch through a direct 3D diffusion. Experiments in B and C were conducted with double-tethered curtains, and flow was terminated after MutSa entered the sample chamber.
  • FIG. 23 ATP binding provokes ID diffusion of mismatch- bound MutSa.
  • FIG. 23A Models showing how MutSa might search for strand-discrimination signals (SS).
  • FIG. 23B Kymogram and tracking showing the response of mismatch-bound MutSa upon injection of 1 mM ATP. Experiments were conducted with double -tethered curtains, MutSa was prebound to the mismatch, ATP was injected at 0.1 ml. min-1, and flow was terminated after ATP entered the sample chamber. The DNA was not labeled. "Flow on” indicates when ATP injection was initiated, and "ATP arrival” indicates when ATP entered the sample chamber.
  • FIG. 23C Response of mismatch-bound MutSa upon injection of 1 mM ATPyS.
  • FIG. 23D Example of spontaneous, A TP-independent release of MutSa, followed by ATP-dependent release
  • FIG. 24 Coiocaiization of MutLa with mismatch-bound MutSa.
  • FIG. 24A MutLa binding to mismatch-bound MutSa on single-tethered DNA curtains. MutSa was bound to the mismatch, followed by injection of MutLa into the sample chamber. MutLa and MutSa were labeled with different colored QDs. (Top) MutSa. (Middle) MutLa. (Bottom) Overlay with MutSa (magenta) and MutLa (green). The DNA was not labeled. Imperfect correspondence between all individual QD green/magenta pairs reflects the presence of "dark" proteins. (FIG.
  • FIG. 24B Kymogram generated from a single D A molecule showing that MutLa remains stationary and colocalized with mismatch-bound MutSa over time; the green (MutLa) and magenta (MutSa) signals appear white in the overlay. Blue and green arrowheads indicate transient pauses in buffer flow, and the coincident disappearance of the QD signals verifies that neither protein was stuck to the sample chamber surface.
  • FIG. 24C Distribution of QD-tagged MutLa in the presence of QD-iagged MutSa.
  • FIG. 24D Distribution of QD-tagged MutLa with unlabeled MutSa. Insets in C and D show kymograms illustrating that MutSa/MutLa remains at the mismatch.
  • FIG. 25A Target-search mechanisms of MutLa and the MutSa/MutLa complex.
  • FIG. 25A Kymograms and tracking data showing examples of QD-tagged MutLa (green) engaging mismatch-bound MutSa (MM-MutSa; unlabeled) after undergoing a ID or 3D target search.
  • FIG. 25B Kymogram and tracking showing that MutLa does not stop at MutSa in the absence of a mismatch.
  • FIG. 25C Kymogram showing that MutSa and MutLa do not establish stable interactions with one another on homoduplex DNA.
  • FIG. 25D Kymogram and tracking showing ATP-triggered release of lesion- bound MutSa/MutLa and subsequent ID diffusion along the flanking DNA, In the kymogram, the MutSa (magenta) and MutLa (green) signals appear white in the overlay. In the graph, MutSa (magenta) and MutLa (green) were tracked independently, and the tracking data were superimposed.
  • FIG. 26 Intersite transfer of MMR proteins between juxtaposed DNA molecules
  • FIG. 26A Schematic of a crisscrossed DNA curtain.
  • FIG. 26B and FIG. 26C Optical images showing pattern elements and TTRFM images showing examples of crisscrossed DNA molecules, insets illustrate positions of each DNA molecule.
  • FIG. 26E-G Behavior of MutLa (magenta) upon encountering a crisscrossed DNA junction.
  • FIG. 26E Integrated trajectory.
  • FIG. 26F Tracking data
  • FIG. 26G Tracking data superimposed on the DNA axes.
  • DN A molecules are shown as blue lines; the green circle identifies the DNA junction within a 90% confidence interval.
  • Tracking data are color-coded according location relative to the crisscross. The color-coded bar shows the relative location of protein over time.
  • FIG. 26H MutSa before encountering a lesion.
  • FIG. 261 MutSa after ATP-triggered release from a mismatch.
  • FIG. 26J MutSa/MutLa after ATP-triggered mismatch release; MutLa was QD-tagged, and MutS was untagged.
  • I and J the zero time points correspond to the location of the lesions, and the longer time trajectories for these datasets reflect the longer DNA-binding lifetimes of MutSa and MutSa/MutLa after ATP-triggered release from mismatches
  • FIG. 27 Schematic for early stages of MMR.
  • FIG. 27A Model summarizing how MutSa finds lesions, how MutLa locates lesion-bound MutSa, and how the
  • MutSa/MutLa scans the flanking DNA by ID diffusion after ATP-triggered release from a lesion.
  • FIG. 27B MutSa structural changes predicted upon ATP-triggered release from a lesion.
  • the structures represent front and side views of mismatch-bound human MutSa (Protein Data Bank ID code 208B) in which the protein complex (gray) is wrapped around the DNA (green) with domain I of Msh6 (magenta) engaged with the mismatched base (B33), (Right) Hypothetical structures were obtained by rigid-body rotation of Msh6 domain I out of the major groove to illustrate how retraction of Msh6 domain I out of the DNA major groove might allow the release of MutSa from the mismatch and still allo the protein to remain tightly wrapped around the DNA while enabling ID diffusion in the absence of an obligatory rotational component.
  • FIG. 28 Construction and characterization of mismatch substrate,
  • FIG. 28a Overview of ⁇ 13- DNA construction. Highlights sites used for inserting the new DNA fragments, as well as the number and arrangement of the Nt. BspQI nickase sites, and restriction sites for Ncol and SwaL
  • FIG. 28b Schematic of oligonucleotide insertion strategy. The A.I3-DNA is green, and the nicking sites are indicated with arrowheads. After treatment with Nt. BspQI , the ⁇ -DNA is mixed with an excess of the appropriate
  • FIG. 28c Restriction analysis of the ⁇ 3- DNA substrates.
  • the XT3-DNA substrate has three unique restriction fragments not present in the wt phage: digestion with Swal yields a 13,828 bp fragment (purple asterisk) and digestion with Ncol liberates two fragments 9,959 bp (red asterisk) and 5,671 bp (blue asterisk) in length. Insertion of the mismatch eliminates the two fragments produced by digestion with Ncol (because the mismatch disrupts the Ncoi site), but does not affect the SwaT fragment.
  • FIG. 29 Half-life of MutSa bound at mismatches and MutLa bound to the MutSa- mismatch complex.
  • FIG. 29a QD-tagged MutSa was bound to the 3x mismatches on a singletethered DNA curtain as shown in Fig. 21 of the main text. The lesion-bound proteins were then chased with buffer that lacked additional free protein and contained 150 mM NaCl and 1 mM ADP (along with 20 mM Tris [pH 7.8], I mM MgC12, 1 mM DTT, and 4 mg ml-1 BSA).
  • the number of QD-tagged MutLa proteins that remained bound to the DNA was measured at defined time intervals, and the resulting data were fit with single exponential curves to determine the half-lives of the lesion-bound proteins, yielding a value of 7.8 ⁇ G.4 minutes.
  • FIG. 30 Mismatch targeting by MutSa.
  • FIG. 30a Shows 10 representative examples of tracking data (magenta) for molecules of MutSa that engaged the mismatches through a ID search. The initial binding positions of the proteins are indicated with blue arrowheads, the location of the mismatches is indicated as MM and a green line, and l esion engagement is indicated with black arrowheads.
  • FIG. 30b Shows a map of the initial binding sites for all observed molecules of MutSa that bound to the lesions.
  • Gray arrow heads correspond to proteins that bound directly to the mismatches through an apparent 3D mechanism (within our optical resolution limits) and blue arrowheads correspond to proteins that bound to nonspecific DNA sites and slid in ID along the DNA to engage the lesions.
  • FIG. 30c Shown are five representative examples of MSD plots generated from the tracking data of MutSa as it searches for lesions.
  • FIG. 31 ID diffusion of mismatch-bound MutSa following ATP or ATPyS chase.
  • FIG. 31 a Shows 5 representative examples of tracking data i llustrating the behavior of mismatch bound MutSa after the injection of 1 mM ATP
  • FIG. 31b shows 5 representative examples of tracking data illustrating the behavior of mismatch bound MutSa after the injection of 1 mM ATPyS. Gaps in the tracking data correspond to portions of the trajectories that could not be accurately tracked due to QD blinking or changes in background intensity, and the end-points in the tracking data correspond to dissociation of the proteins from the DNA. In both (FIG. 31a) and (FIG.
  • biack arrowheads indicate when ATP or ATPyS enters the microfluidic sample chamber
  • blue arrowheads indicate when the protein dissociates from the DN A
  • traces lacking blue arrowheads the proteins remained bound to the DNA and continued diffusing beyond the data collection window.
  • FIG. 31c Shown are five representative examples of MSD plots generated from the tracking data of MutSa after being released from lesions upon chasing with ATP.
  • FIG. 33 Redundant nature of ID-diffusion,
  • FIG. 33a-b Results of simulations of a ID random walk, which were used to reveal the average number of steps (N) necessary to move a given distance along a I D lattice (FIG. 33a), and difference between the number of steps (N) taken and the average number of origin (or mismatch) crossings (FIG. 33b).
  • FIG. 33c Theoretical number of origin crossings versus the number of steps taken.
  • FIG. 34 Spontaneous mismatch escape and return by MutSa. Data were collected by monitoring lesion-bound MutSa in buffer containing 1 mM ADP and 150 mM NaCl (along with 20 mM Tris [pH 7.8], 1 mM MgC12, I mM DTT, and 4 mg ml-1 BSA). The proteins were continuously observed for at ten minutes at 10 Hz.
  • the tracking data (FIG. 34a) highlight examples of different MutSa molecules that spontaneously escaped from the mismatched bases. The MutSa trajectories are shown in magenta, and the location of the mismatches (MM) is indicated by the green line. (FIG.
  • FIG. 35 Mismatch- MutSa targeting by MutLa.
  • FIG. 35a Shows 10 representative examples of tracking data for molecules of MutLa that engaged mismatch-bound MutSa through a 1 D search. The initial binding positions of the proteins are indicated with blue arrowheads, the locations of the MutSa-mismatches are indicated, and lesion engagement is indicated with black arrowheads.
  • FIG. 35b Shows a map of the initial binding sites for all observed molecules of MutLa that bound to mismatch-bound MutSa.
  • Gray arrow heads correspond to MutLa proteins that bound directly by apparent 3D collisions to the mismatch- bound MutSa (within optical resolution limits) and blue arrowheads correspond to MutLa proteins that bound to nonspecific DNA sites and slid in ID along the DNA to engage the mismatch-bound MutSa.
  • FIG. 35c Five representative examples of MSD plots generated from the tracking data of MutLa as it searches for lesion-bound MutSa,
  • FIG. 36 ATP-triggered release of the mismatch-bound MutSa/MutLa complex.
  • FIG. 36a Ten representative examples of tracking data illustrating the behavior of mismatch bound MutSa after injection of ATP, and including complexes that exhibited blinking of QD-MutLa (top eight traces) as well as nonblinking QD-MufLa (bottom two traces; QD-MutS exhibit blinking in all observed cases).
  • the black arrowheads indicate when ATP entered the fioweells.
  • Gaps in the tracking data correspond to portions of the trajectories that could not be accurately tracked due to QD blinking or changes in background intensity.
  • FIG. 36b Five representative MSD plots for the MutS/MutL complex after ATP-triggered lesion release.
  • FIG. 36c The distribution of lifetimes measured for MutSa/MutLa after being released from the mismatches upon ATP injection (N-18). These values yield a lower bound on the lifetime of the diffusing
  • FIG. 37 Barrier patterns for crisscrossed DNA curtains.
  • FIG. 37a Schematic of the two-channel fiowcell.
  • FIG. 37b Low magnification (l Ox) optical image of the chromium (Cr) patterned surface.
  • FIG. 37c and d High-resolution SEM (scanning electron microscope) images of the Cr patterns.
  • FIG. 37e High magnification (lOOx) optical image of a single pattern. Important elements of the pattern design are highlighted.
  • FIG. 37f AFM (atomic force microscope) image illustrating barrier height.
  • the present invention is based in part on the discovery that single-stranded nucleic acid molecules can be disposed on a substrate and positionally aligned to allow analysis of individual single-stranded nucleic acid molecules.
  • the methods and compositions described herein include a substrate, coating material, e.g., a lipid bilayer, and single-stranded nucleic acid molecules attached directly to the substrate, attached to the substrate via a linkage, or attached to the lipid layer via a linkage.
  • the single-stranded nucleic acids are capable of interacting with their specific targets while attached to the substrate, and by appropriate labeling of the nucleic acid molecules and the targets, the sites of the interactions between the targets and the nucleic acid molecules may be derived.
  • the sites of the interactions will define the specificity of each interaction.
  • a map of the patterns of interactions with single-stranded nucleic acid molecules on the substrate is convertible into information on specific interactions between single-stranded nucleic acid molecules and targets.
  • any conceivable substrate may be employed in the compositions and methods described herein.
  • the substrate may be biological, nonbiological, organic, inorganic, or a combination of any of these, existing, e.g., as particles, strands, precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, or slides.
  • the substrate may have any convenient shape, such as, e.g., a disc, square, sphere or circle.
  • the substrate and its surface can form a rigid support on which to carry out the reactions described herein.
  • the substrate can be, e.g., a polymerized Langmuir Blodgett film, functionalized glass, Si, Ge, GaAs, Gap, Si0 2 , Si 4 , modified silicon, or any one of a wide variety of gels or polymers such as (poly)tetrafluoroethylene, (poly)vinylidenedifiuoride, polystyrene, polycarbonate, or combinations thereof.
  • Other substrate materials will be readily apparent to those of skill in is the art upon re v ie w of this disclosure.
  • the substrate is a made of Si(3 ⁇ 4 and is flat.
  • the substrate is coated with a linker to which the nucleic acid molecules attach.
  • linkers can be, e.g., chemical or protein linkers.
  • the substrate can be coated with a protein such as neutravidin or an antibody.
  • the substrate includes a diffusion barrier, e.g., a mechanical, chemical or protein barrier.
  • Diffusion barriers can be prepared by applying barrier materials onto the substrate prior to deposition of the lipid bilayer; the bilayer then forms around the barriers,
  • a mechanical barrier can be, e.g., a scratch or etch on the substrate, which physically prevents lipid diffusion.
  • barrier materials can be made that are similar to the thickness of the bilayer itself (e.g., 6-8 nm), or thinner than the biiayer.
  • Protein barriers can be deposited onto substrates, e.g., SiO? substrates, by a variety of methods. For example, protein barriers can be deposited in well-defined patterns by a process called microcontact printing. Microcontact printing uses a PDMS
  • PDMS stamps can transfer proteins to a Si0 2 substrate in patterns with features as small as 1 ⁇ , and thicknesses on the order of 5-10 run.
  • the PDMS stamps used for microcontact printing can be made, e.g., by soft-lithography as described previously .
  • the PDMS can be incubated wiih a solution of protein, dried, and then placed into contact with the substrate, e.g., Si0 2 , resulting in transfer of the protein "ink” from the PDMS stamp to the substrate and yielding a pattern defined by the stamp design.
  • protein barriers can be made from fibronectin.
  • the material is one that renders the substrate inert.
  • the material can be lipids, forming, e.g., a lipid biiayer.
  • the layer is made of zwitterionic lipids. A lipid biiayer can be deposited onto the substrate by applying liposomes to the substrate.
  • Liposomes can be produced by known methods from, e.g., 1 ,2-dioleoyl-sn-glycero-3- phosphocholine (DOPC) or 0.5% biotin-phosphatidylethanolaniine (biotein-PE) plus 99.5% DOPC (A anti Polar lipids, Alabaster, AL).
  • DOPC 1 ,2-dioleoyl-sn-glycero-3- phosphocholine
  • biotein-PE biotin-phosphatidylethanolaniine
  • DOPC A anti Polar lipids, Alabaster, AL
  • the lipid biiayer can include polyethylene glycol (PEG).
  • PEG polyethylene glycol
  • PEG can be included in the lipid biiayer.
  • PEG can also be included to make the surface of the biiayer inert to reagents added to the array.
  • the nucleic acid molecules can be attached to the substrate, to the lipid biiayer, or to the non-linear, geometric diffusion barrier, to form an array.
  • the nucleic acid molecules can be attached by a linkage either at one end of the nucleic acid molecule or at both ends.
  • the nucleic acid molecule can be linked to a cognate protein that binds to the protein coated on the substrate,
  • the substrate is coated with neutravidin and the nucleic acid molecule linker is biotin.
  • Linkers can be added to the nucleic acid molecules using standard molecular biology techniques known to those of ordinary skill in the art.
  • the nucleic acid molecule can be linked to the lipid bilayer.
  • the lipid bilayer is deposited onto the substrate and a protein, e.g., neutravidin, is linked to the lipid head groups. Biotinylated nucleic acid molecules are then introduced, linking the nucleic acid molecules to the lipid bilayer.
  • the nucleic acid molecules can be linked to the nonlinear, geometric diffusion barriers.
  • the diffusion barrier is a protein, e.g., biotinylated bovine serum albumin (BSA), deposited on the substrate. Neutravidin is then bound directly to the biotinylated BSA protem barriers, and biotinylated nucleic acid molecules are linked to the biotinylated BSA protein barriers.
  • BSA biotinylated bovine serum albumin
  • Other known protein-cognate protein pairs can be used in the methods described herein.
  • antibodies e.g., anti- digoxigenin antibodies, can be used as protein barriers and the cognate antigen, e.g., digoxigenin, linked to the nucleic acid molecule.
  • one end of the nucleic acid molecule is attached by a linkage, for example to the substrate or to a non-linear, geometric diffusion barrier.
  • both ends of the nucleic acid molecule are attached by linkages, for example, to the substrate, to a non-linear, geometric diffusion barrier, or to a combination of the two surfaces.
  • Double-tethered DNA substrates can be used for visualizing ID diffusion
  • DMA molecules can be biotinyiaied at both ends. While a constant, moderate hydrodynamic flow force is applied, DNA is suspended above an inert lipid bilayer. The only interaction between the DNA and the surface is through the biotinylated ends of the molecule. For example, 80% extension of the DNA molecule corresponds to -0.5 pN of feree (e.g., where the DNA is not distorted).
  • attaching both ends of a nucleic acid molecule to the barriers, inert substrate, or a combination thereof, can generate a "rack,".
  • the "rack" can be generated by reversibly anchoring the entire contour length of the nucleic acid molecule (e.g., a DNA molecule) to the lipid bilayer of an array described herein by exposing the nucleic acid molecules to an effective calcium concentration.
  • the calcium concentration is at least about 0.5 mM, at least about I mM, at least about 1 .5 mM, at least about 2 mM, at feast about 2.5 mM, at least about 3 mM, at least about 3,5 mM, at least about 4 mM, at least about 4.5 mM, at least about 5 mM, at least about 5.5 mM, at least about 6 mM, at least about 6.5 mM, at least about 7 mM, at least about 7.5 mM, at least about 8 mM, at least about 8,5 mM, at feast about 9 mM, at least about 9.5 mM, at least about 10 mM, or at least about 10.5 mM.
  • single-stranded nucleic acid can be generated by in vitro rolling circle replication.
  • the DNA polymerase ⁇ 29 can be used to generate single-stranded nucleic acids by rolling circle replication using a circular single- stranded nucleic acid template.
  • rolling circle replication can occur on the solid support.
  • the attached nucleic acid molecules and/or the interacting nucleic acid molecules or polypeptides are visualized by detecting one or more labels attached to the nucleic acid molecules or polypeptides.
  • the labels may be incorporated by any of a number of means well known to those of skill in the art.
  • the nucleic acid molecules on the array can be coupled to a nonspecific label, e.g., a dye, e.g., a fluorescent dye, e.g., YOYO l (Molecular Probe, Eugene, OR), TOTOl, TO-PRO, acridine orange, DAPI and ethidium bromide, that labels the entire length of the nucleic acid molecule.
  • the nucleic acid molecules can also be labeled with Quantum dots, as described herein,
  • the nucleic acid molecules e.g., the nucleic acid molecules on the array or target nucleic acid molecules
  • the label can be incorporated during an amplification step in the preparation of the sample nucleic acids.
  • PGR polymerase chain reaction
  • the nucleic acid molecule is amplified in the presence of labeled deoxynucleotide triphosphates (dNTPs).
  • a label may be added directly to the nucleic acid molecule or to an amplification product after an amplification is completed.
  • Means of attaching labels to nucleic acids include, for example, nick translation or end-labeling (e.g. with a labeled RNA) by kinasing of the nucleic acid and subsequent attachment (ligation) of a nucleic acid linker joining the sample nucleic acid to a label (e.g., a fluorophore).
  • Detectable labels suitable for use in the methods and compositions described herein include any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means.
  • Useful labels in include biotin for staining with labeled streptavidin conjugate, magnetic beads (e.g., DynabeadsTM), fluorescent dyes (e.g., fluorescein, Texas red, rhodamine, green fluorescent protein, and the like, see, e.g., Molecular Probes, Eugene, OR), radiofabels (e.g., 3 H, 125 I, JD S, l4 C, or 32 P), enzymes (e.g., horse radish peroxidase, alkaline phosphatase and others commonly used in an ELISA), and colorimetric labels such as colloidal gold (e.g., gold particles in the 40-80 nm diameter size range scatter green light with high efficiency) or colored glass or plastic (e.g., polystyrene, poly
  • fluorescent labels are used.
  • the nucleic acid molecules can all be labeled with a single label, e.g., a single fluorescent label.
  • different nucleic acid molecules have different labels.
  • one nucleic acid molecule can have a green fluorescent label and a second nucleic acid molecule can have a red fluorescent label.
  • Suitable ehromogens which can be employed include those molecules and compounds that absorb light in a distinctive range of wavelengths so that a color can be observed or, alternatively, which emit light when irradiated with radiation of a particular wave length or wave length range, e.g., fluorescers.
  • Suitable dyes are available, being primary chosen to provide an intense color with minimal absorption by their surroundings.
  • Illustrative dye types include quinoline dyes, triaryim ethane dyes, acridine dyes, alizarine dyes, phthaleins, insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and phenazoxonium dyes.
  • fluorescers can be employed either by alone or, alternatively, in conjunction with quencher molecules. Fluorescers of interest fall into a variety of categories having certain primary functionalities. These primary functionalities include 1- and 2- amino naphthalene, ⁇ , ⁇ '-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9- aminoacridines, ⁇ , ⁇ '-diaminobenzophenone imines, anthracenes, oxacarbocyanine, marocyanine, 3-aminoequilenin, perylene, bisbenzoxazole, bis-p-oxazolyl benzene, 1 ,2- benzophenazin, retinoi, bis-3 -aminopyridmium salts, hellebrige in, tetracycline, steroplienol, benzimidzaolylphenylamine, 2-oxo-3 -ehromen, indo
  • Individual fluorescent compounds that have functionalities for linking or that can be modified to incorporate such functionalities include, e.g., dansyl chloride; fluoresceins such as 3,6- dihydroxy-9-phenylxanthhydrol; rbodamineisothiocyanate; N-phenyl 1 -amino-8- sulfonatonaphthalene; N-phenyl 2-amino-6-sulfo " natonaphthalene: 4-acetamido-4- isothiocyanato-stilbene-2,2'-disulfonic acid; pyrene-3 -sulfonic acid; 2-toluidinonaphthalene- 6-sulfonate; N-phenyl, N-methyl 2-am noaphthalene-6-sulfonate; ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; ⁇ , ⁇ '-d
  • the label may be a "direct label", i.e., a detectable label that is directly attached to or incorporated into the nucleic acid molecule.
  • the label may be an "indirect label", i.e., a label joined to the nucleic acid molecule after attachment to the substrate.
  • the indirect label can be attached to a binding moiety that has been attached to the nucleic acid molecule prior to attachment to the substrate.
  • Polypeptides can be visualized by coupling them to, e.g., fluorescent labels described herein, using known methods.
  • fluorescent labels e.g., fluorescent dyes, fluorescent dyes, and fluorescent dyes.
  • other labels such as Quantum dots (Invitrogen) can be used, as described herein.
  • a fluorescent label is an embodiment of the invention.
  • Standard procedures are used to determine the positions of the nucleic acid molecules and/or a target, e.g., a second nucleic acid molecule or a polypeptide.
  • the position of a nucleic acid molecule on an array described herein can be detected by the signal emitted by the label.
  • the locations of both the nucleic acid molecules on the array and the target will exhibit significant signal.
  • a label In addition to using a label, other methods may be used to scan the matrix to determine where an interaction, e.g., between a nucleic acid molecule on an array described herein and a target, takes place.
  • the spectrum of interactions can, of course, be determined in a temporal manner by repeated scans of interactions that occur at each of a multiplicity of conditions.
  • a multiplicity of interactions can be simultaneously determined on an array, e.g., an array described herein.
  • the array is excited with a light source at the excitation wavelength of the particular fluorescent label and the resulting fluorescence at the emission wavelength is detected.
  • the excitation light source is a laser appropriate for the excitation of the fluorescent label.
  • Detection of the fluorescence signal can utilize a microscope, e.g., a fluorescent microscope.
  • the microscope may be equipped with a phototransducer (e.g., a
  • phoiomultiplier a solid state array, or a ccd camera
  • an automated data acquisition system to automatically record the fluorescence signal produced by the nucleic acid molecules and/or targets on the array.
  • automated systems are known in the art.
  • Use of laser illumination in conjunction with automated confocal microscopy for signal detection permits detection at a resolution of better than about 100 ⁇ , better than about 50 ⁇ , and better than about 25 ⁇ .
  • the detection method can also incorporate some signal processing to determine whether the signal at a particular position on the array is a true positive or may be a spurious signal. For example, a signal from a region that has actual positive signal may tend to spread over and provide a positive signal in an adjacent region that actually should not have one. This may occur, e.g., where the scanning system is not properly discriminating with sufficiently high resolution in its pixel density to separate the two regions. Thus, the signal over the spatial region may be evaluated pixel by pixel to determine the locations and the actual extent of positive signal. A true positive signal should, in theory , show a uniform signal at each pixel location. Thus, processing by plotting number of pixels with actual signal intensity should have a clearly uniform signal intensity. Regions where the signal intensities show a fairly wide dispersion, may be particularly suspect and the scanning system may be programmed to more carefully scan those positions.
  • TIRFM Total internal reflection fluorescence microscopy
  • a laser beam is directed through a microscope slide and reflected off the interface between the slide and a buffer containing the fluorescent sample. If the angle of incidence is greater than the critical angle [0 c ⁇ sm ' ⁇ 2 / ⁇ ); where m and n 2 are the refractive indexes of the slide and aqueous samples, respectively], then all of the incident light is reflected away from the interface. However, an illuminated area is present on the sample side of the slide. This is called the evanescent wave, and its intensity decays exponentially away from the surface.
  • the evanescent wave penetrates approximately 100 am into the aqueous medium.
  • This geometry reduces the background signal by several orders of magnitude compared to conventional fluorescence microscopy and readily allows the detection of single fluorescent molecules, because contaminants and bulk molecules in solution are not illuminated and do not contribute to the detected signal.
  • total internal reflection fluorescence microscopy to visualize the arrays described herein, it is possible to simultaneously monitor hundreds of aligned DNA molecules within a single field-of-view.
  • microfluidic fiowcells composed of substrates that are rendered inert by deposition of a lipid bilayer as described herein.
  • the attached nucleic acid molecules are aligned in a desired orientation that is optimal for detection by, e.g., TIRFM.
  • a microfluidic flowcell that can be used in the methods described herein.
  • a substrate described herein is overlaid with a coverslip, e.g., a glass coverslip, to form a sample chamber, and the substrate contains an inlet port and an outlet port, through which a hydrodynamic force is applied.
  • the hydrodynamic force can be mediated by, e.g., a buffer solution that flows over the lipid bilayer described herein.
  • An exemplary microfluidic flowcell can be constructed from 76.2 x 25.4 x I mm (L x W x H) fused silica slides (ESCO Products, Oak Ridge, NJ).
  • Inlet and outlet holes can be drilled through ihe slides using, e.g., a diamond-coated bit (1.4 mm O.D.; Eurotool, Grandview, MO),
  • a sample chamber can be prepared from a borosilicate glass coverslip (Fisher Scientific, USA) and, e.g., double-sided tape ( ⁇ 25 ⁇ thick, 3M, USA) or a polyethylene gasket.
  • Inlet and outlet ports can be attached using preformed adhesive rings (Upchurch Scientific, Oak Harbor, WA), and cured at 120°C under vacuum for 2 hours.
  • the dimensions of the exemplary sample chamber are 3,5 x 0.45 x 0.0025 cm (L x W x H).
  • the total volume of the exemplary flowcell is ⁇ 4 ⁇ .
  • a syringe pump (Kd Scientific, Holliston, MA) is used to control buffer deliver ⁇ / to the sample chamber. This exemplary apparatus is not meant to be limiting, and one of skill in the art would appreciate modifications that could be made.
  • An exemplary total internal reflection fluorescence microscope is a modified Nikon TE2Q00U inverted microscope.
  • a 488 nm laser (Coherent Inc., Santa Clara, CA) and a 532 nm laser (CrystaLaser, Reno, NV) were focused through a pinhole (10 ⁇ ) using an achromatic objective lens (25x; Melfes Griot, Marlow Heights, MD), then collimated with another achromatic lens (f - 200 mm).
  • the beam was directed to a focusing lens ⁇ f 500 mm) and passed through a custom-made fused silica prism (J.R. Cumberland, Tnc) placed on top of the flowcell.
  • Fluorescence images were collected through an objective lens (100 x Plan Apo, NA 1.4, Nikon), passed through a notch filter (Semrock, Rochester, NY), and captured with a back-thinned EMC CD (Cascade 512B, Photometries, Arlington, AZ). Image acquisition and data analysis were performed with Metamorph software (Universal Imaging Corp., Downington, PA). All DNA length measurements were performed by calculating the difference in y-coordinates from the beginning to the end of the fluorescent molecules.
  • D MSD/4t; where MSD (the mean square displacement) is the square of the average step size measured over time interval i (0, 124 sec).
  • the arrays described herein can be used to detect individual nucleic acid molecules, e.g., nucleic acid molecules coupled to a label.
  • an array can be constructed as part of a microfluidic flowcell described herein.
  • the nucleic acid molecules, e.g., labeled nucleic acid molecules can be attached to a substrate, to a lipid bilayer, or to a diffusion barrier, as described herein.
  • hydrodynamic force e.g., introduction of a buffer as described herein
  • the nucleic acid molecules are aligned in direction of the hydrodynamic force, with the nonattached ends of the nucleic acid molecules extending in the direction of the flow of the hydrodynamic force.
  • Individual nucleic acid molecul es on the array can be visualized before and/or after the application of the hydrodynamic force using, e.g., TTRFM as described herein.
  • the interactions of nucleic acid molecules on the arrays with target polypeptides are determined.
  • the nucleic acid molecules can be visualized before and/or after the application of a hydrodynamic force, as described herein.
  • the polypeptides can be coupled to a label and introduced into the array, e.g., a microfluidic cell including the array, as a component of the buffer that mediates the hydrodynamic force.
  • Individual nucleic acid molecules and individual target polypeptides can be visualized, e.g., by TIRFM as described herein, and interactions can be determined by colocalization of the signals from the nucleic acid molecules and the polypeptides. Such interactions can be further analyzed by collecting signals over a period of time.
  • Such methods can be used to visualize, e.g., the movement of polypeptides along the length of individual nucleic acid molecules, as described herein.
  • compositions described herein can be used to screen for compounds, e.g., drug compounds, that affect, e.g., disrupt, the interactions between nucleic acid molecules and polypeptides.
  • an array can be constructed as part of a microfluidic flowcell described herein.
  • the nucleic acid molecules e.g., labeled nucleic acid molecules
  • the polypeptides can be coupled to a label and introduced into the array, e.g., a microfluidic cell including the array, as a component of the buffer that mediates the hydrodynamic force.
  • the polypeptides are known to interact with the nucleic acid molecules, and the interactions are visualized as described herein.
  • the polypeptides can be proteins involved in DNA replication, recombination and/or repair.
  • Candidate compounds can then be added to the array, e.g., as a component of the buffer that mediates the hydrodynamic force, and the effect of the compound on the interactions between individual nucleic acid molecules and the polypeptides can be visualized. Compounds that disrupt the interactions can be visually identified. Such methods can be automated.
  • the methods described herein can be used to screen for therapeutic compounds to treat cancer, e.g., cancer of the breast, prostate, lung, bronchus, colon, rectum, urinary bladder, kidney, pancreas, oral cavity, pharynx, ovar '-, skin, thyroid, stomach, brain, esophagus, liver, cervix, larynx, soft tissue, testis, small intestine, anus, anal canal, anoreetum, vulva, bailbiadder, bones, joints, hypopharynx, eye, nose, nasal cavity, ureter, gastrointestinal tract; non-Hodgkin lymphoma, Multiple Myeloma, Acute Myeloid Leukemia, Chronic Lymphocytic Leukemia, Hodgkin Lymphoma, Chronic Myeloid Leukemia and Acute Lymphocytic Leukemia.
  • cancer e.g., cancer of the breast, prostate, lung, bronchus, colon, rectum, urinary bladder
  • the methods and compositions described herein can be used to sequence nucleic acid molecules.
  • the arrays described herein can be constructed with identical nucleic acid molecules, e.g., single stranded DNA. molecules, or with different nucleic acid molecules, e.g., single stranded DNA molecules.
  • an oligonucleotide primer is annealed to the DNA molecules.
  • Polymerase is then added along with the fluorescent dNTP mix.
  • Fluorescent nucleotide analogs that do not terminate extension of the D strand are used.
  • the DNA molecules are then attached to the substrate and the array is visualized as described herein.
  • the color of the nucleotide incorporated into the growing chain reveals the sequence of the DNA molecules. If all of the DNA molecules within the array are identical, then the incorporation of the first nucleotide during polymerization will yield a fluorescent line extending horizontally across the array . Subsequent nucleotide addition will also yield horizontal lines and the color of each line will correspond the DNA sequence. When sequencing different DNA molecules, the differences in DN A sequences are revealed as the incorporation of different fluorescent nucleotides across the array, rather than the lines of identical color seen when sequencing identical DNA molecules. In some embodiments, these methods are automated.
  • RNAP RNAP in real time as it searches for promoters, and we develop a theoretical framework for analyzing target searches at the submicroscopic scale based upon single-molecule target association rates. Contrary to long-held assumptions, we demonstrate that the promoter search is dominated by three-dimensional diffusion at both the microscopic and submicroscopic scales in vitro, which has direct implications for understanding how promoters are located within physiological settings.
  • RNA polymerase the protein machinery directly responsible for RNA synthesis.
  • Escherichia coli has -3,000 promoters, each containing a core sequence ⁇ 35 base pairs in length with hexameric consensus sites at the -35 (TTGACA) and -10 (TATAAT) regions. 4"9 Prior to synthesizing a transcript, RN AP must find appropriate promoter sequences. Like all DNA-binding proteins, RNAP is expected to employ some form of diffusion to locate its targets ( Figure 6). 1 There are four potential diffusion-based mechanisms that might contribute to the promoter search: (i) one-dimensional (I D)
  • RNAP can move long distances along DNA by lD-sliding, 3 ⁇ 4"29 and as a consequence it is also now widely assumed that RNAP locates promoters through facilitated diffusion involving a I D search.
  • E. coli RNAP is among the best-characterized enzymes at the single molecule level, yet no study has conclusively established how RNAP locates promoters/ *
  • QD-RNAP quantum dot-tagged RNAP
  • Promoter association assays reveals known intermediates.
  • RNAP was injected into the fSowcell ( ⁇ rNTPs), flow was terminated, and data collected at 5, 10, or 100 frames per second (Hz; Figure 2a-b).
  • RNAP The D- o s values for RNAP were all several orders of magnitude lower than values reported for lac repressor, p53, and Mfhl-Pmsl ( Figure 2e, Figure 12 & Table 4), further arguing against extensive ID diffusion contributing to the promoter search.
  • the D ⁇ obs values for RNAP were:
  • Intersegmental transfer is not essential for the promoter search.
  • the promoter search mechanism of E. coli RNAP appeared to be dominated by 3D random collisions, with no evidence for facilitated diffusion involving ID sliding over distances along the DNA greater than our current spatial resolution limits.
  • Facilitated searches can also potentially occur through intersegmental transfer, which would involve RNAP movement from one distal site to another via a looped DNA intermediate ( Figure 6).
  • the DNA used in our experiments were maintained in a stretched configuration, and we anticipate that they would not support intersegmental transfer because the DNA cannot form the looped mtermediates necessary for this mode of facilitated diffusion.
  • coli KNAP was direct binding from solution, which occurs at a rate of: k ⁇ a u ) 8 ⁇ ⁇ 3 C 0 J oo 0 e -D 3 u 2 1 [u(J 2 0 (up)+Y 2 0 (up))] -1 du, where Co is initial protein concentration, 13 ⁇ 4 is the 3-dimension diffusion coefficient of QD- RNAP, ⁇ is the effective target size, p is the reaction radius, and Jo and J'o are Bessel functions of the first and second kind, respectively.
  • the effective target size is a geometric constraint describing the binding surface that transiently samples DN during the promoter search, and is a function of protein orientation ( ⁇ ) and linear target size (a) (Fig.
  • Comparison of the experimental data to values calculated from k ( ⁇ ) (t) provides a direct assessment of the promoter search mechanism, which allows us to determine whether submicroscopic facilitated diffusion contributes to promoter association: if the experimentally observed association rates exceed k ( ⁇ ) a (t), then submicroscopic facilitated diffusion must be contributing to the search mechanism; in contrast, if the experimentally observed association rates are equal to k ( ⁇ ) a (t), then the search mechanism can be attributed to 3D collisions with no underlying contribution of submicroscopic facilitated diffusion.
  • association rates exceeded k ( ⁇ ) a (t) below 500 pM QD-RNAP, revealing that subniicroscopic facilitated diffusion accelerated the promoter search at low protein concentrations, with 3-fold acceleration observed at 50 pM RN AP (Figure 3d).
  • association times converged to k ( ⁇ ) a (t), indicating that subniicroscopic facilitated diffusion did not contribute to the promoter search at higher concentrations ( Figure 3d).
  • QD-RNAP no longer benefits from facilitated diffusion at concentrations >500 pM, one must recognize that V will vary for different proteins and/or different reaction conditions.
  • the physical behavior of RNAP with respect to the search process will not change regardless of whether the concentration is above or belo Cthr,' the only thing that changes is the probability of engaging a target through a direct collision (Pw) versus the probability of engaging the target after undergoing facilitated diffusion along the DNA (PFD).
  • RNAP RNAP ( ⁇ - 2.23-nm) the "antenna” was just ⁇ 1.48-nm (corresponding to ⁇ 6-bp in our system); the very small size of the "antenna” indicated the limited contribution that facilitated diffusion (sliding and/or hopping) maded to the promoter search even at the lowest RN AP concentrations tested ( Figure 3e-f).
  • An in vivo protein concentration of 1 nM corresponds to just 1 protein molecule in a volume the size of an E. coli cell, 52 therefore an in vivo concentration of 50 pM would be equivalent to an average of just 1/20 "" of a molecule of RNAP per bacterium, which would not seem phy siologically relevant.
  • proteins present at lo concentrations in living cells may be more apt to locate targets through facilitated diffusion, whereas those present at higher concentrations (e.g. RNAP, -2,000-3,000 molecules cell 1 ) may be more likely to engage their target sites through 3D diffusion.
  • Transcriptional activators such as catabolite activator protein (CAP) are commonly involved in the regulation of gene expression, and can exert their effects either by facilitating recruitment of RNAP or by stimulating steps after recruitment (e.g. open complex formation, promoter escape, etc.) In scenarios involving factor-assisted recruitment, additional protein-protein contacts stabilize interactions between RNAP and the promoter. However, the presence of a transcriptional activator near a promoter should not be involved in the regulation of gene expression, and can exert their effects either by facilitating recruitment of RNAP or by stimulating steps after recruitment (e.g. open complex formation, promoter escape, etc.) In scenarios involving factor-assisted recruitment, additional protein-protein contacts stabilize interactions between RNAP and the promoter. However, the presence of a transcriptional activator near a promoter should not
  • RNAP fundamentally alter the search process by causing RNAP to start sliding and or hopping along the DNA while executing the search, rather it would just make the target appear "larger" to RNAP (i.e. promoter plus factor, instead of just the promoter), which would in turn reduce the facilitation threshold.
  • Factors that stimulate steps after recruitment would not influence the search because they exert their effects only after the promoter search is complete.
  • RNAP lac repressor
  • lac repressor which is thought to employ facilitated diffusion in vivo during its target search/ 1"'56 may need to do so to compensate for its much lower intracellular abundance ( ⁇ 10 molecules celT ! ) and the corresponding scarcity of its targets (3 lac operators per genome).
  • other proteins e.g. Fis, HIT, IHF, H-NS, etc.
  • other steps are rate-limiting during gene expression (e.g.
  • RNAP was then diluted into biotin-supplemented transcription buffer (irNTP, 250 ⁇ each, as indicated) to a final concentration of 30-200 pM, and then a 50 ⁇ 1 sample was injected into the flowcell at a rate of 0.1 ml niin , and buffer flow was terminated 120-s after beginning the injection.
  • biotin-supplemented transcription buffer irNTP, 250 ⁇ each, as indicated
  • RNA Polymerase Purification and Characterization Cells for expressing a chromosomal copy of RNA. polymerase that harbors a biotinylation peptide tag on the C- terminus of the ⁇ ' subunit were generously provided by Dr. Robert Landick (University of Wisconsin-Madison) (A19), and RNAP holoenzyme was expressed, purified, and characterized as previously described (A4).
  • RNAP remained functional under the dilute conditions necessary for single molecule measurements.
  • Cy3- label 249-bp DNA fragment containing promoter PR was made by PCR using ⁇ phage DNA as a template with the following primers: Cy3 (5'- Cy3-GGC CTT GTT GAT CGC GCT TT -3', 5'- CGT GCG TCC TCA AGC TGC TCT T -3', IDT).
  • RNAP Varying amounts of purified RNAP (0.1 - 1.6 iiM) were incubated with 0.4 tiM of the Cy3-iabeled PR DNA fragment in buffer (20 mM Tris [pH 8.0], 25 mM KCi, 1 mM MgC12, 1 mM DTT, and 0.2 mg ml-1 BSA) at room temperature for 40 minutes. Heparin ( 10 ug ml- i) was then added to disrupt non-specifically bound R AP and closed complexes, and the reactions were resolved on native 5% polyacrylamide gels to separate the free and bound DNA.
  • RNAP was diluted to a final concentration of 0.6 nM in buffer (20 niM Tris [pH 8.0], 25 mM KCi, 1 mM MgC12, 1 mM DTT, and 0.2 mg ml-1 BSA) and the diluted samples were then incubated at room temperature for the indicated time intervals.
  • the activity of the diluted RNAP was measured using a gel shift assay as described above. As shown in Figure 14, the activity of diluted RNAP did not change significantly over a 40-minute period.
  • Our single molecule measurements are typically completed within ⁇ 15 minutes of diluting the RN AP stock solutions.
  • RNA polymerase molecules assigned as ⁇ 3 events i.e. promoter-bound open complexes
  • ⁇ 3 events i.e. promoter-bound open complexes
  • the buffer flow was terminated, ensuring that the concentration of QD-RNAP in the sample chamber remained constant after this time point; the selection of the 30 sec stopping point was based on the concentration profiles of our flowcells under thissample injection regime.
  • imaged were continually collected at either 5- or 10-Hz for a period of up to 15 minutes.
  • the primary advantage of this analysis is that it eliminates the need to definitively esta blish a zero time point prior to an initial binding event, so long as the concentration of QD-RNAP in the tlowceil remains isotropic, because ail calculations are based on residual waiting times between initial binding events on the different DNA molecules in the sample chamber.
  • QD excursions within the evanescent field was determined for 100 Hz data sets.
  • the MCCD was set to frame transfer mode and an AOI (63 x 1 pixels), and one DNA molecule was imaged for 10,000 frames at an acquisition rate of 100 Hz in the absence of any QDs.
  • a histogram of the resulting signal intensities corresponded to background noise, which dropped dramatically above an intensity of -2000 (A.U.). From the histogram we calculate that the probability of camera background noise beyond this threshold is ⁇ 9.5 10°, therefore we selected a threshold value of 2040 (A.U.).
  • QDs 150 pM
  • data were collected as described above using the same EMCCD settings.
  • the first time constant TO is the same as the QD-oniy control ( Figure 7a ⁇ b), and was also found for control measurements made in the absence of DNA ( Figure 7e-f), therefore does not arise from a poiyrnerase-speeific interaction with the DNA.
  • n corresponds to the lifetime of a nonspecific interaction between DNA and RNAP.
  • DNA can then beestimated as follows below.
  • the actual versus injected concentration of QD-RNAP was determined by injecting a fixed volume (50 ⁇ ) of QDs (200 pM) into the sample chamber at a defined flow rate (0.05 ml min- 1) while continuously monitoring the bulk fluorescence signal through the microscope objective.
  • the resulting signal versus time curve was normalized to define the QD concentration profile as a function of these defined injection parameters.Therefore the number of non-specific interaction events between DNA and RNA polymerase is given by: N S
  • step two Using the diffusion coefficient from step one, determine the optimal point at which to perform a regression, and recalculate the diffusion coefficient. 3. Repeat step two until convergence,
  • the mean squared displacement is known to cany significant statistical error fro two sources: (?) The mean squared displacement time average from a single trajector '- only equals the ensemble average in the limit that the trajectory is infinite (see below), so only when a trajectory is infinitely long, is the apparent diffusion coefficient obtained from that trajectory precisely correct; and (it) there is also a correlated error, which comes about from overlaps in calculation of displacements. Readers should be directed to Qian et. al, and X. Michalet for in depth discussion (A22-23).
  • Multiplication by fx - x "f and integration reveals that, for a stationary particle, - 0, the MSD should be a straight line at 2 ⁇ ⁇ . Furthermore, detailed calculations reveal that measured MSD values are gamma distributed about their true means (A6). 1 hat is to say, when .0 / 0bS is small, it is common to attain MSD values in the range [0,2 ⁇ "). Since the diffusion coefficient is determined through a linear fit to MSD values, it is not irregular to obtain a negative measured value for Di ⁇ such as when early time points produce values in the range [2 ⁇ 2 , ⁇ ], and later time points yield values on [0,2 f). This, of course is also the source of positive values of Di, 0 bs for stationary particles ( Figure 12).
  • the MSD plots should be independent of time, and while the resultant curves for the QD-labeled DNA do exhibit time independence at long time separation, at short times there is clearly some motion, as revealed in the rise of the MSD plots at early time points ( Figure 13). Errors resulting from the camera and fitting functions are not correlated in time, and won't induce this kind of time dependent behavior in the MSD. However, the motion of the DNA will produce time dependent effects if the fluctuations occur on a time scale comparable to the QD motion ( Figure 13).
  • Ax ⁇ 1 QQnm (three standard deviations of noise) events are scored as direct collisions. Furthermore, Ax > lOOnm corresponds to facilitated transport to the operator. The initial binding location of nonspecific events that did not result in target capture (failed searches) were also collected, provided the binding event occurred prior to target engagement by the successful protein ⁇ i.e. the target was still unoccupied). For the failed searches, Ax was calculated by measuring the initial binding location of the failed searcher relative to the location of the operator.
  • T7 KNAP experiments The gene for T7 RNAP was fused to a C -terminal
  • the ceils were thawed at room temperature, lysed by sonicatation, and the lysate was clarified by centrifugation.
  • the clarified lysate was loaded onto a 10-ml Chitin bead column (NEB), and washed extensively with buffer containing 20 mM Tris-HCl, pH 8.0, 1 M NaCi, 1 mM EDTA, following the manufactures protocol.
  • the column bed was then quickly flushed with 20 mM Tris-HCl, pH 8.0, 1 M NaCi, 1 mM EDTA, plus 50 mM DTT, and incubated at 4 C for ⁇ 20 hours.
  • the protein was then eiuted and dialyzed into T7 RNAP storage buffer (50 mM Tris-HCl [pH 7.9], 100 mM NaCl, 20 mM ⁇ -mercaptoethanol, 1 mM EDTA, 50% glycerol, 0.1% triton X- 1 Q0) at 4°C overnight. Protein activity was tested by in vitro run off transcription assays. Single-molecule experiments using double -tethered DNA curtains were conducted exactly as described for E. coli RNAP, under the indicated buffer conditions ( Figure 10).
  • a facilitated association process of a protein to its cognate sequence consists of three states: ⁇ ) ' ) a free state, (ii) a non-specifically bound state, wherein the protein is bound to non-target DNA, and (Hi) a specifically bound state, where the protein has located and bound target DNA,
  • the search process that a protein undergoes consists of cycling through the non- specifically bound and free states until eventually locating the target. When the concentration of available nonspecific states outnumbers specific states, this process is slow. The "facilitation" occurs due to two factors.
  • the affinity of proteins for non-target stretches of DNA localizes the protein to the DNA for extended periods of time, allowing for many successive rebinding events.
  • the protein when the protein is able to translocate along the DNA during its time in the bound state, it may interrogate multiple sites during a single binding event,
  • reaction radius The motion of the pro tein beyond this distance is expected to be free thermal diffusion in three dimensions. While, within p the protein's motion is constrained to only allow movement along the dimension of the DNA ( Figure 3a-d). This motion is expected to be Brownian as well, however the diffusion coefficient must now include the average effect of the potential along the DNA as well as the viscous forces from solution.
  • the reaction radius is then dependent on the size of the protein and the DNA, as well as the ionic strength of the solution, as it describes the point at which one can disregard the gradient of the radial portion of the electrostatic potential of DNA.
  • p is chosen to be the sum of the radii of the searching protein and the DNA plus the Debye screening length, r a3 ⁇ 4 , under our reaction conditions.
  • Effective Target Size The hallmark of facilitated diffusion, is that the overall association rate can be greatly accelerated by the mechanisms of sliding and hopping we have described. Notably, the magnitude of this effect is proportional to the concentration of reactants. That is, the acceleration, which may be present at lower protein concentrations, vanishes as the concentration increases. To see this consider the flux to the operator to be comprised of three ⁇ .. i- terms: the first, * ⁇ *, is described above, the second i3 ⁇ 4 , the hopping rate into the promoter, and the third, * 's , is the sliding rate into the promoter.
  • Harada et al. also reported that 10 out of 381 RNAP molecules (2.6%) underwent I D diffusive motion detectable above instrument resolution (0.2 ⁇ ). Notably, the experiments of Harada et al. were conducted in 50% sucrose, 10 and the high viscosity of this buffer 'ti> Q %stic ⁇ * w 1 may ha v e artificially prolonged the lifetime of the
  • ® is the bend angle of the DNA molecule induced by the bound protein
  • K is the DNA stretch modulus (-1,200 pN)
  • 20 ⁇ is the length of the bound site times the length of an unperturbed base pair (20bp ⁇ 0.34 nm/bp)(A37).
  • RNAP recognizes and binds to the promoters on the extended DNA substrates
  • the lifetime we obtained for open complex formation closely match literature values
  • RNAP moves along the DN A when provided with all four rNTPs.
  • RNA polymerase can transcribe against applied forces of up to ⁇ 14-25 pN (A38-39) which again suggests that the relatively lo tension used in our assays should have little or no impact upon the proteins ability to bind promoters.
  • Browning D Busby S. The regulation of bacterial transcription initiation. Nat Rev Microbiol. 2004;2:57-65.
  • RNA polymerase active center the molecular engine of transcription. Annu Re v Biochem. 2009;78:335-61 .
  • Protein-nucleic acid interactions contribute to all aspects of gene expression, genome maintenance, and DNA replication, and defects in protein-nucleic acid interactions are often the underlying causes of genetic diseases and cancer.
  • DNA curtains are assembled by tethering one end of a biotinylated DNA molecule to a lipid bilayer, which coats the surface of a microfluidic sample chamber, 1 5
  • the bilayer provides an inert environment compatible with a range of biological macromolecules.
  • DNA is tethered to the bilayer via a biotin-streptavidin linkage, permitting the DN A to diffuse in two dimensions.
  • Hydrodynamic force is used to organize the DNA along nanofabricated barriers that disrupt the continuity of the bilayer. Lipids cannot traverse these barriers; therefore, the molecules align along the barriers and extend parallel to the sample chamber surface, allowing them to be visualized by total internal reflection fluorescence microscopy (TIRFM).
  • TIRFM total internal reflection fluorescence microscopy
  • DNA curtains enable direct visualization of hundreds or even thousands of individual DNA molecules along with any proteins bound to the DNA by real-time fluorescence microscopy, and the molecules themselves are confined within a "bio- friendly" microenvironment that minimizes nonspecific interactions with the sample chamber surface.
  • Single-stranded DNA is a key intermediate in nearly all biochemical reactions related to the maintenance of genome integrity (e.g., DNA replication, homologous DNA recombination, nucleotide excision repair, mismatch repair), but the lack of methodologies for readily visualizing long ssDN A molecules has been noted in the literature as a crucial limitation of existing single-molecule technologies. 10 Several challenges have prevented use of ssDNA in single-molecule curiam experiments.
  • RPA replication protein A
  • RPA-ssDNA filaments are stiffer than naked ssDN A, all owing the RPA-bound ssDNA to be stretched out by laminar flow and visualized by real-time optical microscopy. This approach will pro vide access to a wide range of problems related to protein-ssDNA interactions, in particular those related to the repair of damaged DNA.
  • ⁇ 29 DNA Polymerase The gene encoding ⁇ 29 DNA polymerase was purchased from Genscript and subcloned into a modified pTXB3 vector containing an N- terminal hexahistidine tag (6xHis) upstream of a 33 ⁇ 4 Flag epitope tag. The protein was expressed in E. coli strain BL21 with overnight induction at 18 °C with 0.3 mM isopropyl- ⁇ - D-thiogalactopyranoside, IPTG.
  • 6xHis N- terminal hexahistidine tag
  • the ceils were collected by centrifugaiion and resuspended in lysis buffer (25 mM Tris-HCl [pH 7.4], 500 mM NaCI, 5% glycerol, 5 mM imidazole), along with protease inhibitors (0.5 mM 4-(2-ammoethyi) benzenesulfonyl fluoride (AEBSF; Fisher), 10 mM £-64 (Sigma), 2 mM benzamidine), and then lysed by sonication. The lysate was clarified by centrifugaiion, and the supernatant was applied to Ni-NTA resin (Qiagen).
  • lysis buffer 25 mM Tris-HCl [pH 7.4], 500 mM NaCI, 5% glycerol, 5 mM imidazole
  • protease inhibitors 0.5 mM 4-(2-ammoethyi) benzenesulfonyl fluoride
  • the resin was washed with Ni-wash bu 'er (25 mM Tris-HCl, pH 7.4, 500 mM NaCI, 5% glycerol, 5 mM imidazole).
  • Ni-wash bu 'er 25 mM Tris-HCl, pH 7.4, 500 mM NaCI, 5% glycerol, 5 mM imidazole.
  • the protein was eluted in 25 mL ofNi-elution buffer (25 mM Tris-HCl, pH 7.4, 500 mM NaCI, 5% glycerol, 300 mM imidazole) and applied directly to a chitin column (NEB).
  • the chitin column was washed with chitin-wash buffer (25 mM Tris, pH 7.4, 500 mM NaCI, 0.1 mM ethylenediaminetetraacetic acid, EDTA), and the protein was eluted by incubating the resin in chitin-wash buffer containing 50 mM dithiothreitoi (DTT) overnight at 4 °C.
  • the eluate was diaiyzed into storage buffer (10 mM Tris, pH 7.4, 100 mM KCf, 1 mM DTT, 0.1 mM EDTA, 50% glycerol) and stored at -80°C. Protein concentration was determined using s 2 sonm - 1.2 x 1 () 5 M 1 cm 1 to yield a final concentration of 10 ⁇ (-0,75 mg/mL).
  • scRPA cerevisiae subunits of replication protein A
  • scRPA cerevisiae subunits of replication protein A
  • scRPA cerevisiae subunits of replication protein A
  • eGFP enhanced green fluorescent protein
  • ScRPA-eGFP was expressed in is. coli strain BL21 with an overnight induction at 18°C with 0.3 mM IPTG. The cells were collected by centrifugaiion, resuspended in lysis buffer (50 mM NaKP04, 250 mM NaCI, 10 mM imidazole [pH 7.9]), and lysed by sonication.
  • the lysate was clarified by centrifugaiion and bound to Ni-resin (Qiagen) in batch for 30 min at 4°C.
  • the beads were washed with Ni-wash buffer (50 mM NaKP04, 250 mM NaCI, 20 mM imidazole).
  • Ni-wash buffer 50 mM NaKP04, 250 mM NaCI, 20 mM imidazole.
  • the protein was eluted with 2 ⁇ 5 mL in Ni-elute buffer (50 mM NaKP0 4 , 250 mM NaCI, 250 mM imidazole) and diaiyzed against 2 L of bu"er (30 mM Hepes [pH 7.9], 1 mM DTT, 0.25 mM EDTA, 0.01% NP40, 80 mM NaCI).
  • the protein was then purified by Hi-trap Q sepharose (GE Healthcare) with a gradient from 0 to 70% B (30 mM Hepes [pH 7.9], I mM DTT, 0.25 mM EDTA, 0.01% NP40; A, 80 mM NaCI, B, 1 M NaCI) over 150 mL.
  • ScRPA-eGFP was dialyzed overnight against 1 L of buffer (30 niM Hepes [pH 7.9], 150 mM NaCl, 1 niM DTT, 0.01% NP40, 0.25 mM EDTA).
  • the protein was then concentrated with polyethylene glycol (PEG; Thermofisher) and then dialyzed against storage buffer containing 50% glycerol.
  • the protein was aliquoted, frozen in liquid N?, and stored at -80 °C.
  • Sgsl Purifcation and Labeling Sgsl contains N-terminal flag and C- terminal 3 * HA tags and was expressed in Sf9 cells and purified over an anti-Flag column, as described. ' 4 Sgs l was labeled by incubating with anti-HA quantum dots (QDs) for 2 h on ice prior to imaging.
  • QDs anti-HA quantum dots
  • Single-Stranded DNA Substrates Single-stranded M13mpl 8 (NEB) was annealed to a biotinylated primer (5 '-BioTEG-dTTT TTT TTT TTT TTT TTT TTT TTT TTT TTT TTT T ' T ' T TTT TTT TTT GTA AAA CGA CGG CCA GT). The annealed product was then passed through a size exclusion spin column (Centrispin 40; Princeton Separations) to remove excess primer. The final volume was 200 ⁇ with an approximate concentration of 15 nM annealed M13mp l 8.
  • Rolling circle replication reactions ( 100 uL) contained 50 mM Tris [pH 7.4], 2 mM DTT, 10 mM MgC12, 10 mM ammonium sulfate, 0.15 nM primed M13mp ! 8 DNA, and 200 ⁇ deoxyribonucleoside triphosphates, dNTPs. Replication was initiated by addition of ⁇ 29 DNA polymerase to a final concentration of 100 nM and incubated for 30 min at 30 °C, Reactions were quenched by addition of EDTA to a final concentration of 75 mM.
  • Electron-Beam Lithography Barriers were fabricated by electron-beam lithography, as described in brief, fused silica slides were cleaned in NanoStrip (CyanTek Corp) for 20 min, rinsed with acetone and isopropanol, and dried with N2. Slides were spin- coated with two layers of polymethylmethacrylate (PMMA; 25K and 495K; MicroChem), followed by a layer of Aquasave (Mitsubishi Rayon). Patterns were written with a FEi Sirion scanning electron microscope (J. C. Nabity, Inc.).
  • lipid vesicles composed of DOPC (1 ,2-dioleoyi-sn-glycerophosphocholine), 0,5% biotinylated-DPPE (l,2-dipalmitoyl-s «-g3ycero-3-phosphoethanolamine-N-(cap biotinyl)), and 8% mPEG 550-DOPE ( 1 ,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N- [methoxy(poiyethylene glycoi)-550J) were diluted in buffer containing 10 mM Tris-HCl (pH 7,4) and 100 mM NaCl and incubated within the sample chamber for 30 min.
  • the surface was further passivated with Buffer A [40 mM Tris-HCl (pH 7.4), 1 mM DTT, 1 mM MgC12, 0.2 mg mL ⁇ J BSA].
  • Buffer A 40 mM Tris-HCl (pH 7.4), 1 mM DTT, 1 mM MgC12, 0.2 mg mL ⁇ J BSA.
  • the DNA was coupled to the bilayer and aligned at the barriers.
  • the !ow cells were attached to a syringe pump system (KD Scientific) and flushed with Buffer A.
  • ⁇ 29 DNA polymerase is highly processive and can generate ssDNA molecules
  • scRPA binds tightly to ssDNA (K a « 10 9 — -1011 M- 1), 13 so ssDNA binding is expected to occur at low protein concentrations amenable to single-molecule imaging.
  • RPA eliminates secondary structure in ssDNA, protects ssDNA from damage, and increases the persistence length of ssDNA; 13 ' 1 ' these features should ensure that ssDNA bound by RPA could be readily stretched by buffer flow (Figure 18C,D).
  • scRPA retains biological function in vivo when labeled with eGFP on the C-terminus of the 32 kDa subunit, 18 ensuring that the labeled protein would retain all relevant activities related to its biological functions,
  • scRPA-eGFP remained bound to the ssDNA. with little or no dissociation or exchange with free RPA in solution, even after observations over times ranging up to >60 min.
  • Sg l is the S, cerevisiae RecQ heliease that participates in a number of reactions involving ssDNA.
  • QD quantum dot
  • scRPA-eGFP-ssDNA complex is of great benefit because it eliminated the need to maintain a pool of free RPA, which would contribute to background signal.
  • RPA is a ubiquitous protein involved in all biological reactions that have an ssDNA intermediate (e.g., homologous DNA recombination, nucleotide excision repair, post-replicative mismatch repair, DNA replication, etc.), so the experiments shown will permit in-depth biological studies involving a broader compliment of proteins involved in the various reactions.
  • ssDNA is unlikely to exist in vivo because it becomes rapidly coated with RPA (or SSB in prokaryotes); 13 therefore, development of methods for observing RPA-bound ssDNA provides a biologically relevant context for experimentally accessing a range of other proteins that act on ssDNA. (such as the homologous recombination proteins Rad51, Srs2, Rad52, etc.).
  • MutLa undergoes intersite transfer between juxtaposed DN A segments while searching for lesion-bound MutSa, but this activity is suppressed upon association with MutSa, ensuring that MutS/MutL remains associated with the damage-bearing strand while scanning the flanking DNA.
  • Our findings highlight a hierarchy of lesion- and ATP-dependent transitions involving both MutSa and MutLa, and help establish how different modes of diffusion can be used during recognition and repair of damaged DNA.
  • MMR Postreplicative mismatch repair
  • Saccharomyces cerevisiae should incur only approximately two mismatches per cell cycle (B6). MutSa must find these rare lesions, MutLa must search for lesion-bound MutSa, and the lesion-bound MutSo/MutLa complex must search the flanking DNA for signals that distinguish the parental and daughter strands (Bl-3).
  • Models describing how DNA -binding proteins search for specific targets include 3D diffusion (i.e., jumping), ID hopping, ID sliding, and intersegmental transfer; the latter three are categorized as facilitated diffusion because they allow target association rates exceeding limits imposed by 3D diffusion (B7- 10). New single-molecule and NMR techniques have led to resurgent interest in
  • MutSo/MutLa are released upon binding ATP and scan the flanking DNA for strand- discrimination signals by ID diffusion. While searching for lesions, the movement of MutSa is consistent with a model wherein the protein rotates to maintain constant register with the helical contour of the DNA (B14). However, once released from a mismatch, MutSa is altered so that mismatches no longer are recognized as targets, and the protein slides much more rapidly, suggesting its motion no longer is coupled to rotation around the DN A. Finally, we demonstrate that the mismatch-bound MutSo MutLa complex undergoes an ATP- dependent functional transition rendering it resistant to dissociation from damaged DNA. These data provide a detailed view of how diffusion can contribute to the early stages of MMR.
  • Each model makes unique predictions as to how MutSa should behave in the DNA curtain assay: Translocation predicts that MutSa should undergo ATP hydrolysis-dependent unidirectional motion; the molecular-switch model predicts that MutSa should exhibit ATP-binding-dependent ID diffusion; and static transactivation predicts that MutSa should remain at the mismatch while awaiting looping- mediated interactions with flanking DNA.
  • MutSa Must , The highly redundant nature of diffusion poses a conceptually important problem: Once MutSa is released from a mismatch and starts scanning the flanking DN A by ID diffusion, it must not reengage the mismatch; otherwise it could become nonproductively trapped while undergoing reiterative cycles of mismatch binding and release. This problem can be illustrated by considering that when MutSa takes a single diffusive step away from the mismatch, it has a 50% probability of re-encountering the mismatch on the very next step, and the average number of times MutSa would re-encounter the mismatch is equal to N--1, where N is the distance in 1 -bp diffusion steps between the mismatch and the nearest strand discrimination signal (Fig. 33). These considerations suggest that MutSa must be functionally distinct after ATP-triggered release from a mismatch to avoid redundant lesion recognition.
  • each microscopically observed bypass reflects ⁇ 1 ,000 submicroscopic encounters with the lesions; these encounters are undetectable as independent events given current resolution limits.
  • the diffusion coefficient of MutSa is consistent with ID sliding wherein lateral motion of the protein is coupled to obligatory rotation as it tracks the helical pitch of the DNA (B 14).
  • MutLa can locate mismatch- bound MutSa through ID hopping or 3D diffusion.
  • MutSa and MutLa collided while diffusing at sites other than a mismatch they showed no evidence of establishing stable interactions (n > 2,000) (Fig. 25C).
  • This outcome is remarkable given that the local concentration of two proteins that encounter one another while undergoing a ID search on the same DNA molecule is infinitely high.
  • the conformational context of MutSa is critical for controlling protein-protein interactions with MutLa and that the two complexes do not interact stably with one another while undergoing ID diffusion in the absence of a mismatch despite being forced into close physical proximity through association with the same DNA molecule,
  • mismatch-bound MutSa/MutLa must undergo a structural change upon binding ATP, rendering the complex resistant to dissociation from the lesion-bearing DNA without altering its ability to scan the flanking duplex by ID diffusion.
  • MutSa is physically associated with replication factories and that 10-15% of mismatch repair can be attributed to replication fork-associated MutSa (B40).
  • these results suggest the possibility that the replisome might clear DNA of any potential obstacles that otherwise could impair lesion targeting, perhaps enabling MutSa to slide along the newly synthesized naked DNA while surveying for lesions at the rear of the progressing fork.
  • MutSa also can be targeted to lesions through a 3D mechanism (or submicroscopic ID sliding over distances less than 30 nm) might explain how lesions are located for the 85-90% of repair events that do not involve direct association of MutSa with the replisomes (B40).
  • MutLa can search for lesion-bound MutSa through a combination of ID hopping, 3D diffusion, and intersite transfer (Fig. 27A), and we anticipate that this search could occur on chromatin because MutLa can diffuse readily past nucleosomes (B15). After assembling at a lesion, the MutSa/MutLa complex is released upon binding ATP and scans the flanking DNA. by ID diffusion. During this search, MutSa/MutLa is rendered incapable of intersite transfer and becomes highly resistant to dissociation, which could ensure that the MutSa/MutLa complex remained confined to the damaged DN A.
  • MutLa form oligomers comprised of ⁇ 1 1 ⁇ 5 pro teins at sites of repair in vivo, as evidenced by the presence of Pmsl-4GFP foci (40).
  • QD quantum dot
  • the predominance of single MutLa molecules in our study can be attributed to the fact that we were probing the early stages of MMR involving initial lesion recognition and assembly of the first MutSa/MutLa complex.
  • MutLa foci observed in vivo reflect later stages of the reaction (B40).
  • MutLa oligomerization on MutSa occurs only after the first MutSa/MutLa complex is released from the lesion.
  • This hypothesis also is supported by the observation that the msh6-Gl 14D mutant of MutSa, which is capable of forming a ternary complex with MutLa at mismatches but is defective for ATP-triggered release, does not support formation of detectable Pmsl-4GFP foci in vivo. Therefore, ATP -triggered release of the initial MutSa/MutLa complex from the lesions may represent an intermediate step preceding the assembly of higher-order MutLa oligomers.
  • MutSa alone or within the context of the MuiSa/MutLa complex displays dramatically altered diffusive characteristics before and after lesion recognition, likely reflecting distinct functional and structural states necessary to accommodate the different stages of MMR.
  • MutSa diffuses through a mechanism consistent with ID sliding while tracking the helical pitch of the DNA (B 14, B l 5), but after ATP-triggered release from the mismatch, MutSa diffuses much more rapidly and no longer recognizes mismatches as binding targets. Inspection of available MutS and MutSa structures provides a potential explanation for these differences (Fig. 27B) (B24, B31 , B33).
  • This configuration of Msh6 domain I would impose steric constraints requiring MutSa to track the helical pitch of the DN during any ID diffusion (i.e., just as a bolt tracks the helical threads of a screw).
  • domain I of Msh6 is inserted into the major groove before lesion recognition (as necessary to engage a mismatch and consistent with a rotation-coupled ID diffusion) and remains within the major groove upon binding the lesion (as shown in the crystal structures) but then is retracted from the major groove after ATP-triggered release from the mismaicli (consistent with more rapid ID diffusion observed after lesion recognition). Retraction of Msh6 domain 1 from the major groove also would explain how MutSa and MutSa/MutLa are released from the mismatch upon binding ATP and how they avoid rebinding the mismatch while searching for strand- discrimination signals.
  • MutSa was affinity purified after being labeled with QDs, thus eliminating any QDs not bound by active MutSa before injection of the sample for single-molecule imaging.
  • reactions were performed as previously described (B14, B15), except that all buffers contained either 100 or 150 mM NaCl.
  • all buffers contained 20 mM Tris (pH 7.8), 1 mM MgCl 2 , 1 mM DTP, and 4 mg/mL BSA, along with the indicated concentration of NaCl.
  • standard reaction conditions for looking at lesion binding all contained 1 mM ADP.
  • MutSa and MutLa were purified and labeled as described (1, 2). MutSa labeling was performed at a 6: 1 QD:Protein ratio (300 nm Qdot : 50 nM protein) in PBS containing 0.2 mg ml-1 BSA and incubated for 20 minutes at 4°C. The protein-QD conjugates were then purified to remove unconjugated QDs. For this, biotinylated ⁇ -DNA (300 pM) was incubated with streptavidin magnetic beads (5 mg; Roche) for 20-min at 20°C.
  • the MutSa-QD conjugation reaction was added to the beads, the PBS solution was diluted to 1/5* concentration with 10 mM Tris (pH 7.8) solution, and the reaction was incubated for 10-min at 4°C. Beads were washed twice with 10 mM Tris (pH 7.8), 20 mM NaCl, i mM MgCl, 1 mM DTT, and 0.2 mg ml- 1 BSA. QD-MutSa was eluted with 10 mM Tris (pH 7.8), 300 mM NaCl, 1 mM MgCl, 1 mM DTT, and 0.2 mg ml- 1 BSA. [0187] 2. DNA substrates and cloning.
  • a 151 bp DNA fragment containing unique restriction and nickase sites was ligated between the Nhel and Xhol sites (Fig. 28). Insert-containing DNA was packaged using MaxPlax ⁇ packing extracts (Epicenter), according to the manufacturer's instructions. Phage stocks were prepared by standard plate lysis, and used to infect 1 ml of E. coli LE392MP cells (OD 0.1) at 37°C for 20 minutes. Infected cells were used to inoculate a 200 ml liquid culture in LB and 10 mM MgS04, which was grown overnight at 39 C C. 10 mi of chloroform was added and the culture was shaken for 10 minutes.
  • the iysed culture was incubated with Dnase I and RNase (1 ug ml-1 each) at 20°C for 1 hour. SDS (0.5%), EDTA (50 mM) and proteinase K (5 mg) were added to the Iysed culture, and incubated at 20°C for 1 hour, followed by phenol chloroform extraction and isopropanol precipitation. Purified DNA was resuspended in TE, and end-labeled with oligonucleotides, as described (2).
  • the end- labeled D A was treated with the Nt.BspQI (NEB), mixed with a 1000-fold molar excess of an oligonucleotide complementary to the region encompassed by the nickase sites, and then heated and cooled. Successful insertion was assessed by comparing restriction digests with either Ncol or Swal, and alkaline gel electrophoresis verified the nicks were sealed by T4 DNA ligase.
  • NEBspQI Nt.BspQI
  • MutLa does not remain preferent lly bound to any positions on the DNA (2)
  • the distribution histogram of MutLa represents the instantaneous positions for all molecules in the observed population and the flat distribution reflects the absence of preferred binding sites. Sampling error was determined by the Bootstrap method (4), and the 70% confidence intervals are presented.
  • MutSa and MutLa target search experiments were conducted with double-tethered curtains in 40 mM Tris (pH 7.8), 1 mM DTT, 150 mM NaCl, 1 mM MgC12, 1 mM ADP, and 0.2 mg ml-1 BSA.
  • QD-MutSa (1 -5 iiM) was injected at a flow rate of 5-20 ⁇ /min, and flow was terminated upon visual confirmation that the proteins had begun entering the sample chamber.
  • MutSa- mismatch search experiments MutSa was pre-bound to the mismatch in buffer containing 1 mM ADP, and free proteins were flushed from the sample chamber.
  • QD-MutLa (5-20 iiM) was injected at a flow rate of 5-20 ⁇ mirt-1, and flow was terminated upon visual confirmation that the proteins had entered the sample chamber, A protein was categorized as having undergone a I D search only if there were at least two frames at the beginning of the diffusion trajectory that were at least three standard deviations away from the location of the mismatch. If the proteins initially appeared within this resolution limit, then they were categorized has having undergone an apparent 3D binding event.
  • QD-tagged proteins (MuiScx or the MutSa/MutL complex, as indicated) were first bound to mismatch-bearing DNA molecules in buffer containing 20 mM Tris [pH 7.8], 150 mM NaCL 1 mM ADP, 1 niM MgC12, 1 mM DTT, and 4 mg ml-1 BSA, and the reactions were chased with the same buffer but with the 1 mM ADP replaced with 1 mM ATP (or 1 mM ATPyS, as indicated). Videos were continually recorded at 5- or 10-Hz, and the data manually segregated into populations that either remained stationary, directly dissociated from the DNA, or began diffusing along the DNA.
  • Diffusion coefficients represent the mean ⁇ standard deviation of >25 particle tracking measurements and were calculated from MSD plots as described (1, 2). All diffusion coefficients were based on measurements of protein complexes thai exhibited QD blinking (see below); the reason these measurements are confined to blinking QDs is to help ensure that the reported diffusion coefficients reflect a homogeneous population of molecules all with the same hydrodynamic radii, and minimize variance associated with the reported values is due to heterogeneity in the oligomeric states of the complexes being measured (1, 2). The spatial resolution of our tracking data is limited by Brownian fluctuations of the D A.
  • the QD-MuiSa signals were then tracked, and escape from the lesions was defined as three contiguous frames outside the 3 standard deviations from the tracking noise: the probability of falsely identifying an escape event is on the order of - ⁇ 0-8.
  • 53 remained bound to the lesions and did not escape the lesions (within experimental resolution as defined by 3 standard deviations from the tracking noise).
  • Fig. 33 shows the results from Monte Carlo simulations of a freeiv diffusing molecule with equally spaced absorbing boundaries (e.g.: nicks flanking either side of a mismatch). Boundary distances ranged from 2 to 190 steps away from the origin in the simulations, and 100,000 traces were generated for each boundary distance by selecting forward and backward steps with equal probability. The number of times the origin was encountered before the boundaries was recorded as well as the average number of steps necessary to encounter a boundary. As expected with a freely diffusing molecule, the average number of steps needed to travel a distance of N steps away was N2 (Fig. 33a). Notably, the simulated traces also reveal that a molecule with equally spaced boundaries N steps away will on average cross the origin N-l times (Fig. 33b).
  • !,! x is the average time necessary for a walker starting at position x to reach a site located n away before reaching an opposing site m away
  • the final metric for redundancy in random walks is the number of crossing events, CI , at a particular location rii, given a predetermined number of steps, L.
  • CI L is then related to the sum of the probability, P m, , of the walker occupying the mfh site at each step up to the nth step.
  • P m, n is given by the binomial distribution (9):
  • microscopically bound (MB) to the mismatches.
  • a protein was considered to have released the mismatch when three consecutive position measurements fell outside of the MB region. The probability of this observation in the event that the protein remained at the lesion is (9 10! ! .
  • P l ⁇ the probability that a protein initially bound at site xl on the DNA finds a target (located at the origin) before dissociation.
  • Kadyrov FA Dzantiev L, Constantin N, Modrich P (2006) Endonucleolytic function of MuiLalpha in human mismatch repair. Cell 126:297-308, B5. Kadyrov FA, et al. (2.007) Saccharomyces cerevisiae MutLalp a is a mismatch repair endonuciease, J Biol Chem 282:37181-37190.

Abstract

The invention is related to nucleic acid arrays involving fluid lipid bilayers disposed on the support and methods of using the nucleic acid arrays.

Description

LIPID BiLAYERS ΪΝΑ
[0001] This application claims the benefit of and priority to U.S. Provisional Patent Application No. 61/745, 149, filed December 21, 2012, the contents of which are hereby incorporated by reference in its entirety.
[0002] All patents, patent applications and publications cited herein are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described and claimed herein.
10003] This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.
GOVERNMENT SUPPORT
[0004] The work described herein was supported in whole, or in part, by National Institute of Health Grant Nos. GM074739 and GM082848. The United States Go vernment has certain rights to the invention.
EQUIVALENTS
[0005] Those skilled in the art will recognize, or be able to ascertain, using no more than routine experimentation, numerous equivalents to the specific substances and procedures described herein. Such equivalents are considered to be within the scope of this invention.
BACKGROUND
[Θ006] Recent years have witnessed a dramatic increase in the use of technologies that allow the detailed interrogation of individual biological macromolecules in aqueous environments under near-native conditions. This increase can be attributed to the development and availability of highly sensitive experimental tools, such as atomic force microscopy (AFM), laser and magnetic tweezers, and fluorescence-based optical detection, all of which have all been used to study biological phenomena such as protein folding and unfolding, DNA dynamics, and protein-nucleic acid interactions. SUMMARY OF THE INVENTION
[0007] The invention is based, in part, on the discovery that nucleic acid molecules can be disposed on a substrate and positionally aligned to allow analysis of individual nucleic acid molecules. Accordingly, in one aspect, the invention features an array that includes a substrate and single-stranded nucleic acid molecules attached to the substrate. The single- stranded nucleic acid molecules can be attached to the substrate by means of a linkage, e.g., a linkage between cognate binding proteins, e.g., neutravidin and biotin, or an antibody and antigen (e.g., anti-digoxigenin antibody and digoxigenin); or a crosslinking linkage, e.g., disulfide linkage or coupling between primary amines using gluteraldehyde. In some embodiments, the single-stranded nucleic acid molecules are attached at one end. In some embodiments, the nucleic acid molecules are attached at both ends.
BRIEF DESCRIPTION OF THE DRAWINGS i8| FIG. 1 shows single-molecule DNA. curtain assay for promoter-specific binding by RNA polymerase (FIG. 1 A) Double-tethered DNA curtain assay for organizing substrates on surfaces of a microfluidic device. (FIG. IB) Two-color images of YOYO 1 --stained DNA (green) bound by QD-RNAP (magenta). (FIG. 1C) Schematic of the λ-phage genome (48.5- kb), including relative locations and orientations of promoters aligned with images of QD- RNAP on single DNA molecules (Table 2). As shown in (FIG. 1B-C) most RNAP is bound to the promoters, and the left half of the λ-DNA that lacks promoters is essentially de void of bound proteins. The finding that R AP can locate promoters on stretched DNA molecules eliminates intersegmental transfer as an obligatory component of the promoter search (FIG. 6).
FIG. 2 shows visualization of single molecules of RNA polymerase as they search for and engage promoters (FIG. 2A) Kymograms of RNAP binding to λ-DN A showing kinetically distinct intermediates. DNA is unlabeled, and RNAP is magenta. (FIG. 2B) Representative example of RNAP binding and initiating transcription from XPR; for this assay RNAP was premixed with all four rNTPs immediately prior to injection into the sample chamber (also see Supplementary Fig. 4). Initial binding (t = 0 s) is indicated as purple dot, and magenta bars highlight the first 3-9 seconds of the reaction trajectory. (FIG. 2C) Binding distributions of kinetically distinct intermediates, and corresponding lifetime measurements (insets; also see FIG, 7); a schematic showing the relative promoter location is included. Error bars indicate 70% confidence intervals obtained through bootstrap analysis.37 (FIG. 2D) Kinetic scheme reflecting observed intermediates, NSP, CC, and OC, refer to nonspecifically bound, closed complex, and open complex, respectively; note that CC could also represent another intermediate preceding the open complex.6 Kinetic parameters are not segregated for individual promoters, rather they are considered collectively, and therefore reported values should be considered an average of all λ promoters. (FIG. 2E) Upper bound of observed diffusion coefficients for promoter-bound RNAP, compared to immobilized dig— QDs and other proteins known to undergo ID-diffusion (FIG. 12-13 & Table 4).36,45,47 Diffusion coefficients are gamma distributed, therefore we report the magnitude of the square root of the variance as error bars (n > 50 for all data sets).
[0018] FIG. 3 shows single-molecule kinetics reveal the promoter search is dominated by 3D— diffusion (FIG. 3A) Influence of protein orientation on target association. The angle Θ0 defines the effective DNA-binding surface of QD-RNAP, and Θ defines the orientation of the effective binding surface relative to the promoter. (FIG. 3B) Illustration of linear target size (a), for example where a 2-bp: a 1-bp offset (in either direction) results in target recognition, but a 2-bp offset does not result in target recognition. (FIG. 3C) Relationship between Θ, a and ψ, and their influence on promoter recognition. (FIG. 3D) Observed promoter assocition rates (ka). Dashed magenta line corresponds to k (ψ) a (t) in the absence of faciliated diffusion (for ψ = 0.75-mn), and experimental values above this line reflect rate enhancement due to facilitated diffusion. The boundary between the shaded and unshaded regions of the graph represents the facilitation threshold (Cthr; as indicated). (FIG. 3E) Effective target size (ψ) versus RNAP concentration. The dashed black line highlights the iimiting value of ψ. (f) Rate acceleration (ka/CO) versus RNAP concentration. The difference between the experimental values and k (ψ) a (t) reflects facilitated diffusion, and the orange shaded region represents the maximum possible acceleration due ID-sliding and/or hopping. In (FIG. 3D-F) error bars represent S.E.M. (n > 50 for each data point).
[0011] FIG. 4 shows protein concentration exerts a dominant influence on target searches even for proteins capable of sliding on DNA (FIG. 4A) DNA schematic showing the location of the 5x lac operator. (FIG. 4B) Two-color image of YOYOl-stained DNA (green) bound by QD-lac repressor (magenta). (FIG. 4C) Kymogram showing an example of lac repressor binding to nonspecific DNA and then diffusing in ID to the operator; data were collected at 33 pM lac repressor. The distance between the initial binding site and the operator is indicated as Δχ. (FIG. 4D) ymogram showing an example of direct operator binding in the absence of any detectable ID sliding; data were collected at 800 pM lac repressor. The successful search through 3D binding is highlighted, as are examples of molecules that searched through FD but failed to locate the operator. (FIG. 4E) Graph showing the mean value of Δχ as a function of pro tein concentration for proteins that successfully engage the operator. Inset, percentage of total operator binding events that are attributable to FD (magenta) and 3D (green) at each protein concentration. Error bars represent S.D. of the data (n > 54 for each data point). (FIG. 4F) Graph of Δχ for all observed proteins. Blue data points correspond to proteins that fail to bind the operator, magenta data points are proteins that bind the operator after undergoing FD, and green data points correspond 3D binding to the operator. All green data points within each column overlap at zero, but their fractional contribution to operator binding is shown as green bars in the inset of panel (FIG. 4E). These experiments were all conducted in buffer containing 10 mM Tris-HCi (pH 8.0), 1 rnM MgC12, 1 mM DTT, and 1 mg mM BSAJncreasingly complex environments encountered during in vivo searches
[0012] FIG. 5 is a schematic showing Facilitated diffusion (FD) will be favored at concentrations below the facilitation threshold because the initial encounter with the DI A will most often occur at nonspecific sites, so the probability (P) of target engagement through FD exceeds the probability of engagement through 3D (PFD > P3D). Concentrations equal to or exceeding the facilitation threshold will favor 3D because the relative increase in protein abundance increases the probability of a direct collision with the target site (PFD > P3D). FD-related processes such as sliding/hopping can still occur at high protein concentrations, but those proteins undergoing FD are less likely to reach the target site before those that collide directly with the target. Although the facilitation threshold will vary for different proteins and different conditions, higher protein concentrations will still favor 3D collisions irrespective of the local environment (e.g. the presence of recruitment factors, DNA-bound obstacles, macromolecular crowding, local DNA folding) or global DNA architecture.
[0013] FIG. 6 is a schematic showing the Target Search Problem. Diffusion-based models for how proteins might search for binding targets: random collision through 3D- diffusion (i.e. jumping); I D-hopping, involving a series of microscopic dissociation and rebinding events; ID-sliding, wherein the protein moves without dissociating from the DNA; and intersegmental transfer, involving movement from one distal location to another via a looped intermediate. These mechanisms are not mutually exclusive, and the latter three are categorized as facilitated diffusion because by reducing dimensionality they allow target association rates exceeding limits imposed by 3D-diffusion. DNA is green, the target site (promoter) is blue, and RNAP is magenta.
[0014] FIG. 7 shows lifetime analysis of τ» and Έϊ events. (FIG. 7A) Histogram of lifetimes for QDs only in the absence of RNAP, and the red line is a single exponential fit to the histogram. (FIG. 7B) Shows the same QDonly data, but the y-axis is on a logarithmic scale. (FIG. 7C~D) Histogram of lifetimes for QD-RNAP and corresponding double exponential fit. The first time constant obtained from the double exponential fit (5.6 msec) is the same as is obtained from the single exponential fit to the QD-only data set. (FIG. 7E-F) Histogram of lifetimes for QD-RN AP and corresponding exponential fit for data collected in the absence of DNA. (FIG. 7G) This binding distribution uses the data points presented in Fig. 2c, but the data were restricted to only those e vents that had a lifetime of >40-msec. Based upon the two exponential components obtained from the lifetime measurements, this ensures that most of the events (>93%) plotted in this binding distribution histogram are τι events (i.e. nonspecsfically bound RNAP). (FIG, 71) Semi-log plot of the lifetime distributions for the ¾ events, corresponding to the inset shown in the lower panel FIG. 2C.
[0015] FIG. 8 shows promoter binding by QD-RNAP. (FIG. 8A) Schematic of substrate with a ligated promoter. As a further verification that QD-R AP was binding to promoters in the phage DNA, and not simply associating with the AT-rich right half of the molecule, a 100-bp synthetic DNA fragment (IDT) spanning positions -67 to +24 of s romoter was ligated into the Apal site on the left half of the phage genome. Successful insertion of the promoter fragment destroys the Apal site. The presence of the insert was confirmed by PGR, and products lacking the insert were then selected against by further digestion with Apal prior to assembly of the DNA curtains (molecules that get cut with Apal cannot be assembled into double-tethered curtains). The ligation mixtures contain a heterogeneous mixture of substrates with the promoter fragment inserted in either orientation, as depicted. (FIG. 8B) Binding site distribution. The ligated DNA was used to assess QD- RNAP binding distributions in single-molecule DNA curtain assays. As shown here, the presence of the new "'^ romote fragment resulted in a new peak of QD-RNAP in the binding distribution at the expected location. [0016] FIG. 9 shows transcription by QD-RNAP. Examples of RNAP mo vement along the DN A in the presence of all four rNTPs, and data were collect at room temperature. RN AP and rNTPs were premixed prior to injection into the sample chamber. The trajectories are color coded for each corresponding promoter, and the relative orientation of each promoter is indicated on the left.
[0017] FIG. 10 shows diffusion of lac repressor and T7 RNAP. Kymograms comparing E. coli RNA P to QD-tagged T7 RN AP and lac repressor, both of which can diffuse along D A under low ionic strength conditions.1-3 (FIG. 1ΘΑ) E. coli RNAP compared to T7 RNAP; buffer conditions: 40 mM Tris-HCl (pH 8.0), 0.2 mg ml-1, 5 mM DTT for T7 RNAP and 1 mM DTT for E. coli RNAP. (FIG. 10B) E. coli RNAP compared to lac repressor: buffer conditions: 10 mM Tris HC1 (pH 8.0), 1 mM MgC12, 1 mM DTT, 1 mg ml-1 BSA. The DNA used in the experiments with lac repressor contained a single, 21 -bp symmetric lac operator, as indicated by the arrow.4 (FIG. IOC) E. coli compared to T7 RNAP; buffer conditions: 2.0 mM Tris-HCl (pH 8.0), 25 mM KCi, 1 mM MgC12, 1 mM DTT, 0.2 mg ml-1 BSA.
[0018] FIG. 11 shows RNAP Bead- Aggregates Exhibit 1 D Movement. (FIG. 11 A) Particle-tracking trajectory showing ID diffusive motion of an RNAP-saturated bead (1.0 μηι) bound to a DNA molecule in the absence of buffer flow. (FIG. 11B) Trajectory of an RNAP-saturated bead (1.0 um) when buffer flow (0.4 ml min- 1) was applied in the direction indicated by the arrowhead. (FIG. 11 C) A typical trajectory of QD-tagged RNAP bound to DNA is shown for comparison.
[0019] FIG. 12 shows RNAP and dig-QD Diffusion Coefficient Data. (FIG. 12A) Shows a comparison of the single molecule and ensemble diffusion coefficients obtained for QD- tagged RNAP and an immobilized dig-QD (this study), along with reported values for lac repressor 2, p53 5, and Mlhl -Pmsl 6. (FIG. 12B) Magnified view of the RNAP and dig-QD data sets. Red circles represent diffusion coefficients obtained from all individual particle- tracking trajectories for RNAP, and blue circles represent diffusion coefficients from dig-QD trajectories collected and analyzed under identical conditions. Squares represent ensemble values for the diffusion coefficients obtained from the cumulative tracking data along with corresponding error bars. Diffusion coefficients are gamma distributed, therefore we report the magnitude of the square root of the variance (error bars). [0028] FIG. 13 shows diffusion Coefficients and DNA Fluctuations. (FIG. 13A) Cartoon illustration of DNA motion giving rise to the apparent diffusion coefficients for the stationary dig-QDs The underlying fluctuations of the DNA were analyzed by linking a single QD to a fixed digoxigenen tag covalently attached to the double-tethered DNA molecules.7 (FIG. 13B) Distributions of single- frame displacements for data collected at either 5 or 10 Hz (as indicated) for the entire dig-QD data set. The distributions have been normalized, and the overlay is a Gaussian fit generated using the mean and standard error of the distribution. The number of individual displacements is indicated. (FIG. 13C) Reference graphs showing the mean squared displacement analysis of the stationary dig-QD particles.
[0021] FIG. 14 shows activity of RNAP Under Dilute Conditions. (FIG. 14A) Gel shift assay showing RN AP and promoter association. RNAP was titrated into reactions containing 0.4 nM Cy-3 labeled promoter DNA fragment and then challenged with heparin to disrupt protein-DNA complexes that had not formed open complexes. The right two lanes are examples of negative controls containing different amounts of input DNA, which was used to calibrate band intensity. (FIG. 14B) Quantitation showing the fraction of bound promoter DNA fragment as a function of the RNAP to DNA ratio, c, Stability of RNAP under dilute conditions. RNAP was diluted to 0.6 nM and incubated at room temperature in the absence of DNA. After the indicated time intervals, the samples were assayed for DNA binding activity using the Cy3- labeled E promoter DNA fragment. Bound and unbound D A fractions were separated by native gel electrophoresis and quantitated based on the fluorescence intensity of the bands.
[0022] FIG. 15 shows parallel Array of Double-tethered Isolated (PARDI) Molecules. (FIG. ISA) Schematic diagram of the new PARDI DNA curtain design used for the promoter association rate measurements. (FIG. 15B) Optical image highlighting nanofabricated PARDI pattern design. (FIG. 15C) Image of a typical PARDI field-of-view, showing the double-tethered, YOYO 1 -stained DNA molecules. (FIG. 15D) Histogram showing the measured distances between neighboring DN A molecules anchored to the P ARDI patterned surface.
[0023] FIG. 16 shows promoter Association Rate Analysis. (FIG. 16A) Illustration of a PARDI curtain, where each DNA is numbered. Overlapping DNA and DNA molecules closer than 7-μηι are excluded from analysis, (FIG. 16B) A kmyogram is made for each DNA in the field- of-view, and the time required for def ection of the first promoter bound protein on each individual DNA is extracted from the kymograms (e.g.: Ll Λ Ϊ2: for DNA #1 & #2, respectively). Importantly, we are not measuring the rate of either closed complex (cc;
schematically represented as red lines) formation or open complex (oc; schematically represented as magenta lines) formation, but rather we are measuring the exact instant (with 100 msec resolution) at which a single molecule of RN AP is detected at a promoter for the molecules of RNAP that subsequently make a successfully transition to open complexes. Once one promoter in occupied, ail subsequent binding events on that same DNA molecule are excluded from further analysis (e.g.: iexCiudec for DNA molecules #2 & #34): therefore in this example, we could obtain a maximum of 34 data points. These restrictions ensure we only record binding events that occur when all promoters on a given DNA are initially- accessible for binding by RNAP, in accordance with calculation parameters.
[0024] FIG. 17 is a schematic of a DNA molecule tethered to lipid-coated flow cell.
[0025] FIG. 18 is a schematic of the procedures for making ssDN A curtains. (FIG. 18A) ssDNA is generated by rolling circle replication. (FIG. 18B) Agarose gel sho wing the products of rolling circle replication; note that the ssDNA generated in these assays is too long to verify its length by electrophoresis. (FIG. 18C) For single-tethered curtains, biotinylated ssDNA is anchored to a single lipid within the bilayer, and the DNA is then aligned at barriers through the application of hydrodynamic force. RPA-GFP is then introduced into the flow cell to label the DNA. and remove secondary structure. (FIG. 18D) For double-tethered curtains, the RPA-ssDNA is nonspecific ally adsorbed to exposed anchor points downstream from the linear diffusion barriers.
[0026] FIG. 19 shows single-tethered ssDNA curtains. (FIG. 19A) Kymogram showing RPA-dependent extension of an ssDNA substrate; ScRPA-eGFP was injected at time zero, the eGFP signal is in green, and the location of the linear barrier is indicated as "b". (FIG. 19B) 'Transient pause of fow confirms that the scRPA-eGFP-ssDNA is not stuck to the sample chamber surface. (FIG. 19C) Full-field view of ssDNA molecules labeled with scRPA-eGFP. The six linear barriers are marked bl— b6. Image was collected while buffer was flowing through the sample chamber. (FIG. 19D) scRPA-eGFP remains bound to the ssDNA for long periods of time. (FIG. 1 E) The scRPA-eGFP-ssDNA complex is resistant to bu"ers containing denaturant (3.5 M urea); note that the background increases while urea is flushed through the sample chamber, likely due to protein being stripped of the mierofluidics upstream of the observation area.
[mil] FIG. 20 shows double-tethered ssDNA curtains. (FIG. 20A) Full-field view of extended scRPA-eGFP labeled ssDNA anchored by both ends to the sample chamber surface, as illustrated in Figure 3 SB; the linear barrier and anchor are indicated as "b" and "a", respectively. Image was collected in the absence of flow. (FIG. 20B) Kymogram of a double-tethered ssDNA in the presence and absence of buffer flow, as indicated, confirming the molecule remains confined within the evanescent field even in the absence of an externally applied hydrodynamic force. (FIG. 20C) Kymogram showing QD-tagged Sg l bound to an ssDNA molecule. The ssDNA is in green (scRPA-eGFP; upper panel), the QD- tagged Sgsl is shown in magenta (middle panel), and an overlay of the ScRPA-eGFP and QD-Sgsl is also shown (bottom panel). (FIG. 20D) Kymogram shows an example where the ssDNA spontaneously breaks during observation. Both the ssDNA and the Sgsl immediately diffuse out of view, confirming they are not nonspecific-ally adsorbed to the surface.
[0028] FIG. 21. Mismatch recognition by MutSct, (FIG. 21A) Schematic of single- tethered DNA curtains. DNA substrates are anchored to the bilayer and aligned along nanofabricated barriers, (FIG. 21B) Images of a three-tiered DNA curtain with flow on (Left), during a transient pause in flow (Center), and after flow has been resumed (Right). Flo w is from top to bottom; DN A is green, and proteins are magenta. The location of the three tandem G/T mismatches (MM) is indicated. (FIG. 21C) Kymogram generated from a single DNA molecule subjected to transient pauses in buffer flow (light blue arrowheads) followed by quickly resuming flow (green arrowheads). (FIG. 21D) Distribution of MutSa bound to mismatch-containing DNA. Error bars in this and subsequent figures represent the SD from N bootstrap samples (B44).
[0029] FIG. 22. Mechanisms of mismatch targeting by MutSa, (FIG. 22A) Schematic of the double-tethered DNA curtains. DNA substrates are anchored by one end to the lipid bilayer, are aligned along the nanofabricated barriers, and then are anchored at their downstream ends through a digoxigemn-antibody linkage. (FIG. 22B) Example of MutSa undergoing I D diffusion until encountering the lesion. MutSa is magenta, the DNA is not labeled, and gaps in the trajectories reflect QD blinking. The lower panels highlight the first few seconds of the trajectory. (FIG. 22C) Example of MutSa capturing the mismatch through a direct 3D diffusion. Experiments in B and C were conducted with double-tethered curtains, and flow was terminated after MutSa entered the sample chamber.
[0038] FIG. 23. ATP binding provokes ID diffusion of mismatch- bound MutSa. (FIG. 23A) Models showing how MutSa might search for strand-discrimination signals (SS). (FIG. 23B) Kymogram and tracking showing the response of mismatch-bound MutSa upon injection of 1 mM ATP. Experiments were conducted with double -tethered curtains, MutSa was prebound to the mismatch, ATP was injected at 0.1 ml. min-1, and flow was terminated after ATP entered the sample chamber. The DNA was not labeled. "Flow on" indicates when ATP injection was initiated, and "ATP arrival" indicates when ATP entered the sample chamber. The difference between these time points corresponds to the dead volume of the microfluidics. (FIG. 23C) Response of mismatch-bound MutSa upon injection of 1 mM ATPyS. (FIG. 23D) Example of spontaneous, A TP-independent release of MutSa, followed by ATP-dependent release
[0031] FIG. 24. Coiocaiization of MutLa with mismatch-bound MutSa. (FIG. 24A) MutLa binding to mismatch-bound MutSa on single-tethered DNA curtains. MutSa was bound to the mismatch, followed by injection of MutLa into the sample chamber. MutLa and MutSa were labeled with different colored QDs. (Top) MutSa. (Middle) MutLa. (Bottom) Overlay with MutSa (magenta) and MutLa (green). The DNA was not labeled. Imperfect correspondence between all individual QD green/magenta pairs reflects the presence of "dark" proteins. (FIG. 24B) Kymogram generated from a single D A molecule showing that MutLa remains stationary and colocalized with mismatch-bound MutSa over time; the green (MutLa) and magenta (MutSa) signals appear white in the overlay. Blue and green arrowheads indicate transient pauses in buffer flow, and the coincident disappearance of the QD signals verifies that neither protein was stuck to the sample chamber surface. (FIG. 24C) Distribution of QD-tagged MutLa in the presence of QD-iagged MutSa. (FIG. 24D) Distribution of QD-tagged MutLa with unlabeled MutSa. Insets in C and D show kymograms illustrating that MutSa/MutLa remains at the mismatch. (FIG. 24E) Distribution of QD- tagged MutLa on a single-tethered curtain in the absence of MutSa. MutLa diffuses on DN A continually in the absence of MutSa (Inset), so the distribution histogram in E represents the instantaneous distribution of mobile MutLa molecules, whereas the distribution peaks observed in C and D represent proteins that are stably bound to the lesions and are not moving along the DNA [0032] FIG. 25. Target-search mechanisms of MutLa and the MutSa/MutLa complex. (FIG. 25A) Kymograms and tracking data showing examples of QD-tagged MutLa (green) engaging mismatch-bound MutSa (MM-MutSa; unlabeled) after undergoing a ID or 3D target search. The DNA was not labeled. (FIG. 25B) Kymogram and tracking showing that MutLa does not stop at MutSa in the absence of a mismatch. (FIG. 25C) Kymogram showing that MutSa and MutLa do not establish stable interactions with one another on homoduplex DNA. (FIG. 25D) Kymogram and tracking showing ATP-triggered release of lesion- bound MutSa/MutLa and subsequent ID diffusion along the flanking DNA, In the kymogram, the MutSa (magenta) and MutLa (green) signals appear white in the overlay. In the graph, MutSa (magenta) and MutLa (green) were tracked independently, and the tracking data were superimposed.
[0033] FIG. 26. Intersite transfer of MMR proteins between juxtaposed DNA molecules, (FIG. 26A) Schematic of a crisscrossed DNA curtain. (FIG. 26B and FIG. 26C) Optical images showing pattern elements and TTRFM images showing examples of crisscrossed DNA molecules, insets illustrate positions of each DNA molecule. (FIG. 26E-G) Behavior of MutLa (magenta) upon encountering a crisscrossed DNA junction. (FIG . 26E) Integrated trajectory. (FIG. 26F) Tracking data, (FIG. 26G) Tracking data superimposed on the DNA axes. In G, DN A molecules are shown as blue lines; the green circle identifies the DNA junction within a 90% confidence interval. Tracking data are color-coded according location relative to the crisscross. The color-coded bar shows the relative location of protein over time. (FIG. 26H) MutSa before encountering a lesion. (FIG. 261) MutSa after ATP-triggered release from a mismatch. (FIG. 26J) MutSa/MutLa after ATP-triggered mismatch release; MutLa was QD-tagged, and MutS was untagged. In I and J the zero time points correspond to the location of the lesions, and the longer time trajectories for these datasets reflect the longer DNA-binding lifetimes of MutSa and MutSa/MutLa after ATP-triggered release from mismatches
[0034] FIG. 27. Schematic for early stages of MMR. (FIG. 27A) Model summarizing how MutSa finds lesions, how MutLa locates lesion-bound MutSa, and how the
MutSa/MutLa scans the flanking DNA by ID diffusion after ATP-triggered release from a lesion. (FIG. 27B) MutSa structural changes predicted upon ATP-triggered release from a lesion. (Left) The structures represent front and side views of mismatch-bound human MutSa (Protein Data Bank ID code 208B) in which the protein complex (gray) is wrapped around the DNA (green) with domain I of Msh6 (magenta) engaged with the mismatched base (B33), (Right) Hypothetical structures were obtained by rigid-body rotation of Msh6 domain I out of the major groove to illustrate how retraction of Msh6 domain I out of the DNA major groove might allow the release of MutSa from the mismatch and still allo the protein to remain tightly wrapped around the DNA while enabling ID diffusion in the absence of an obligatory rotational component.
10035] FIG. 28, Construction and characterization of mismatch substrate, (FIG. 28a) Overview of λ 13- DNA construction. Highlights sites used for inserting the new DNA fragments, as well as the number and arrangement of the Nt. BspQI nickase sites, and restriction sites for Ncol and SwaL ( FIG. 28b) Schematic of oligonucleotide insertion strategy. The A.I3-DNA is green, and the nicking sites are indicated with arrowheads. After treatment with Nt. BspQI , the λ-DNA is mixed with an excess of the appropriate
oligonucleotide (magenta), and then briefly heated and cooled to replace the nicked fragments with the new oligonucleotide. (FIG. 28c) Restriction analysis of the λΙ3- DNA substrates. The XT3-DNA substrate has three unique restriction fragments not present in the wt phage: digestion with Swal yields a 13,828 bp fragment (purple asterisk) and digestion with Ncol liberates two fragments 9,959 bp (red asterisk) and 5,671 bp (blue asterisk) in length. Insertion of the mismatch eliminates the two fragments produced by digestion with Ncol (because the mismatch disrupts the Ncoi site), but does not affect the SwaT fragment.
[0036] FIG. 29. Half-life of MutSa bound at mismatches and MutLa bound to the MutSa- mismatch complex. (FIG. 29a) QD-tagged MutSa was bound to the 3x mismatches on a singletethered DNA curtain as shown in Fig. 21 of the main text. The lesion-bound proteins were then chased with buffer that lacked additional free protein and contained 150 mM NaCl and 1 mM ADP (along with 20 mM Tris [pH 7.8], I mM MgC12, 1 mM DTT, and 4 mg ml-1 BSA). The number of proteins that remained bound to the mismatch was measured at defined time intervals, and the resulting data were fit with single exponential curves to determine the half-lives of the lesion-bound proteins, yielding a value of 9.6±1.5 minutes for lesion-bound MutSa. (FIG. 29b) QDtagged MutLa was bound to the mismatch-MutSa (untagged) complex as shown in Fig. 24d of the main text. The lesion-bound MMR proteins were then chased with buffer that lacked additional free protein and contained 150 mM NaCl and I mM ADP (along with 20 mM Tris [pH 7.8], 1 mM MgCI2, I mM DTT, and 4 mg ml- 1 BSA). The number of QD-tagged MutLa proteins that remained bound to the DNA was measured at defined time intervals, and the resulting data were fit with single exponential curves to determine the half-lives of the lesion-bound proteins, yielding a value of 7.8±G.4 minutes.
[0037] FIG. 30. Mismatch targeting by MutSa. (FIG. 30a) Shows 10 representative examples of tracking data (magenta) for molecules of MutSa that engaged the mismatches through a ID search. The initial binding positions of the proteins are indicated with blue arrowheads, the location of the mismatches is indicated as MM and a green line, and l esion engagement is indicated with black arrowheads. (FIG. 30b) Shows a map of the initial binding sites for all observed molecules of MutSa that bound to the lesions. Gray arrow heads correspond to proteins that bound directly to the mismatches through an apparent 3D mechanism (within our optical resolution limits) and blue arrowheads correspond to proteins that bound to nonspecific DNA sites and slid in ID along the DNA to engage the lesions. (FIG. 30c) Shown are five representative examples of MSD plots generated from the tracking data of MutSa as it searches for lesions.
[0038] FIG. 31. ID diffusion of mismatch-bound MutSa following ATP or ATPyS chase. (FIG. 31 a) Shows 5 representative examples of tracking data i llustrating the behavior of mismatch bound MutSa after the injection of 1 mM ATP, and (FIG. 31b) shows 5 representative examples of tracking data illustrating the behavior of mismatch bound MutSa after the injection of 1 mM ATPyS. Gaps in the tracking data correspond to portions of the trajectories that could not be accurately tracked due to QD blinking or changes in background intensity, and the end-points in the tracking data correspond to dissociation of the proteins from the DNA. In both (FIG. 31a) and (FIG. 31b), biack arrowheads indicate when ATP or ATPyS enters the microfluidic sample chamber, blue arrowheads indicate when the protein dissociates from the DN A, and for traces lacking blue arrowheads the proteins remained bound to the DNA and continued diffusing beyond the data collection window. (FIG, 31c) Shown are five representative examples of MSD plots generated from the tracking data of MutSa after being released from lesions upon chasing with ATP. (FIG. 31d) Distribution of lifetimes for MutSa/MutLa after being released from the mismatches upon ATP injection (N=32). These values yield a lo wer bound on the lifetime of the diffusing MutSa/MutLa complex after ATPtriggered release from the lesions of tl/2>198±23.4 seconds. Note that this value is a lower bound because -40% of the observed molecules did not dissociate from the DNA during the observation windows, rather they remained bound to the DNA and kept diffusing.
[0039] FIG. 32. ATP chase of mismatch-bound MutSa in lo salt buffer. Data were collected in 1 mM ADP and 50 mM NaCl (along with 20 mM Tris [pH 7.8], 1 mM MgC12, 1 mM DTT, and 4 mg ml-1 BSA), and then chasing the bound proteins with the same buffer with the exception that 1 mM ADP was replaced with 1 mM ATP, as indicated by the dashed line. Upon chasing with ATP at 50 mM NaCl, 18% of the proteins began diffusing along the DNA, 4% directly dissociated, and 78% remained stationary at the lesions (N=78). The tracking data above highlight ten representative examples of different MutSa molecules, including 8 that remained stationary and 2 that exhibited ID excursions away from the mismatches. Notably, one of the two molecules that escaped from the mismatch also rebound to the mismatch following brief ID excursions (second trace from the bottom). This is the only example of lesion rebinding by MutSa that we have seen in the presence of ATP, and could suggest that the complexes remain in the ADP bound state under these low salt conditions; however, we are hesitant to draw conclusions given the rarity of this observation.
[0040] FIG. 33. Redundant nature of ID-diffusion, (FIG. 33a-b) Results of simulations of a ID random walk, which were used to reveal the average number of steps (N) necessary to move a given distance along a I D lattice (FIG. 33a), and difference between the number of steps (N) taken and the average number of origin (or mismatch) crossings (FIG. 33b). (FIG. 33c) Theoretical number of origin crossings versus the number of steps taken.
[0041] FIG. 34. Spontaneous mismatch escape and return by MutSa. Data were collected by monitoring lesion-bound MutSa in buffer containing 1 mM ADP and 150 mM NaCl (along with 20 mM Tris [pH 7.8], 1 mM MgC12, I mM DTT, and 4 mg ml-1 BSA). The proteins were continuously observed for at ten minutes at 10 Hz. The tracking data (FIG. 34a) highlight examples of different MutSa molecules that spontaneously escaped from the mismatched bases. The MutSa trajectories are shown in magenta, and the location of the mismatches (MM) is indicated by the green line. (FIG. 34b) Inset, theoretical calculation for the sliding distance after spontaneous mismatch release based on ID random walk, revealing a mean excursion distance of„! ! !"# = 22.5-bp and a mean return time of t! ! !"#= 1.5 milliseconds. Results are segregated into microscopically observable (magenta) and submicroscopic regimes (black), and the blue line represents a fit to the calculations. Larger graph shows the theoretical expectation based on simulations of 10,000 random walkers and the resulting data are displayed at the resolution of our experimental data (see below), yielding mean observable excursion distances and times of Z! ! !"#,! "#=2,014-bp and
£! ! !"#, !"#=! 1.5-seconds; the submicroscopic regime is omitted from this plot because it would not be experimentally observable. (FIG. 34c) Experimental observations. The experimentally observed diffusion trajector es are segregated into proteins that freely d ffused on the DNA (cyan), and those whose diffusion distance was limited by collisions with either other proteins on the DNA (yellow) or the chromium barriers (purple). The experimental data yielded observed excursion distances and times of Z!" H3, 134-bp and t!"#=30.7-seconds, which are in good agreement with the theoretical expectations, (FIG. 34d) Inset, theoretical calculation for the return time after spontaneous mismatch release based on ID random walk. The results are segregated into microscopically observable regime (magenta), the submicroscopic regime (black), and the blue line represents a fit to the calculations and is the same in all three graphs. Larger graph shows the theoretical expectation and the resulting data are displayed at the resolution of our experimental data; the submicroscopic regime is omitted because it would not be experimentally observable. (FIG. 34e) Experimental observed return times.
[0042] FIG. 35. Mismatch- MutSa targeting by MutLa. (FIG. 35a) Shows 10 representative examples of tracking data for molecules of MutLa that engaged mismatch-bound MutSa through a 1 D search. The initial binding positions of the proteins are indicated with blue arrowheads, the locations of the MutSa-mismatches are indicated, and lesion engagement is indicated with black arrowheads. (FIG. 35b) Shows a map of the initial binding sites for all observed molecules of MutLa that bound to mismatch-bound MutSa. Gray arrow heads correspond to MutLa proteins that bound directly by apparent 3D collisions to the mismatch- bound MutSa (within optical resolution limits) and blue arrowheads correspond to MutLa proteins that bound to nonspecific DNA sites and slid in ID along the DNA to engage the mismatch-bound MutSa. (FIG. 35c) Five representative examples of MSD plots generated from the tracking data of MutLa as it searches for lesion-bound MutSa,
[0043] FIG. 36, ATP-triggered release of the mismatch-bound MutSa/MutLa complex. (FIG. 36a) Ten representative examples of tracking data illustrating the behavior of mismatch bound MutSa after injection of ATP, and including complexes that exhibited blinking of QD-MutLa (top eight traces) as well as nonblinking QD-MufLa (bottom two traces; QD-MutS exhibit blinking in all observed cases). The black arrowheads indicate when ATP entered the fioweells. Gaps in the tracking data correspond to portions of the trajectories that could not be accurately tracked due to QD blinking or changes in background intensity. The ends of the traces correspond to the end of the data collection, and do not correspond to protein dissociation from the DNA. (FIG. 36b) Five representative MSD plots for the MutS/MutL complex after ATP-triggered lesion release. (FIG. 36c) The distribution of lifetimes measured for MutSa/MutLa after being released from the mismatches upon ATP injection (N-18). These values yield a lower bound on the lifetime of the diffusing
MutSa/MutLa complex after ATP-triggered release from the lesions of tl/2>267.6±62.1 seconds. Note that this value is a lower bound because -70% of the observed complexes did not dissociate from the DNA during the observation windows, rather they remained bound to the DNA and kept diffusing, therefore they are not included in the histogram.
[0044] FIG. 37, Barrier patterns for crisscrossed DNA curtains. (FIG. 37a) Schematic of the two-channel fiowcell. (FIG. 37b) Low magnification (l Ox) optical image of the chromium (Cr) patterned surface. (FiG. 37c and d) High-resolution SEM (scanning electron microscope) images of the Cr patterns. (FIG. 37e) High magnification (lOOx) optical image of a single pattern. Important elements of the pattern design are highlighted. (FIG. 37f) AFM (atomic force microscope) image illustrating barrier height. (FIG. 37g) This graph shows the calculated distance between two crisscrossed DNA molecules suspended 20-nm above a solid surface (represented in red) as determined from a 20-minute simulation with 200-nsec time steps. The position for the lower DNA is shown in blue, and the upper DNA is shown in green. The average positions pi of the DNA molecules relative to the bilayer surface, and dispersion pl l - pi about these positions, are indicated in (FIG. 37a). The average positions of the DNA relative to one another , and the corresponding dispersion SI - S ! about these positions, are shown in (FIG. 37h)
DETAILED DESCRIPTION OF THE INVENTION
[0045] The present invention is based in part on the discovery that single-stranded nucleic acid molecules can be disposed on a substrate and positionally aligned to allow analysis of individual single-stranded nucleic acid molecules. In particular, the methods and compositions described herein include a substrate, coating material, e.g., a lipid bilayer, and single-stranded nucleic acid molecules attached directly to the substrate, attached to the substrate via a linkage, or attached to the lipid layer via a linkage. The single-stranded nucleic acids are capable of interacting with their specific targets while attached to the substrate, and by appropriate labeling of the nucleic acid molecules and the targets, the sites of the interactions between the targets and the nucleic acid molecules may be derived.
Because the single-stranded nucleic acid molecules are positionally defined, the sites of the interactions will define the specificity of each interaction. As a result, a map of the patterns of interactions with single-stranded nucleic acid molecules on the substrate is convertible into information on specific interactions between single-stranded nucleic acid molecules and targets.
Preparation of Substrate
[0046] Essentially, any conceivable substrate may be employed in the compositions and methods described herein. The substrate may be biological, nonbiological, organic, inorganic, or a combination of any of these, existing, e.g., as particles, strands, precipitates, gels, sheets, tubing, spheres, containers, capillaries, pads, slices, films, plates, or slides. The substrate may have any convenient shape, such as, e.g., a disc, square, sphere or circle. The substrate and its surface can form a rigid support on which to carry out the reactions described herein. The substrate can be, e.g., a polymerized Langmuir Blodgett film, functionalized glass, Si, Ge, GaAs, Gap, Si02, Si 4, modified silicon, or any one of a wide variety of gels or polymers such as (poly)tetrafluoroethylene, (poly)vinylidenedifiuoride, polystyrene, polycarbonate, or combinations thereof. Other substrate materials will be readily apparent to those of skill in is the art upon re v ie w of this disclosure. In some embodiments, the substrate is a made of Si(¾ and is flat.
[0047] In some embodiments, the substrate is coated with a linker to which the nucleic acid molecules attach. Such linkers can be, e.g., chemical or protein linkers. For example, the substrate can be coated with a protein such as neutravidin or an antibody.
[0048] In some embodiments, the substrate includes a diffusion barrier, e.g., a mechanical, chemical or protein barrier. Diffusion barriers can be prepared by applying barrier materials onto the substrate prior to deposition of the lipid bilayer; the bilayer then forms around the barriers, A mechanical barrier can be, e.g., a scratch or etch on the substrate, which physically prevents lipid diffusion.
[0049] In the case of a chemical barrier, the chemical nature of the barrier, and not its surface topography, is the primary factor in preventing lipid diffusion. Barrier materials can be made that are similar to the thickness of the bilayer itself (e.g., 6-8 nm), or thinner than the biiayer. Protein barriers can be deposited onto substrates, e.g., SiO? substrates, by a variety of methods. For example, protein barriers can be deposited in well-defined patterns by a process called microcontact printing. Microcontact printing uses a PDMS
(poly[dimethylsiloxane]) template as a stamp for generating specific patterns on substrates. PDMS stamps can transfer proteins to a Si02 substrate in patterns with features as small as 1 μτη, and thicknesses on the order of 5-10 run. The PDMS stamps used for microcontact printing can be made, e.g., by soft-lithography as described previously . Once made, the PDMS can be incubated wiih a solution of protein, dried, and then placed into contact with the substrate, e.g., Si02, resulting in transfer of the protein "ink" from the PDMS stamp to the substrate and yielding a pattern defined by the stamp design. For example, protein barriers can be made from fibronectin.
[0058] To the substrate is then attached a layer of a material. In one embodiment, the material is one that renders the substrate inert. For example, the material can be lipids, forming, e.g., a lipid biiayer. In another embodiment, the layer is made of zwitterionic lipids. A lipid biiayer can be deposited onto the substrate by applying liposomes to the substrate. Liposomes can be produced by known methods from, e.g., 1 ,2-dioleoyl-sn-glycero-3- phosphocholine (DOPC) or 0.5% biotin-phosphatidylethanolaniine (biotein-PE) plus 99.5% DOPC (A anti Polar lipids, Alabaster, AL). In some embodiments, the lipid biiayer can include polyethylene glycol (PEG). For example, in embodiments where quantum dots are used to label nucleic acid molecules and/or polypeptides, PEG can be included in the lipid biiayer. PEG can also be included to make the surface of the biiayer inert to reagents added to the array.
Tethering Nucleic Acid Molecules
[0051] As described herein, the nucleic acid molecules can be attached to the substrate, to the lipid biiayer, or to the non-linear, geometric diffusion barrier, to form an array. The nucleic acid molecules can be attached by a linkage either at one end of the nucleic acid molecule or at both ends. For example, when a protein is coated on the substrate prior to the deposition of the lipid biiayer, the nucleic acid molecule can be linked to a cognate protein that binds to the protein coated on the substrate, In one embodiment, the substrate is coated with neutravidin and the nucleic acid molecule linker is biotin. Linkers can be added to the nucleic acid molecules using standard molecular biology techniques known to those of ordinary skill in the art. [0052] Alternatively, the nucleic acid molecule can be linked to the lipid bilayer. In one embodiment, the lipid bilayer is deposited onto the substrate and a protein, e.g., neutravidin, is linked to the lipid head groups. Biotinylated nucleic acid molecules are then introduced, linking the nucleic acid molecules to the lipid bilayer.
[0053] In other embodiments, the nucleic acid molecules can be linked to the nonlinear, geometric diffusion barriers. In one embodiment, the diffusion barrier is a protein, e.g., biotinylated bovine serum albumin (BSA), deposited on the substrate. Neutravidin is then bound directly to the biotinylated BSA protem barriers, and biotinylated nucleic acid molecules are linked to the biotinylated BSA protein barriers. Other known protein-cognate protein pairs can be used in the methods described herein. For example, antibodies, e.g., anti- digoxigenin antibodies, can be used as protein barriers and the cognate antigen, e.g., digoxigenin, linked to the nucleic acid molecule. In another embodiment, one end of the nucleic acid molecule is attached by a linkage, for example to the substrate or to a non-linear, geometric diffusion barrier. In a further embodiment, both ends of the nucleic acid molecule are attached by linkages, for example, to the substrate, to a non-linear, geometric diffusion barrier, or to a combination of the two surfaces. Double-tethered DNA substrates can be used for visualizing ID diffusion For example, DMA molecules can be biotinyiaied at both ends. While a constant, moderate hydrodynamic flow force is applied, DNA is suspended above an inert lipid bilayer. The only interaction between the DNA and the surface is through the biotinylated ends of the molecule. For example, 80% extension of the DNA molecule corresponds to -0.5 pN of feree (e.g., where the DNA is not distorted).
[0054] In some embodiments, attaching both ends of a nucleic acid molecule to the barriers, inert substrate, or a combination thereof, can generate a "rack,". In some embodiments, the "rack" can be generated by reversibly anchoring the entire contour length of the nucleic acid molecule (e.g., a DNA molecule) to the lipid bilayer of an array described herein by exposing the nucleic acid molecules to an effective calcium concentration. In one embodiment, the calcium concentration is at least about 0.5 mM, at least about I mM, at least about 1 .5 mM, at least about 2 mM, at feast about 2.5 mM, at least about 3 mM, at least about 3,5 mM, at least about 4 mM, at least about 4.5 mM, at least about 5 mM, at least about 5.5 mM, at least about 6 mM, at least about 6.5 mM, at least about 7 mM, at least about 7.5 mM, at least about 8 mM, at least about 8,5 mM, at feast about 9 mM, at least about 9.5 mM, at least about 10 mM, or at least about 10.5 mM. Generation of Single- Stranded Nucleic Acid Molecules
[0055] As described herein, single-stranded nucleic acid can be generated by in vitro rolling circle replication. In one embodiment the DNA polymerase ©29 can be used to generate single-stranded nucleic acids by rolling circle replication using a circular single- stranded nucleic acid template. For example, but not limited to, the M13mpl8 template. In some embodiments, rolling circle replication can occur on the solid support.
Labeling Nucleic Acid Molecules and Polypeptides
[0056] In another embodiment, the attached nucleic acid molecules and/or the interacting nucleic acid molecules or polypeptides are visualized by detecting one or more labels attached to the nucleic acid molecules or polypeptides. The labels may be incorporated by any of a number of means well known to those of skill in the art. The nucleic acid molecules on the array can be coupled to a nonspecific label, e.g., a dye, e.g., a fluorescent dye, e.g., YOYO l (Molecular Probe, Eugene, OR), TOTOl, TO-PRO, acridine orange, DAPI and ethidium bromide, that labels the entire length of the nucleic acid molecule. The nucleic acid molecules can also be labeled with Quantum dots, as described herein,
[0057] In another embodiment, the nucleic acid molecules, e.g., the nucleic acid molecules on the array or target nucleic acid molecules, can be coupled to a label at defined locations using known methods. The label can be incorporated during an amplification step in the preparation of the sample nucleic acids. For example, polymerase chain reaction (PGR) with labeled primers or labeled nucleotides will provide a labeled amplification product. The nucleic acid molecule is amplified in the presence of labeled deoxynucleotide triphosphates (dNTPs).
[0058] Alternatively, a label may be added directly to the nucleic acid molecule or to an amplification product after an amplification is completed. Means of attaching labels to nucleic acids include, for example, nick translation or end-labeling (e.g. with a labeled RNA) by kinasing of the nucleic acid and subsequent attachment (ligation) of a nucleic acid linker joining the sample nucleic acid to a label (e.g., a fluorophore).
[0059] Detectable labels suitable for use in the methods and compositions described herein include any composition detectable by spectroscopic, photochemical, biochemical, immunochemical, electrical, optical or chemical means. Useful labels in include biotin for staining with labeled streptavidin conjugate, magnetic beads (e.g., Dynabeads™), fluorescent dyes (e.g., fluorescein, Texas red, rhodamine, green fluorescent protein, and the like, see, e.g., Molecular Probes, Eugene, OR), radiofabels (e.g., 3H, 125I, JDS, l4C, or 32P), enzymes (e.g., horse radish peroxidase, alkaline phosphatase and others commonly used in an ELISA), and colorimetric labels such as colloidal gold (e.g., gold particles in the 40-80 nm diameter size range scatter green light with high efficiency) or colored glass or plastic (e.g., polystyrene, polypropylene, latex, etc.) beads. Patents teaching the use of such labels include U.S. Pat. Nos. 3,817,837; 3,850,752; 3,939,350; 3,996,345; 4,2.77,437; 4,275,149; and 4,366,241.
[0068] In some embodiments, fluorescent labels are used. The nucleic acid molecules can all be labeled with a single label, e.g., a single fluorescent label. Alternatively, different nucleic acid molecules have different labels. For example, one nucleic acid molecule can have a green fluorescent label and a second nucleic acid molecule can have a red fluorescent label.
[0061] Suitable ehromogens which can be employed include those molecules and compounds that absorb light in a distinctive range of wavelengths so that a color can be observed or, alternatively, which emit light when irradiated with radiation of a particular wave length or wave length range, e.g., fluorescers.
[0062] A wide variety of suitable dyes are available, being primary chosen to provide an intense color with minimal absorption by their surroundings. Illustrative dye types include quinoline dyes, triaryim ethane dyes, acridine dyes, alizarine dyes, phthaleins, insect dyes, azo dyes, anthraquinoid dyes, cyanine dyes, phenazathionium dyes, and phenazoxonium dyes.
[0063] A wide variety of fluorescers can be employed either by alone or, alternatively, in conjunction with quencher molecules. Fluorescers of interest fall into a variety of categories having certain primary functionalities. These primary functionalities include 1- and 2- amino naphthalene, ρ,ρ'-diaminostilbenes, pyrenes, quaternary phenanthridine salts, 9- aminoacridines, ρ,ρ'-diaminobenzophenone imines, anthracenes, oxacarbocyanine, marocyanine, 3-aminoequilenin, perylene, bisbenzoxazole, bis-p-oxazolyl benzene, 1 ,2- benzophenazin, retinoi, bis-3 -aminopyridmium salts, hellebrige in, tetracycline, steroplienol, benzimidzaolylphenylamine, 2-oxo-3 -ehromen, indole, xanthen, 7-hydroxycoumaxin, phenoxazme, salicylate, strophanthidin, poiphyrins, triarylmetbanes and flavin. Individual fluorescent compounds that have functionalities for linking or that can be modified to incorporate such functionalities include, e.g., dansyl chloride; fluoresceins such as 3,6- dihydroxy-9-phenylxanthhydrol; rbodamineisothiocyanate; N-phenyl 1 -amino-8- sulfonatonaphthalene; N-phenyl 2-amino-6-sulfo"natonaphthalene: 4-acetamido-4- isothiocyanato-stilbene-2,2'-disulfonic acid; pyrene-3 -sulfonic acid; 2-toluidinonaphthalene- 6-sulfonate; N-phenyl, N-methyl 2-am noaphthalene-6-sulfonate; ethidium bromide; stebrine; auromine-0,2-(9'-anthroyl)palmitate; dansyl phosphatidylethanolamine; Ν,Ν'-dioctadecyl oxacarbocyanine; Ν,Ν'-dihexyi oxacarbocyanine; merocyanine, 4(3'pyrenyl)butyrate; d-3- aminodesoxy-equilenin; 12-(9'anthroyl)stearate; 2-methylanthracene; 9-vinyianthracene; 2,2,(vinylene-p-phenylene)b sbenzoxazoie; p-b s[2-(4-methyl-5-phenyl-oxazolyf)]benzene; 6- dimethylamino-l,2-benzophenazin; retinol; bis(3'-aminopyridimum) 1, 10-decandiyl diiodide; sulfonaphthylhydrazone of heliibrienin; chlorotetracyciine; N(7-dimethylamino-4-niethyl~2- oxo-3-cbromeny])maleirmde; N-[p-(2-benzimidazolyl)-pheny]]maleimide; N-(4- fluoranthyl)ma3eimide; b s(homovan liic acid); resazarin; 4-chloro-7-nitro- 2,l,3benzooxadiazole; merocyanine 540; resorufin; rose bengal; and 2,4-diphenyl-3(2H)- furanone.
[0064] The label may be a "direct label", i.e., a detectable label that is directly attached to or incorporated into the nucleic acid molecule. Alternatively, the label may be an "indirect label", i.e., a label joined to the nucleic acid molecule after attachment to the substrate. The indirect label can be attached to a binding moiety that has been attached to the nucleic acid molecule prior to attachment to the substrate. For a detailed re view of methods of la beling nucleic acids and detecting labeled hybridized nucleic acids see Laboratory Techniques in Biochemistry and Molecular Biology, Vol. 24: Hybridization With Nucleic Acid Probes, P. Tijssen, ed. Elsevier, N.Y., ( 1993)).
[0065] Polypeptides can be visualized by coupling them to, e.g., fluorescent labels described herein, using known methods. Alternatively, other labels, such as Quantum dots (Invitrogen) can be used, as described herein.
Detecting Nucleic Acid Molecules and Polypeptides
[0Θ66] As discussed abo v e, the use of a fluorescent label is an embodiment of the invention. Standard procedures are used to determine the positions of the nucleic acid molecules and/or a target, e.g., a second nucleic acid molecule or a polypeptide. For example, the position of a nucleic acid molecule on an array described herein can be detected by the signal emitted by the label. In other examples, when a nucleic acid molecule on the array and a target nucleic acid molecule or poly pepiide are labeled, the locations of both the nucleic acid molecules on the array and the target will exhibit significant signal. In addition to using a label, other methods may be used to scan the matrix to determine where an interaction, e.g., between a nucleic acid molecule on an array described herein and a target, takes place. The spectrum of interactions can, of course, be determined in a temporal manner by repeated scans of interactions that occur at each of a multiplicity of conditions. However, instead of testing each individual interaction separately, a multiplicity of interactions can be simultaneously determined on an array, e.g., an array described herein.
[0067] In certain embodiments, the array is excited with a light source at the excitation wavelength of the particular fluorescent label and the resulting fluorescence at the emission wavelength is detected. In certain embodiments, the excitation light source is a laser appropriate for the excitation of the fluorescent label.
[0068] Detection of the fluorescence signal can utilize a microscope, e.g., a fluorescent microscope. The microscope may be equipped with a phototransducer (e.g., a
phoiomultiplier, a solid state array, or a ccd camera) attached to an automated data acquisition system to automatically record the fluorescence signal produced by the nucleic acid molecules and/or targets on the array. Such automated systems are known in the art. Use of laser illumination in conjunction with automated confocal microscopy for signal detection permits detection at a resolution of better than about 100 μηι, better than about 50 μνη, and better than about 25 μνη.
[0069] The detection method can also incorporate some signal processing to determine whether the signal at a particular position on the array is a true positive or may be a spurious signal. For example, a signal from a region that has actual positive signal may tend to spread over and provide a positive signal in an adjacent region that actually should not have one. This may occur, e.g., where the scanning system is not properly discriminating with sufficiently high resolution in its pixel density to separate the two regions. Thus, the signal over the spatial region may be evaluated pixel by pixel to determine the locations and the actual extent of positive signal. A true positive signal should, in theory , show a uniform signal at each pixel location. Thus, processing by plotting number of pixels with actual signal intensity should have a clearly uniform signal intensity. Regions where the signal intensities show a fairly wide dispersion, may be particularly suspect and the scanning system may be programmed to more carefully scan those positions.
Total Internal Reflection Fluorescence Microscopy
[0070] Total internal reflection fluorescence microscopy (TTRFM) is used to detect the nucleic acid molecules and polypeptides described herein. For TIRFM, a laser beam is directed through a microscope slide and reflected off the interface between the slide and a buffer containing the fluorescent sample. If the angle of incidence is greater than the critical angle [0c ~sm ' η2/ηι); where m and n2 are the refractive indexes of the slide and aqueous samples, respectively], then all of the incident light is reflected away from the interface. However, an illuminated area is present on the sample side of the slide. This is called the evanescent wave, and its intensity decays exponentially away from the surface. For most applications the evanescent wave penetrates approximately 100 am into the aqueous medium. This geometry reduces the background signal by several orders of magnitude compared to conventional fluorescence microscopy and readily allows the detection of single fluorescent molecules, because contaminants and bulk molecules in solution are not illuminated and do not contribute to the detected signal. By using total internal reflection fluorescence microscopy to visualize the arrays described herein, it is possible to simultaneously monitor hundreds of aligned DNA molecules within a single field-of-view.
[0071] The methods described herein use microfluidic fiowcells composed of substrates that are rendered inert by deposition of a lipid bilayer as described herein. By apply ing a hydrodynamic force to the arrays described herein, the attached nucleic acid molecules are aligned in a desired orientation that is optimal for detection by, e.g., TIRFM.
[0072] A microfluidic flowcell that can be used in the methods described herein.
Generally, a substrate described herein is overlaid with a coverslip, e.g., a glass coverslip, to form a sample chamber, and the substrate contains an inlet port and an outlet port, through which a hydrodynamic force is applied. The hydrodynamic force can be mediated by, e.g., a buffer solution that flows over the lipid bilayer described herein. An exemplary microfluidic flowcell can be constructed from 76.2 x 25.4 x I mm (L x W x H) fused silica slides (ESCO Products, Oak Ridge, NJ). Inlet and outlet holes can be drilled through ihe slides using, e.g., a diamond-coated bit (1.4 mm O.D.; Eurotool, Grandview, MO), A sample chamber can be prepared from a borosilicate glass coverslip (Fisher Scientific, USA) and, e.g., double-sided tape (~25 μχη thick, 3M, USA) or a polyethylene gasket. Inlet and outlet ports can be attached using preformed adhesive rings (Upchurch Scientific, Oak Harbor, WA), and cured at 120°C under vacuum for 2 hours. The dimensions of the exemplary sample chamber are 3,5 x 0.45 x 0.0025 cm (L x W x H). The total volume of the exemplary flowcell is ~ 4 μΐ. A syringe pump (Kd Scientific, Holliston, MA) is used to control buffer deliver}/ to the sample chamber. This exemplary apparatus is not meant to be limiting, and one of skill in the art would appreciate modifications that could be made.
[0073] An exemplary total internal reflection fluorescence microscope is a modified Nikon TE2Q00U inverted microscope. A 488 nm laser (Coherent Inc., Santa Clara, CA) and a 532 nm laser (CrystaLaser, Reno, NV) were focused through a pinhole (10 μηι) using an achromatic objective lens (25x; Melfes Griot, Marlow Heights, MD), then collimated with another achromatic lens (f - 200 mm). The beam was directed to a focusing lens · f 500 mm) and passed through a custom-made fused silica prism (J.R. Cumberland, Tnc) placed on top of the flowcell. Fluorescence images were collected through an objective lens (100 x Plan Apo, NA 1.4, Nikon), passed through a notch filter (Semrock, Rochester, NY), and captured with a back-thinned EMC CD (Cascade 512B, Photometries, Tucson, AZ). Image acquisition and data analysis were performed with Metamorph software (Universal Imaging Corp., Downington, PA). All DNA length measurements were performed by calculating the difference in y-coordinates from the beginning to the end of the fluorescent molecules. Diffusion estimates for the lipid-tethered DNA substrates were performed by manually tracking the tethered ends of four different molecules, and diffusion coefficients were calculated using: D = MSD/4t; where MSD (the mean square displacement) is the square of the average step size measured over time interval i (0, 124 sec).
Methods for Visualizing Nucleic Acid Molecules and Polypeptides
[0074] The arrays described herein can be used to detect individual nucleic acid molecules, e.g., nucleic acid molecules coupled to a label. For example, an array can be constructed as part of a microfluidic flowcell described herein. The nucleic acid molecules, e.g., labeled nucleic acid molecules, can be attached to a substrate, to a lipid bilayer, or to a diffusion barrier, as described herein. Upon the application of hydrodynamic force, e.g., introduction of a buffer as described herein, the nucleic acid molecules are aligned in direction of the hydrodynamic force, with the nonattached ends of the nucleic acid molecules extending in the direction of the flow of the hydrodynamic force. Individual nucleic acid molecul es on the array can be visualized before and/or after the application of the hydrodynamic force using, e.g., TTRFM as described herein.
[0075] In some embodiments, the interactions of nucleic acid molecules on the arrays with target polypeptides are determined. The nucleic acid molecules can be visualized before and/or after the application of a hydrodynamic force, as described herein. To visualize the interactions with target polypeptides, the polypeptides can be coupled to a label and introduced into the array, e.g., a microfluidic cell including the array, as a component of the buffer that mediates the hydrodynamic force. Individual nucleic acid molecules and individual target polypeptides can be visualized, e.g., by TIRFM as described herein, and interactions can be determined by colocalization of the signals from the nucleic acid molecules and the polypeptides. Such interactions can be further analyzed by collecting signals over a period of time. Such methods can be used to visualize, e.g., the movement of polypeptides along the length of individual nucleic acid molecules, as described herein.
Methods for High-throughput Screening of Compounds
[0076] The methods and compositions described herein can be used to screen for compounds, e.g., drug compounds, that affect, e.g., disrupt, the interactions between nucleic acid molecules and polypeptides. For example, an array can be constructed as part of a microfluidic flowcell described herein. The nucleic acid molecules, e.g., labeled nucleic acid molecules, can be attached to a substrate, to a lipid biiayer, or to a diffusion barrier, as described herein. To visualize the interactions with target polypeptides, the polypeptides can be coupled to a label and introduced into the array, e.g., a microfluidic cell including the array, as a component of the buffer that mediates the hydrodynamic force. In some embodiments, the polypeptides are known to interact with the nucleic acid molecules, and the interactions are visualized as described herein. For example, the polypeptides can be proteins involved in DNA replication, recombination and/or repair. Candidate compounds can then be added to the array, e.g., as a component of the buffer that mediates the hydrodynamic force, and the effect of the compound on the interactions between individual nucleic acid molecules and the polypeptides can be visualized. Compounds that disrupt the interactions can be visually identified. Such methods can be automated. [0077] For example, the methods described herein can be used to screen for therapeutic compounds to treat cancer, e.g., cancer of the breast, prostate, lung, bronchus, colon, rectum, urinary bladder, kidney, pancreas, oral cavity, pharynx, ovar '-, skin, thyroid, stomach, brain, esophagus, liver, cervix, larynx, soft tissue, testis, small intestine, anus, anal canal, anoreetum, vulva, bailbiadder, bones, joints, hypopharynx, eye, nose, nasal cavity, ureter, gastrointestinal tract; non-Hodgkin lymphoma, Multiple Myeloma, Acute Myeloid Leukemia, Chronic Lymphocytic Leukemia, Hodgkin Lymphoma, Chronic Myeloid Leukemia and Acute Lymphocytic Leukemia.
Methods for High-throughput Sequencing of Nucleic Acid Molecules
[0078] The methods and compositions described herein can be used to sequence nucleic acid molecules. The arrays described herein can be constructed with identical nucleic acid molecules, e.g., single stranded DNA. molecules, or with different nucleic acid molecules, e.g., single stranded DNA molecules. Before attaching the DNA molecules to the substrate, an oligonucleotide primer is annealed to the DNA molecules. Polymerase is then added along with the fluorescent dNTP mix. Such methods are known in the art. Fluorescent nucleotide analogs that do not terminate extension of the D strand are used. The DNA molecules are then attached to the substrate and the array is visualized as described herein. The color of the nucleotide incorporated into the growing chain reveals the sequence of the DNA molecules. If all of the DNA molecules within the array are identical, then the incorporation of the first nucleotide during polymerization will yield a fluorescent line extending horizontally across the array . Subsequent nucleotide addition will also yield horizontal lines and the color of each line will correspond the DNA sequence. When sequencing different DNA molecules, the differences in DN A sequences are revealed as the incorporation of different fluorescent nucleotides across the array, rather than the lines of identical color seen when sequencing identical DNA molecules. In some embodiments, these methods are automated.
EXAMPLES
EXAMPLE 1 - The Promoter-Search Mechanism Of Escherichia Coli Rna Polymerase Is Dominated By Three-Dimensionai Diffusion
[0079] Gene expression, DNA replication, and genome maintenance are all initiated by proteins that must recognize specific targets from among a vast excess of nonspecific DN A.
1 For example, to initiate transcription, E. coii RNA polymerase must locate promoter sequences, which comprise <2% of the bacterial genome. This search problem remains one of the least understood aspects of gene expression, largely due to the transient nature of search intermediates. Here we visualize RNAP in real time as it searches for promoters, and we develop a theoretical framework for analyzing target searches at the submicroscopic scale based upon single-molecule target association rates. Contrary to long-held assumptions, we demonstrate that the promoter search is dominated by three-dimensional diffusion at both the microscopic and submicroscopic scales in vitro, which has direct implications for understanding how promoters are located within physiological settings.
[0080] Transcription is ihe key step of gene expression and regulation in which the information encoded in genomic DNA is transcribed into RNA.4"6 A complex network of regulator '- features allows precise control over the expression of any given gene. This regulation is achieved through the interplay of promoter DNA sequences that dictate the sites of transcript initiation, along with the effects of a multitude of transcription factors and other regulatory elements that can influence the efficiency of transcript initiation, elongation, and/or termination.'*"0 At the heart of this regulatory network is RNA polymerase (RNAP): the protein machinery directly responsible for RNA synthesis.4" '
[0081] Escherichia coli has -3,000 promoters, each containing a core sequence ~35 base pairs in length with hexameric consensus sites at the -35 (TTGACA) and -10 (TATAAT) regions.4"9 Prior to synthesizing a transcript, RN AP must find appropriate promoter sequences. Like all DNA-binding proteins, RNAP is expected to employ some form of diffusion to locate its targets (Figure 6).1 There are four potential diffusion-based mechanisms that might contribute to the promoter search: (i) one-dimensional (I D)
"hopping", where the protein moves along the same molecule of DNA via a correlated series of submicroscopic dissociation and rebindmg events before re-equilibrating back into free solution; ( ?') ID-sliding, where the protein executes a random walk along the DNA without dissociation; {Hi) intersegmental transfer, where the protein moves from one site to another via a looped intermediate and (iv) three-dimensional diffusion (or "jumping"), where the protein starts out fully equilibrated with free solution i.e. it has no memory of whether it has previously visited a DN A site) and then finds its targets through direct 3D-collisions from solution (Figure 6).1 These mechanisms are not mutually exclusive, and different combinations can in principle contribute to site-specific targeting for a given DNA-binding protein. Search mechanisms that employ ID-hopping, sliding, or intersegmental transfer are collectively referred to as facilitated diffusion,1""14 because the reduction in dimensionality brought about through use of these mechanisms presents the potential for target site association rates that exceed the limits imposed by pure 3D diffusion. 1,1 >l5
[0082] The seminal work of Riggs et a!. established that, under certain conditions, lac repressor binds its target faster than the three-dimensional (3D) diffusion limit.16 Subsequent theoretical and experimental work verified that target association rates can be accelerated through facilitated diffusion, and these results are also often used to argue that facilitated diffusion therefore must contribute to target searches.10"14 However, as noted in the literature/3,17 there is little evidence to support this generalization based on the findings with lac repressor, and lac repressor itself may be atypical in terms of its DNA-binding and target search properties. In addition, prior theoretical models of target searches demonstrating that the fastest possible searches occur through a combination 3D diffusion and ID sliding over short distances consider that a single protein at is conducting the search.13, 15'18"'1 This assumption is reasonable for low-abundance proteins, such as lac repressor (<10 molecules cell !), but is less appropriate when considering proteins present at higher concentrations. Indeed, it has more recently been recognized that facilitated diffusion can in fact slow down target searches by causing proteins to waste too much time surveying nonspecific
DNA,l8,19,2i"24 leading to the suggestion that this outcome might be avoided in the case of some proteins through a combination of low affinity for nonspecific DNA and increased protein copy number.18 Nevertheless, based on the work with lac repressor, facilitated diffusion is still commonly assumed to play a role in the majority of cellular target search processes. Accordingly, a number of studies have reported that RNAP can move long distances along DNA by lD-sliding,¾"29 and as a consequence it is also now widely assumed that RNAP locates promoters through facilitated diffusion involving a I D search.31' Despite this, no promoter association rate exceeding the 3D-diffusion limit (~10s— lO1, M 1 s ) has ever been reported,31'"'2 and the potential contribution of facilitated diffusion to the promoter search process has been challenged in the literature.33
[0083] In an effort to help resolve the mechanism of the promoter search, here we used single molecule optical imaging of nano fabricated DNA curtains to visualize molecules of is. coli RNAP as they searched for the native promoters within the phage λ genome. Using this approach we could identify intermediates consistent with nonspecifically bound proteins, promoter-associated closed complexes, open complexes, and actively transcribing RNAP. We also present a theoretical framework for analyzing search mechanisms in the
submicroscopic regime using experimentally measured kinetic parameters obtained from single molecule observations. Our experimental results and theoretical calculations argue that facilitated diffusion does not contribute to promoter targeting by E. coli RNAP at physiologically relevant protein concentrations. We also show that protein concentration has a dominating effect in dictating how proteins find specific target sites, and the potential rate- accelerating benefits of facilitated diffusion can be overcome through increased protein abundance. The concepts derived from our theoretical treatment of the target search problem are entirely general, and can in principle be applied to any site- or structure-specific nucleic acid-binding protein.
Results
[0084] Visualizing the promoter search by E. coli RNAP. E. coli RNAP is among the best-characterized enzymes at the single molecule level, yet no study has conclusively established how RNAP locates promoters/* To distinguish among potential search mechanisms (Figure 6 & Table l)1"''1 '1 '14, we used double-tethered DNA curtains to visualize quantum dot-tagged RNAP (QD-RNAP) bound to native promoters within λ-DNA (Figure 1 & Table 2). For these assays we functionalized the λ-DNA (48,502 base pairs) at one end with biotin, and at the other end with digoxigenin (DIG). We anchored the biotinylated DN A end to a lipid bilayer through a biotin-streptavidin linkage, and the molecules were then aligned along the leading edges of nano fabricated barriers by application of a hydrodynamic force. The DIG-tagged DN A ends were then anchored to anti-DIG antibody-coated pentagons positioned downstream from the linear barriers. This strategy yielded DN A molecules that are all anchored in the same orientation and can be viewed by total internal reflection fluorescence microscopy (TIRFM) in the absence of hydrodynamic flow. '5"3 '' Using similar assays, we have previously demonstrated that promoter binding by E. coli RNAP is 8'°-dependent, occurs preferentially at physiological ionic strength, and that QD-RNAP is active for transcription.3 " We conclude that QD-RNAP faithfully located promoters within the context of our experimental platform.
[0085] Promoter association assays reveals known intermediates. We next visualized single molecules of RNAP in real time as they searched for native promoters within the λ- DNA. To observe the promoter search in real-time, QD-RNAP was injected into the fSowcell (±rNTPs), flow was terminated, and data collected at 5, 10, or 100 frames per second (Hz; Figure 2a-b). These experiments revealed four potential intermediates, for brevity referred to as To, Ti , ¾ and x-% events (Figure 2a).To events were highly transient [το=5.58 (5.46-5.74) milliseconds (ms); range indicates 95% confidence interval); R2=0.99], and were also observed with either QDs alone or in the absence of DNA (Figure 7); therefore we ascribed these events to random diffusion through the detection volume in the absence of any interaction with the DNA and they were not considered further, n e ents were RNAP-- dependent, displayed short lifetimes [τ -,=29.23 (24.53-36.18) ms], occurred randomly along the DNA (within our spatial resolution limits of ±39-nm), were not observed in the absence of DNA, and were uncorrelated with promoter-bound open complexes (r=0.25, P=0A0, Pearson correlation analysis; Figure 2c, Figure 7). RNAP corresponding to x% dissociated more slowly from the DNA [¾=3.53 (2,77-4.8) seconds (s); R2=0.93], and were strongly correlated with distributions of promoter-bound open complexes (r=0.87, P<7x l0 l4; Fig. 2c).Jd Molecules classified as τ3 exhibited even slower dissociation [r3=5,736 (5, 150-6,467) s; R =0.99], coincided with known promoters (Figure 2c, Figure 8), were resistant to challenge with heparin (a hallmark of open complex formation), and could initiate transcription (Figure 2b, Figure 9), as previously described 6 These results are consistent with a reaction scheme where i\ corresponds to nonspecifically bound RNAP, r2 to closed complexes, and !¾ to open complexes (Figure. 2d).
[0086] While values for τ i (i.e.1/k ' - -l ) and ¾ {ie. 1/k -i) have not been previously reported in the literature, our value for ¾ was consistent with previously reported equilibrium constants measured in bulk for closed complex formation {k\!k-\) for several different promoters, assuming a diffusion limited association rate.39"*" The value we obtained for τ¾ {ie. Vkoff) was consistent with bulk biochemical data for the lifetime of the open complex,'*1" 3 providing additional support for our assignment of these events within the reaction scheme. Moreover, within the reaction scheme defined in Figure 2d, the ratio of tyvi events is equal to k-i/k , yielding a value for ½ of 0.056 s ', which is also in good agreement with literature values.41'42 We concluded that the promoter-bound intermediates observed in our assay- reflected properties consistent with the literature, and that the DNA curtain assay could be used to probe the early stages of RNAP association with promoter DN A that precede the initiation of transcription. [0087] No microscopically detectable ID-diffusion prior to promoter binding. Our results demonstrated that QD-tagged RNAP was correctly targeted to promoters in the DNA curtain assay, and that the experimental observables obtained from these assays recapitulated known reaction schemes and kinetic parameters for promoter association and dissociation. Surprisingly, real-time observations revealed no evidence for microscopic ID-diffusion by RNAP (« >6,000; Figure. 2a, Table 2, see below); only <0.5% of proteins exhibited ID- motion, and these rare events did not precede promoter engagement. In addition, the same experiments were conducted ov er a range of ionic strengths (e.g. 0-200 mM KCL 0-10 mM MgCl2), including all buffer conditions under which R P sliding has been previously reported (Table 3). Under no conditions did we find evidence of microscopically detectable I D diffusion of RNAP along the λ-DNA prior to engaging the promoters. Control experiments confirmed T7 RNAP and lac repressor were capable of extensive ID diffusion in our assays (Figure 1Θ), "46 and we have previously shown that the DNA repair proteins Msh2-Msh6 and Mlhl-Pmsl exhibit ID diffusion, indicating that the DNA curtains or QD tags do not prevent protein diffusion along DNA."'6'3 ' Finally, we readily observed ID- movement for l .O-uxn beads coated with RNAP, suggesting that prior reports of extensive I D-diffusion by E. coli RNAP may have been confounded by multivalent aggregates (Figure 11). In summary, we found no direct experimental evidence supporting an extensive contribution of ID diffusion during the promoter search, suggesting that the promoter search by QD-tagged RNAP within the context of our DNA. curtain assays was dominated by 3D diffusion.
[0088] As a further test of the hypothesis that RNAP did not undergo extensive ID diffusion during the promoter search we next sought to determine the upper bounds for the observed ID diffusion coefficients ( ,ο*») for QD-RNAP at defined points along the reaction trajectory (Figure 2e & Figure 12). Given their transient nature (¾) < 30 nis), we could not determine .0; >0¾ values for the nonspecifically bound RNAP (see below), however, we did calculate D- ohs for intermediates categorized as either closed (¾ events) or open complexes (Τ3 events), as well as for the first 3-9 seconds after initial DNA binding for molecules of RN AP that subsequently initiated transcription in the presence of rNTPs (Figure 2b). We then compared the resulting D] >0bs values to published values for several well-characterized proteins known to undergo ID diffusion, including lac repressor. p53, and Mini -Pins l 6 We also compared the data to immobile QDs coupled to the DNA through a covalent digoxigenin tag (dig-QD; Figure 2e, Figure 13 & Table 4). The dig-QD measurements provided an indicaiion of the extent to which the DNA fluctuations contribute to the diffusion coefficient measurements (Figure 13). The D- o s values for RNAP were all several orders of magnitude lower than values reported for lac repressor, p53, and Mfhl-Pmsl (Figure 2e, Figure 12 & Table 4), further arguing against extensive ID diffusion contributing to the promoter search. The D\ obs values for RNAP (-15-100 nm2 s~l; Table 4) were
indistinguishable from values obtained for stationary dig-QDs, It is important to recognize that the small D\,0h-s values obtained for RNAP cannot be interpreted as protein movement along the DMA, but rather arise from the underlying diffusive fluctuations of the DNA itself (Figure 13). We concluded that promoter binding by E. coli RN AP is not preceded by microscopically detectable ID-diffusion.
[0089] Intersegmental transfer is not essential for the promoter search. As indicated above, the promoter search mechanism of E. coli RNAP appeared to be dominated by 3D random collisions, with no evidence for facilitated diffusion involving ID sliding over distances along the DNA greater than our current spatial resolution limits. Facilitated searches can also potentially occur through intersegmental transfer, which would involve RNAP movement from one distal site to another via a looped DNA intermediate (Figure 6). The DNA used in our experiments were maintained in a stretched configuration, and we anticipate that they would not support intersegmental transfer because the DNA cannot form the looped mtermediates necessary for this mode of facilitated diffusion. Given that RNAP readily bound promoters in the stretched λ-DNA, we concluded that intersegmental transfer cannot be an obligator}' component of the promoter search process. How ever, we emphasize that we could not rule out the possibility that is. coli RN AP might utilize mechanisms involving intersegmental transfer while searching for promoters on the bacterial chromosome in vivo.
[0098] Submicroscopic framework for the promoter search problem. Given the transient nature of encounters between RNAP and nonspecific DNA, we were unable to experimentally determine D fibs for the protein while bound to nonspecific sites. Therefore, we next sought to establish a theoretical framework to investigate the promoter search at the submicroscopic scale. Here "submicroscopic" is defined as any event occurring below existing spatial and temporal resolution limits. Full treatment of the theory is presented below; for brevity we highlight key features and results. We began by recognizing that the flux of RNAP onto the promoters is the result of three components: (i) direct binding to promoters from a fully equilibrated solution (i.e. 3D-diffusion); ( /') promoter binding from solution after dissociation from another region of DN A (i.e. hopping); and (Hi) promoter binding after undergoing ID-diffusion along the DNA (i.e. sliding). As revealed below, the most important of these terms with respect to the promoter search by E. coli KNAP was direct binding from solution, which occurs at a rate of: k ψ a u ) 8 π ψθ 3 C 0 J oo 0 e -D 3 u 2 1 [u(J 2 0 (up)+Y 2 0 (up))] -1 du, where Co is initial protein concentration, 1¾ is the 3-dimension diffusion coefficient of QD- RNAP, ψ is the effective target size, p is the reaction radius, and Jo and J'o are Bessel functions of the first and second kind, respectively. The effective target size is a geometric constraint describing the binding surface that transiently samples DN during the promoter search, and is a function of protein orientation (Θ) and linear target size (a) (Fig. 3a-c).'0,i* Linear target size should not be confused with promoter length; rather, it describes the range over which a bound protein can be out of register, yet still recognize its target (Fig, 3b). An important prediction arising from this formalism is that target association rates for any protein can become dominated by k (ψ) (t) as Co increases, implying that increased protein abundance can obviate any potentially accelerating contributions from facilitated diffusion, regardless of whether the protein in question is capable of hopping and/or sliding along DN A , In simpler terms, what the mathematics revealed was that the probability of directly colliding with a target site increases at higher protein concentration, and any form of search facilitation can be rendered effectively irrelevant by increasing protein abundance because the proteins that reach the target first will do so through 3D diffusion. The question then becomes: At what threshold concentration does 3D diffusion begin to dominate the search? For brevity, we will refer to the concentration at which 3D target binding becomes favored as the facilitation threshold (<¾): 3D target binding will be favored when the protein concentration equals or exceeds <¾, whereas facilitated diffusion will be favored when the concentration is below Cthr. In addition, once facilitated diffusion is removed from the search through increased protein abundance, k (ψ) a. (t) can be used to recover the effective target size (ψ), which in turn provides an estimate of linear target size (a). These parameters reflect dynamic physical properties of highly transient encounter complexes, and cannot be accessed through traditional biochemical analysis of stable or metastable reaction intermediates (closed complexes, open complexes, etc. ), nor can they be revealed through structural studies of static protein-nucleic acid complexes. To our knowledge, neither ψ, a, nor have been experimentally determined for any protein-nucleic acid interaction.
[0091] Single molecule promoter search kinetics. By definition, it is not possible to directly visualize submicroscopic events that contribute to target searches. However, we can obtain promoter association rates from real-time single-molecule measurements (see below). These experimental values can then be compared to the theoretical calculations allowing us to extract critical features of search mechanisms that are otherwise obscured at the microscopic scale. This provides an independent assessment of the search process that is unhindered by existing spatial or temporal instrument resolution limits. Comparison of the experimental data to values calculated from k (ψ) (t) provides a direct assessment of the promoter search mechanism, which allows us to determine whether submicroscopic facilitated diffusion contributes to promoter association: if the experimentally observed association rates exceed k (ψ) a (t), then submicroscopic facilitated diffusion must be contributing to the search mechanism; in contrast, if the experimentally observed association rates are equal to k (ψ) a (t), then the search mechanism can be attributed to 3D collisions with no underlying contribution of submicroscopic facilitated diffusion.
[0092] We evaluated the potential contribution of submicroscopic facilitated diffusion to the search process by directly measuring promoter association times over a range of RNAP concentrations within the context of a new DNA curtain assay designed to separate neighboring molecules by 7-μηι, thereby eliminating any potential for variation in association kinetics due to differences in local DNA concentration (Figure 3d & Figures 14- 16). Importantly, with this assay we were not measuring closed or open complex formation; rather we were measuring the instantaneous time at which single molecules of RNAP are initially detected at a promoter, conditioned upon their subsequent conversion to closed and then open complexes (Figure 16). These measurements were conducted at 100-ms temporal resolution, which is appropriate given that the slow downstream isomerization steps involved in promoter binding (e.g. closed and open complex formation) occur on the order of seconds, therefore any error in determining the initial binding events does not propagate into the measurements of the initial association rates. This experimental format allowed us to precisely define all of the experimental boundary conditions and parameters involved in calculating the predicted promoter association rates from k (ψ) a (t) (e.g. DNA geometry, DNA length, DNA density, number of accessible promoters, protein concentration, solution viscosity, temperature, ionic strength, etc.),
[0093] Interestingly, association rates exceeded k (ψ) a (t) below 500 pM QD-RNAP, revealing that subniicroscopic facilitated diffusion accelerated the promoter search at low protein concentrations, with 3-fold acceleration observed at 50 pM RN AP (Figure 3d). However, at >500 pM R AP, association times converged to k (ψ) a (t), indicating that subniicroscopic facilitated diffusion did not contribute to the promoter search at higher concentrations (Figure 3d). Although our results showed QD-RNAP no longer benefits from facilitated diffusion at concentrations >500 pM, one must recognize that V will vary for different proteins and/or different reaction conditions. For example, unlabeled RNAP (hydrodynamic radius, r = 7Α~ταη 9 will diffuse more rapidly through solution than QD- RNAP (r ¾ 13.4-nm),50 so we anticipate the promoter association with unlabeled proteins that should converge to k (ψ) (t) at an even lower protein concentration, which would be reflected as a reduction in Cthr. Importantly, the physical behavior of RNAP with respect to the search process will not change regardless of whether the concentration is above or belo Cthr,' the only thing that changes is the probability of engaging a target through a direct collision (Pw) versus the probability of engaging the target after undergoing facilitated diffusion along the DNA (PFD).
[0094] Furthermore, the exact solution of k (» a (t) yielded an effective target size ψ of 0.75-nm and estimated linear size a of ~6-bp (Figure 3e),Sl indicating promoters would not be recognized if RNAP is more than ±3-bp out of register. The apparent increase in ψ at low RNAP concentration reflected what is historically referred to as the "antenna" effect. °" At 50 pM RNAP (ψ - 2.23-nm) the "antenna" was just ~1.48-nm (corresponding to ~6-bp in our system); the very small size of the "antenna" indicated the limited contribution that facilitated diffusion (sliding and/or hopping) maded to the promoter search even at the lowest RN AP concentrations tested (Figure 3e-f). An in vivo protein concentration of 1 nM corresponds to just 1 protein molecule in a volume the size of an E. coli cell,52 therefore an in vivo concentration of 50 pM would be equivalent to an average of just 1/20"" of a molecule of RNAP per bacterium, which would not seem phy siologically relevant. Taken together, our results demonstrated that although submicroscopic facilitated diffusion can moderately accelerate the promoter search, this acceleration only occurs at exceedingly low RNAP concentrations, whereas at physiologically relevant protein concentrations the overall promoter search process should be dominated by 3D-diffusion.
[ΘΘ95] Increased protein abundance disfavors facilitated searches. One conclusion arising from our mathematical treatment of target searches is that increased protein abundance will diminish the contribution of facilitated diffusion. This concept is not unique to E. coli KNAP, and will even apply to proteins that can diffuse long distances along DNA because the probability of direct collisions ( /j) with the target always increases with increasing protein abundance and will eventual exceed the probability of target engagement through facilitated diffusion (PRO)- AS a simple illustration of this point, we used the DNA curtain assay and a λ-DNA bearing 5-tandem 21 -bp symmetric lac operators to qualitatively assess target binding by QD-tagged lac repressor (Figure 4a-fo). 8 These experiments were intentionally conducted at low ionic strength, such that nonspecific binding and I D diffusion were greatly favored, as described.44'4'3 At low concentrations many proteins initially bound to random, nonspecific sites and then diffused thousands of base pairs along the DNA before eventually binding the target; these events were categorized as having occurred through facilitated diffusion (FD; Figure 4c). Operator binding in the absence of microscopically detectable ID diffusion was also observed: these events were categorized as 3D (Figure 4d). For the proteins that successfully engaged the operator the contribution of facilitated diffusion to the search process is reflected in the distance between the initial binding site and the operator (Δ ) and the change in the ratio of FD to 3D events. As protein concentration increased, the mean value of (Ax) decreased for the proteins that bound to the operator (Figure 4e-f), and there was a corresponding increase in the fraction of events categorized as 3D (Figure 4e, inset). At the highest concentration of lac repressor tested (800 pM) ~7I% of the total operator binding events were attributed to 3D diffusion (Figure 4e, inset). Technical limitations prevented titration to higher protein concentrations due to the accompanying increase in background fluorescence, but we an ticipate that if the concentration were raised further eventually all of the operator binding events would occur through 3D diffusion. This conclusion will even extend into the submicroscopic regime. An in-depth analy sis of the facilitation threshold and effective target size for lac repressor (as provided above for RNAP) was beyond the scope of this work, however the trend in these data clearly illustrated that the contribution of facilitated diffusion diminishes with increased protein abundance, even though lac repressor is capable of sliding great distances on DNA under low ionic strength conditions. Importantly, at all concentrations tested many molecules lac repressor bound to random, nonspecific sites all along the length of the λ-DNA, and these proteins still exhibited ID diffusion even when the concentration was raised (Figure 4d,f); however, as
concentration increased this ID diffusion could be considered nonproductive with respect to target association because most of the proteins that bound the operator first did so through 3D collisions (Figure 4e, inset & Figure 4d,f).
Discussion
[0096] Our results argue against facilitated diffusion at either the microscopic or submicroscopic scales as a significant contributing component of the E. coli RNAP promoter search, and we also show that in general any potential contribut ons of facilitated diffusion can be overcome through increased protein abundance, even for proteins that can slide long distances on DNA. Facilitated diffusion and 3D collisions can be conceptually considered as two distinct, competing pathways either of which has the potential to result in target binding, and 3D diffusion will always be favored at protein concentrations equal to or exceeding the facilitation threshold simply because the relative increase in protein abundance increases the probability of a direct collision with the target site (Figure 5). In other words, just because a protein is physically capable of hopping and/or sliding over long distances along DNA does not mean that these processes will accelerate target binding because protein concentration can always have a dominating effect on the overall search process. A broader implication of this conclusion is that proteins present at lo concentrations in living cells (e.g. lac repressor, <10 molecules cell ') may be more apt to locate targets through facilitated diffusion, whereas those present at higher concentrations (e.g. RNAP, -2,000-3,000 molecules cell 1) may be more likely to engage their target sites through 3D diffusion.
[0097] Our experimental setting differs substantially from much more complex physiological environments where the promoter search might be influenced by the presence of factors that can assist in the recruitment of RNAP to promoters, or by local DNA folding, higher-order chromatin architecture, and macromolecular crowding (Figure 5). While we cannot yet quantitatively assess the influence of these parameters, we can consider ho they might qualitatively affect the promoter search.
[0098] Transcriptional activators, such as catabolite activator protein (CAP), are commonly involved in the regulation of gene expression, and can exert their effects either by facilitating recruitment of RNAP or by stimulating steps after recruitment (e.g. open complex formation, promoter escape, etc.) In scenarios involving factor-assisted recruitment, additional protein-protein contacts stabilize interactions between RNAP and the promoter. However, the presence of a transcriptional activator near a promoter should not
fundamentally alter the search process by causing RNAP to start sliding and or hopping along the DNA while executing the search, rather it would just make the target appear "larger" to RNAP (i.e. promoter plus factor, instead of just the promoter), which would in turn reduce the facilitation threshold. Factors that stimulate steps after recruitment would not influence the search because they exert their effects only after the promoter search is complete.
[0099] Higher-order organization of DNA in vivo has the potential to promote 3D collisions or "jumps", but is not expected to favor ID sliding and/or hopping, both of which can be considered as local events that are not influenced by global DNA architecture.1 8'20 In contrast, naked DNA stretched out at low dilution presents the most favorable possible conditions for ID sliding and/or hopping. " '' The fact that we do not detect facilitated diffusion contributing to the promoter search by RNAP under conditions that should otherwise greatly favor hopping and/or sliding suggests these processes are unlikely to occur in vivo simply due to the more complex 3D DNA environment.
[0108] Molecular crowding, either in solution or o the DN A, is a nontrivial issue, which can have both positive and negative impacts on DNA binding. Increased nonspecific binding can arise from macromolecular crowding in solution due to excluded volume effects," and any increase in nonspecific binding has the potential to promote facilitated diffusion.
Although in the case ofE. coli RNAP, increased nonspecific binding brought about through use of lo ionic strength conditions still does not lead to microscopically detectable ID diffusion, suggesting any increased nonspecific affinity caused by excluded volume effects is unlikely to cause RNAP to start diffusing along DNA. The effects of macromolecular crowding on DNA arise from the presence of other nonspecific DNA-binding proteins, which can reduce nonspecific DNA-binding affinities through competitive inhibition, " 1 and can also impede ID diffusion along DNA through steric hindrance.18' 1 '36 The net result of the seemingly opposed influences of macromolecular crowding in solution versus molecular crowding on the DNA has yet to be quantitatively explored, although one might anticipate that highly a bundant proteins such as Fis and HU (each of which can be present at concentrations of up to -30-50 uM in E. coli) would disfavor facilitated searches by restricting access to nonspecific sites."' [0101] In summary, there are at least four reasons why promoter searches in E. coli would not benefit from facilitated diffusion. First, there are on the order of -2,000-3,000 molecules of RNAP in E. coli, corresponding to an in vivo concentration of -2-3 μΜ.55 Based on our findings, if even a small fraction of the total RNAP present in a cell w ere free, then it should still locate promoters through 3D collisions rather than facilitated diffusion. Estimates have suggested that there are on the order of -550 molecules (-0.5 μ.Μ) of free o'°-containing RNAP holoenzyme in living bacteria;5" if these estimates are correct, then the facilitation threshold would have to somehow increase by roughly three orders of magnitude in order for hopping and or sliding to accelerate the promoter search in vivo. In contrast to RNAP, lac repressor, which is thought to employ facilitated diffusion in vivo during its target search/1"'56 may need to do so to compensate for its much lower intracellular abundance (<10 molecules celT!) and the corresponding scarcity of its targets (3 lac operators per genome). Second, long nonspecific lifetimes will lead to slower searches, so RNAP appears to be optimized to avoid wasting time scanning nonspecific DNA.l0"lJ'" Third, other proteins (e.g. Fis, HIT, IHF, H-NS, etc.) may obstruct ID-diffusion, but such obstacles could be avoided through 3D-searehes. '9 Fourth, other steps are rate-limiting during gene expression (e.g. promoter accessibility, promoter escape, elongation, eic,),""'5 '"59 suggesting there is simply no need for RNAP to locate promoters faster than the 3D-diffusion limit. Finally, despite the much more complicated environments present in physiological settings, our general conclusion regarding the effects of protein abundance on target searches should remain qualitatively true because higher protein concentrations will increase the probability of direct target binding through 3D collisions.
Methods
[0102] Single-molecule experiments were conducted on a custom-built total internal reflection microscope and utilized double-tethered DNA curtains.35'"'6 RNAP was expressed with a biotinylation peptide fused to the C-terminus of β', and purified as described 8'6" Biotinylated R AP holoenzyme was labeled with streptavidin-QDs (Qdot® 705, r = 12.6- nm; invitrogen). Prior to use, ffowcells were flushed with transcription buffer (40 mM Tris [pH 8.0J, 100 mM KC1, 10 mM !gCk 1 mM DTT and 2. mg ml BSA) supplemented with 250 pM YOYO! and 9 μΜ free biotin to block the surface. QD-tagged RNAP was then diluted into biotin-supplemented transcription buffer (irNTP, 250 μΜ each, as indicated) to a final concentration of 30-200 pM, and then a 50~μ1 sample was injected into the flowcell at a rate of 0.1 ml niin , and buffer flow was terminated 120-s after beginning the injection. For initial promoter binding measurements, experiments were conducted in buffer containing 20 mM Tris-HCl [pH 8.0], 25 mM KC1, 1 mM MgCl2, 1 mM DTT, and 2 mg ml"' BSA, and images were acquired at 5-100 Hz, as mdicated, using NIS-Elements software (Nikon). For the measurements of promoter association rates, the fraction of active protein in each preparation was first determined from gel shift assays utilizing a Cy3-labeled 249-bp DNA fragment containing the PR promoter. Gel shifts were conducted under dilute protein conditions equivalent to those used in the single-molecule assays to ensure that ensemble protein activity reflected that in the single-molecule assays. All single-molecule kinetic measurements to determine promoter association rates utilized a new type of DNA curtain designed to avoid local DNA concentration effects (Figure 15). QD-R AP (700-μί) was then injected into the flowcell at 0.5 ml min 1 in buffer containing 20 mM Tris-HCl [pH 8.0], 25 mM KC1, 1 mM MgCl2, 1 mM DTT, and 2 mg ml 1 BSA, using a standardized sample injection procedure that eliminated variability in observed association rates due to microfluidic heterogeneities and variations in protein concentration profiles. Images were collected at 10 Hz, and initial promoter association rates were then obtained by measuring dwell times between successive promoter binding events for all different DNA molecules within the fieid-of-view (Figure 16).
[0103] Table 1 : Summary of Prior Promoter Search Studies
Figure imgf000042_0001
l . l l Q1 -450 run I -600 AFM, die 1 not Guthold et al. ,
(-440 bp) seconds address how 1999. A 12
promoters were
located
Table 2: List of λ-Phage Promoter
Figure imgf000043_0001
"There are ten promoters in the λ phage genome, however, binding to XPRM and λΡκ. is mutua exclusive*16, therefore a maximum of nine promoters can be occupied in our assays.
[0105] Table 3: Buffer Conditions Tested in this Study
Figure imgf000043_0002
Figure imgf000044_0001
The concentration of Tris-HCl was not reported in Kabata et al, and we selected 20 mM for our assay conditions. 1 Θ6] Table 4: Diffusion coefficient values
Figure imgf000044_0002
a The diffusion coefficients we report of R AP cannot be interpreted as diffusion of the proteins along the DNA. This is because diffusion coefficients in the small value range are dominated by DNA fluctuations rather than protein movement, and small value diffusion coefficients are also subject to large sources of error (Figures 12-13). b The dig-QD diffusion coefficients were obtained from particle tracking data collected for either 3- seconds or >9-seconds, as indicated. The difference between the diffusion coefficient values obtained for dig-QD (3 - sec) and dig-QD (> 9 - sec) does not reflect differences in the behavior of the dig-QDs, rather it reflects the greater precision of the diffusion coefficient measurements obtained for particle tracking data collected over longer time intervals. For the same reason, the isdiffusion coefficients decrease as the data collection windows increase fro 3-seeonds to >9-seconds. c Note that the τ2 molecules exhibited an exponentially distributed lifetime. Therefore these diffusion coefficients were obtained from the analysis of particle tracking data sets comprised of trajectories of differing time lengths, whereas all other data sets were built from tracking data spanning the indicate time intervals.
[0107] RNA Polymerase Purification and Characterization. Cells for expressing a chromosomal copy of RNA. polymerase that harbors a biotinylation peptide tag on the C- terminus of the β' subunit were generously provided by Dr. Robert Landick (University of Wisconsin-Madison) (A19), and RNAP holoenzyme was expressed, purified, and characterized as previously described (A4).
[0108] We sought confirmation that RNAP remained functional under the dilute conditions necessary for single molecule measurements. For this, we adapted a gel shift assay for quantifying promoter binding activity under dilute protein conditions. The active RNAP concentration under the single molecule experiment conditions was determined using a gel shift assay by titrating RNAP into reactions containing a fixed amount of promoter DNA, as previously described (Figure 14) (A20). To enhance detection, these assays utilized a Cy3- label 249-bp DNA fragment containing promoter PR, which was made by PCR using λ phage DNA as a template with the following primers: Cy3 (5'- Cy3-GGC CTT GTT GAT CGC GCT TT -3', 5'- CGT GCG TCC TCA AGC TGC TCT T -3', IDT). Varying amounts of purified RNAP (0.1 - 1.6 iiM) were incubated with 0.4 tiM of the Cy3-iabeled PR DNA fragment in buffer (20 mM Tris [pH 8.0], 25 mM KCi, 1 mM MgC12, 1 mM DTT, and 0.2 mg ml-1 BSA) at room temperature for 40 minutes. Heparin ( 10 ug ml- i) was then added to disrupt non-specifically bound R AP and closed complexes, and the reactions were resolved on native 5% polyacrylamide gels to separate the free and bound DNA. The gels were then scanned for Cy3 fluorescence using Typhoon FLA 9000 (GE Healthcare), and the fractions of bound and free DN A were quantified using ImageQuantTL software (GE Healthcare). Under these reactions conditions, the formation of open complex is essentially irreversible, and fractional activity of RNAP capable of forming stable open complexes is revealed as the inverse of the saturation point in the titration curve (Figure 14) (A20), [0109] To verify thai RNA polymerase remained active over time after dilution, a time series of R.NAP pro oter- binding activity using the gel shift procedure described above. Briefly, RNAP was diluted to a final concentration of 0.6 nM in buffer (20 niM Tris [pH 8.0], 25 mM KCi, 1 mM MgC12, 1 mM DTT, and 0.2 mg ml-1 BSA) and the diluted samples were then incubated at room temperature for the indicated time intervals. The activity of the diluted RNAP was measured using a gel shift assay as described above. As shown in Figure 14, the activity of diluted RNAP did not change significantly over a 40-minute period. Our single molecule measurements are typically completed within <15 minutes of diluting the RN AP stock solutions.
[0118] Promoter Dissociation Kinetics. A key feature of the τ2 and τ3 events was ihai they coincided with the locations of the known phage promoters (Figure 1 & Figure 2c).4 The assignment of τ2 and τ3 events was made by inspection of reaction trajectories, and this assignment was facilitated by the drastic difference in the observed lifetimes for these two types events. Molecules assigned as i;2 events dissociated from the D A during collection of real time videos, whereas those assigned as τ3 events did not dissociate from the DNA during the typical 200-second observation windows (Figure 2). To determine the lifetime of the ¾ events, we measured the times o ver which the individual proteins remained bound to the DNA before dissociating, and resulting data was fit to a single exponential decay. RNA polymerase molecules assigned as τ3 events (i.e. promoter-bound open complexes) exhibited lifetimes that greatly exceeded the typical lengths of the videos that were used to monitor DNA binding in real time. Therefore dissociationof promoter-bound open complexes (i.e. τ3 events) was measured in separate experiments by injecting InM QD-RNAP into the flow cell to bind the promoters in buffer (20 mM Tris [pH 8.0], 25 mM KC1, ImM MgC12, 1 mM DTT, and 0.2 mg ml- 1 BSA). After an incubation of 3 minutes, free QD-RNAP wasremoved from solution by flushing the sample chamber with buffer lacking RN AP. The number of promoter-bound QD-RNAP molecules was then monitored versus time over a 2-hour period with data collected at 5-minute intervals. The resulting data were then corrected for broken DNA molecules, and fit to a single exponential decay (Figure 2c). [8111] Promoter Association Kinetics. For association rate measurements, we developed a new type of sparse DNA curtain that ensures experimental measurements are made under conditions where the DNA is effectively at the infinite dilution limit, such that the binding of RNA poly merase molecules on individual DNA molecules can be regarded as independent process. The patterns were made by electron-beamlithography, as described (Α2 Γ), but the partem geometry was altered so that the surface was much more sparsely populated with double- tethered DNA molecules (Figure IS),
[0112] We next determined ihe concentration profile in the flowcell using an injection rate of 0.5 ml min-1. A700 μΐ sample of InM QDs was loaded into the sample loop and the syringe pump was then pre-run for 1 minute at 0.5 ml min-1. The sample was then injected without stopping the syringe pump. Images were recorded continuously and the QD signal intensity in the sample chamber of the flowcell was measured over time. The concentration of QDs in the sample chamber relative to the concentration of the injected sample (1 nM) was then calculated based on the volume of the sample loop and the flow rate. This concentration profile was ihen used in ihe kinetic measurements of promoter association rates to calculate the actual concentration of QD-RNAP present in the sample chambers.
[()! 13] To begin a measurement, 700 μΐ of diluted QD-RNAP was loaded into the sample loop. To avoid pH changes due to the oxygen scavenger system, the GLOXY components were added to the buffer immediately prior to use; imaging buffer stored in a closed syringe will maintain a constant pH for than 1 hour after the addition of the oxygen scavenging system. To avoid any dead response time from the syringe pump, the pump was pre -ran at a flow rate of 0.5 ml min-1 for 1 min. The QD-RNAP sample was then injected while maintaining a constant flow rate of 0.5 mi min-1, and data acquisition was initiated 10 sec after sample injection. Exactly 30 sec after the injection, the buffer flow was terminated, ensuring that the concentration of QD-RNAP in the sample chamber remained constant after this time point; the selection of the 30 sec stopping point was based on the concentration profiles of our flowcells under thissample injection regime. During this procedure, imaged were continually collected at either 5- or 10-Hz for a period of up to 15 minutes.
[0114] The resulting kinetic data were analyzed to determine the association rates based upon the initial binding observed within the field-of-view after the concentration of QD-RNAP in the sample chamberhas plateaued. For each individual DNA molecule, we first determine the time to initial binding (?,·), which we define as the time when the first promoter- bound QD-RNAP open complex is detected, conditioned upon its subsequently forming an open complex, and for the purpose of this analysis we defined the open complex as anything with a lifetime more than 40 sec. Binding events prior to the 30 sec time point after initial sample injection were excluded to ensure that any measured association events occurred only after the concentration of QD-RN AP in the flowcell had plateaued. Once the first promoter was occupied by an open complex, any subsequent events occurring on (he same DNA molecule were excluded from analysis such that the resulting data set describes only the initial binding events that occurred under conditions where all of the λ-phage promoters were unoccupied and accessible for binding, in accordance with the theoretical calculations. For all DNA molecules we then build a data set of the promoter initial association times {//, ¾ ·· · where each of these values represents the time it took for the first promoter to be occupied on each DNA within the field-of-view. The data were (hen sorted into ascending order ¾ ··· l(N)} such that ί;:ϊ·5 - £i"i - "" - The residual waiting times between binding events were then calculated as ~ ~~ ^and a corrected time for each measured event was determined as tr- = ^¾¾f ~ m* (Figure 16). We then determined the average association time 'T = ^over the entire data set at each tested concentration of RNA polymerase (Figure 3d). The primary advantage of this analysis is that it eliminates the need to definitively esta blish a zero time point prior to an initial binding event, so long as the concentration of QD-RNAP in the tlowceil remains isotropic, because ail calculations are based on residual waiting times between initial binding events on the different DNA molecules in the sample chamber.
Therefore, in the absence of a mechanical stop-flow device, this analysis of residual waiting times is more reliable than more typical kinetic measurements of bimolecular processes that are based upon cumulative occupation probability fitting to get the reaction time.
[0115] Collision frequencies and nonspecific DNA binding events. In experiments conducted at either 5 or 10 Hz, we readily detected the binding of QD-RNAP to the promoters within λ-phage DNA, but we could not detect any appreciable binding to nonspecific sites on the same DNA molecules (see Figure. 3a, upper panel). This observation indicated we were missing collisions between the QD-RNAP and the nonspecific DNA because these collisions should have occurred at equal frequency everywhere along the DNA, and also mdicated the lifetimes of the nonspecific complexes must have been substantially shorter than the 100-200 msec integration time for data collected at 10 or 5 Hz, respectively . To detect nonspecific binding, we increased the data acquisition rate to 100 Hz. We also sought to measure the collision frequencies for the QDs and QD-RNAP complexes at 100 Hz, and then compare these measured rates to calculated values for the collision frequency, in order to determine whether we were capturing all potential collisions with the DNA in the single molecule experiments. [0116] First, the intensity threshold necessary- to distinguish camera noise from actual
QD excursions within the evanescent field was determined for 100 Hz data sets. The MCCD was set to frame transfer mode and an AOI (63 x 1 pixels), and one DNA molecule was imaged for 10,000 frames at an acquisition rate of 100 Hz in the absence of any QDs. A histogram of the resulting signal intensities corresponded to background noise, which dropped dramatically above an intensity of -2000 (A.U.). From the histogram we calculate that the probability of camera background noise beyond this threshold is ~ 9.5 10°, therefore we selected a threshold value of 2040 (A.U.). Next, QDs (150 pM) were injected into the low cell, and data were collected as described above using the same EMCCD settings.
Comparison of the two histograms revealed signal intensities that exceeded the threshold values for the camera noise, and these values were scored as QD excursions into the detection v olume that is defined by the penetration depth of the evanescent field ( 350 am).
[0117] Next a control experiment was conducted to determine how frequently QDs alone (in the absence of R AP) entered the detection volume, and how much time they spent within this volume before diffusing back out into bulk solution. To count the number of events (see below), a kymograph was made from data collected at 00 Hz, as described above, and a QD detection event was defined as any signal within the kymograph that exceeded the defined threshold for EMCCD noise. The lifetime of these collision events was well described by a single exponential function with i¾ =5.68 msec (95% confidence interval [5.56, 5.78]). This lifetime corresponds to the time QDs spend within the detection volume before diffusing back into free solution,
[0118] The same experiment was then perfomed using QD-RNAP in order to measure the lifetime of interactions between RNAP and DN A. QD-RNAP (150 pM) was injected into the flow cell and data were collected at 100 Hz, exactly as described above. In 100 seconds, 125 events exceeding the noise threshold were recorded along the length of the DNA. The resulting data were best described by the sum of two exponential functions with xo =5.58 msec (5.46, 5.74) and xs =2.9.23 msec (24.53, 36.18) (Figure 3 & Figure 7b-c). The first time constant TO is the same as the QD-oniy control (Figure 7a~b), and was also found for control measurements made in the absence of DNA (Figure 7e-f), therefore does not arise from a poiyrnerase-speeific interaction with the DNA. We conclude that n corresponds to the lifetime of a nonspecific interaction between DNA and RNAP. [0119] The number of expected non-specific interactions between QD-RNAP and
DNA can then beestimated as follows below. The ratio of το and ti events is given by: r = ½; ; = 13,4
-' !ϋ A, dx where AQ, A
Figure imgf000050_0001
are the amplitude and rate constants of the two components of observed events with QD-RNAP; note that *e = ^'1» and ¾ = The protein samples used for single molecule imaging are subject to dilution while passing through the microfiuidics and additional loss can also occur through nonspecific adsorption to the sample tubes, injection needles, tubing components, etc. Therefore the actual concentration of protein within the sample chamber may not be the same as the concentration that was injected. To correct for the difference due to dilution in the microfiuidics, the actual versus injected concentration of QD-RNAP was determined by injecting a fixed volume (50 μί) of QDs (200 pM) into the sample chamber at a defined flow rate (0.05 ml min- 1) while continuously monitoring the bulk fluorescence signal through the microscope objective. The resulting signal versus time curve was normalized to define the QD concentration profile as a function of these defined injection parameters.Therefore the number of non-specific interaction events between DNA and RNA polymerase is given by: N S
(I ÷ r) x 0.15 rsM x 0.35 * 55% ~ 5.8 O. i i^A/ .- 0 35 χ 5S¾ ~ per 100 sec per M, where 0.35 is the ratio of actual concentration versus injection concentration, and 33% is the percentage of active protein. This corresponds to an observed non-specific interaction between RNAP-QD and DMA at a frequency—10.03 sec- 1 (at 200 pM RNA polymerase).
[0128] To verify whether we were capturing all of the potential collision events from the data collected at 100 Hz, we next calculated the expected collision frequency for comparison to the experimentally measured values. For this purpose we define two volumes, VR ' and VR, where VR is defined by the region 8 - r - °, and VR ' is defined by the region S' > r > 0 where R is the experimentally detectable volume surrounding the DNA and R ' is chosen such that the total volume contains a single particle, i.e. it satisfies the following equation, where is the length of DNA and C is the concentration of protein: [0121] We then have an equation that relates the rate of QD-RNAP diffusing out of each volume to the radial extent of each region through the law of mass action:
Figure imgf000051_0001
[0122] Given an experimentally observed lifetime of QDs within the volume defined by VR, we then calculated an expected value for the collision frequency. For an
experimentally measured lifetime of To = 5.68 msec (see above), at a QD-RNAP
concentration of 150 M, with an excitation radius of 178-nm, we find an expected collision frequency of 13.8 sec-L which is in excellent agreement with the experimentally observed value of ~10.03 sec-1 . The good agreement between the experimentally observed collision frequency and the calculated collision frequency indicates that we are capturing most, if not ail, potential collision events at a data acquisition rate of 100 Hz.
[0123] Diffusion Coefficients and DNA Fluctuations, in single molecule experiments, the diffusion coefficient is commonly determined by performing a linear regression of the mean square displacement (Ψ), against time. In Michalet (A22) it was noted that for a given length of trajectory, diffusion coefficient (D), frame rate At, and level of measurement noise, σ, there exists an optimal point to perform the regression, which minimizes the error in the estimation of £> ,<-,·&,- We applied this method to our ID system and calculated ID-diffusion coefficients for xi and τi- events. A full derivation can be found for the 2-dimensional case in Michalet ( A22) and here we present only the necessary equations to extend the calculations of the variance and covariance of Ψ to a 1 -dimensional system:
„ . (l Is*→- ,¥ - 17«¾? - iV* -* n(-5 ÷ 7ί?- } - 12Dn&tcr* - for 2s . ft*
and
Figure imgf000052_0001
For convenience we also present the equations for the mean and dispersion of a least squares regression fit to P points, respectively. and
Figure imgf000052_0003
[0124] Using these equations, we performed a check of predicted values versus values calculated from simulation experiments for both < T ii-s "
Figure imgf000052_0004
for each set of parameters, 1000 simulation experiments were performed and the average values obtained. The appearance of a minimum in ^'-T <* > leads to the method by which we should calculate Di 0bs, as outlined:
10125] Algorithm to calculate best estimate of D obi-
1. Perform a LSF (least squares fit) to n% of Ψ, and calculate the diffusion coefficient.
2. Using the diffusion coefficient from step one, determine the optimal point at which to perform a regression, and recalculate the diffusion coefficient. 3. Repeat step two until convergence,
4. Repeat entire process again from additional starting points to ensure global convergence.
[01 Applying the above analysis to the tracking data for individual RNAP molecules resulted in tiiree populations. The first group, -84 percent of trajectories in this study, converged to a single optimum fit. The second group, -~i 1 percent exhibited two or three 'optimums' in close proximity to one another, in this case, choosing between the values was arbitrary. In the final group multiple 'optimums' were measured which were sizably different. For these cases we based our decisions on the output of the convergence loop, and a holistic view of the particular data set.
[0127] The ensemble mean squared displacement is calculated from a number of individual trajectories, and for notation purposes we introduce two variables: Q as an index over trajectories, and T„, which is defined below:
Figure imgf000053_0001
Then * may be calculated as:
Figure imgf000053_0002
The variance and covariance of are calculated as in the single trajectory case, just appropriately-weighted by the number of terms that originate from each set:
Figure imgf000053_0003
and
Figure imgf000053_0004
[0128] We can then determine the dispersion of slopes resulting from a LSF analysis of Ψ , which is identical to the single trajector '' case aside from the use of ¾*n in place of Ψη, and these relationships were also verified with synthetic data.
[0129] The above analysis presents a rigorous means of obtaining diffusion coefficients for experimental particle tracking data, however, one must still recognize that when diffusion coefficients are very small (on the order of the noise and error), or when the trajectories are very short (N<10 frames) negative diffusion coefficients can begin to emerge (Fignre 12). Diffusion coefiicienis are obtained from correlation functions, so that any error in calculation of the positions will propagate through to the final value. The first is there is a localization error, which encompasses DNA fluctuations (see below), as well as camera noise (including frame averaging), florescence fluctuations, and Gaussian fitting errors. This error is systematic at short time separation due to the DNA fluctuations. Secondly, the mean squared displacement is known to cany significant statistical error fro two sources: (?) The mean squared displacement time average from a single trajector '- only equals the ensemble average in the limit that the trajectory is infinite (see below), so only when a trajectory is infinitely long, is the apparent diffusion coefficient obtained from that trajectory precisely correct; and (it) there is also a correlated error, which comes about from overlaps in calculation of displacements. Readers should be directed to Qian et. al, and X. Michalet for in depth discussion (A22-23).
[0138] The sources of error described above can be considered negligible when the diffusion coefficients are relatively large, as is the ease for proteins that can diffuse long distances on DNA. However, when the diffusion coefficients are in the small-valued range the DN A fluctuations contribute substantially to the obtained values. Detailed analysis of such small-value diffusion coefficients would require the development of equations that implicitly include the DNA fluctuations in the density functions used to calculate the diffusion coefficients, which is beyond the scope of our current work. Instead, we analyzed the motion of quantum dots that were covalemly linked to the DNA via a directed antibody interaction (A7) and use these results to qualitatively assess the contribution of DNA fluctuations to the Di„D-s obtained from promoter-bound closed and open complexes (dig-QD; Figure 3e, Figure 13, Table 4). [0131] We begin by recognizing that the probability density function for 1-D
Brownian motion in the presence of Gaussian localization error can be shown to be the following (A6):
Figure imgf000055_0001
Multiplication by fx - x "f and integration reveals that, for a stationary particle, - 0, the MSD should be a straight line at 2σ~. Furthermore, detailed calculations reveal that measured MSD values are gamma distributed about their true means (A6). 1 hat is to say, when .0/ 0bS is small, it is common to attain MSD values in the range [0,2σ"). Since the diffusion coefficient is determined through a linear fit to MSD values, it is not irregular to obtain a negative measured value for Di^ such as when early time points produce values in the range [2σ2,οο], and later time points yield values on [0,2 f). This, of course is also the source of positive values of Di,0bs for stationary particles (Figure 12).
[0132] For a completely stationary particle, the MSD plots should be independent of time, and while the resultant curves for the QD-labeled DNA do exhibit time independence at long time separation, at short times there is clearly some motion, as revealed in the rise of the MSD plots at early time points (Figure 13). Errors resulting from the camera and fitting functions are not correlated in time, and won't induce this kind of time dependent behavior in the MSD. However, the motion of the DNA will produce time dependent effects if the fluctuations occur on a time scale comparable to the QD motion (Figure 13). The MSD plots yield a straight line corresponding to 2σΔ, yielding values for σ of ~39 nm2 sec-1 , which we ascribe to the underlying DNA fluctuations (Figure 13). Importantly, this dig-QD data set reflects an ensemble of approximately 34,000 - 58,000 individual diffusive steps (as indicated, Figure 13), and thus refl ects what can be considered the average noise of the system arising from the movement of the DNA collected under ideal conditions. As indicated above, in the case of small value diffusion coefficients obtained from smaller data sets (such as those arising from shorter time trajectories for QD- NAP), one can expect a large variation in Di„bs even for stationary particles. Based on these results for stationary dig-QDs, we conclude that our results are upwardly biased by the DNA chain motions, and that the diffusion coefficients obtained for the promoter-bound RNAP molecules assigned as closed or open complexes do not likely reflect motion of the proteins along the DNA, but rather reflect motion of the DNA itself,
[0133] Lac repressor experiments. FLAG-iagged lac repressor protein was expressed, purified, and labeled with anti-FLAG quantum dots, as previously described.4 Double-tethered DNA curtains were made using a version of -DNA containing either a single 21 -bp symmetrical operator sequence (Figure, 4),4,24 or a 5x tandem repeat of the same operator sequence (Figure 10). We have previously shown that QDtagged lac repressor co-localizes with the operator site, and dissociated rapidly in the presence of 1PTG, as expected (A4) Target search experiments (Figure. 4 & Figure 10) were conducted in low ionic strength buffer containing 10 mM Tris-HCl (pH 8.0), 1 mM MgC12, 1 mM DTT, 1 mg ml- 1 BSA, using the indication amounts of QD-tagged lac repressor (0.1-1 mM). Proteins were injected into the sample chamber and data were collected at 10 Hz. For each target- binding event, two position measurements were collected. First, the position of the protein was recorded in the first frame in which it appeared. Subsequently, the position of the particle was measured after it had bound stably at the target for a time greater than or equal to 1 min. The difference between these two points, Ax, was then determined. For Ax < 1 QQnm (three standard deviations of noise) events are scored as direct collisions. Furthermore, Ax > lOOnm corresponds to facilitated transport to the operator. The initial binding location of nonspecific events that did not result in target capture (failed searches) were also collected, provided the binding event occurred prior to target engagement by the successful protein {i.e. the target was still unoccupied). For the failed searches, Ax was calculated by measuring the initial binding location of the failed searcher relative to the location of the operator.
[0134] T7 KNAP experiments. The gene for T7 RNAP was fused to a C -terminal
AviTag (G GGLNDTFE A QKIEWHE) and cloned into the vector pTBX3 (NEB). The vector was transformed into BL21 cells, which were then grown in LB (IL) containing carbenicillin, and induced at an OD600-0.8 with 0.8 mM IPTG. After 4 hours of induction, ceils were harvested by centrifugation and resuspended in buffer containing 20 mM Tris-HCl, pH 8.0, 1 M NaCl, 1 mM EDTA, plus a Halt protease inhibitor cocktail (Pierce). The cell paste was then flash frozen on liquid nitrogen and stored at -80°C until lysis. For lysis, the ceils were thawed at room temperature, lysed by sonicatation, and the lysate was clarified by centrifugation. The clarified lysate was loaded onto a 10-ml Chitin bead column (NEB), and washed extensively with buffer containing 20 mM Tris-HCl, pH 8.0, 1 M NaCi, 1 mM EDTA, following the manufactures protocol. The column bed was then quickly flushed with 20 mM Tris-HCl, pH 8.0, 1 M NaCi, 1 mM EDTA, plus 50 mM DTT, and incubated at 4 C for ~20 hours. The protein was then eiuted and dialyzed into T7 RNAP storage buffer (50 mM Tris-HCl [pH 7.9], 100 mM NaCl, 20 mM β-mercaptoethanol, 1 mM EDTA, 50% glycerol, 0.1% triton X- 1 Q0) at 4°C overnight. Protein activity was tested by in vitro run off transcription assays. Single-molecule experiments using double -tethered DNA curtains were conducted exactly as described for E. coli RNAP, under the indicated buffer conditions (Figure 10).
[0135] Mechanisms of Diffusion-Controlled Reactions. To address the rate at which diffusion-controlled processes occur, we begins with the descriptions of colloidal aggregation developed by Smoluchowski (A25) where the rate, ksmo of the reaction
Λ ύ " * "e : can be shown to be proport onal to the sum of the Stokes-Einstein diffusion coefficients, i¾, of the two reactants.
Here, p is the reaction radius, which will be defined below. This relation is commonly cited as the upperviimit on the speed at which a diffusion-controlled reaction may occur (A26-27). However, early measurements of the rate at which the lac repressor associates to its operon sequence exceeded this limit by two orders of magnitude (A28). This paradox of faster-than- diffusion association was resolved for site-specific association of proteins by including mechanisms of lower dimensionality : hopping, sliding, and intersegmental transfer, in what has been termed the facilitated diffusion model (Figure 6) (A26). With facilitated diffusion, the association rate has the form = κ ΙΛ + ¾) where ¾ depends on the dissociation rate of the protein from non-specific DNA, the diffusion coefficient of the protein on the surface of DNA, and the protein concentration.
[0136] In the simplest terms, a facilitated association process of a protein to its cognate sequence consists of three states: {)') a free state, (ii) a non-specifically bound state, wherein the protein is bound to non-target DNA, and (Hi) a specifically bound state, where the protein has located and bound target DNA, In general, the search process that a protein undergoes consists of cycling through the non- specifically bound and free states until eventually locating the target. When the concentration of available nonspecific states outnumbers specific states, this process is slow. The "facilitation" occurs due to two factors. First, the affinity of proteins for non-target stretches of DNA localizes the protein to the DNA for extended periods of time, allowing for many successive rebinding events. Second, when the protein is able to translocate along the DNA during its time in the bound state, it may interrogate multiple sites during a single binding event,
[0137] Reaction Radius & Target Size. The theoretical framework of for facilitated diffusion separates the bound and free states at a distance p, which is termed the reaction radius. The motion of the pro tein beyond this distance is expected to be free thermal diffusion in three dimensions. While, within p the protein's motion is constrained to only allow movement along the dimension of the DNA (Figure 3a-d). This motion is expected to be Brownian as well, however the diffusion coefficient must now include the average effect of the potential along the DNA as well as the viscous forces from solution. The reaction radius is then dependent on the size of the protein and the DNA, as well as the ionic strength of the solution, as it describes the point at which one can disregard the gradient of the radial portion of the electrostatic potential of DNA. Here, p is chosen to be the sum of the radii of the searching protein and the DNA plus the Debye screening length, r, under our reaction conditions.
[0138] Investigations of the sequence-specific binding of proteins commonly consist of foot printing and mutation methods, re v ealing both the extent of the protein-DNA interface and minimal consensus sequences. However, a fundamental question remains: over what range can the protein be out of register with the target sequence and still recognize it (Figure 3b)? If we move the protein I -bp to the left or right of a perfectly centered target, is the protein heavily biased toward registered binding or does the protein act as if it is has been placed on a random sequence for which it has no preference? If the answer is the former, then this begs the question, exactly how far out of register is it necessary to move the protein until the latter is true? This is the concept of the linear target size, which we will term a. If the position zn defines when the protein is perfectly in register with the target sequence, then whenever the proteins position, z, satisfies s ~ " ~ ^ st it recognizes the target sequence and rapidly moves to ZQ.
[0139] Above we considered the effect of lateral displacement on target recognition, we must also consider the effect of the protein's angular orientation with respect to the target (Figure 3a). The importance of orientation arises due to the fact that the entire surface of the protein does not carry out the function of sequence recognition. For example, consider a protein as a J anus particle, where half of the surface recognizes DNA sequences, and the other half does not. if we consider the search process to consist of only proteins colliding with the DNA from solution, it should be clear that half of the particles which collide with the target sequence will recognize it, as the other half would ha v e encountered the DNA in an unproductive orientation. While, it is difficult to consider the motion of a protein about its own axis in calculation, the effect of orientation can easily be accounted for by altering the target size. For the example above, an effective target size W equal to half of the size of the usual size, ~ ϊ, would account for the non-binding surface, while allowing every encounter to be productive. In the calculations that follow, we will treat the protein-quantum dot complex as a sphere. The portion of this sphere corresponding to the reactive portion of the protein is then defined by the half-angle, Θ, subtended by the area of an equivalent circle of the sphere. Then, the above condition for recognition becomes ¾ - ϊ — s - ¾ + ϊ ·
[0148] While the mapping of the orientation of the protein onto the target size accounts for the probability of successful collisions between the protein and target DNA when the protein originates in bulk solution, it underestimates the probability of locating the target via sliding mechanisms. If is the one-dimensional diffusion coefficient in the reduced system, it is then straightforward to recover the usual one-dimensional diffusion coefficient, *'!
[0141] Association Rate to the DNA from Solution. We consider the DNA to be initially void of bound protein and immersed in an isotropic distribution of RNAP molecules at concentration L». Then, the initial (first encounter) associateion rate of proteins to the DNA is identical to the flux of proteins across an absorbing cylinder of radius p and length L, where L- 48,5()2-bp. This flux can be found from the solution to the radial diffusion equation, subject to the following boundar conditions,
£ ?'., ft) = Cj rffii¾¾Ls p
The Laplace transformed solution satisfying these boundary conditions is given by:
Figure imgf000060_0001
Where the Laplace transform is defined as Λ*> ~ ^ «* the modified
Bessel function of the second kind. The solution to the above in the time domain can be written as (A29):
Figure imgf000060_0002
To determine the rate of association per unit length, we find the flux ^ of proteins across the boundary at p, and then integrate this flux over the entire surface of the DNA:
From the above we also find *« j In general, we will make use of the Asymptotic solutions of L( - S) by considering the small argument (long time) and the large argument
KQ( Z) i = ^r
(short time) expansions of ' . Note, here we have scaled time as r , which leads to
(A29):
Figure imgf000060_0003
(Eq. 6) .i; 4 i - - 2 ;· i !;¾?4 ; ; < ·ι \ 4 ; - - 2 γ ίτ
12] Effect of Protein Concentration on Association Rates & Calculation of
Effective Target Size. The hallmark of facilitated diffusion, is that the overall association rate can be greatly accelerated by the mechanisms of sliding and hopping we have described. Notably, the magnitude of this effect is proportional to the concentration of reactants. That is, the acceleration, which may be present at lower protein concentrations, vanishes as the concentration increases. To see this consider the flux to the operator to be comprised of three τ.. i- terms: the first, *·*, is described above, the second , the hopping rate into the promoter, and the third, * 's, is the sliding rate into the promoter. Notably, while A« is proportional to the initial concentration, **¾ and ''^ are also both proportional to the concentration but they are additionally scaled by the nonspecific lifetime, and in the case of h's, by the ID diffusion coefficient. Conceptually, the domination of « over the association rate is then easily inferred from the limiting cases of the non-specific lifetime and ID diffusion coefficient. For example, when the dissociation rate is zero, (i.e. the DNA is infinitely sticky), hopping cannot exist. Then the concentration at which hopping is effectively eliminated from the association rate corresponds to establishing ' 'a *" , where n is the average number of hops to the target. To estimate the limiting rate of association, recall that the average number of initial binding events per unit length up to a time t is given by.
Figure imgf000061_0001
[0143] Now, if there are N promoter sites, each of an effective length on the DNA, then the probabilit of randomly choosing one of these sites is 4 That is to say, on average, it takes iV Ψ random collision events until a promoter is found. We then ask for the time £ such that ' * >r . If we again let " " * , then for each value of
, there is a ' such the following equality is true.
Figure imgf000061_0002
When the concentration reaches a value necessary for W to be the predominant contribution to the association rate, the above calculation yields the effective target size. Furthermore, at any concentration higher than this value, the above continues to give the same result for . However, at lower concentrations, this calculation will over estimate ^ , due to the combined influence of hopping and sliding. Traditionally, this would be referred to as the "antenna" effect (A.8, A27).
[0144] Comparison to Previous Single-molecule Promoter Search Studies. The work of Kabata el al. is often cited as evidence for long-distance ID-diffusion of E. coli RNA polymerase along DNA (Al I). However inspection of the data presented in Kabata el al. shows that the reported single molecule trajectories were not diffraction limited fluorescent spots, as would be expected for single molecules of RNAP, We surmised that these data might have reflected behavior of large aggregates of RNA polymerase. An RNAP aggregate would have numerous DNA binding sites, and the collective effect of these binding sites could cause an aggregate to appear to slide on DNA. To test this hypothesis, we saturated large (1.0-μχη dia.) streptavidin-coated beads (ChemiceJl GmbH, Cat. No. 2205-1) with biotinylated RNAP, and asked whether these artificial mimics of an RNAP aggregate could slide on DNA. As shown in Figure 11, the RNAP-coated 1.0-μτη beads were observed moving along the D A by ID-diffusion and could also be pushed in ID along the DNA when flow was applied, confirming that an aggregate of RN A polymerase might display apparent sliding behavior. Alternatively, Kabata el al. could not define the number of DNA molecules that gave rise to each observed sliding event, and the DNA belts they were using were 2-3 μηι thick (10-300 mg nil- 1 ), so the apparent sliding they observed may have arisen as the cumulative outcome of multiple nonspecific binding events involving numerous DN A molecules.
[0145] The work of Guthhold el al. also presented evidence for ID diffusion of RNA polymerase along nonspecific DNA. In this study, the authors used AFM to image RNAP bound to nonspecific DNA adsorbed onto a mica surface, and reported a value for of 1.1x101 nm2 s--l and a nonspecific lifetime of 600 seconds (A12). The authors of this study concluded that the extraordinarily long 600-second lifetime of RNA polymerase bound to nonspecific sites was likely a consequence of both the D A and the protein being absorbed to the mica surface, as was necessary for the AFM measurements. In this scenario, the exceedingly small diffusion constant that Guthold el al. report for nonspecificaily bound RNA polymerase is in full agreement with our data, and we infer that they detected ID- diffusion because of the extraordinarily long lifetime of the nonspecificaily bound complexes adsorbed to the mica surface. [0146] Harada et al. studied DNA binding using Cy3-tagged E. coli KNAP and λ- phage DNA. held suspended above a surface by a dual optical trap, and concluded that ID diffusion may contribute to the promoter search (A10). In this study the authors reported different lifetimes for RNAP bound to either the AT-rich or GC-rich halves of the λ-DNA, which correspond to the side of λ containing all of the promoters and the side lacking promoters, respectively. For the AT-rich half, they reported lifetimes of 330-msec and 1 .5- sec, and for the GC-rich half they reported lifetimes of 120-msec. However, they did not detect a population of proteins consistent with open complexes, and they suggested that inability to detect open complexes was due to the relatively high 5 pN of tension on the DNA. They also demonstrated a drastic increase in binding at lower DNA tensions, therefore it seems plausible that the application of 5 pN of tension may have also altered the lifetimes of the other binding intermediates, thus yielding different values than reported in our study. Moreover, the lifetime of 120-msee reported by Harada et al. for the nonspecifically bound intermedi te was 4-fold higher than the 30-msec upper bound we have placed on this lifetime. Harada et al. also reported that 10 out of 381 RNAP molecules (2.6%) underwent I D diffusive motion detectable above instrument resolution (0.2 μηι). Notably, the experiments of Harada et al. were conducted in 50% sucrose, 10 and the high viscosity of this buffer 'ti>Q%stic} * w 1 may ha v e artificially prolonged the lifetime of the
nonspecifically bound intermediates by reducing the 3D diffusion coefficient of RNAP and restricting its ability to diffuse away from the DN A upon dissociation, or through the increase osmotic stress, which is commonly reflected as an increase in the affinity for nonspecific DNA relative to specific DNA sites30-35; either effect would have led to an overestimate of I D-sliding. More importantly, our results show that even if the protein wrere able to slide, 3D- diffusion will still dominate the promoter search mechanism at physiologically relevant protein concentrations regimes.
[0147] influence of DNA tension. DNA wrapping or bending by RN A polymerase should be antagonized under tension, and as such might be expected to perturb open complex formation. However, this effect should occur at much higher DN A tensions than are used in our study. The DNA in our experiments is typically stretched to -75% its contour length, and the tension on the DNA can be estimated based upon relative mean extension using the Worm-Like Chain model:
Figure imgf000064_0001
Where is the DNA persistence length (53 nm), ' g ' is termal energy, is the mean observed extension, L is the full contour length of the DNA (16.49 μ ι for λ-DNA), and F is the calculated tension force (A36). Based on this calculation the DNA in our experiments experiences a tension force of -0.35 pN. The total free energy change required for site- specific binding under tension for a protein that bends DNA can then be estimated as: F +— ilpksTVF
Figure imgf000064_0002
Where ® is the bend angle of the DNA molecule induced by the bound protein, K is the DNA stretch modulus (-1,200 pN), and 20 · is the length of the bound site times the length of an unperturbed base pair (20bp · 0.34 nm/bp)(A37). This equation predicts a simple, linear relationship between the energetic cost of bending the DNA as a function of applied tension: for a protein with bend angle of 60° and binding site of 20-bp, on DNA under 0.35pN of tension, > ^ /8 BT^ Qasec\ on these rough calculations, the low tension experienced by the stretched DN A in our experiments should have little or no impact on promoter binding by RNAP, and much higher tension forces than those that are accessible by simple flow-stretched DNA experiments would be required to substantially perturb the binding of proteins that bend the DNA. Note that the value for changes from
¾· 0,004 k> ■¾ 0.4 K8 T for binding site sizes ranging from 1 -bp to 1 OG-bp, and bend angles of 45° and 90° (for a 20-bp binding site) yield values of !" Mx^. " ®" ' ^s^, respectively: so our conclusion that DNA bending will not be impacted at low tension holds true for a range of site sizes and bend angles. The conclusion that low DNA tensions experienced in our curtain assays should not drastically impact binding is reflected by our data in that RNAP recognizes and binds to the promoters on the extended DNA substrates, the lifetime we obtained for open complex formation closely match literature values, and RNAP moves along the DN A when provided with all four rNTPs. In addition, RNA polymerase can transcribe against applied forces of up to ~ 14-25 pN (A38-39) which again suggests that the relatively lo tension used in our assays should have little or no impact upon the proteins ability to bind promoters.
References
1. von Hippel P, Berg O. Facilitated target location in biological systems. J Biol Chem. 1989;264:675-8.
2. Gorman J, Greene EC. Visualizing one-dimensional diffusion of proteins along DNA. Nat Struct Mo! Biol. 2008;15:768-74.
3. Herbert K, Greenleaf W, Block S. Single-molecule studies of RNA polymerase: Motoring along. Ann. Rev. Biochem. 2008;77: 149-176.
4. Haugen S, Ross W, Course R. Advances in bacterial promoter recognition and its control by factors thai do not bind DNA. Nat Rev Microbiol. 2008;6:507- 19.
5. Browning D, Busby S. The regulation of bacterial transcription initiation. Nat Rev Microbiol. 2004;2:57-65.
6. Saecker R, Record M, Dehaseth P. Mechanism of bacterial transcription initiation: RNA polymerase - promoter binding, isomerization to initiation-competent open complexes, and initiation of RNA synthesis. J Mol Biol. 201 1 ;412:754-71.
7. Nudler E. RNA polymerase active center: the molecular engine of transcription. Annu Re v Biochem. 2009;78:335-61 .
8. Mendoza-Vargas A, et al. Genome-wide identification of transcription start sites, promoters and transcription factor binding sites in E. cofi. PLoS One. 2009;4:e7526
9. Cho B, et al. The transcription unit architecture of the Escherichia coli genome. Nat
Biotechnol. 2009;27: 1043-9. 10. Berg OG, Blomberg C, Association kinetics with coupled diffusional flows. Special application to the lac repressor— operator system. Biophys Chem. 1976;4:367-81.
11. von Hippei PH, Berg OG. Facilitated target location in biological sysiems, J Biol Chem. 1989;264:675-8.
12. Mirny L, et al. How a protein searches for its site on DMA: the mechanism of facilitated diffusion. Journal of Physics a-Mathematical and Theoretical. 2009:42.
13. Hal ford S, Marko J. How do site-specific DNA-binding proteins fsnd their targets? Nucleic Acids Res. 2004;32:3040-52.
14. Berg O, Winter R, von Hippei P. Diffusion-driven mechanisms of protein translocation on nucleic acids. 1 Models and theory. Biochemistry. 1981 ;20:6929-48.
15. Berg O, Blomberg C. Association kinetics with coupled diffusional flows. Special application to the lac repressor operator system. Biophys Chem. 1976;4:367-81.
16. Riggs AD, Bourgeois S, Cohn M. The lac represso --operator interaction. 3. Kinetic studies, J Mol Biol. 1970;53:401- 17.
17. Halford S. An end to 40 years of mistakes in DNA-protein association kinetics?
Biochem Soc Trans. 2009;37:343-8.
18. Mirny L, et al. Ho a protein searches for its site on DMA: the mechanism of facilitated diffusion. J Physics A. 2009;42:434013.
19. Li G-W, Berg O, Elf J. Effects of niacromolecular crowding and DNA looping on gene regulation kinetics. Nature Physics. 2009;5:294-297.
20. Hu T, Grosberg A, Sliklovskii B. How proteins search for their specific sites on DNA: the role of DNA conformation. Biophysical Journal. 2006;90:2731-2744.
21. Bauer M, Metzler R. Generalized faciiiated diffusion model for DNA-binding proteins with search and recognition states. Biophysical Journal. 2012;102:2321-2330.
22. Kolesov G, Wunderlich Z, Laikova O, Gelfand M, Mirny L. How gene order is influenced by the biophysics of transcriotion regulation. Proc Natl Acad Set U S A.
2007;104: 13948- 13953 23. Wunderiich Z, Mirny L. Spatial effects on the speed and reliability of proiein-DNA search. Nucleic Acids Res. 2008;36:3570-8.
24. Das R, Kolomeisky A. Facilitated search of proteins on DNA: correlations are important. Phys Chem Chem Phys. 2010; 12:2999-3004.
25. Singer P, Wu CW. Promoter search by Escherichia cols RNA polymerase on a circular DNA template. J Biol Chem. 1987;262: 14178-89.
26. Ricchetti M, Metzger W, Heuma n H. One-dimensional diffusion of Escherichia coli DNA-dependent RNA polymerase: a mechanism to facilitate promoter location. Proc Natl Acad Sci U S A. 1988;85:4610-4.
27. Kabata H, et al. Visualization of single molecules of RNA polymerase sliding along DNA. Science. 1993 ;262 : 1561 -3.
28. Gushold M, et al. Direct observation of one-dimensional diffusion and transcription by Escherichia coli RNA polymerase. Biophys J. 1999;77:2284-94.
29. Harada Y, et al. Single-molecule imaging of RNA polymerase-DNA interactions in real time. Biophys J. 1999;76:709-15.
30. Berg J, Tymoczko J, Stryer L. Biochemistry. W.H. Freeman and Company; New York: 2007.
31. Roe JH, Burgess RR., Record MX., Jr Kinetics and mechanism of the interaction of Escherichia coli RNA polymerase with the lambda PR promoter. J Moi Biol. 1984; 176:495- 522.
32. Friedman L, Gefles J. Mechanism of transcription initiation at an activator-dependent promoter defined by single-molecule observation. Cell. 2012;148:679-689.
33. DeHaseth PL, Zupancic M, Record MX., Jr RNA polymerase-promoter interactions: the comings and goings of RNA polymerase. J Bacterid. 1998; 180:3019-3025.
34. Herbert K, Greerileaf W, Block S. Single-molecule studies of RNA polymerase: motoring along. Annu Rev Biochem. 2008;77: 149-76. 35. Gorman J, Fazio T, Wang F, Wind S, Greene EC. Na.no fabricated racks of aligned and anchored DNA substrates for single-molecule imaging. Langmuir. 2010;26:1372-9.
36. Gorman J, Piys A, Visnapuu M, Aiani E, Greene EC. Visualizing one-dimensional diffusion of eukaryotic DNA repair factors along a chromatin lattice. Nat Struct Mol Biol. 2010;17:932-8.
37. Gorman J, et al. Single-molecule imaging reveals target search mechanisms during mismatch repair. Proc Natl Acad Sci U S A. 2012:i09:E3074-E3()83.
38. Fmkelstein I, Visnapuu M, Greene EC. Single-molecule imaging reveals mechanisms of protein disruption by a DNA translocase. Nature. 2010;468:983-7.
39. Simons R, Hoopes B, McClure W, Kleclcner N. Three promoters near the termini of IS! 0: pIN, pOUT, and pill. Cell. 1983;34:673-82.
40. McClure W. Rate-limiting steps in RNA chain initiation. Proc Natl Acad Sci U S A. 1980;77:5634-8.
41. Hawley D, McClure W. In vitro comparison of initiation properties of bacteriophage lambda wild-type PR and x3 mutant promoters. Proc Natl Acad Sci Li S A. 1980;77:6381-5.
42. Dayton C, Prosen D, Parker K, Cech C. Kinetic measurements of Escherichia eoli RNA polymerase association with bacteriophage T7 early promoters. J Biol Ghent.
1984;259: 1616-21.
43. Brunner M, Bujard IT. Promoter recognition and promoter strength in the Escherichia cofi system. EMBO J. 1987;6:3139-44.
44. Wang Y, Austin R, Cox E. Single molecule measurements of repressor protein I D diffusion on DNA. Phys Rev Lett. 2006;97:048302.
45. Elf J, Li G, Xie X. Probing transcription factor dynamics at the single-molecule level in a living cell Science. 2007;316: 1 191-4.
46. Kim JTT, Larson RG. Single-molecule analysis of ID diffusion and transcription elongation of T7 RNA polymerase along individual stretched DNA molecules. Nucieic Acids Res. 2007;35:3848-58. 47. Tafvizi A, Huang F, Fersht A, Mirny L, van Qijen A. A single-molecule characterization of p53 search on DNA. Proc Natl Acad Sci U S A. 201 1;108:563-8.
48. Berg OG. Orientation constraints in diffusion-limited macromolecular association. The role of surface diffusion as a rate-enhancing mechanism. Biophys J. 1985;47: 1-14.
49. Austin R, Karohl J, Jovin T. Rotational diffusion of Escherichia coli RNA polymerase free and bound to deoxyribonucleic acid in nonspecific complexes. Biochemistry.
1983;22:3082-90.
50. Gorman J, et al. Dynamic basis for one-dimensional DNA scanning by the mismatch repair complex Msh2-Msh6. Mol Ceil. 2007:28:359-70.
51. Berg OG, von Hippel PH. Diffusion-controlled macromolecular interactions. Annu Rev Biophys Biophys Chem. 1985; 14: 131-60.
52. Moran U, Philips R, Milo R. Snapshot: key numbers in biology. Cell. 2010:141 : 1262
53. Minion A. The influence of macromolecular crowding and macromolecular confinement on biochemical reactions in physiological media. J Biol Chem. 20Q1;276: 10577-10580.
54. Graham J, Johnson R, Marko J. Concentration-dependent exchange accelerates turnover of proteins bound to double-stranded DNA. Nucleic Acids Res. 2011;39:2249-2259.
55. Ishihama A. Functional modulation of Escherichia coil RNA polymerase. Annu Rev Microbiol. 2000;54:499-518.
56. Hammar P, et al. The lac repressor displays faciliated diffusion in living cells. Science. 2012;336: 1595-1598.
57. McClure W. Mechanism and control of transcription initiation in prokaryotes. Annu Rev Biochem. 1985;54: 171-204.
58. So L, et al. General properties of transcriptional time series in Escherichia coli. Nat Genet. 43:554-60.
59. Reppas N, Wade J, Church G, Struhl K. 'The transition between trancriptional initiation and elongation in E. coli is highly variable and often rate limiting. Mol Cell. 2006;24:747- 60. Shaevitz J, Abbondanzieri E, Landick R, Block S. B cktracking by single RNA polymerase molecules observed at near-base-pair resolution. Nature. 2003;426:684-7.
Supplementary References
Al . Wang, Y., Austin, R. & Cox, E. Single molecule measurements of repressor protein ID diffusion on DNA. Phys Rev Lett 97, 048302 (2006).
A2. Elf, J., Li, G. & Xie, X. Probing transcription factor dynamics at the single- molecule level in a living cell. Science 316, 1191 - 4 (2007).
A3. Kim, J.H. & Larson, R.G. Single -molecule analysis of ID diffusion and transcription elongation of T7 RNA polymerase along individual stretched DNA molecules. Nucleic Acids Res 35, 3848-58 (2007).
A4. Finkelstein, !., Visnapuu, M. & Greene, E. Single-molecule imaging reveals mechanisms of protein disruption by a DNA transiocase. Nature 468, 983-7 (2010).
A5. Tafvizi, A., Huang, F., Fersht, A., Mirny, L. & van Oijen, A. A single-molecule characterization of p53 search on DNA. Proc Natl Acad Sci U S A 108, 563 - 8 (2011).
A6. Gorman, J., Plys, A., Visnapuu, M., Alani, E. & Greene, E. Visualizing one-dimensional diffusion of eukaryotic DNA repair factors along a chromatin lattice. Nat Struct Mol Biol 17,
932- 8 (2010).
A7. Visnapuu, M.-L. & Greene, E. Single-molecule imaging of DNA curtains reveals intrinsic energy landscapes for nucleosome deposition. Nat Struct Mol Biol 1 , 1056-1062 (2009).
A8. Ricchetti, M., Metzger, W. & Heumann, H. One-dimensional diffusion of Escherichia coli DNAdependent RNA polymerase: a mechanism to facilitate promoter location. Proc Natl Acad Sci USA 85, 4610 - 4 (1988).
A9. Singer, P. & Wu, C. Promoter search by Escherichia coli RNA polymerase on a circular DNA template. J Biol Chem 262, 14178 - 89 (1987).
A10, Harada, Y. et al. Single-molecule imaging of RNA polymerase-DNA interactions in real time. Biophys J 76, 709 - 15 ( 1999).
Al 1. Kabata, H. et al. Visualization of single molecules of RNA polymerase sliding along DNA. Science 262, 1561 - 3 (1993).
A 12. Guthofd, M. et al. Direct observation of one-dimensional diffusion and transcription by Escherichia coli RNA polymerase. Biophys J 77, 2.284 - 94 (1999). A13. Fong, R., Woody, S. & Gussin, G. Direct and indirect effects of mutations in lambda PRM on open complex formation at the divergent PR promoter. J Mol Biol 240, 1 19 - 2.6 (1994).
A14. Mita, B., Tang, Y. & del aseth, P. Interference of PR-bound RNA polymerase with open complex formation at PRM is relieved by a 10-base pair deletion between the two promoters. J Biol Chem.21 , 30428 - 33 (1995).
A15. Hoopes, B. & McClure, W. A cll-dependent promoter is located within the Q gene of bacteriophage lambda. Proc Nail Acad Sci USA 82, 3134 - 8 (1985).
Al 6. Hershberger, P. & deHaseth, P. RNA polymerase bound to the PR promoter of bacteriophage lambda inhibits open complex formation at the divergently transcribed PRM promoter. Implications for an indirect mechanism of transcriptional activation by lambda repressor. J MolBiol 222, 479 - 94 (1991).
A17. Singer, P. & Wu, C. Kinetics of promoter search by Escherichia coli RN A polymerase. Effects ofmonovalent and divalent cations and temperature, J Biol Chem 263, 4208 - 14 (1988).
Al 8. Tafvizi, A. et al. Tumor suppressor p53 slides on DNA. with low friction and high stability. Biophys J 5, L01 - 3 (2008).
A 19. Shaevitz, J., Abbondanzieri, E., Landick, R. & Block, S. Backtracking by single RNA polymerase molecules observed at near-base-pair resolution. Nature 426, 684 - 7 (2003). A20. Roe, J.H., Burgess, R..R. & Record, M.T., Jr. Kinetics and mechanism of the interaction of Escherichia coli RNA polymerase with the lambda PR promoter. J Mol Biol 176, 495-522 (1984).
A21. Greene, E., Wind, 8., Fazio, T., Gorman, J. & Visnapuu, M.-L. DNA curtains for high- throughput single-molecule optical imaging. Methods Enzymol 472, 293-315 (2010).
A22. Michalet, X. Mean square displacement analysis of single-particle trajectories with localization error: Brownian motion in an isotropic medium. Phys Rev E Stat Nonlin Soft Matter Phys 82, 0 1914 (2010).
A23, Qian, FL, Sheetz, M. & Efson, E. Single particle tracking. Analysis of diffusion and flow in twodimensional systems. Biophys J 6Q, 910 - 21 (1991).
A24. Sadler, J., Sasmor, H. & Betz, J. A perfectly symmetric lac operator binds the lac repressor very tighly. Proc Nail Acad Sci USA 80, 6785-6789 (1983).
A25. Smoluchowski, M. Three lectures on diffusion, Brownian motion and the coagulation of colloids. Phys Z 17, 557-571, 585-599 (1916). A26. von Hippel, P. & Berg, O. Facilitated iarget location in biological systems. J Biol Chem 264, 675 - 8 (1989).
A27. Mirny, L, et al. How a protein searches for its site on DNA: the mechanism of facilitated diffusion. J Physics A 42, 434013 (2009).
A28, Riggs, A., Bourgeois, S. & Colin, M. The lac represser-operator interaction. 3. Kinetic studies. J Mol Biol 53, 401 - 17 (1970).
A29. Carslaw, H. & Jaeger, J. Conduction of heat in solids, (Oxford University Press, 1959). A30. Lynch, T. & Sligar, S. Macromolecular hydration changes associated with BamHl binding and catalysis. J Biol Chem 275, 30561 - 5 (2000).
A31 . Gamer, M. & Rau, D. Water release associated with specific binding of gal repressor. EMBO J , 1257 - 63 ( 1995).
A32, Parsegian, V., Rand, R. & Rau, D, Macromoiecuies and water: probing with osmotic stress. Methods Enzymol 259, 43 - 94 (1995).
A33. Sidorova, . & Rau, D. Differences in water release for the binding of EcoRI to specific and nonspecific DNA sequences. Proc Nail Acad Sci U S A 93, 12272 - 7 (1996).
A34. Robinson, C. & Sligar, S. Molecular recognition mediated by bound water. A mechanism for star activity of the restriction endonuclease EcoRL J Mol Biol 234, 302 - 6 (1993).
A35, Robinson, C. & Sligar, S. Changes in solvation during DNA binding and cleavage are critical to altered specificity of the EcoRI endonuclease. Proc Natl Acad Sci US A 95, 2186 - 91 (1998).
A36. Bustamante, C, Marko, J., Siggia, E. & Smith, S. Entropic elasticity of lambda-phage DNA. Science 265, 1599- 1600 ( 1994).
A37. van den Broek, B., N oom, M. & Wuite, G. DNA-tension dependence of restriction enzyme activity reveals mechanochemical properties of the reaction pathway. Nucleic Acids Res 33, 2676-2684 (2005).
A38. Wang, M. et al. Force and velocity measured for single molecules of RNA polymerase. Science 282, 902-907 ( 1998).
A39. Yin, H. et al. Transcription against an applied force. Science 270, 1653- 1657 ( 1995).
EXAMPLE 2 - Single-Stranded DNA Curtains for Real-Time Single-Molecule
Visualization of Protein-Nucleic Acid Interactions [0148] Single -molecule imaging of biological macromolecules has dramatically impacted our understanding of many types of biochemical reactions. 'TO facilitate these studies, we have established new strategies for anchoring and organizing DNA molecules on the surfaces of microfluidic sample chambers that are otherwise coated with fluid lipid bilayers. This previous work was reliant upon the use of double -stranded DNA, precluding access to information on biological processes involving single-stranded nucleic acid substrates. Here, we present procedures for aligning and visualizing single-stranded DNA molecules along the leading edges of nanofabricated barriers to lipid diffusion, in both "single-tethered" and "double -tethered" experimental formats. This new single-molecule approach provides long- awaited access to critical biological reactions involving single-stranded DNA binding proteins.
[0149] Protein-nucleic acid interactions contribute to all aspects of gene expression, genome maintenance, and DNA replication, and defects in protein-nucleic acid interactions are often the underlying causes of genetic diseases and cancer. Single-molecule
methodologies have begun providing remarkable new infonnation regarding the molecular details of reactions involving proteins and either DNA or RNA. However, it is challenging to acquire statistically meaningful data from technically demanding experiments designed to look at individual biochemical reactions, and this problem is compounded for cases where the biological molecules under investigation are heterogeneous and/or the reaction trajectories contain transient intermediates. In addition, most single-molecule techniques require that the molecules under investigation be physically anchored to a solid support. Extensive controls are essential to verify that surface tethering does not interfere with biological function. To help overcome these problems, we have developed new experimental strategies for organizing thousands of individuai DNA molecules into defined patterns on the surfaces of microfluidic sample chambers coated with lipid bilayers that mimic cell membrane. We refer to these methodologies as "DNA curtains", and they are assembled by tethering one end of a biotinylated DNA molecule to a lipid bilayer, which coats the surface of a microfluidic sample chamber, 1 5 The bilayer provides an inert environment compatible with a range of biological macromolecules. DNA is tethered to the bilayer via a biotin-streptavidin linkage, permitting the DN A to diffuse in two dimensions. Hydrodynamic force is used to organize the DNA along nanofabricated barriers that disrupt the continuity of the bilayer. Lipids cannot traverse these barriers; therefore, the molecules align along the barriers and extend parallel to the sample chamber surface, allowing them to be visualized by total internal reflection fluorescence microscopy (TIRFM). The barriers are made by electron-beam lithography, and variations in barrier patterns allow precise control over the organization of the DNA. DNA curtains enable direct visualization of hundreds or even thousands of individual DNA molecules along with any proteins bound to the DNA by real-time fluorescence microscopy, and the molecules themselves are confined within a "bio- friendly" microenvironment that minimizes nonspecific interactions with the sample chamber surface.6 9
[0158 Single-stranded DNA (ssDNA) is a key intermediate in nearly all biochemical reactions related to the maintenance of genome integrity (e.g., DNA replication, homologous DNA recombination, nucleotide excision repair, mismatch repair), but the lack of methodologies for readily visualizing long ssDN A molecules has been noted in the literature as a crucial limitation of existing single-molecule technologies.10 Several challenges have prevented use of ssDNA in single-molecule curiam experiments. Single-molecule experiments often rely upon intercalating dyes such as YOYOl to view dsDNA, but YOYOl causes extensive DNA nicking,11 This is not problematic for dsDNA, but even a single nick in the phosphate backbone will cause ssDNA to break away from its attachment to the surface. In addition, dsDNA is stiff and readily stretched by the application of buffer flow (-80% contour extension at ~1 pN of force).12 In contrast, ssDNA is much more flexible and also forms extensive secondary structure. As a consequence, ~50-6G pN of force is required to stretch ssDNA to -80% of its full contour length.12 This higher force regime is inaccessible with the laminar flo typically used for single molecule imaging.
[0151] Here, we generate ssDNA substrates using an in vitro rolling circle replication assay, and we align these long ssDNA molecules into DNA curtains along the leading edges of nanofabncated barriers to lipid diffusion. We then utilize a fluorescently tagged variant of replication protein A (RPA), which is a DNA-binding protein with high-specificity for single- stranded DNA substrates,13 to both label the ssDNA and remove secondary structure.
RPA-ssDNA filaments are stiffer than naked ssDN A, all owing the RPA-bound ssDNA to be stretched out by laminar flow and visualized by real-time optical microscopy. This approach will pro vide access to a wide range of problems related to protein-ssDNA interactions, in particular those related to the repair of damaged DNA.
Methods [0152] φ29 DNA Polymerase. The gene encoding φ29 DNA polymerase was purchased from Genscript and subcloned into a modified pTXB3 vector containing an N- terminal hexahistidine tag (6xHis) upstream of a 3¾ Flag epitope tag. The protein was expressed in E. coli strain BL21 with overnight induction at 18 °C with 0.3 mM isopropyl-β- D-thiogalactopyranoside, IPTG. The ceils were collected by centrifugaiion and resuspended in lysis buffer (25 mM Tris-HCl [pH 7.4], 500 mM NaCI, 5% glycerol, 5 mM imidazole), along with protease inhibitors (0.5 mM 4-(2-ammoethyi) benzenesulfonyl fluoride (AEBSF; Fisher), 10 mM £-64 (Sigma), 2 mM benzamidine), and then lysed by sonication. The lysate was clarified by centrifugaiion, and the supernatant was applied to Ni-NTA resin (Qiagen). The resin was washed with Ni-wash bu 'er (25 mM Tris-HCl, pH 7.4, 500 mM NaCI, 5% glycerol, 5 mM imidazole). The protein was eluted in 25 mL ofNi-elution buffer (25 mM Tris-HCl, pH 7.4, 500 mM NaCI, 5% glycerol, 300 mM imidazole) and applied directly to a chitin column (NEB). The chitin column was washed with chitin-wash buffer (25 mM Tris, pH 7.4, 500 mM NaCI, 0.1 mM ethylenediaminetetraacetic acid, EDTA), and the protein was eluted by incubating the resin in chitin-wash buffer containing 50 mM dithiothreitoi (DTT) overnight at 4 °C. The eluate was diaiyzed into storage buffer (10 mM Tris, pH 7.4, 100 mM KCf, 1 mM DTT, 0.1 mM EDTA, 50% glycerol) and stored at -80°C. Protein concentration was determined using s2sonm - 1.2 x 1 ()5 M 1 cm 1 to yield a final concentration of 10 μΜ (-0,75 mg/mL).
[0153] SEiX^ffi^uMS M^ ^MuuFj^lS l,^- A plasmid encoding all three S.
cerevisiae subunits of replication protein A (scRPA) was generously provided.1 3 An Avrll site was introduced into the 30 kDa subunit by site directed mutagenesis. The gene for enhanced green fluorescent protein (eGFP) was cloned downstream of the 32 kDa subunit. ScRPA-eGFP was expressed in is. coli strain BL21 with an overnight induction at 18°C with 0.3 mM IPTG. The cells were collected by centrifugaiion, resuspended in lysis buffer (50 mM NaKP04, 250 mM NaCI, 10 mM imidazole [pH 7.9]), and lysed by sonication. The lysate was clarified by centrifugaiion and bound to Ni-resin (Qiagen) in batch for 30 min at 4°C. The beads were washed with Ni-wash buffer (50 mM NaKP04, 250 mM NaCI, 20 mM imidazole). The protein was eluted with 2 χ 5 mL in Ni-elute buffer (50 mM NaKP04, 250 mM NaCI, 250 mM imidazole) and diaiyzed against 2 L of bu"er (30 mM Hepes [pH 7.9], 1 mM DTT, 0.25 mM EDTA, 0.01% NP40, 80 mM NaCI). The protein was then purified by Hi-trap Q sepharose (GE Healthcare) with a gradient from 0 to 70% B (30 mM Hepes [pH 7.9], I mM DTT, 0.25 mM EDTA, 0.01% NP40; A, 80 mM NaCI, B, 1 M NaCI) over 150 mL. ScRPA-eGFP was dialyzed overnight against 1 L of buffer (30 niM Hepes [pH 7.9], 150 mM NaCl, 1 niM DTT, 0.01% NP40, 0.25 mM EDTA). The protein was then concentrated with polyethylene glycol (PEG; Thermofisher) and then dialyzed against storage buffer containing 50% glycerol. The protein was aliquoted, frozen in liquid N?, and stored at -80 °C. The final concentration was 8 μΜ (~T , 1 nig/niL) as determined from the absorbance of the eGFP chromophore at 488 nM (8488 nm = 55 000 cm-1 M"!).
10154] Sgsl Purifcation and Labeling. Sgsl contains N-terminal flag and C- terminal 3 * HA tags and was expressed in Sf9 cells and purified over an anti-Flag column, as described. '4 Sgs l was labeled by incubating with anti-HA quantum dots (QDs) for 2 h on ice prior to imaging.
[0155] Single-Stranded DNA Substrates. Single-stranded M13mpl 8 (NEB) was annealed to a biotinylated primer (5 '-BioTEG-dTTT TTT TTT TTT TTT TTT T'T'T TTT TTT TTT GTA AAA CGA CGG CCA GT). The annealed product was then passed through a size exclusion spin column (Centrispin 40; Princeton Separations) to remove excess primer. The final volume was 200 μΕ with an approximate concentration of 15 nM annealed M13mp l 8. Rolling circle replication reactions ( 100 uL) contained 50 mM Tris [pH 7.4], 2 mM DTT, 10 mM MgC12, 10 mM ammonium sulfate, 0.15 nM primed M13mp ! 8 DNA, and 200 μΜ deoxyribonucleoside triphosphates, dNTPs. Replication was initiated by addition of ©29 DNA polymerase to a final concentration of 100 nM and incubated for 30 min at 30 °C, Reactions were quenched by addition of EDTA to a final concentration of 75 mM.
[0156] Electron-Beam Lithography. Barriers were fabricated by electron-beam lithography, as described in brief, fused silica slides were cleaned in NanoStrip (CyanTek Corp) for 20 min, rinsed with acetone and isopropanol, and dried with N2. Slides were spin- coated with two layers of polymethylmethacrylate (PMMA; 25K and 495K; MicroChem), followed by a layer of Aquasave (Mitsubishi Rayon). Patterns were written with a FEi Sirion scanning electron microscope (J. C. Nabity, Inc.). Aquasave was removed with deionized water, and resist was developed using isopropanol/methyl isoburyi ketone (3 : 1) for 1 min with ultrasonic agitation at 5°C. The substrate was rinsed in isopropanol and dried with N2. Barriers were made with a 15-20 ran layer of chromium (Cr), and following liftoff, samples were rinsed with acetone and dried with N?..3 [0157] Flow cells. Flow cells and lipid bilayers were prepared as described,' Briefly, lipid vesicles composed of DOPC (1 ,2-dioleoyi-sn-glycerophosphocholine), 0,5% biotinylated-DPPE (l,2-dipalmitoyl-s«-g3ycero-3-phosphoethanolamine-N-(cap biotinyl)), and 8% mPEG 550-DOPE ( 1 ,2-dioleoyl-sn-glycero-3-phosphoethanolamine-N- [methoxy(poiyethylene glycoi)-550J) were diluted in buffer containing 10 mM Tris-HCl (pH 7,4) and 100 mM NaCl and incubated within the sample chamber for 30 min. The surface was further passivated with Buffer A [40 mM Tris-HCl (pH 7.4), 1 mM DTT, 1 mM MgC12, 0.2 mg mL~J BSA]. The DNA was coupled to the bilayer and aligned at the barriers. The !ow cells were attached to a syringe pump system (KD Scientific) and flushed with Buffer A.
[0158] Microscopy. Experiments were performed with a custom-built prism-type total internal reflection fluorescence (TTRF) microscope equipped with a 2.00 mW diode- pumped solid-state laser (488 ran; Coherent), and the laser power at the face of the prism was ~5 mW, as described." 8
Results and Discussion
[0159] φ29 DNA polymerase is highly processive and can generate ssDNA molecules
(#70 000 nucleotides (nt) in length)15,16 in rolling circle replication assays using a circular ssDNA template (M13mpl8; 7249 nt) (Figure 18A,B). The ssDNA products harbor a single biotin at the 5' end, which can be linked to a lipid bilayer through a tetravalent streptavidin linkage (Figure 18C). Single-stranded DNA molecules cannot be stretched by the hydrodynamic forces accessible within our system (<1 pN) nor can they be labeled with fluorescent intercalating dyes. 'TO overcome these issues, we chose scRPA-eGFP as an ssDNA labeling reagent based on several criteria. First, scRPA binds tightly to ssDNA (Ka « 109— -1011 M- 1),13 so ssDNA binding is expected to occur at low protein concentrations amenable to single-molecule imaging. Second, RPA eliminates secondary structure in ssDNA, protects ssDNA from damage, and increases the persistence length of ssDNA;13'1 ' these features should ensure that ssDNA bound by RPA could be readily stretched by buffer flow (Figure 18C,D). Third, scRPA retains biological function in vivo when labeled with eGFP on the C-terminus of the 32 kDa subunit,18 ensuring that the labeled protein would retain all relevant activities related to its biological functions,
[0 60] To assemble single-tethered ssDNA curtains, the products of a rolling circle replication assay were anchored to the lipid bilayer, and scRPA-eGFP (0.2 nM) was then injected at a rate of 1.0 mL/min. Upon injection of the scRPA-eGFP, the ssDNA becomes visible and begins extending toward its full contour length (Figure 19A). When flow was paused, the ssDNA-scRPA-eGFP diffused away from the surface and out of the evanescent field, confirming that the molecules were not stuck to the bilayer (Figure 19B). Wide-field images revealed varying lengths of ssDNA, as expected, with molecules ranging from 1.8 to 212 μτη, and an average length of -20 μηι (Figure 1 C). Electron microscopy (EM) images of hitman RP A bound to ssDNA reveal that the protein-coated ssDNA had a contour length that was approximately 17% shorter than naked ssDNA, corresponding to a distance of -0,40 am between adjacent bases for RPA-bound ssDNA.1 ' Assuming S. cerevisiae and human RPA interact similarly with ssDNA and that the structure of RPA-ssDNA is similar in solution and on EM grids, then the substrates observed in our assays would be expected to range from 4500 to 530000 nucleotides (nt) in length, with an average length of -50 000 nt. Importantly, scRPA-eGFP remained bound to the ssDNA. with little or no dissociation or exchange with free RPA in solution, even after observations over times ranging up to >60 min. The eGFP fluorophores do bleach over extended observation periods, but the ssDNA itself does not shorten, indicating that the photob!eached scRPA-eGFP remained bound to the ssDNA and did not exchange with protein in solution (Figure 19D). In addition, scRPA-eGFP remained bound to the ssDNA when chased with buffers containing either 1 M NaCi or 3.5 M urea (Figure 19E), as expected on the basis of prior bulk biochemical experiments.1"'
[0161] Single-tethered DMA curtains require constant buffer flow through the sample chamber in order to visualize the DMA substrates. In contrast, double-tethered curtains can be visualized in the absence of flow, which is advantageous in experimental scenarios where reagents are limiting or when the application of buffer flow might perturb the biological reactions under investigation/''' ' To make double-tethered ssDNA curtains, we utilized nanofabricated patterns consisting of linear barriers for aligning the ssDNA and pentagon- shaped anchor points for tethering the downstream ends of the molecules (Figure 18D). The scRPA-eGFP-ssDNA adsorbed nonspecifically to the anchor points, allowing the molecules to be viewed even i the absence of buffer flow (Figure 2ΘΑ). As a simple proof-of- principle, we next visualized the protein Sg l bound to the double-tethered ssDNA; Sgsl is the S, cerevisiae RecQ heliease that participates in a number of reactions involving ssDNA.1 ,19 Sgsl was tagged with a quantum dot (QD) and injected into a flow cell containing double-tethered ssDNA curtains labeled with scRPA-eGFP. Both the ssDNA and the bound Sgsl were readily visible with two-color imaging (Figure 2ΘΒ).
[0162] In summary, ssDNA is a key intermediate in nearly all reactions related to
DNA metabolism and genome maintenance. However, the lack of approaches for studying long ssDNA molecules by real-time single molecule imaging has greatly hindered progress on studies of a number of ssDNA binding proteins essential for DNA repair and
metabolism.1" Here, we have presented a simple technique for preparing and visualizing ssDNA curtains bound by scRPA-eGFP. The remarkable stability of the
scRPA-eGFP-ssDNA complex is of great benefit because it eliminated the need to maintain a pool of free RPA, which would contribute to background signal. Moreover, RPA is a ubiquitous protein involved in all biological reactions that have an ssDNA intermediate (e.g., homologous DNA recombination, nucleotide excision repair, post-replicative mismatch repair, DNA replication, etc.), so the experiments shown will permit in-depth biological studies involving a broader compliment of proteins involved in the various reactions.1 Importantly, naked ssDNA is unlikely to exist in vivo because it becomes rapidly coated with RPA (or SSB in prokaryotes);13 therefore, development of methods for observing RPA-bound ssDNA provides a biologically relevant context for experimentally accessing a range of other proteins that act on ssDNA. (such as the homologous recombination proteins Rad51, Srs2, Rad52, etc.).
References
(1) Fazio, T.; Visnapuu, M. L.; Wind, S.; Greene, E. C. Langmuir 2008, 24, 10524-10531.
(2) Gorman, J.; Fazio, T.; Wang, F.; Wind, 8.; Greene, E. Langmuir 2010, 26, 1372-1379.
(3) Greene, E.; Wind, 8.; Fazio, T.; Gorman, J.; Visnapuu, M. Methods Enzymol. 2010, 472, 293-315.
(4) Visnapuu, M.-L.; Fazio, T.; Wind, S.; Greene, E. C. Langmuir 2008, 24, 11293-11299.
(5) Graneli, A.; Yeykal, C; Prasad, T.; Greene, E. Langmuir 2006, 22, 292-299.
(6) Visnapuu, M.-L.; Greene, E. Nat. Struct. Moi. Biol. 2009, 16, 1056-1062.
(7) Gorman, J.; Plys, A.; Visnapuu, M. ; Alans, E.; Greene, E. Nat. Struct. Mol. Biol. 2010,
17, 932-938.
(8) Finkelstein, I.; Visnapuu, M.-L,; Greene, E. Nature 2010, 468, 983-987.
(9) Gorman, J.: et ai Mol. Cell 2007, 28, 359-370.
(10) Ha, T.; Kozlov, A.; Lohman, T. Annu. Rev. Biophys. 2012, 41, 295-319. (1 1) Tycon, Μ,; Dail, C; Faison, K.; Melvin, W,; Fecko, C, Anal, Biochem. 2012, 426, 13-21.
(12) Bustamante, C; Bryant, Z.; Smith, S. Nature 2003, 421 , 423-427.
(13) Wold, M. Annu. Rev. Biochem. 1997, 66, 61 92.
(14) Niu, H.; et al Nature 2010, 467, 108-1 1 1.
(15) Blanco, L.; et al. J. Biol. Chem. 1989, 264, 8925-8940.
( 16) Brockman, C; Kim, 8.; Latinwo, F.; Schroeder, C. Soft Matter 201 1, 7, 8005-8012,
(17) Treuner, K.; Ramsperger, U.; Knippers, R. J. Mol. Biol. 1996, 259, 104- 1 12.
(18) Lisby, M.; Barlow, J.; Burgess, R.; Rothstein, R. Cell 2004, 1 18, 699-713.
(19) Bernstein, K.; Gangloff, S.; Rothstein, R. Annu. Rev. Genet. 2010, 44, 393-417.
EXAMPLE 3 - Single-nio!ecnle imaging reveals target-search mechanisms during D A
[0163] The ability of proteins to locate specific targets among a vast excess of nonspecific DNA is a fundamental theme in biology. Basic principles governing these search mechanisms remain poorly understood, and no study has provided direct visualization of single proteins searching for and engaging target sites. Here we use the postreplicaiive mismatch repair proteins MutSa and MutLa as model systems for understanding diffusion-based target searches. Using single-molecule microscopy, we directly visualize MutSa as it searches for DNA lesions, MutLa as it searches for lesion-bound MutSa, and the MutSa/MutLa complex as it scans the flanking DNA. We also show that MutLa undergoes intersite transfer between juxtaposed DN A segments while searching for lesion-bound MutSa, but this activity is suppressed upon association with MutSa, ensuring that MutS/MutL remains associated with the damage-bearing strand while scanning the flanking DNA. Our findings highlight a hierarchy of lesion- and ATP-dependent transitions involving both MutSa and MutLa, and help establish how different modes of diffusion can be used during recognition and repair of damaged DNA.
Postreplicative mismatch repair (MMR) corrects errors in DNA synthesis before they lead to genomic instability (B 1---3). MMR increases the fidelity of DNA replication up to 1,000-fold, and MMR defects in humans cause hereditasy nonpolyposss colorectal cancer and may influence the onset of other tumors (Bl). MutSa and MutLa are conserved eukaryotic protein complexes necessary for MMR. MutSa is responsible for recognition of mismatches and small insertion/deletion loops (B 1--3), whereas MutLa harbors an endonuclease activity necessary for cleavage of the lesion-hearing DNA strand (B4, B5).
[0165] The challenges faced during MMR can be illustrated by considering that
Saccharomyces cerevisiae should incur only approximately two mismatches per cell cycle (B6). MutSa must find these rare lesions, MutLa must search for lesion-bound MutSa, and the lesion-bound MutSo/MutLa complex must search the flanking DNA for signals that distinguish the parental and daughter strands (Bl-3). Models describing how DNA -binding proteins search for specific targets include 3D diffusion (i.e., jumping), ID hopping, ID sliding, and intersegmental transfer; the latter three are categorized as facilitated diffusion because they allow target association rates exceeding limits imposed by 3D diffusion (B7- 10). New single-molecule and NMR techniques have led to resurgent interest in
understanding how proteins locate targets (B l .1-13), and using single-molecule imaging we previously demonstrated that MutSa and MutLa can undergo facilitated diffusion on undamaged DNA through IDsliding and ID hopping, respectively (B14, B 15). However, no single-molecule study as directly revealed proteins searching for and subsequently engaging a target site through I D diffusion (i.e., ID sliding or ID hopping) (7B), and the inability to visualize target capture also prevents investigation of questions regarding do wnstream MMR events.
[0166] Here we used nanofabricated DNA curtains and total internal reflection fluorescence microscopy (TIRFM) to watch MutSa and MutLa as they interact with mismatch-containing substrates, and we asked how these proteins conduct their respective target searches throughout the early stages of MMR. We show that MutSa can be targeted to mismatched bases through either I D sliding or 3D diffusion, that MutLa locates mismatch-bound MutSa through ID hopping and 3D intersite transfer, and that mismatch-bound MutSa and
MutSo/MutLa are released upon binding ATP and scan the flanking DNA for strand- discrimination signals by ID diffusion. While searching for lesions, the movement of MutSa is consistent with a model wherein the protein rotates to maintain constant register with the helical contour of the DNA (B14). However, once released from a mismatch, MutSa is altered so that mismatches no longer are recognized as targets, and the protein slides much more rapidly, suggesting its motion no longer is coupled to rotation around the DN A. Finally, we demonstrate that the mismatch-bound MutSo MutLa complex undergoes an ATP- dependent functional transition rendering it resistant to dissociation from damaged DNA. These data provide a detailed view of how diffusion can contribute to the early stages of MMR.
Results
[0167] Visualization of Mismatch Recognition by MutSa on DNA Curtains, We have used DNA curtains previously to investigate the behavior of MutSa and MutLa on undamaged DNA (B14, B .15). Here we sought to determine how MutSa and MutLa behave on substrates with defined mismatches. For these experiments, we engineered a λ-DNA (47,467 bp) harboring three tandem G/T mismatches separated from one another by 38 bp (Fig. 28; three mismatches were used to enhance efficiency of the assay). To make single- tethered DNA curtains, the DNA was anchored to a lipid bilayer on the surface of a microfluidic sample chamber, and hydrodynamic force was used to push the DNA into nanofabricated barriers (Fig. 21 A) (B16), The DNA was aligned along the barriers, enabling visualization of hundreds of molecules by TIRPM (F g. 21 B asid 21 C). At 150 mM NaCl and 1 mM ADP MutSa showed preferential binding to the mismatches, as evidenced by the "lines" of QD-MutSa that spanned the DNA curtains at the mismatches (Fig. 21B) and as also was evident from histograms of the MutSa binding distributions (Fig. 21D). MutSa disappeared when flow was interrupted and reappeared when flow was resumed, verifying that the proteins were bound to the DNA and were not stuck to the surface of the sample chamber (Fig. 21 B and 21C). MutSa exhibited a half- life of 9.6 ± 1.5 min while bound to the mismatches in the presence of 1 mM ADP (n = 60; Fig. 29).
[01 8] MutSa is Targeted to Mismatches Through a Combination of ID Sliding and 3Ϊ) Diffusion. Next, to determine how MutSa located the mismatches, we used double-tethered DNA curtains where the DNA was aligned and anchored by both ends, allowing the molecules to be viewed in the absence of buffer flow (Fig. 22A) (B17). MutSa was injected into the sample chamber, flow was terminated, and the proteins were observed in real time as they searched the DNA. At physiological ionic strength, MutSa located the mismatches either through ID sliding (42.5% of observed events; n = 17/40) (Fig. 22B, Fig. 30), with sliding observed over distances up to 3,7 μιη (~14.6 kbp), or through apparent 3D diffusion (57.5% of observed events: n = 23/40) (Fig. 22C). We defined target binding as MutSa being within three SDs of the target site for five consecutive frames; any submicroscopic ID sliding events below this resolution were scored as apparent 3D diffusion. Therefore, the 42.5% of events attributed to ID sliding represents the minimal fraction that can be described by this mechanism.
MutSa Scans DNA Flanking the Mismatch by ID Diffusion.
[0169] The mechanism by which MMR proteins search for strand- discrimination signals remains controversial (Bl-3, B18). Three proposed models are (i) translocation, in which MutSa uses the free energy released by ATP hydrolysis to move along DNA (B19, 20); (ii) the molecular-switch model, in which ATP binding triggers a conformational change enabling MutSa to scan DNA by ID diffusion (B21--23); and (hi) static transactivation, in which ATP -binding allows stationary MutSa to search for distal strand-discrimination signals through DNA looping (Fig. 23A) (B24-26). Each model makes unique predictions as to how MutSa should behave in the DNA curtain assay: Translocation predicts that MutSa should undergo ATP hydrolysis-dependent unidirectional motion; the molecular-switch model predicts that MutSa should exhibit ATP-binding-dependent ID diffusion; and static transactivation predicts that MutSa should remain at the mismatch while awaiting looping- mediated interactions with flanking DNA.
[0178] To distinguish among the models, we used double-tethered DNA curtains to investigate what happened when mismatch- bound MutSa was chased with ATP. When mismatch-bound MutSa was chased with ATP at physiological ionic strength, most proteins (85%; n 60/71) were released from the mismatches after a brief delay (tl/2 14.6 s; n === 60), consistent with the 8.0 ± 2.7 s half- life reported for ATP-triggered release from G/T mismatches in biochemical studies (B23), and the remaining 15% (n = 1 1 71) remained stationary and did not respond to ATP. Of those that were released upon injection of ATP, 15% fn = 9/60) directly dissociated from the DNA with no evident sliding, whereas the remaining 85% (n = 51/60) were released from the mismatch and scanned the flanking DNA through ID diffusion (Fig. 23B, Fig. 31). Analysis of the mean squared displacement revealed a mean ID diffusion coefficient (DI D) of 0,057 ± 0.064 μηχ2 s~-l fn === 25) after ATP-triggered mismatch release. Experiments conducted at 50 mM NaCi revealed significantly less ATP-dependent release of MutSa from the lesions: 78% of the proteins remained stationary upon ATP injection, and the remaining proteins either diffused (18%) or directly dissociated (4%) from the lesions fn = 78; Fig. 32), indicating that ATP-triggered release and ID diffusion were favored at physiological ionic strength. Our results also revealed changes in the lifetime of the complexes, as has been reported for Taq MutS (B27). As demonstrated above, MutSa can scan DNA for lesions by ID diffusion, and we have shown previously that at 150 mM NaCi the lifetime of Mutsa while scanning DNA before lesion recognition is 20 ± 4 s (14). In contrast, quantitation of the MutSa diffusion trajectories after lesion release yielded a lower bound for the lifetime, tl/2 > 198 ± 23.4 s (Fig. 23; Fig. 31). MutSa also diffused along the DNA when chased with ATPyS (62% diffused, 23% dissociated, and 15% remained stationary; n = 26), indicating that nucleotide binding was sufficient to trigger mismatch release (Fig. 23C, Fig. 31). These findings support the molecular-switch model in which MutSa scans the flanking DNA by ID diffusion (B21).
[0171] MutSa Must, The highly redundant nature of diffusion poses a conceptually important problem: Once MutSa is released from a mismatch and starts scanning the flanking DN A by ID diffusion, it must not reengage the mismatch; otherwise it could become nonproductively trapped while undergoing reiterative cycles of mismatch binding and release. This problem can be illustrated by considering that when MutSa takes a single diffusive step away from the mismatch, it has a 50% probability of re-encountering the mismatch on the very next step, and the average number of times MutSa would re-encounter the mismatch is equal to N--1, where N is the distance in 1 -bp diffusion steps between the mismatch and the nearest strand discrimination signal (Fig. 33). These considerations suggest that MutSa must be functionally distinct after ATP-triggered release from a mismatch to avoid redundant lesion recognition.
[0172] 'TO evaluate this hypothesis, we assessed the efficiency of lesion recognition by MutSa before and after ATP-triggered release from the mismatches. Of the MutSa molecules that recognized the lesions through a ID search, none diffused past the lesions (n = 0/17) (Fig. 22B, Fig. 30), indicating that initial target recognition must be efficient. Moreover, when MutS spontaneously escaped from the mismatches (i.e., ATP-independent release), the proteins typically diffused a short distance along the DNA and then quickly rebound to the lesions (n = 101 escapes, of which 97 resulted in rebinding to the lesions without bypass) (Fig. 3D, Fig. 34). Considered together, these data show that before the addition of ATP, MutSa stopped moving upon encountering the lesions during ID searches in 97% of all observed cases (n = 1 14/118), with only 3% of the observed encounters leading to diffusion past the lesions, in contrast, after ATP- (or ATPyS)-triggered mismatch release, we observed a total of 325 independent, microscopically observed bypass events (n = 51 proteins, corresponding to an a verage of approximately six bypasses per protein), none of which led to detectable rebinding; these values represent the lower bounds for the number of potential bypass events, because the proteins often continued diffusing on the DNA beyond the duration of our observations. Notably, each microscopically observed bypass reflects ~ 1 ,000 submicroscopic encounters with the lesions; these encounters are undetectable as independent events given current resolution limits. These results indicate MutSa no longer recognizes mismatches as viable targets after ATP-triggered release.
[0173] MutSa Diffa^ The mean D ID of
MutSa before lesion recognition was 0.009 ± 0.01 1 μηι2 s- 1 (at 150 mM NaCl; n = 25) (B 14), but there was a 6.3-fold increase (Student t test, P = 1.5 χ 10-9) in this value to 0.057 ± 0.064 μηι2 s-1 (n = 25) after ATP-mediated release from the mismatches. Before lesion recognition, the diffusion coefficient of MutSa is consistent with ID sliding wherein lateral motion of the protein is coupled to obligatory rotation as it tracks the helical pitch of the DNA (B 14). However, after lesion recognition, the mean diffusion coefficient of MutSa exceeded the theoretical threshold for rotation-coupled ID diffusion (Drot,theor = 0.024 μιη2 s-1) (B 14) and was physically incompatible with motion involving an obligatory rotational component (B 12, B28-3Q). Structures of MutS and MutSa reveal the proteins are in intimate contact with DNA along an interface that completely encircles the duplex (B24, 1331-33). This configuration could accommodate ID sliding or could allow MutSa to make very small hops on the DNA as a closed ring, provided there was sufficient space between the protein and DNA surfaces to allow transient penetration of ions that could screen the charged surfaces; we cannot yet distinguish between these two possibilities experimentally. However, we can conclude that the rapid mo v ement of MutSa after mismatch release is most consistent with ID diffusion (hopping or sliding) in the absence of an obligatory rotational component. A similar conclusion was obtained recently from single-molecule measurements of Taq MutS bound to mismatch-containing DNA (B34), suggesting that transitions from rotation-coupled to rotation-uncoupled diffusion upon lesion recognition and ATP-binding may be a common feature of the MutS family of protems.
[0174] Colocaiization of MutLa with Mismatch-Bound MutSa. We next asked whether QD-tagged MutLa colocalized with mismatch-bound MutSa on the single-tethered DNA curtains (Fig. 24). We have shown previously that MutLa binds DNA, but rather than remaining stationary, most MutLa (>95%) diffuses rapidly along the DNA by a ID hopping mechanism (B 15). We detected no colocaiization of MutLa and MutSa on DNA that lacked mismatches (n > 2,000; see below), and MutLa alone did not bind the G/T mismatches in the absence of MutSa but instead diffused past the lesions without stopping (Fig. 24E).
However, when MutSa was bound to the mismatch, MutLa stopped diffusing at lesion-bound MutSa (Fig, 24 A and 24B). In the absence of ATP, both proteins remained at the lesions, with MutLa exhibiting a half- life of 7.8 ± 0.4 mm (n = 65) when colocalized with mismatch- bound MutSa (Fig. 29). Mismatch colocalization of MutLa was observed with both QD- tagged MutSa and untagged MutSa (Fig. 24 C and 24D). We conclude that MutLa was targeted specifically to mismatch-bound MutSa,
[0175] MutLa Js Targ^
We next watched MutLa as it searched for mismatch- bound MutSa on double -tethered DNA curtains. MutLa could locate mismatch-bound MutSa by a ID-hopping mechanism (55% of observed events; n = 33/60) or by apparent 3D diffusion (45% of observed events; n = 27/60) (Fig. 25A, Fig. 35); the percentage of events attributed to ID diffusion represents the minimal fraction that occurred through this mechanism, because the apparent 3D targeting events also could reflect submicroscopic 1 D diffusion over distances less than our spatial resolution of ±30 nm. Control experiments verified that MutLa did not stop at mismatches in the absence of MutSa (n > 2,000) (Fig. 25B). We conclude that MutLa can locate mismatch- bound MutSa through ID hopping or 3D diffusion. Notably, when MutSa and MutLa collided while diffusing at sites other than a mismatch, they showed no evidence of establishing stable interactions (n > 2,000) (Fig. 25C). This outcome is remarkable given that the local concentration of two proteins that encounter one another while undergoing a ID search on the same DNA molecule is infinitely high. We conclude that the conformational context of MutSa is critical for controlling protein-protein interactions with MutLa and that the two complexes do not interact stably with one another while undergoing ID diffusion in the absence of a mismatch despite being forced into close physical proximity through association with the same DNA molecule,
[0176] MutSq7MutLq Complex Scans DNA Flanking the Mismatch by ID Sliding. We next asked whether the MutSa MutLa complex also scanned the flanking DNA by ID diffusion. As shown in Fig. 25D, Fig. 36, in assays with double-tethered DNA curtains, ATP provoked release of MutSa/MutLa from the mismatches at physiological salt concentrations (150 niM NaCl), Most complexes then scanned flanking DNA by I D diffusion (63% of observed events; n === 22/35), and all of those that scanned DNA by ID diffusion remained intact as MutSa/MutLa complexes (n = 22/22), demonstrating that MutLa and MutSa remain associated with one another as they scan the flanking DNA by ID diffusion, even though they do not interact while bound to duplex DNA before lesion recognition by MutSa. Smaller populations dissociated from the DNA upon injection of ATP (23%; n = 8/35) or remained at the ;mismatches (14%: n = 5/35), Following ATP-triggered mismatch release, the
MutSa/MutLa complexes that underwent ID diffusion remained on the DNA for up to several hundred seconds with a lower bound of ti/2 > 267.6 ± 62.1 s (Fig. 36). The complexes also repeatedly bypass the mismatches, whereas they remain stably bound to the mismatches in reactions containing only ADP (Fig. 251>). We conclude that the behavior of the MutSa/MutLa complex is consistent with the molecular-switch model (B21).
Furthermore, analysis of the postlesion diffusion trajectories revealed a mean DID of 0.062 ± 0.095 ,ii,m2 s~T (n = 22) for MutSa/MutLa, which was ~6.9-fold larger than observed for MutSa alone before lesion recognition (DlD,MutSa = 0.009 ± 0.01 1 μιη s-1 before mismatch release, at 150 mM NaCl; Student t test, P < 1 x 10-9), providing additional evidence that ATP-triggered release from lesions modifies the diffusive characteristics of the MMR proteins. Our results suggest that MutSa must be functionally distinct before and after lesion recognition and that these changes persist even after the proteins diffuse away from the mismatches.
[0177] Increased Stability of MutSa/MutLa After ATP-Triggered Release from
Mismatches. We next tested the relative resistance of the different MMR protein complexes to challenge with high-salt buffers. In the DNA curtain assays, all the DNA-bound MutSa dissociated when chased with moderately high salt (300 mM NaCl) before (B 14), during, or after lesion recognition (n > 2,000). As previously shown, MutLa is more salt resistant than MutSa (B 15), but it also dissociated from DN A rapidly when challenged with higher salt (~ 100% dissociation at 0.7 M NaCl; n > 2,000). Mismatch-bound MutSa/MutLa also dissociated from DNA upon exposure to high salt, and in the presence of 1 mM ADP all the lesion-bound complexes (n = 40) dissociated from the DNA upon injection of 0.7 M NaCl. In contrast, after ATP-triggered release from the mismatch, MutSa/MutLa became resistant to increases in ionic strength, and 58% of the complexes (n = 18/31 ) remained bound to DNA and continued diffusing even after injection of buffer containing 0.7 M NaCl; the remaining 42% displayed a lifetime of 23.1 ± 8,3 s. We conclude that mismatch-bound MutSa/MutLa must undergo a structural change upon binding ATP, rendering the complex resistant to dissociation from the lesion-bearing DNA without altering its ability to scan the flanking duplex by ID diffusion.
[0178] intersite Transfer Between Juxtaposed DNA Molecules During MMR. It is widely hypothesized that DNA-biiiding proteins can use some forms of facilitated diffusion (e.g., jumping or intersegmental transfer) to undergo intersite transfer between juxtaposed DNA segments that otherwise are separated by long regions of linear sequence (B8, B9, B35), The potential for intersite transfer has profound implications for MMR. Before lesion recognition, either MutSa and/or MutLa might undergo mtersite transfer, which in principle could assist in their respective target searches. However, if the proteins were to undergo intersite transfer while scanning the flanking DNA after lesion recognition, then in a best- case scenario repair would fail because the MMR machinery would lose track of the damaged DNA. In a worst-case scenario, intersite transfer after lesion recognition might lead to inappropriate cleavage of undamaged DNA by the MutLa endonuclease.
[0179] To assess intersite transfer during MMR, we used nanofabricated chromium patterns situated at the convergence of two buffer channels to arrange molecules into crisscrosses, where intersections between molecules represented regions of locally high DNA.
concentration (F g, 26 A-D, Fig, 37). The time-averaged distance between the DNA substrates at the crisscross was—106 nm, which was calculated by treating the DNA as two harmonic chains suspended above a surface at the height of the barriers (20 nm), and the probability that they approach within <20 nm of one another to during a 100-ms window is near unity (Fig, 37). We reasoned that intersite transfer would be revealed as -90° turns in the protein diffusion trajectories at the DNA intersections. Accordingly, the diffusion trajectories of MutLa were punctuated by abrupt turns at the DN A intersections (Fig. 26 E- G). These results demonstrated that MutLa can undergo intersite transfer, with an observed probability of P = 0.188 (n = 32) for transferring from one DNA to another during each encounter with the intersections. This value represents a lower bound for the frequency of intersite transfer, because these events could be identified unambiguously only if the proteins diffused far enough away from the region encompassing the DNA intersection to verify whether they were bound to the fsrst or second DN A molecule (Fig. 26G). This finding suggests that MutLa would be able to search for lesion-bound MutSa within the 3D volume of the eukaryotic nucleus through a combination of ID hopping and intersite transfer. Our previous experiments suggest that MutLa travels while wrapped around DNA in a large ring- like configuration (B 15). If so, then this ring would have to open transiently to allow intersite transfer. In contrast. Mini homodimers do not appear to form rings (B 15) and therefore would be expected to transfer more readily between two DNA molecules. In agreement with this hypothesis, Mlhl alone also switched between DNA molecules and did so approximately twofold more efficiently (P = 0.333; n = 39) than MutLa. In contrast to MutLa, MutSa did not transfer between molecules readily before lesion binding (P = 0.067; n = 30) (Fig. 26H) or after ATP-triggered lesion release (P = 0.038; n = 130) (Fig. 26Γ). The MutS /MutL complex also remained confined to the same DNA after ATP-triggered lesion release (P = 0.052; n = 97) (Fig. 26J), indicating that the ability of MutLa to undergo intersite transfer was suppressed upon association with MutSa. These results, together with the finding that MutSa/MutLa is resistant to NaCl-induced dissociation after lesion release, indicate that MutLa is functionally altered within the context of the MutSa/MutLa complex, ensuring that the complex remains confined to the damaged DNA while scanning the flanking sequences.
Discussion
[0188] Here we pro vide direct visual observ ation of proteins searching for and subsequently engaging target sites through facilitated diffusion mechanisms on single molecules of DNA. Our work also illustrates how transitions between different modes of diffusion are regulated during the early stages of MM through a combination of lesion recognition, protein-protein association, and nucleotide cofaciors. This work also suggests how facilitated diffusion might contribute to mismatch repair in vivo and yields insights into the structural changes necessary to accommodate the distinct behaviors of MutSa, MutLa, and the MutSa/MutLa complex at different stages of MMR.
[0181] We have shown that MutSa can be targeted to mismatches in vitro by ID sliding or through apparent 3D diffusion (Fig. 27A). Importantly, we previously demonstrated that sliding of MutSa is obstructed by nucleosomes (B 15), consistent with the notion that I D sliding would be problematic for searches in crowded environments (B8, B 10, B36-38). We also aniicipate that mismatch binding through 3D diffusion would be difficult if the mismatch were occluded by a nucleosome (B39). These observations imply that any DNA. searched by MutSa must be kept free of obstructions. This requirement could be accomplished if the MMR proteins were coupled to the DNA replication machinery. In support of this model, recent work has demonstrated that MutSa is physically associated with replication factories and that 10-15% of mismatch repair can be attributed to replication fork-associated MutSa (B40). Together, these results suggest the possibility that the replisome might clear DNA of any potential obstacles that otherwise could impair lesion targeting, perhaps enabling MutSa to slide along the newly synthesized naked DNA while surveying for lesions at the rear of the progressing fork. Our finding that MutSa also can be targeted to lesions through a 3D mechanism (or submicroscopic ID sliding over distances less than 30 nm) might explain how lesions are located for the 85-90% of repair events that do not involve direct association of MutSa with the replisomes (B40).
[0182] MutLa can search for lesion-bound MutSa through a combination of ID hopping, 3D diffusion, and intersite transfer (Fig. 27A), and we anticipate that this search could occur on chromatin because MutLa can diffuse readily past nucleosomes (B15). After assembling at a lesion, the MutSa/MutLa complex is released upon binding ATP and scans the flanking DNA. by ID diffusion. During this search, MutSa/MutLa is rendered incapable of intersite transfer and becomes highly resistant to dissociation, which could ensure that the MutSa/MutLa complex remained confined to the damaged DN A. These properties are established through a sequence of events including lesion recognition by MutSa and establishment of mismatch- dependent protein --protein interactions between MutSa and MutLa followed by ATP- triggered release of MutSa/MutLa from the lesion. This strict hierarchy would enforce tight regulatory control over the formation of higher-order MMR protein intermediates, thereby preventing inappropriate assembly of MutSa/MutLa complexes at sites other than DNA lesions. Bacterial MutL and eukaryotic MutLa both undergo ATP-driven conformational changes consistent with the formation of closed-ring architectures mediated through dimerization of the N-terminal domains (B41 , B42). Therefore, we hypothesize that MutLa within the context of the MutSa/MutLa complex engages the DNA in a closed-ring configuration after ATP -triggered mismatch release, rendering the complex resistant to dissociation from damaged DNA (Fig, 27A). The marked resistance of the MutSa/MutLa complex to dissociation from the DNA after ATP -triggered release from the mismatches also is consistent with the recent finding that Pmsl -4GFP foci do not turn over when the downstream stages of MMR are compromised (B40).
[0183] MutLa form oligomers comprised of ~ 1 1 ± 5 pro teins at sites of repair in vivo, as evidenced by the presence of Pmsl-4GFP foci (40). In our assays, -79% of all observed MutLa appeared consistent with single proteins based on quantum dot (QD) blinking. The predominance of single MutLa molecules in our study can be attributed to the fact that we were probing the early stages of MMR involving initial lesion recognition and assembly of the first MutSa/MutLa complex. In contrast, MutLa foci observed in vivo reflect later stages of the reaction (B40). Taken together these results suggest that MutLa oligomerization on MutSa occurs only after the first MutSa/MutLa complex is released from the lesion. This hypothesis also is supported by the observation that the msh6-Gl 14D mutant of MutSa, which is capable of forming a ternary complex with MutLa at mismatches but is defective for ATP-triggered release, does not support formation of detectable Pmsl-4GFP foci in vivo. Therefore, ATP -triggered release of the initial MutSa/MutLa complex from the lesions may represent an intermediate step preceding the assembly of higher-order MutLa oligomers.
[0184] MutSa alone or within the context of the MuiSa/MutLa complex displays dramatically altered diffusive characteristics before and after lesion recognition, likely reflecting distinct functional and structural states necessary to accommodate the different stages of MMR. Before mismatch recognition, MutSa diffuses through a mechanism consistent with ID sliding while tracking the helical pitch of the DNA (B 14, B l 5), but after ATP-triggered release from the mismatch, MutSa diffuses much more rapidly and no longer recognizes mismatches as binding targets. Inspection of available MutS and MutSa structures provides a potential explanation for these differences (Fig. 27B) (B24, B31 , B33). MutSa completely encircles DNA., and domain I of Msh.6 lies within the major groove, allowing a conserved phenylalanine and glutamic acid to engage the mismatch; all remaining contacts with the DNA lie along the phosphate backbone (B24, B31 , B33). This configuration of Msh6 domain I would impose steric constraints requiring MutSa to track the helical pitch of the DN during any ID diffusion (i.e., just as a bolt tracks the helical threads of a screw). Retraction of domain I from the major groove would be necessary and sufficient to allo MutSa to diffuse as a closed ring on DNA without obligatory rotation (Fig, 27B) and also is consistent with the recent observation that domain 1 of Taq MutS undergoes large structural changes upon being released from mismatches based upon single-pair fluorescence resonance measurements of energy transfer (B43). Therefore, we hypothesize that domain I of Msh6 is inserted into the major groove before lesion recognition (as necessary to engage a mismatch and consistent with a rotation-coupled ID diffusion) and remains within the major groove upon binding the lesion (as shown in the crystal structures) but then is retracted from the major groove after ATP-triggered release from the mismaicli (consistent with more rapid ID diffusion observed after lesion recognition). Retraction of Msh6 domain 1 from the major groove also would explain how MutSa and MutSa/MutLa are released from the mismatch upon binding ATP and how they avoid rebinding the mismatch while searching for strand- discrimination signals.
Materials and Methods
[0185] Experiments were performed with a custom-built TIRF microscope and
nanofabricated DNA curtains, as previously described (B 14-17). Images were acquired at 5- 10 Hz using NIS-EJements software (Nikon) and were saved as uncompressed, 16-bit TIFF files. Experiments requiring two-color detection used a Dual-View image- plitting device (Optical Insights) equipped with a dichroic mirror (630 DCXR; Chroma Technologies). Image alignment of the two channels was performed during postprocessing [ImageJ software (National Institutes of Health) with the "Align RGB Planes" plug-in] using the dark signal from the nanofabricated DNA bamers as a reference, and aligned images were pseudocolored and digitally recombined in ImageJ. Before use, MutSa was affinity purified after being labeled with QDs, thus eliminating any QDs not bound by active MutSa before injection of the sample for single-molecule imaging. Unless otherwise stated, reactions were performed as previously described (B14, B15), except that all buffers contained either 100 or 150 mM NaCl. In brief, all buffers contained 20 mM Tris (pH 7.8), 1 mM MgCl2, 1 mM DTP, and 4 mg/mL BSA, along with the indicated concentration of NaCl. Unless otherwise stated, standard reaction conditions for looking at lesion binding all contained 1 mM ADP. In the nucleotide chase experiments, ADP was replaced by injecting 1 mM ATP or 1 mM ATPyS, as specified. Finally, YQYG1 was omitted in most reactions because its presence inhibited ATP -triggered release of MutSa from the mismatches.
[0186] I. Protein purification and labeling. MutSa and MutLa were purified and labeled as described (1, 2). MutSa labeling was performed at a 6: 1 QD:Protein ratio (300 nm Qdot : 50 nM protein) in PBS containing 0.2 mg ml-1 BSA and incubated for 20 minutes at 4°C. The protein-QD conjugates were then purified to remove unconjugated QDs. For this, biotinylated λ-DNA (300 pM) was incubated with streptavidin magnetic beads (5 mg; Roche) for 20-min at 20°C. The MutSa-QD conjugation reaction was added to the beads, the PBS solution was diluted to 1/5* concentration with 10 mM Tris (pH 7.8) solution, and the reaction was incubated for 10-min at 4°C. Beads were washed twice with 10 mM Tris (pH 7.8), 20 mM NaCl, i mM MgCl, 1 mM DTT, and 0.2 mg ml- 1 BSA. QD-MutSa was eluted with 10 mM Tris (pH 7.8), 300 mM NaCl, 1 mM MgCl, 1 mM DTT, and 0.2 mg ml- 1 BSA. [0187] 2. DNA substrates and cloning. To create λ-DNA with 3 tandem G/T mismatches a 151 bp DNA fragment containing unique restriction and nickase sites was ligated between the Nhel and Xhol sites (Fig. 28). Insert-containing DNA was packaged using MaxPlax λ packing extracts (Epicenter), according to the manufacturer's instructions. Phage stocks were prepared by standard plate lysis, and used to infect 1 ml of E. coli LE392MP cells (OD 0.1) at 37°C for 20 minutes. Infected cells were used to inoculate a 200 ml liquid culture in LB and 10 mM MgS04, which was grown overnight at 39CC. 10 mi of chloroform was added and the culture was shaken for 10 minutes. The iysed culture was incubated with Dnase I and RNase (1 ug ml-1 each) at 20°C for 1 hour. SDS (0.5%), EDTA (50 mM) and proteinase K (5 mg) were added to the Iysed culture, and incubated at 20°C for 1 hour, followed by phenol chloroform extraction and isopropanol precipitation. Purified DNA was resuspended in TE, and end-labeled with oligonucleotides, as described (2). To make mismatches, the end- labeled D A was treated with the Nt.BspQI (NEB), mixed with a 1000-fold molar excess of an oligonucleotide complementary to the region encompassed by the nickase sites, and then heated and cooled. Successful insertion was assessed by comparing restriction digests with either Ncol or Swal, and alkaline gel electrophoresis verified the nicks were sealed by T4 DNA ligase.
[0188] 3. Single molecule reaction conditions. Unless otherwise stated, reactions were performed as described (I, 2), with the exception that all buffers contained either 100 or 150 mM NaCL unless otherwise specified, and the DNA substrates for both single- and double- tethered assays exhibited a mean extended contour length of - 0.75. In brief, buffers contained 20 mM Tris [pti 7.8], 1 mM MgC12, 1 mM DTT, and 4 mg ml-1 BSA, along with the indicated concentrations ofNaCl. Unless otherwise stated, standard reaction conditions for looking at lesion binding all contained I mM ADP in the buffers. In the nucleotide chase experiments, the 1 mM ADP was replaced by injecting 1 mM ATP or 1 mM ATPyS, as specified. Unless otherwise, indicated YOYOl was omitted. Please note that all reported results, and conclusions derived from these results, are based upon at least three independent experimental measurements.
[0189] 4. Binding site distribution measurements on single-tethered DNA curtains. Binding distributions of MutSa and MutSot/MutLa complexes were made on single-tethered DNA. curtains. Data was divided into bins based on the variation of QDs attached to DNA curtains at single fixed positions (3). The MutLa only distributions were obtained through the same analysis procedure, but used double-tethered DNA curtains (with no buffer flow), because MutLct diffuses rapidly along DN A and is quickly pushed off of single-tethered DNA curtains if flow is applied (2).
[0198] Because MutLa does not remain preferent lly bound to any positions on the DNA (2), the distribution histogram of MutLa represents the instantaneous positions for all molecules in the observed population and the flat distribution reflects the absence of preferred binding sites. Sampling error was determined by the Bootstrap method (4), and the 70% confidence intervals are presented.
[0191] 5. MutSa and MutLa target search experiments, These experiments were conducted with double-tethered curtains in 40 mM Tris (pH 7.8), 1 mM DTT, 150 mM NaCl, 1 mM MgC12, 1 mM ADP, and 0.2 mg ml-1 BSA. For mismatch search experiments, QD-MutSa (1 -5 iiM) was injected at a flow rate of 5-20 μΐ/min, and flow was terminated upon visual confirmation that the proteins had begun entering the sample chamber. For the MutSa- mismatch search experiments, MutSa was pre-bound to the mismatch in buffer containing 1 mM ADP, and free proteins were flushed from the sample chamber. QD-MutLa (5-20 iiM) was injected at a flow rate of 5-20 μΐ mirt-1, and flow was terminated upon visual confirmation that the proteins had entered the sample chamber, A protein was categorized as having undergone a I D search only if there were at least two frames at the beginning of the diffusion trajectory that were at least three standard deviations away from the location of the mismatch. If the proteins initially appeared within this resolution limit, then they were categorized has having undergone an apparent 3D binding event.
[0192] 6. Mismatch-bound MutSa and MutSa/MutL lifetime measurements on single- tethered DNA curtains. QD-tagged proteins were bound to mismatch-bearing DNA in single- tethered curtains in buffer containing 20 mM Tris [pH 7.8], 50 mM NaCl, 1 mM ADP, 1 mM MgC12, 1 mM DTT, and 4 mg ml- 1 BSA, The NaCl concentration was raised to 150 mM and the images collected at defined intervals and the total number of proteins remaining bound to the lesions was plotted as a function of time. Resulting data were fit to single exponential curves.
[0193] 7. Nucleotide chase experiments on double-tethered DNA curtains. ATP chase experiments for were performed with double -tethered DNA. curtains, so buffer flow could be terminated after ATP injection, or maintained at a very low constant rate (5-20 μ,Ι/min) such that the diffusive properties of the DNA-bound proteins were not perturbed. The delay time of the injection system was pre-calibrated by using the microscope to monitor the background fluorescence signal following injection of fluorescein. QD-tagged proteins (MuiScx or the MutSa/MutL complex, as indicated) were first bound to mismatch-bearing DNA molecules in buffer containing 20 mM Tris [pH 7.8], 150 mM NaCL 1 mM ADP, 1 niM MgC12, 1 mM DTT, and 4 mg ml-1 BSA, and the reactions were chased with the same buffer but with the 1 mM ADP replaced with 1 mM ATP (or 1 mM ATPyS, as indicated). Videos were continually recorded at 5- or 10-Hz, and the data manually segregated into populations that either remained stationary, directly dissociated from the DNA, or began diffusing along the DNA.
[0194] 8. Protein tracking and diffusion coefficients. Diffusion coefficients represent the mean ± standard deviation of >25 particle tracking measurements and were calculated from MSD plots as described (1, 2). All diffusion coefficients were based on measurements of protein complexes thai exhibited QD blinking (see below); the reason these measurements are confined to blinking QDs is to help ensure that the reported diffusion coefficients reflect a homogeneous population of molecules all with the same hydrodynamic radii, and minimize variance associated with the reported values is due to heterogeneity in the oligomeric states of the complexes being measured (1, 2). The spatial resolution of our tracking data is limited by Brownian fluctuations of the D A. 'TO determine the scale of this noise, QDs wrere attached to the DN A through a digoxignein linker. The λ-DN A for these curtains was engineered to include an oligonucleotide insertion containing digoxigenin and the curtains were incubated with QDs labeled with anti- digoxigenin antibodies, and the standard deviation of the QDs was calculated for 12.0 molecules over 4 different flowcells. The average of the 120 measured standard deviations, representing the average longitudinal fluctuations of the DNA molecules, was 30-nanometers (~12G-bp). Note that observed diffusion coefficients displayed a log- normal distribution, as expected given that the energy landscape on the DNA experienced by the diffusing proteins is normally distributed (1 , 2). A Student's t-test can only evaluate normal distributions, therefore when comparing two different diffusion coefficients the p- values were obtained from the natural logs (In) of the corresponding diffusion coefficients, as described (2).
[0195] Individual QDs blink, and this well-known phenomenon enables one to distinguish single vs. multiple QDs (2, 5-7). In our experiments a non-blinking QD signal could arise from either QD aggregation, or protein aggregation/oligomerization; note that we cannot accurately determine the numbers of proteins that might be present in these larger complexes because as many as 50% of the QDs can be dark (i.e.: non- fluorescent), and the emission intensity of individual QDs can also span a broad range (6). From our experiments, >95% of ail observed molecules of QDMutSa (either diffusing along the DNA or bound to the mismatches) exhibited QD blinking, as would be expected from single QD-tagged proteins, and as previously reported, -74% of all observed QD-MutLa exhibited blinking before binding to MutSa.(2) Of the remaining -26% of QD-MutLa that did not blink, these molecules exhibited signal intensities that could be consistent with larger oligomers of MutLa, and many of these (52%) also diffused o DN A, as previously reported (2). For the MutSa only binding distribution measurements, we only measured the locations of blinking QDs, which again reflected >95% of the total population of molecules. For the MutLa only binding distribution measurements, we only measured the locations of blinking QDs, which again reflected -74% of the total population. For the MutLa binding distribution
measurements made in the presence of MutSa (either QD-tagged MutSa or untagged MutSa), we did not segregate the MutLa data based on QD blinking behavior, and the resulting distribution histograms were representative of the entire population of observed proteins. The reason we did not segregate this MutLa binding site dis tribution data into blinking and nonblmking populations is because once the first MutSa/MutLa complex has formed at a lesion, other upstream MutLa molecules are pushed into this stationary complex due to the force exerted on the proteins by the flowing buffer (see also Gorman et ah, 2010),(2) and these incoming proteins tend to stack up at the lesions because they cannot be pushed passed the lesion bound complex. We do not interpret this MutLa "stacking behavior" seen in the single- tethered curtain assays as oligomerization of MutLa at lesion-bound MutSa, because when the same experiments are conducted on double-tethered curtains (in the absence of buffer flow) additional diffusing molecules of MutLa do not appear to oligomerize after formation of the initial lesion-bound MutSa/MutLa complex, and under these conditions 90% of all observed lesion-bound complexes display blinking of the QD-MutLa (see below). For the MutSa target search, MutSa only nucleotide chase experiments, and spontaneous mismatch escape experiments, >95% of all QD-MutSa molecules displayed blinking, and only these blinking molecules were used for analysis. For the MutLa target search experiments, -74% of all QD-MutLa scanning the DNA also exhibited blinking behavior, and we confined our analysis to these proteins, although as previously reported the nonb linking fraction of QD-MutLa also diffused along the DNA by ID diffusion, and we saw no obvious differences in the behavior of blinking versus nonblmking QDMutLa complexes. For the MutSa/MutLa complex ATP chase experiments, 100% of the QDMutSa (N=39/39) and 90% of the QD-MutLa (N=35/39) within the mismatch-bound MutSa/MutLa complexes exhibited blinking behavior consistent with single protems. The remaining 10% of the MutSa/MutLa complexes displayed blinking by QD-MutLa (N=4 39); of these four nonb linking complexes, three were released from the mismatches upon the injection of ATP and began scanning the flanking DNA by ID diffusion, and the fourth remained stationary at the lesion. Analysis of mtersite transfer using the crisscrossed DNA curtains was confined only to those molecules of QD-MutSa, QD-MutLa, or MutSa/QD-MutLa complexes that exhibited QD blinking.
[0196] 9. Spontaneous mismatch release by MutSa. QD-tagged proteins were bound to mismatehbearing DNA in double-tethered curtains in buffer containing 20 mM Tris [pH 7.8], 50 mM NaCl, 1 mM ADP, 1 mM MgC12, i mM DTT, and 4 mg ml-1 BSA. The NaCl concentration was then raised to 150 mM to promote dissociation of nonspecifieally bound proteins, and the remaining mismatch-bound molecules of MutSa wrere monitored continuously for a period of 10-15 minutes at an acquisition of 5 frames per second (200- msec integration). The QD-MuiSa signals were then tracked, and escape from the lesions was defined as three contiguous frames outside the 3 standard deviations from the tracking noise: the probability of falsely identifying an escape event is on the order of -Λ 0-8. Of 76 proteins observed under these conditions, 53 remained bound to the lesions and did not escape the lesions (within experimental resolution as defined by 3 standard deviations from the tracking noise). The remaining 23 proteins showed clear ID excursions away from the lesions, and of these 13 were analyzed by particle tracking, yielding a total of 95 lesion escape/return events; the remaining 10 proteins were not tracked because they collided with other nonspecifieally bound proteins on the DNA. Analysis of the 95 excursions yielded a mean observed excursion distance and time of i!"#=3,134-bp and t! "#=30.7-seconds, and these values were in good agreement with theoretical expectations.
[0197] 10. Protein dissociation with high salt chases. Proteins were bound to the DNA, and the number of proteins present was determined from the resulting images of the DNA. Video acquisition was then terminated, and 700-μ1 of reaction buffer containing either 300 mM NaCl (for MutSa) or 700 mM NaCl (for MuiLa only or the MutSa/MutLa complex) was then flushed through the sample chamber at a flow rate of 0.2 ml min-1. Images were collected from the same field, and used to determine the number of proteins that remained on the DNA after the high salt washes.
[0198] 11. Intersite transfer assays. Crisscrossed curtains were made as described above for doubletethered curtains (8), with the exception that the DNA was sequentially injected from the two separate inlet channels (Fig. 26 & Fig. 37). The distance between the two DNA molecules was estimated by treating them as harmonic chains suspended above a reflective surface at a height equivalent to that of the nanofabricated barriers (Fig, 37).
[0199] Proteins were tracked in the absence of buffer flow. The tracking data was used to define the axes of the crisscrossed DNA molecules and the position of the intersection, and verified by staining with YOYOl . In cases where proteins showed obvious diffusion on the second DNA molecule, the tracking data was fit to two lines using a least squares algorithm. When there was no apparent intersite transfer onto the second D , the positions of the two DNA molecules were defined separately. The first DNA molecule was defined by fitting the protein tracking data to a line; the second DNA molecule was defined by fitting the trace of another protein molecule that diffused on the second DNA molecule. The position of the intersection was calculated from the location of the lines. Tracking data were then centered so that the mtersection was at r0(0, 0). Each data point was assigned to the intersection or one of the four arms of the intersection as follows:
1. Calculate the uncertainty of the intersection σιΌ based on the uncertainty of the two lines of DNA molecules.
2. Calculate the distance of each data point of the protein ri to each arm dl , d2, d3, and d4,
3. Given the position and uncertainty of the intersection (rO, crO) and the data point (ri, ari) (στΐ is the tracking precision of the protein), and a confidence level, judge if the data point is within the intersection. If so, assign the arbitrar '- designation '0' to this data point.
4. If the data point is outside the intersection area, assign it to be on one of the four DNA "arms" based the minimal value among dl, d2, d3, and d4. Assign ' 1 '2', '3' or '4' accordingly to this data point. 102ΘΘ] Intersite transfer was quantified based on the above assignment. If the tracking and assignment yielded the numeric sequence (11 1 1 1 100220333301 1 1), where 1-3 correspond to different "arms" of the crisscrossed DNA relative to the DNA intersection, which is assigned as 0, we know that the protein underwent two intersite transfer events: from 1 to 0 to 2, and from 2 to 0 to 3, where "arm" 1 and 3 are on one DNA molecule and "arm" 2 and 4 are on another DNA molecule. Fig. 26g-j show color-coded examples of these assignments.
[0201] 12. Origin crossings by a random walker. Fig. 33 shows the results from Monte Carlo simulations of a freeiv diffusing molecule with equally spaced absorbing boundaries (e.g.: nicks flanking either side of a mismatch). Boundary distances ranged from 2 to 190 steps away from the origin in the simulations, and 100,000 traces were generated for each boundary distance by selecting forward and backward steps with equal probability. The number of times the origin was encountered before the boundaries was recorded as well as the average number of steps necessary to encounter a boundary. As expected with a freely diffusing molecule, the average number of steps needed to travel a distance of N steps away was N2 (Fig. 33a). Notably, the simulated traces also reveal that a molecule with equally spaced boundaries N steps away will on average cross the origin N-l times (Fig. 33b).
[0202] This relationship can be further demonstrated by estimating the number of returns that occur in a given time (or steps), subject to particular boundary conditions. To estimate these values, we will use the conditional splitting probabilities, £!,! , the conditional mean first passage time (CMFPT), τ!,! , and a discrete crossing statistic, CI L . £!,! x is the probability that a walker starting at position x reaches a site located n away before reaching an opposing site m away. The splitting probabilities satisfy Laplace's equation, V! ! £!,! x = 0, subject to the boundary conditions, £!,! ??. = ! , £!,! m = 0 (8), which when solved yields:
£n,m x = (m - x)/(m - n)
Similarly, !,! x is the average time necessary for a walker starting at position x to reach a site located n away before reaching an opposing site m away, τ!,! x can be obtained from Poisson's equation, DWl ! £!,! x τ\,\ x - -£!,! , subject to the boundary conditions, £!,! m τΙ,Ι m = 0, £!,! n τ\,\ n = 0, which yields:
= fr [(2m - n - x)(x - n)j [02Θ3] As an example, consider a walker beginning at the origin in between two absorbing boundaries located a distance N away. Once the walker takes a step away from the origin towards ±N it has a probability of N! ! to reach ±N, i.e. m - 0, n = ±N, and x = ±1. That is to say, the walker will, on average, fail to reach ±N, N -- 1 times before reaching it on the Nth try, and thereby return to the origin N ~- 1 times. Furthermore, during each failed excursion, a time N + 1 3D! elapses, and on the final successful excursion, N! + 2 6D ! elapses, for an overall time for this process Δί = Nl 2D!.
[02Θ4] The final metric for redundancy in random walks is the number of crossing events, CI , at a particular location rii, given a predetermined number of steps, L. For this calculation we describe the motion of a walker in discrete space and time. CI L is then related to the sum of the probability, P m, , of the walker occupying the mfh site at each step up to the nth step. P m, n is given by the binomial distribution (9):
Figure imgf000100_0001
[0205] If we are interested in the number of times a protein beginning at the origin will return to the origin in L steps, we simply sum P m, n from 0 to L, allowing n to take on only even integers, due to the fact that only paths of even length can return to the origin, as given by: evens
Figure imgf000100_0002
[02Θ6] This relationship holds via simulation, confirming that at walker will cross its origin CI L times during a walk of L steps (Fig. 27c). Furthermore, to count the number of crossings at a distal site a, CI , in L steps, we again count only the even paths to account for the fact that P a, n includes both a and -a.
[0207] 13. Spontaneous lesion escape and return. In modeling the transport process of MutSa on DNA, we first assume that the protein makes single base pair steps along the DNA, One of the consequences of probability density P ni, , is that the average displacement grows as the as the square root of the number of steps. That is to say, a protein takes on average 100 steps to reach an average distance of i Obp from its starting position. To connect this assumption with the experimental measurements, we impose an average stepping time, r! "#$, such that after n steps a time t = nr!"#$ has elapsed.
[0208] Next, we designate two types of quantities: microscopic (experimentally observable within our resolution limits) and sub-microscopic (inferred from microscopic quantities and statistical analysis). Consider the following example to highlight these two suppositions. During an experiment, two successive measurements of position, xl t! and x! tl , would yield a microscopic displacement d - x\ - xl and a microscopic time step At = t\ - tl . From the above, we would then infer the protein made d\ (or At/r! "#S) submicroscopic steps. Then using Einstein's relation for the mean squared displacement, we can relate τ! "#$ to the microscopically measured diffusion coefficient as, τ!"#$ = 2 ! !, when D and d\ are given in identical units.
[02Θ9] We define proteins that remain in the local environment of the lesion as
microscopically bound (MB) to the mismatches. The local environment is defined as extending three standard deviations on either side of the lesion (840 bp; determined by examination of proteins stably bound to DM A, a = 35nm « 140&p). A protein was considered to have released the mismatch when three consecutive position measurements fell outside of the MB region. The probability of this observation in the event that the protein remained at the lesion is (9 10! ! .
[0210] We examined 95 events were MutS spontaneously dissociated from the mismatches and diffused a short distance along the DNA before rebinding. For each of these events we measured two microscopic values: the excursion length, IV, which is defined as the maximum distance that the protein travels away from the lesions during an excursion, and T! " which corresponds to total time that the protein spends diffusing on the DNA before rebinding to the lesions (Fig. 34). Given these definitions we can predict the probability of an excursion of length N from the splitting probabilities above as P IV = N = SI, I 1 £!,! ! ! N . This is the product of the probability to reach a site N, but not N + 1 , before returning to the origin. Inserting the solutions for the splitting probabilities from above, gives: [0211] It is inaccurate to define an "average" theoretical length for this process because ihe distribution of excursions is divergent and can in principle extend to infinity. However, from the splitting probabilities we recognize that the probability of release from the mismatch and escaping the MB region without reencountering a lesion is only ~0.25%. Furthermore, of the proteins that escape the MB region, roughly 85% will return to ihe lesions before reaching a distance of 2.5-kb, and 90% of the total excursions are expected to be 10-bp or less (Fig. 34) Thus the majority of escape/return events should be occurring within the submicroscopic regime and thus would not be detectable in our microscopic observations.
[0212] We can also calculate the mean time associated with an excursion of length N by adding the CMFPT to reach N from the origin to the CMFPT to reach the origin from N; τ!" 1\" = Ν = τ\, 1 + ! ,! ! ! , which gives:
2N(N \ I ) I
6D}
[0213] This value includes time spent within the MB region, therefore, we use a modified excursion time, r! " N > 420 , which is the time of the excursion spent outside of the MB region given as:
Figure imgf000102_0001
[0214] From this, we can show that of the proteins that escape the MB region, -50% will spend 1 second or less away from the MB region and ihe same 85% from above will spend fewer than 13 seconds outside the MB region before rebinding the lesions. As expected, 90% of the total population will spend fewer than 250 ,us away from the lesion and would fall outside the microscopically observable regime.
[0215] 14. Apparent versus direct 3D binding, A protein was categorized as having undergone a ID search only if there were at least two frames at the beginning of the diffusion trajectory that were at least three standard deviations away from the location of the mismatch. If the proteins initially appeared within this resolution limit, then they were categorized as having undergone an apparent direct 3D binding event. Events categorized as apparent 3D binding could be attributed to ID sliding on a submicroscopic scale, and this is likely true for many of these events. Therefore we soughi io estimate whai fraction of e v ents ascribed to apparent 3D binding could be occurring through submicroscopic sliding and what fraction were likely to occur through direct 3D binding in the absence of submicroscopic ID diffusion. To estimate the distribution of possible lengths, P l\" , which result in target binding, we calculate the probability that a protein initially bound at site xl on the DNA finds a target (located at the origin) before dissociation. This probability, P xl , is the product of two densities: the probability that a protein, which binds the DN A at time, t = 0, is still bound at a later time t, exp -k\ , where k\ defines the lifetime of the nonspeeifically bound protein, and the probability thai a protein starting at site, xl, at time zero, encounters the mismatch for the first time at time t, j 0, t\xl, 0 ( 10).
[0216] Here j 0, t\xl , 0 , is the concentration flux to the origin, provided the origin
(mismatch) is an absorbing boundary, C 0, t = 0, and the end of the DNA at x = L acts as a reflecting boundary, ! ! " C x, t! ! != 0 (1 1). We then formulate P xl as the integral of the conditional proba bility of the protein finding the mismatch before dissociation over al l time, yielding:
Figure imgf000103_0001
[0217] At low concentrations, there is likely only one protein searching the DNA at a time. This protein has an equal probability of starting its search at any site on the DNA. Therefore, we calculate the probability of observing a particular encounter length, i! " as the product of the probability of landing at a particular site and P xl , Where the normalization expresses the fact that eventually (potentially after dissociation and rebinding) the target will be found.
Figure imgf000103_0002
[0218] We can then estimate the probabilities of apparent 3D binding events, P 3 , and observable ID binding events, P I d , as:
Figure imgf000104_0001
[0219] Based on current spatial resolution limits, and our categorization of target binding based on three standard deviations from the mismatch, these calculations predict that -80% of the experimentally observed events would have been categorized as occurring through ID sliding, and the remaining ~20% of observed e v ents would have been categorized as 3D collisions; these results would have been the same for both MutSa and MutLa because both proteins have relatively long lifetimes (>20 sec) on nonspecific DNA. The experimental data do not reflect these predicted distributions, rather 57.5% and 45% of the experimentally observed target binding events were categorized as direct 3D target binding for MutSa and MutLa, respectively. As a simple approximation, the discrepancy between the model and the experimental results suggest that -38% of MutSa targeting events and ~25% of MutLa targeting events can be explained by direct 3D binding in the absence of any submicroscopic ID sliding.
References for this Example
Bl . Modrich P (2006) Mechanisms in eukaryotic mismatch repair, J Biol Chem 281 : 30305-30309.
B2. Jiricny J (2006) The multifaceted mismatch-repair system. Nat Rev Mol Cell Biol
7:
335-346.
B3. Kunkel TA, Erie DA (2005) DNA mismatch repair. Annu Rev Biochem 74:681--
710.
B4. Kadyrov FA, Dzantiev L, Constantin N, Modrich P (2006) Endonucleolytic function of MuiLalpha in human mismatch repair. Cell 126:297-308, B5. Kadyrov FA, et al. (2.007) Saccharomyces cerevisiae MutLalp a is a mismatch repair endonuciease, J Biol Chem 282:37181-37190.
B6. Kunkel TA (2004) DNA replication "delity. J Biol Chem 279: 16895-16898.
B7. Halford SE (2009) An end to 40 years of mistakes in DNA-protein association kinetics? Biochem Soe Trans 37:343-348.
B8. von Hippel PH, Berg OG (1989) Facilitated target location in biological systems. J Biol Chem 264:675-678.
B9. Halford SE, Marko JF (2004) How do site-speci"c DNA-binding proteins "nd their targets? Nucleic Acids Res 32:3040-3052.
B 10. Hager GL, McNally JG, Misteli T (2009) Transcription dynamics. Moi Cell 35:741 - 753.
B l 1. Tang C, Iwahara J, Clore GM (2006) Visualization of transient encounter complexes in protein-protein association. Nature 444:383-386.
B I2. Blarney PC, van Oijen AM, Banerjee A, Verdine GL, Xie XS (2006) A base-excision DNArepair protein " ds intrahelical lesion bases by fast sliding in contact with DNA. Proc
Natl Acad Sci USA 103:5752-5757.
B I3. Gorman J, Greene EC (2008) Visualizing one-dimensional diffusion of protems along DNA. Nat Struct Mol Biol 15:768-774.
B 14. Gorman J, et al. (2007) Dynamic basis for one-dimensional DN A scanning by the mismatch repair complex Msh2-Msh6. Mol Cell 28:359-370.
B 15. Gorman J, Plys AJ, Visnapuu ML, Alani E, Greene EC (2010) Visualizing
onedimensional diffusion of eukaryotic DNA repair factors along a chromatin lattice. Nat Struct Mol Biol 17:932-938.
B 16. Fazio T, Visnapuu ML, Wind S, Greene EC (2008) DNA curtains and nanoscale curtain rods: High-throughput tools for single molecule imaging. Langmuir 24: 10524-10531.
B l 7. Gorman J, Fazio T, Wang F, Wind S, Greene EC (2010) Nanofabricated racks of aligned and anchored DNA substrates for single-molecule imaging. Langmuir 26: 1372- 1379.
18. Kolodner RD, Mendiilo ML, Putnam CD (2007) Coupling distant sites in DNA during DNA mismatch repair. Proc Natl Acad Sci USA 104: 12953-12954.
19. Allen DJ, et al. (1997) MutS mediates heteroduplex loop formation by a translocation mechanism. EMBO J 16:4467-4476.
20. Blackwell LJ, Martik D, Bjornson KP, Bjornson ES, Modrich P ( 1998)
Nucleotidepromoted release of hMutSalpha from heteroduplex DNA is consistent with an ATP dependent translocation mechanism. J Biol Chem 273:32055-32062.
21. Gradia S, Acharya S, Fishel R (1997) The human mismatch recognition complex hMSH2-hMSH6 functions as a novel molecular switch. Cell 91 : 995- 1005.
22.. Gradia S, et al. (1999) hMSH2-hMSH6 forms a hydrolysis-independent sliding clamp on mismatched DNA. Mol Cell 3:255-261.
23. Mendiilo ML, Mazur DJ, Kolodner RD (2005) Analysis of the interaction between the Saecharomyces cerevisiae MSH2-MSH6 and MLH1-PMS1 complexes with DNA using a reversible DNA end-blocking system. J Biol Chem 280:22245-22257.
24. Obmolova G, Ban C, Hsieh P, Yang W (2000) Crystal structures of mismatch repair protein MutS and its complex with a substrate DNA. Nature 407:703-710.
25. Juriop MS, Obmolova G, Raitsch K, Hsieh P, Yang W (2001) Composite active site of an ABC ATPase: MutS uses ATP to verify mismatch recognition and authorize DNA repair. Mol Cell 7: 1-12.
26. Wang H, et al. (2003) DNA bending and unbending by MutS govern mismatch recognition and speci"city. Proc Natl Acad Sci USA 100: 14822-14827.
27. Jeong C, et al. (2011) MutS switches between two fundamentally distinct clamps during mismatch repair. Nat Struct Mol Biol 18:379-385.
28. Bagchi B, Blainey PC, Xie XS (2008) Diffusion constant of a nonspeci"eally bound protein undergoing curvilinear motion along DNA. J Phys Chem B 112:6282-6284.
29. Blainey PC, et al, (2009) Nonspeci"cally bound proteins spin while diffusing along DNA. Nat Struct Mol Biol 16: 1224-1229.
30. Schurr JM (1979) The one-dimensionai diffusion coef cient of proteins absorbed on DNA. Hydrodynamic considerations. Biophys Chem 9:413-414,
31. Lamers MH, et al. (2000) The crystal si us sure of DNA mismatch repair protein MutS binding to a G x T mismatch. Nature 407:71 1-717.
32. Mendiflo ML, et al. (2010) Probing DNA- and ATP-mediated conformational changes in the MutS family of mispair recognition proteins using deuterium exchange mass spectrometry, J Biol Chem 285 : 13170-13182.
33. Warren JJ, et al. (2007) Structure of the human MuiSalpha DNA lesion recognition complex. Mol Cell 26:579-592. 34. Cho W-K, et al. (2012) ATP alters the diffusion mechanics of MutS on mismatched DNA. Structure 20: 1264-1274.
35. Vuzma D, Polonsky M, Levy Y (2010) Facilitated DNA search by multidomain transcription factors: Cross talk via a !exible linker, Biophys J 99: 1202-121 1.
36. Mirny L, et al. (2009) How a protein searches for its site on DNA: The mechanism of facilitated diffusion, j Phys A 42:434013.
37. Slutsky M, Mirny LA (2004) Kinetics of protein-DNA interaction: Facilitated target location in sequence-dependent potential. Biophys J 87:402.1-4035.
38. Gorski SA, Dundr M, Misteli T (2006) The road much traveled: Traf 'eking in the cell nucleus. Curr Opin Cell Biol 18:284-290.
39. Li F, Tian L, Gu L, Li GM (2009) Evidence that nucieosomes inhibit mismatch repair in eukaryotie cells. J Biol Chem 284:33056-33061 .
40. Hombauer H, Campbell CS, Smith CE, Desai A, Kolodner RD (2011) Visualization of eukaryotie DNA mismatch repair reveals distinct recognition and repair intermediates. Cell 147: 1040-1053.
41. Ban C, Junop M, Yang W (1999) Transformation of MutL by ATP binding and hydrolysis: A switch in DNA mismatch repair. Cell 97:85-97,
B42. Sacho EJ, Kadyrov FA, Modrich P, Kunkel TA, Erie DA (2008) Direct v sualization of asymmetric ademne-nucleotide-mduced conformatio al changes in MutL alpha. Moi Cell 29: 1 12-121.
B43. Qiu R, et al. (2012) Large conformational changes in MutS during DNA scanning, mismatch recognition and repair signalling. EMBO J 31 :2528-2540.
B44. Efron B, Tibshtrani R (1993) An Introduciion to the Bootstrap (Chapman and Hail, Inc., New York).
Supplemental References for this Example
1. Gorman J, ei al. (2007) Dynamic basis for one-dimensional DNA scanning by the mismatch repair complex Msh2-Msh6. Mol Cell 28(3):359 - 370.
2. Gorman J, Plys A, Visnapuu M, Alani E, & Greene E (2010) Visualizing onedimensional diffusion of eukaryotic DNA repair factors along a chromatin lattice. Nat
Struct Mol Biol 17(8):932 - 938.
3. Visnapuu M-L & Greene E (2009) Single-molecule imaging of DNA curtains reveals intrinsic energy landscapes for nucleosome deposition. Nat Struct Mol Biol 16: 1056- 1062.
4. Efron B & Tibshirani R (1993) An Introduction to the Bootstrap (Champman and Hall, Inc., New York).
5. Dahan M, ei al. (2003) Diffusion dynamics of gly cine receptors revealed by singlequantum dot tracking. Science 302(5644):442 - 445.
6. Yao J, Larson D, Vishwasrao H, Zipfel W, & Webb W (2005) Blinking and nonradiant dark fraction of water-soluble quantum dots in aqueous solution. Proc Natl Acad Sci U S A 102(40): 14284 - 14289.
7. Zhang Q, Li Y, & Tsien R (2009) The dynamic control of kiss-and-run and vesicular reuse probed with single nanoparticles. Science 323(5920): 1448 - 1453. 8. Gorman J, Fazio T, Wang F, Wind S, & Greene E (2010) Nanofabrieated racks of aligned and anchored DNA substrates for single-molecule imaging. Langmuir 26: 1372 - 1379.
9. Feller W (1971) An introduction to probability theory and its applications (John Wiley & Sons, Inc.).
10. Van Kampen N (2007) Stochastical Processes in Physics and Chemistry (Elsevier Press).
1 1. edner S (2001) A guide to first passage processes (Cambridge University Press).

Claims

What is claimed is:
1 , An array comprising: a) a solid support;
b) a fluid lipid Mayer disposed on the solid support;
c) at least one single-stranded nucleic acid molecule; and d) a linkage for attaching the nucleic acid molecule to the solid support.
PCT/US2013/058641 2012-12-21 2013-09-06 Lipid bilayers for dna molecule organization and uses thereof WO2014099057A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201261745149P 2012-12-21 2012-12-21
US61/745,149 2012-12-21

Publications (1)

Publication Number Publication Date
WO2014099057A1 true WO2014099057A1 (en) 2014-06-26

Family

ID=50978985

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/058641 WO2014099057A1 (en) 2012-12-21 2013-09-06 Lipid bilayers for dna molecule organization and uses thereof

Country Status (1)

Country Link
WO (1) WO2014099057A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112285189A (en) * 2020-09-28 2021-01-29 上海天能生命科学有限公司 Method for remotely controlling electrophoresis apparatus based on image recognition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6228326B1 (en) * 1996-11-29 2001-05-08 The Board Of Trustees Of The Leland Stanford Junior University Arrays of independently-addressable supported fluid bilayer membranes
US20080274905A1 (en) * 2005-09-30 2008-11-06 The Trustees Of Columbia University In The City Of New York Microfluidic cells with parallel arrays of individual dna molecules

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6228326B1 (en) * 1996-11-29 2001-05-08 The Board Of Trustees Of The Leland Stanford Junior University Arrays of independently-addressable supported fluid bilayer membranes
US20080274905A1 (en) * 2005-09-30 2008-11-06 The Trustees Of Columbia University In The City Of New York Microfluidic cells with parallel arrays of individual dna molecules

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
DUTTA ET AL.: "Selective tethering of ligands and proteins to a microfluidically patterned electroactive fluid lipid bilayer array.", LANGMUIR, vol. 26, no. 12, 15 June 2010 (2010-06-15), pages 9835 - 9841 *
GORMAN ET AL.: "Nanofabricated racks of aligned and anchored DNA substrates for single-molecule imaging.", LANGMUIR, vol. 26, no. 2, 19 January 2010 (2010-01-19), pages 1372 - 1379 *
LARSSON ET AL.: "Characterization of DNA immobilization and subsequent hybridization on a 2D arrangement of streptavidin on a biotin-modified lipid bilayer supported on Si02", ANAL CHEM, vol. 75, no. 19, 1 October 2003 (2003-10-01), pages 5080 - 5087 *
VISNAPUU ET AL.: "The importance of surfaces in single-molecule bioscience.", MOL BIOSYST, vol. 4, no. 5, May 2008 (2008-05-01), pages 394 - 403 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112285189A (en) * 2020-09-28 2021-01-29 上海天能生命科学有限公司 Method for remotely controlling electrophoresis apparatus based on image recognition
CN112285189B (en) * 2020-09-28 2021-06-25 上海天能生命科学有限公司 Method for remotely controlling electrophoresis apparatus based on image recognition

Similar Documents

Publication Publication Date Title
US20230203577A1 (en) Methods and systems for processing polynucleotides
US20230272452A1 (en) Combinatorial single molecule analysis of chromatin
US10011872B1 (en) Methods and systems for processing polynucleotides
US9746476B2 (en) Real-time analytical methods and systems
US20200056232A1 (en) Dna sequencing and epigenome analysis
KR102531487B1 (en) Synthetic nucleic acid spike-ins
Zhou et al. SSB functions as a sliding platform that migrates on DNA via reptation
CN112005115A (en) Methods of characterizing multiple analytes from a single cell or cell population
Gotrik et al. Direct selection of fluorescence-enhancing RNA aptamers
AU2016261496B2 (en) Platform for discovery and analysis of therapeutic agents
US20150057162A1 (en) Peptide arrays
Smith et al. High-fidelity single molecule quantification in a flow cytometer using multiparametric optical analysis
JP2013545472A (en) Simultaneous detection of biomolecules in a single cell
Lin et al. Single‐molecule imaging reveals the translocation and DNA looping dynamics of hepatitis C virus NS3 helicase
Marchetti et al. How to switch the motor on: RNA polymerase initiation steps at the single‐molecule level
US20110306042A1 (en) Determination of chromatin conformation
Chanou et al. Single-molecule techniques to study chromatin
Klein et al. Guidelines for DNA recombination and repair studies: Mechanistic assays of DNA repair processes
WO2014099057A1 (en) Lipid bilayers for dna molecule organization and uses thereof
Zhuang et al. An Alternative Clamp Loading Pathway via the T4 Clamp Loader gp44/62− DNA Complex
Whinn et al. Single-molecule visualization of stalled replication-fork rescue by the Escherichia coli Rep helicase
US20240044882A1 (en) Tethered detection assays
US20230416809A1 (en) Spatial detection of biomolecule interactions
US11613783B2 (en) Systems and methods for detecting multi-molecule biomarkers
Lyon Jr Multi-Color Visualization and Quantification of Single RNA Translation and HIV-1 Programmed Ribosomal Frameshifting in Living Cells

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13865472

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13865472

Country of ref document: EP

Kind code of ref document: A1