CN115094127A - Method for in-situ detection of binding position of protein and deoxyribonucleotide - Google Patents

Method for in-situ detection of binding position of protein and deoxyribonucleotide Download PDF

Info

Publication number
CN115094127A
CN115094127A CN202210163961.9A CN202210163961A CN115094127A CN 115094127 A CN115094127 A CN 115094127A CN 202210163961 A CN202210163961 A CN 202210163961A CN 115094127 A CN115094127 A CN 115094127A
Authority
CN
China
Prior art keywords
target
protein
strain
dna
transcription factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210163961.9A
Other languages
Chinese (zh)
Inventor
倪磊
金帆
李飞旋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202210163961.9A priority Critical patent/CN115094127A/en
Publication of CN115094127A publication Critical patent/CN115094127A/en
Priority to PCT/CN2022/140081 priority patent/WO2023160163A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/21Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Pseudomonadaceae (F)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/74Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora
    • C12N15/78Vectors or expression systems specially adapted for prokaryotic hosts other than E. coli, e.g. Lactobacillus, Micromonospora for Pseudomonas
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/78Hydrolases (3) acting on carbon to nitrogen bonds other than peptide bonds (3.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y305/00Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5)
    • C12Y305/04Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4)
    • C12Y305/04012Hydrolases acting on carbon-nitrogen bonds, other than peptide bonds (3.5) in cyclic amidines (3.5.4) dCMP deaminase (3.5.4.12)
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/50Mutagenesis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12RINDEXING SCHEME ASSOCIATED WITH SUBCLASSES C12C - C12Q, RELATING TO MICROORGANISMS
    • C12R2001/00Microorganisms ; Processes using microorganisms
    • C12R2001/01Bacteria or Actinomycetales ; using bacteria or Actinomycetales
    • C12R2001/38Pseudomonas
    • C12R2001/385Pseudomonas aeruginosa

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Immunology (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for detecting the binding position of protein and deoxyribonucleotide in situ, which comprises the steps of constructing a target strain for expressing cytosine deoxyribonucleotide deaminase-target protein fusion protein and a control strain for expressing target protein, wherein the target strain and the control strain do not contain uracil DNA glycosidase genes; after inducing the protein expression of the target strain and the control strain, extracting the genome DNA of the target strain and the control strain; and (3) performing high-throughput sequencing on the genome DNA, analyzing to obtain point mutations of the genome DNA of the target strain and the genome DNA of the control strain, removing the point mutation shared by the target strain and the control strain, and screening the point mutation in the non-coding region from the rest point mutations of the target strain to obtain the mutant. The method utilizes the mutation capability of pyrimidine deoxynucleotide deaminase in bacteria on DNA, constructs a target strain for expressing fusion protein, and then obtains a final result only by simple genome sequencing, thereby being convenient and quick.

Description

Method for in-situ detection of binding position of protein and deoxyribonucleotide
Technical Field
The invention belongs to the technical field of synthetic biology, and particularly relates to a method for detecting a binding position of a protein and deoxyribonucleotide in situ.
Background
The binding site of a transcriptional regulator protein on its genomic DNA is usually a conserved nucleotide sequence, and finding the binding site of a target transcription factor on the genomic DNA can help researchers find out which genes the transcription factor regulates. Therefore, the detection method of protein-DNA interaction is a key technology for analyzing gene regulation circuits and signal transduction pathways of living systems.
The most common method for detecting the interaction between protein and DNA at present is chromatin co-immunoprecipitation-sequencing technology, and the specific method of the method in bacteria is to firstly over-express a transcription factor fused with a tag sequence, then break, enrich and separate DNA fragments combined with the transcription factor, and finally obtain the DNA sequences by a sequencing method. In addition, some people have developed the Calling-cards method, which fuses the transcription factor of interest with the transposon guide protein Sir4, guides the transposon to perform transposition insertion in the transcription factor binding region on the genome, and determines the position of the transposon insertion by means of enzyme digestion, circularization and sequencing, thereby determining the binding site of the transcription factor of interest. The Calling-cards method is currently used by a relatively few people.
The defects of the existing chromatin co-immunoprecipitation-sequencing technology are as follows: 1) the operation time is long, and the labor capacity is large; 2) the method has the advantages that the number of steps is large, small errors in some experiments can be accumulated slowly, so that the finally obtained data result cannot be satisfied, and due to the small errors, troubleshooting after the experiments have problems is difficult; 3) many technical details need to be paid attention in the experimental process, for example, how protein is fixed, how DNA is purified, the ultrasonic condition of DNA breaking and the like greatly influence the final result, and for many experimenters, the technical requirements are difficult to master in a short time, so that the method does not have good popularization.
Therefore, it is important to find a method for rapidly detecting the interaction between protein and DNA.
Disclosure of Invention
In order to solve the defects in the prior art, the invention aims to provide a method for detecting the binding position of a protein and deoxyribonucleotides in situ. According to the invention, through theoretical derivation and optimized maturation of an experimental method, global scanning of a target protein binding site on a genome is realized, a bioinformatics method is utilized to predict a conserved nucleotide sequence of protein-bound deoxyribonucleotide (hereinafter, abbreviated as DNA), and the detection of the interaction between the target protein and the DNA can be realized in a microbial cell. The specific technical scheme is as follows:
the invention provides a method for detecting the binding position of a protein and deoxyribonucleotides in situ, which comprises the following steps:
1) constructing a target strain for expressing the cytosine deoxynucleotide deaminase-target protein fusion protein and a control strain for expressing the target protein;
the target strain and the control strain do not contain uracil DNA glycosidase genes;
2) after inducing the protein expression of the target strain and the control strain, extracting the genome DNA of the target strain and the control strain;
3) and performing high-throughput sequencing on the genomic DNA of the target strain and the genomic DNA of the control strain, analyzing to obtain point mutations of the genomic DNA of the target strain and the genomic DNA of the control strain, removing the point mutation shared by the target strain and the control strain, and screening the point mutation in a non-coding region from the rest point mutations of the target strain to obtain the target protein binding site.
Further, in step 3), setting n groups of genomic DNA of the target strain and the genome DNA of the control strain respectively for high-throughput sequencing, and further obtaining a target protein binding site, wherein the method specifically comprises the following steps: performing high-throughput sequencing on the genomic DNA of the n groups of target strains and the genomic DNA of the n groups of control strains, analyzing to obtain point mutations of the genomic DNA of the target strains and the genomic DNA of the control strains, screening the common point mutations of the n groups of target strains from the rest point mutations of the target strains after removing the common point mutations of the target strains and the control strains, and screening the common point mutations in the non-coding regions to obtain target protein binding sites;
n is more than or equal to 2 and less than or equal to 5.
Further, the construction method of the target strain expressing the cytosine deoxynucleotide deaminase-target protein fusion protein comprises the following steps: constructing a cytosine deoxynucleotide deaminase-target protein fusion protein expression vector, and then introducing the fusion protein expression vector into a strain without uracil DNA glycosidase genes;
the construction method of the target protein-expressing control strain comprises the following steps: a target protein expression vector was constructed, and then the target protein expression vector was introduced into a strain containing no uracil DNA glycosidase gene.
Furthermore, in the cytosine deoxynucleotide deaminase-target protein fusion protein expression vector, the cytosine deoxynucleotide deaminase is positioned at the nitrogen end of the fusion protein, the target protein is positioned at the carbon end of the fusion protein, and the cytosine deoxynucleotide deaminase and the target protein are connected by 8 glycines.
Further, the inducing protein expression of the target strain and the control strain comprises the following steps: recovering bacterial strains, selecting clone bacterial plaques, and shaking the bacterial strains for 6-20 hours at 37 ℃ by using an LB culture medium and adding an inducer;
preferably, the inducer is arabinose;
preferably, the formulation of the LB medium is: 10g/L of sodium chloride, 5g/L of yeast powder and 10g/L of peptone.
Further, the target protein is a transcription factor.
Further, the method for detecting the binding position of the protein and the deoxyribonucleotide in situ also comprises the step of calculating a conserved DNA sequence bound by the target protein, and the specific steps comprise:
with the target protein binding site obtained in the step 3) as a center, respectively extending m base pairs at the upstream and the downstream of the genome DNA to obtain a DNA fragment sequence containing the target protein binding site; m is more than or equal to 10 and less than or equal to 200;
and storing the DNA fragment sequence containing the target protein binding site into a text document, and calculating the conserved DNA sequence bound by the target protein by using a bioinformatics tool MEME Suite.
In a second aspect, the present invention provides a method for predicting the intracellular concentration of a protein transcriptionally expressed from a promoter, comprising: based on the method for detecting the combination position of the protein and the deoxyribonucleotide in situ, the transcription factor target promoter is mutated, and the intracellular concentration of the protein transcribed and expressed by the promoter is predicted according to the following formula;
x=1-exp(-K 5 ·[Protein]·t)
wherein the content of the first and second substances,
x represents the proportion of the mutant DNA on the transcription factor target promoter;
[ Protein ] represents the intracellular concentration of a Protein expressed by transcription of a promoter;
t represents the total time of occurrence of the mutation process;
K 5 denotes the equilibrium constant, K 5 Is represented as follows:
Figure BDA0003515305020000031
k TIC1 、k TIC2 is a constant number, k translation To translation Rate, γ mRNA Is the mRNA decomposition rate, γ Protein The protein degradation rate.
Further, the proportion x of the mutant DNA on the transcription factor target promoter is expressed as follows:
x=1-exp(-k m ·t)
wherein, the first and the second end of the pipe are connected with each other,
k m representing transcription factor binding sitesThe mutation rate of (c);
t represents the total time of occurrence of the mutation process.
Further, the proportion x of the mutant DNA on the transcription factor target promoter is expressed as follows:
x=1-exp(-K 3 ·θ·t)
wherein the content of the first and second substances,
θ represents the ratio occupied by the DNA bound to the transcription factor protein, and θ is expressed as follows:
Figure BDA0003515305020000032
[LasR]represents the concentration of the transcription factor protein, k d Represents an equilibrium constant;
k 3 is a constant number, k 3 Is represented as follows:
K 3 =[RNAP]·[DNA all ]·K 1 ·K 2 ·k TIC1
[RNAP]indicates the concentration of RNA polymerase, [ DNA ] all ]Denotes the total target DNA concentration, K, in the bacterial cell 1 And K 2 The equilibrium constants during the recruitment of ribonucleic acid polymerase by the following transcription factors, respectively:
Figure BDA0003515305020000041
RNAP denotes RNA polymerase, LasR denotes transcription factor protein, RNAP-LasR 2 DNA represents a complex of RNA polymerase with transcription factors and DNA, and LasR-TIC represents a transcription initiation complex in an open loop state to which transcription factor proteins are bound;
preferably, the θ is determined by the method for detecting the binding position of the protein to the deoxyribonucleotide in situ according to the present invention.
The invention has the beneficial effects that:
(1) the invention provides a method for detecting the binding position of protein and deoxyribonucleotide in situ, which utilizes the mutation ability of pyrimidine deoxyribonucleotide deaminase in bacteria to DNA, constructs a target strain expressing cytosine deoxyribonucleotide deaminase-target protein fusion protein, and then obtains the final result by simple genome sequencing. The method completes the construction of the cytosine deoxynucleotide deaminase-target protein fusion protein expression plasmid through basic experimental conditions such as PCR, simple molecular cloning technology and the like, is convenient, quick and time-saving, and does not need fussy experimental design. The method has low operation threshold, and only needs bacterial culture and sequencing after the construction of the strain is completed.
(2) Based on the method for detecting the binding position of the protein and the deoxyribonucleotide in situ, the obtained target protein binding site can also help to understand the transcription level of a target promoter combined by the transcription factor protein.
Drawings
FIG. 1 is a schematic diagram of an experiment for detecting binding sites of a target protein and DNA by fused cytosine deoxynucleotide deaminase;
FIG. 2 shows the base mutation effect on the rhlI gene promoter region after 3 hours, 6 hours, 9 hours and 12 hours of shake-induced bacteria induction;
FIG. 3 shows that the LasR protein can bind to multiple promoters detected by sequencing in a gel electrophoresis migration assay;
FIG. 4 is a graph of the ratio of promoter mutations at different transcription levels as a function of the transcription level of the corresponding promoter;
FIG. 5 shows the conserved nucleotide sequence prediction for LasR binding.
Detailed Description
In order that the invention may be more clearly understood, it will now be further described with reference to the following examples and the accompanying drawings. The examples are for illustration only and do not limit the invention in any way. In the examples, each raw reagent material is commercially available, and the experimental method not specifying the specific conditions is a conventional method and a conventional condition well known in the art, or a condition recommended by an instrument manufacturer.
The principle of the invention is that cytosine deoxynucleotide deaminase and target protein are fused together, DNA near the fused target protein is mutated by utilizing the mutation capability of the cytosine deoxynucleotide deaminase to the DNA in bacteria, and the position of gene mutation is found by means of sequencing, namely the binding site of the target protein on genome. The specific implementation of the invention comprises the steps of designing fusion protein expression, constructing a target bacterial strain, culturing bacteria, sequencing analysis and the like. The method comprises the following specific steps:
1) design of fusion protein expression: the design of fusion protein expression is a key link of the invention, and specifically comprises the expression optimization of cytosine deoxynucleotide deaminase gene and the selection of expression vectors. First, according to the codon usage of the target strain, the web page tool Jcat is used (http://www.jcat.de/) The coding sequence of the original cytosine deoxynucleotide deaminase gene was optimized to ensure that the gene used the most commonly used codons in the target strain and was synthesized by commercial companies. Then, primers of polymerase chain reaction are respectively designed for amplifying the optimized cytosine deoxynucleotide deaminase gene, the target protein gene and a plasmid vector for inducing expression of the fusion protein, and the primers are prepared to be applied to the next Gibbson assembly reaction. Preferably, the cytosine deoxynucleotide deaminase is placed at the nitrogen terminus of the fusion protein and the target protein is placed at the carbon terminus of the fusion protein, with 8 glycines in between.
2) Construction of the target Strain: after the target protein gene segment and the cytosine deoxynucleotide deaminase gene after codon optimization are obtained by amplification, two DNA segments are connected by using a method of overlap extension polymerase chain reaction, and a DNA segment fusing the two genes and an intermediate connecting sequence is obtained after purification. And connecting the DNA fragment with a linearized fragment of a plasmid vector for inducing and expressing the fusion protein through Gibbson assembly reaction, transferring the DNA fragment into an escherichia coli competent strain Top10 by a chemical method, paving a gentamicin resistant plate on the bacteria, selecting clone sequencing and verifying to finally obtain a target plasmid with correct connection.
Meanwhile, the gene fragment of the target protein is connected with the linearized fragment of a plasmid vector for inducing and expressing the fusion protein through Gibbson assembly reaction, and then the gene fragment is transferred into an escherichia coli competent strain Top10 by a chemical conversion method, a gentamicin resistant plate is paved on the bacteria, and the bacteria are selected, cloned and sequenced to verify to finally obtain the plasmid which is correctly connected and expresses the target protein.
On the other hand, the construction of a target strain in which the intracellular uracil DNA glycosidase gene of bacteria is deleted is initiated. Firstly, uracil DNA glycosidase gene on bacterial genome is found, and the gene is knocked out by means of homologous recombination to prevent the coding protein of the gene from repairing the mutant DNA base.
After obtaining the target strain with deleted bacterial intracellular uracil DNA glycosidase gene, respectively introducing the target plasmid with cloned cytosine deoxynucleotide deaminase gene and target protein gene fusion fragment and the plasmid expressing target protein into the target strain by electric transfer, selecting a single clone on a gentamycin resistant plate, identifying the correct clone by polymerase chain reaction, shaking the strain and preserving the strain to obtain the target strain containing the target plasmid and the control strain expressing the target protein without the fused cytosine deoxynucleotide deaminase gene. The stored strains were stored in 35% final glycerol and stored in a freezer at-80 ℃.
3) And (3) bacterial culture: the target strain containing the target plasmid constructed above and the control strain expressing the target protein without the fused cell pyrimidine deoxynucleotide deaminase gene were first streaked and cultured in a constant temperature incubator at 37 ℃ for 20 hours. The clone bacterial plaque is picked and shaken at 37 ℃, an LB culture medium (10 g/L of sodium chloride, 5g/L of yeast powder and 10g/L of peptone) is used, the final concentration of an inducer arabinose is 0.4 percent (mass fraction), the bacteria are shaken every time for 1 ml, and the total time of the bacteria shaking induction is 6-20 hours. The bacteria were collected by centrifugation, and genomic DNA of the bacteria was extracted using a commercial genome extraction kit.
4) Sequencing and analysis: three groups of target strain samples and three groups of control strains were set up for sequencing. Genome sequencing of bacteria was done by commercial companies. After obtaining the point mutation result of the whole genome, the point mutation common to the target strain and the control strain is first eliminated, and then the common point mutation contained in all of the three groups of target strains is found from the remaining point mutations. Further, screening out the common point mutation in the non-coding region of the gene, namely the target protein combinationThe site of (1). Then, the DNA fragment was extended forward and backward by 50 base pairs on the genome centering around each mutation site, and the DNA fragment was cut out to a length of 100 base pairs for each mutation site. Then the sequences are stored in a text document in batches and then a bioinformatics tool MEME Suite (Introduction-MEME Suite(meme-suite.org)) The conserved DNA sequence bound by the transcription factor protein is calculated.
Example 1
The feasibility of the method of the present invention will be described in detail below by taking the intracellular detection of the DNA binding site of the transcription factor LasR protein in Pseudomonas aeruginosa as an example. The experimental scheme is shown in figure 1, and the specific scheme is as follows:
1) design of expression of cytosine deoxynucleotide deaminase-LasR fusion protein
When designing the fusion protein, cytosine deoxynucleotide deaminase is placed at the nitrogen end of the fusion protein, LasR is placed at the carbon end of the fusion protein, and the middle is connected by 8 glycines.
The amino acid sequence of the transcription factor LasR protein is shown in SEQ ID NO.1, and the nucleotide sequence is shown in SEQ ID NO. 2.
The amino acid sequence of the cytosine deoxynucleotide deaminase is shown as SEQ ID NO.3, and according to the codon usage of a target strain Pseudomonas aeruginosa, a webpage tool Jcat (Jcat)http://www.jcat.de/) The coding sequence of the original cytosine deoxynucleotide deaminase gene is optimized, and the nucleotide sequence of the optimized cytosine deoxynucleotide deaminase is shown as SEQ ID NO.4 and synthesized by a commercial company.
The plasmid vector for inducing expression of the fusion protein is pJN105 plasmid vector, which is a tool plasmid capable of inducing expression of a target gene through arabinose, an arabinose-responsive promoter on the tool plasmid is used for controlling the expression of the target gene, the nucleotide sequence of the arabinose-responsive promoter is shown in SEQ ID NO.5, wherein the bold mark is an arabinose promoter sequence region, and the black triangle indicates the position of the target gene insertion.
Primers for polymerase chain reaction are respectively designed for amplifying the cytosine deoxynucleotide deaminase gene and the LasR gene segment after codon optimization and a plasmid vector pJN105 for inducing expression of the fusion protein.
2) Construction of target and control strains
The primers designed for polymerase chain reaction are used for amplifying the cytosine deoxynucleotide deaminase gene after codon optimization, the LasR gene segment and a plasmid vector pJN105 for inducing expression of the fusion protein. After amplification to obtain LasR gene segment and codon optimized cytosine deoxynucleotide deaminase gene, two segments of DNA are connected by using a method of overlap extension polymerase chain reaction, and a DNA segment fusing two gene segments and middle 8 glycine connecting sequences is obtained after purification.
The DNA fragment is connected with a linearized fragment of an pJN105 plasmid vector through Gibbson assembly reaction, then the DNA fragment is transferred into an escherichia coli competent strain Top10 by a chemical conversion method, a gentamycin resistant plate is paved on the bacteria, and clone sequencing is selected for verification, so that a target plasmid which is correctly connected and expresses the cytosine deoxynucleotide deaminase-target protein fusion protein is finally obtained.
Meanwhile, the LasR gene fragment is connected with the linearized fragment of the pJN105 plasmid vector through Gibbson assembly reaction, and then the LasR gene fragment is transferred into an escherichia coli competent strain Top10 by a chemical method, the bacteria are paved with gentamicin resistant plates, and cloning sequencing is selected for verification, so that the plasmid which is connected with the correct expression LasR protein is finally obtained.
A No. 750 gene PA0750 gene on a pseudomonas aeruginosa genome is knocked out by utilizing a homologous recombination mode, and the pseudomonas aeruginosa strain with the intracellular uracil DNA glycosidase gene deleted is obtained.
Respectively introducing the obtained target plasmid for expressing the cytosine deoxynucleotide deaminase-target protein fusion protein and the obtained plasmid for expressing the LasR protein into a pseudomonocell strain in which intracellular uracil DNA glycosidase genes are deleted through electrotransformation, selecting a monoclonal on a gentamicin resistant plate, identifying the correct clone by using polymerase chain reaction, shaking the strain and preserving the strain to obtain a target strain containing the target plasmid and a control strain which does not contain the fusion cytosine deoxynucleotide deaminase gene and expresses the LasR protein. The stored strains were stored in 35% final glycerol and stored in a freezer at-80 ℃.
3) Bacterial culture
The target strain containing the target plasmid and the control strain expressing the LasR protein without the fused cell pyrimidine deoxynucleotide deaminase gene constructed above were first streaked and cultured in a constant temperature incubator at 37 ℃ for 20 hours. The clone bacterial plaque is picked and shaken at 37 ℃, LB culture medium (10 g/L of sodium chloride, 5g/L of yeast powder and 10g/L of peptone) is used, the final concentration of the inducer arabinose is 0.4 percent (mass fraction), 1 ml of bacteria is shaken every time, and the total time of the induction of the bacteria is 12 hours. The bacteria were collected by centrifugation, and genomic DNA of the bacteria was extracted using a commercial genome extraction kit.
4) Sequencing and analysis: three groups of target strain samples and three groups of control strains were set up for sequencing. Genome sequencing of bacteria was done by commercial companies. After obtaining the point mutation result of the whole genome, the point mutation common to the target strain and the control strain is first eliminated, and then the common point mutation contained in all of the three groups of target strains is found from the remaining point mutations. Furthermore, the common point mutation in the non-coding region of the gene is screened out, and the target transcription factor binding site is obtained. Then, the DNA fragment is extended forward and backward by 50 base pairs on the genome, centered at each mutation site, and the DNA fragment sequence is cut out to a length of 100 base pairs for each mutation. Then the sequences are stored in a text document in batches and then a bioinformatics tool MEME Suite (Introduction-MEMESuite(meme-suite.org)) The conserved DNA sequence bound by the transcription factor protein was calculated.
Example 2
After streaking recovery of the target strain and the control strain constructed in example 1, the clone bacterial plaque is picked and shaken at 37 ℃, an LB culture medium (10 g/L of sodium chloride, 5g/L of yeast powder and 10g/L of peptone) is used, 1 ml of the clone bacterial plaque is shaken every time, the final concentration of the inducer arabinose is added to be 0.4% (mass fraction), and the total time of shaking is 3, 6, 9 and 12 hours. The bacteria were collected by centrifugation, and genomic DNA of the bacteria was extracted using a commercial genome extraction kit. And analyzed by the method of example 1.
As shown in FIG. 2, in the case of mutation in the promoter region known to bind to LasR (the promoter of rhlI gene) detected by shake at different times, the boxed position is the DNA base in which the mutation occurs, and the mutation rate in the promoter region increases significantly with the increase of the induction time until the mutation is close to the complete mutation.
Example 3
After streaking and recovering the target strain and the control strain constructed in example 1, colony plaques were picked and shaken at 37 ℃ and induced with LB medium (10 g/L sodium chloride, 5g/L yeast powder, 10g/L peptone) for 12 hours using 0.4% arabinose and 10 μm/L3-oxododecanoylhomoserine lactone, and then the bacteria were collected for genome sequencing. And analyzed by the method of example 1.
Mutations were obtained in a total of 16 promoter regions, of which 10 known promoters, including PA1003, PA1431, PA2426, PA2763, PA2769, PA3104, PA3326, PA3384, rhlI, PA3904, were available. In addition, 6 promoters which have not been found in the past are found, including PA0717, PA0727, PA0861, PA1131, PA3347 and PA 5295. The binding sites obtained after genome sequencing were verified by gel electrophoresis migration assay (EMSA), as shown in FIG. 3, which indicates that all the new promoters have LasR binding activity.
Example 4
Based on the fact that the ability of cytosine deoxynucleotide deaminase to mutate DNA depends on the transcription activity of the target DNA, the transcription at the corresponding position is estimated from the mutation ratio at the measured mutation position. Based on the method, a theoretical model based on steady-state assumption is established and verified through experiments. The model is as follows:
Figure BDA0003515305020000081
(1) the formula is a combined equilibrium reaction formula of the transcription factor and the target promoter, and the equilibrium constant is k d . LasR represents a transcription factor protein. Let θ be the proportion occupied by DNA binding to the transcription factor protein, then θ can be expressed as:
Figure BDA0003515305020000082
let the total target DNA concentration in the bacterial cell be [ DNA ] all ]At a constant value, have
[LasR 2 -DNA]=DNA all ·θ (3)
The process of transcription factor recruitment to ribonucleic acid (hereinafter abbreviated as RNA) polymerase can be written as
Figure BDA0003515305020000083
Wherein RNAP represents RNA polymerase, RNAP-LasR 2 DNA represents a complex of RNA polymerase with a transcription factor and DNA, and LasR-TIC represents a transcription initiation complex in an open loop state to which a transcription factor protein is bound. K 1 And K 2 The equilibrium constants for the two-step reactions are respectively. From equation (4), the following relationship can be obtained:
[LasR-TIC]=[RNAP]·[LasR 2 -DNA]·K 1 ·K 2 (5)
the mutation process is a primary reaction that must be mediated by LasR-TIC and can be written as:
Figure BDA0003515305020000091
wherein the mutation rate k m Proportional to the concentration of LasR-TIC, i.e. k m =k TIC1 ·[LasR-TIC]Where k is TIC1 Is a constant. Then the expression of the ratio x of the mutated DNA can be derived from the expression (6):
x=1-exp(-k m ·t)
wherein t is the total time of occurrence of the mutation process, i.e., the time of inducing the protein expression of the target strain and the control strain of the present invention, in combination with formulas (3) and (5), the base ratio of the mutation is finally expressed as:
x=1-exp(-K 3 ·θ·t),(K 3 =[RNAP]·[DNA all ]·K 1 ·K 2 ·k TIC1 constant) (7)
The process of expressing protein by transcription and translation of DNA can be written as the following reaction group according to the central rule:
Figure BDA0003515305020000092
Figure BDA0003515305020000093
Figure BDA0003515305020000094
wherein mRNA represents messenger RNA and Protein represents Protein. k is a radical of transcription Is the transcription rate constant, k translation To translation Rate, γ mRNA Is the mRNA decomposition rate, γ Protein The protein degradation rate. According to published work by the Xieliang et al in 2006 (DOI:10.1103/PhysRevLett.97.168302), the average concentration of the bacterial intracellular proteins was
Figure BDA0003515305020000095
The transcription process can be considered as a zero order reaction mediated by LasR-TIC, and the transcription rate constant can be expressed as:
k transcription =k TIC2 ·[LasR-TIC] (9)
where k is TIC2 Is a constant. According to formulae (8), (9), combining formulae (3) and (5) to give
[Protein]=K 4 ·θ (10)
Wherein K 4 =k translation ·k TIC2 ·[RNAP]·DNA all ·K 1 ·K 2 /(γ mRNA ·γ Protein ) And is constant.
Further, the relationship between the intracellular concentration [ Protein ] of a target Protein transcribed and expressed by a transcription factor Protein-targeted promoter DNA and the mutation ratio x on the corresponding promoter DNA can be expressed as follows:
Figure BDA0003515305020000096
as can be seen from the formula (11), the mutation ratio of the transcription factor protein-binding target promoter DNA is in a negative exponential relationship with the concentration of the protein expressed, and after an indefinite period of time, the mutation ratio tends to 1.
Based on examples 1 and 4 of the present invention, the theoretical curve of the mutation ratio (y) of the target promoter DNA to which the transcription factor protein LasR binds and the concentration (x) of the protein expressed thereby was determined to be y ═ 1-exp (-K ×), K ═ 2.075.
In this example, LasR-binding promoters with different transcription levels were further constructed, and the concentration of promoter-expressed proteins was characterized by fluorescent proteins, and the frequency of mutations at corresponding times was calculated from the results of first-generation sequencing, as shown in fig. 4. The theoretical curve can be well fitted to the experimental data, which indicates that the predicted relationship between the transcription level of the promoter DNA and the ratio of the transcription factor protein to its mutation is correct.
Example 5
Further based on the promoters found in example 3, these sequences were stored in bulk in text documents, followed by the bioinformatics tool MEME Suite (R) ((R))Introduction-MEME Suite(meme-suite.org)) Conserved DNA sequences to which LasR binds are shown in FIG. 5.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
The invention relates to amino acid and nucleotide sequences as follows:
amino acid sequence of the transcription factor protein LasR for testing (SEQ ID NO.1)
MALVDGFLELERSSGKLEWSAILQKMASDLGFSKILFGLLPKDSQDYENAFIVGNYPAAWREHYDRAGYARVDPTVSHCTQSVLPIFWEPSIYQTRKQHEFFEEASAAGLVYGLTMPLHGARGELGALSLSVEAENRAEANRFMESVLPTLWMLKDYALQSGAGLAFEHPVSKPVVLTSREKEVLQWCAIGKTSWEISVI CNCSEANVNFHMGNIRRKFGVTSRRVAAIMAVNLGLITL
Nucleotide sequence of transcription factor protein LasR for testing (SEQ ID NO.2)
ATGGCCTTGGTTGACGGTTTTCTTGAGCTGGAACGCTCAAGTGGAAAATTGGAGTGGAGCGCCATCCTGCAGAAGATGGCGAGCGACCTTGGATTCTCGAAGATCCTGTTCGGCCTGTTGCCTAAGGACAGCCAGGACTACGAGAACGCCTTCATCGTCGGCAACTACCCGGCCGCCTGGCGCGAGCATTACGACCGGGCTGGCTACGCGCGGGTCGACCCGACGGTCAGTCACTGTACCCAGAGCGTACTGCCGATTTTCTGGGAACCGTCCATCTACCAGACGCGAAAGCAGCACGAGTTCTTCGAGGAAGCCTCGGCCGCCGGCCTGGTGTATGGGCTGACCATGCCGCTGCATGGTGCTCGCGGCGAACTCGGCGCGCTGAGCCTCAGCGTGGAAGCGGAAAACCGGGCCGAGGCCAACCGTTTCATGGAGTCGGTCCTGCCGACCCTGTGGATGCTCAAGGACTACGCACTGCAGAGCGGTGCCGGACTGGCCTTCGAACATCCGGTCAGCAAACCGGTGGTTCTGACCAGCCGGGAGAAGGAAGTGTTGCAGTGGTGCGCCATCGGCAAGACCAGTTGGGAGATATCGGTTATCTGCAACTGCTCGGAAGCCAATGTGAACTTCCATATGGGAAATATTCGGCGGAAGTTCGGTGTGACCTCCCGCCGCGTAGCGGCCATTATGGCCGTTAATTTGGGTCTTATTACTCTCTGA
Amino acid sequence of cytosine deoxynucleotide deaminase (SEQ ID NO.3)
MDSLLMNRREFLYQFKNVRWAKGRRETYLCYVVKRRDSATSFSLDFGYLRNKNGCHVELLFLRYISDWDLDPGRCYRVTWFISWSPCYDCARHVADFLRGNPNLSLRIFTARLYFCEDRKAEPEGLRRLHRAGVQIAIMTFKDYFYCWNTFVENHGRTFKAWEGLHENSVRLSRQLRRILLPLYEVDDLRDAFRT
Optimized nucleotide sequence of cytosine deoxynucleotide deaminase (SEQ ID NO.4)
ATGGACAGCCTGCTGATGAACCGCCGCGAGTTCCTGTACCAGTTCAAGAACGTGCGCTGGGCCAAGGGCCGCCGCGAGACCTACCTGTGCTACGTGGTGAAGCGCCGCGACAGCGCCACCAGCTTCAGCCTGGACTTCGGCTACCTGCGCAACAAGAACGGCTGCCACGTGGAGCTGCTGTTCCTGCGCTACATCAGCGACTGGGACCTGGACCCGGGCCGCTGCTACCGCGTGACCTGGTTCATCAGCTGGAGCCCGTGCTACGACTGCGCCCGCCACGTGGCCGACTTCCTGCGCGGCAACCCGAACCTGAGCCTGCGCATCTTCACCGCCCGCCTGTACTTCTGCGAGGACCGCAAGGCCGAGCCGGAGGGCCTGCGCCGCCTGCACCGCGCCGGCGTGCAGATCGCCATCATGACCTTCAAGGACTACTTCTACTGCTGGAACACCTTCGTGGAGAACCACGGCCGCACCTTCAAGGCCTGGGAGGGCCTGCACGAGAACAGCGTGCGCCTGAGCCGCCAGCTGCGCCGCATCCTGCTGCCGCTGTACGAGGTGGACGACCTGCGCGACGCCTTCCGCACCTAA
Construction of the nucleotide sequence of pJN105 plasmid vector for cloning, wherein the bold symbol is the arabinose promoter sequence region and the black triangle indicates the position of the target gene insertion (SEQ ID NO.5)
Figure BDA0003515305020000111
Figure BDA0003515305020000121
Figure BDA0003515305020000131
SEQUENCE LISTING
<110> Shenzhen advanced technology research institute of Chinese academy of sciences
<120> method for detecting binding position of protein and deoxyribonucleotide in situ
<130> CP122010004C
<160> 5
<170> PatentIn version 3.3
<210> 1
<211> 239
<212> PRT
<213> Artificial sequence
<400> 1
Met Ala Leu Val Asp Gly Phe Leu Glu Leu Glu Arg Ser Ser Gly Lys
1 5 10 15
Leu Glu Trp Ser Ala Ile Leu Gln Lys Met Ala Ser Asp Leu Gly Phe
20 25 30
Ser Lys Ile Leu Phe Gly Leu Leu Pro Lys Asp Ser Gln Asp Tyr Glu
35 40 45
Asn Ala Phe Ile Val Gly Asn Tyr Pro Ala Ala Trp Arg Glu His Tyr
50 55 60
Asp Arg Ala Gly Tyr Ala Arg Val Asp Pro Thr Val Ser His Cys Thr
65 70 75 80
Gln Ser Val Leu Pro Ile Phe Trp Glu Pro Ser Ile Tyr Gln Thr Arg
85 90 95
Lys Gln His Glu Phe Phe Glu Glu Ala Ser Ala Ala Gly Leu Val Tyr
100 105 110
Gly Leu Thr Met Pro Leu His Gly Ala Arg Gly Glu Leu Gly Ala Leu
115 120 125
Ser Leu Ser Val Glu Ala Glu Asn Arg Ala Glu Ala Asn Arg Phe Met
130 135 140
Glu Ser Val Leu Pro Thr Leu Trp Met Leu Lys Asp Tyr Ala Leu Gln
145 150 155 160
Ser Gly Ala Gly Leu Ala Phe Glu His Pro Val Ser Lys Pro Val Val
165 170 175
Leu Thr Ser Arg Glu Lys Glu Val Leu Gln Trp Cys Ala Ile Gly Lys
180 185 190
Thr Ser Trp Glu Ile Ser Val Ile Cys Asn Cys Ser Glu Ala Asn Val
195 200 205
Asn Phe His Met Gly Asn Ile Arg Arg Lys Phe Gly Val Thr Ser Arg
210 215 220
Arg Val Ala Ala Ile Met Ala Val Asn Leu Gly Leu Ile Thr Leu
225 230 235
<210> 2
<211> 720
<212> DNA
<213> Artificial sequence
<400> 2
atggccttgg ttgacggttt tcttgagctg gaacgctcaa gtggaaaatt ggagtggagc 60
gccatcctgc agaagatggc gagcgacctt ggattctcga agatcctgtt cggcctgttg 120
cctaaggaca gccaggacta cgagaacgcc ttcatcgtcg gcaactaccc ggccgcctgg 180
cgcgagcatt acgaccgggc tggctacgcg cgggtcgacc cgacggtcag tcactgtacc 240
cagagcgtac tgccgatttt ctgggaaccg tccatctacc agacgcgaaa gcagcacgag 300
ttcttcgagg aagcctcggc cgccggcctg gtgtatgggc tgaccatgcc gctgcatggt 360
gctcgcggcg aactcggcgc gctgagcctc agcgtggaag cggaaaaccg ggccgaggcc 420
aaccgtttca tggagtcggt cctgccgacc ctgtggatgc tcaaggacta cgcactgcag 480
agcggtgccg gactggcctt cgaacatccg gtcagcaaac cggtggttct gaccagccgg 540
gagaaggaag tgttgcagtg gtgcgccatc ggcaagacca gttgggagat atcggttatc 600
tgcaactgct cggaagccaa tgtgaacttc catatgggaa atattcggcg gaagttcggt 660
gtgacctccc gccgcgtagc ggccattatg gccgttaatt tgggtcttat tactctctga 720
<210> 3
<211> 195
<212> PRT
<213> Artificial sequence
<400> 3
Met Asp Ser Leu Leu Met Asn Arg Arg Glu Phe Leu Tyr Gln Phe Lys
1 5 10 15
Asn Val Arg Trp Ala Lys Gly Arg Arg Glu Thr Tyr Leu Cys Tyr Val
20 25 30
Val Lys Arg Arg Asp Ser Ala Thr Ser Phe Ser Leu Asp Phe Gly Tyr
35 40 45
Leu Arg Asn Lys Asn Gly Cys His Val Glu Leu Leu Phe Leu Arg Tyr
50 55 60
Ile Ser Asp Trp Asp Leu Asp Pro Gly Arg Cys Tyr Arg Val Thr Trp
65 70 75 80
Phe Ile Ser Trp Ser Pro Cys Tyr Asp Cys Ala Arg His Val Ala Asp
85 90 95
Phe Leu Arg Gly Asn Pro Asn Leu Ser Leu Arg Ile Phe Thr Ala Arg
100 105 110
Leu Tyr Phe Cys Glu Asp Arg Lys Ala Glu Pro Glu Gly Leu Arg Arg
115 120 125
Leu His Arg Ala Gly Val Gln Ile Ala Ile Met Thr Phe Lys Asp Tyr
130 135 140
Phe Tyr Cys Trp Asn Thr Phe Val Glu Asn His Gly Arg Thr Phe Lys
145 150 155 160
Ala Trp Glu Gly Leu His Glu Asn Ser Val Arg Leu Ser Arg Gln Leu
165 170 175
Arg Arg Ile Leu Leu Pro Leu Tyr Glu Val Asp Asp Leu Arg Asp Ala
180 185 190
Phe Arg Thr
195
<210> 4
<211> 588
<212> DNA
<213> Artificial sequence
<400> 4
atggacagcc tgctgatgaa ccgccgcgag ttcctgtacc agttcaagaa cgtgcgctgg 60
gccaagggcc gccgcgagac ctacctgtgc tacgtggtga agcgccgcga cagcgccacc 120
agcttcagcc tggacttcgg ctacctgcgc aacaagaacg gctgccacgt ggagctgctg 180
ttcctgcgct acatcagcga ctgggacctg gacccgggcc gctgctaccg cgtgacctgg 240
ttcatcagct ggagcccgtg ctacgactgc gcccgccacg tggccgactt cctgcgcggc 300
aacccgaacc tgagcctgcg catcttcacc gcccgcctgt acttctgcga ggaccgcaag 360
gccgagccgg agggcctgcg ccgcctgcac cgcgccggcg tgcagatcgc catcatgacc 420
ttcaaggact acttctactg ctggaacacc ttcgtggaga accacggccg caccttcaag 480
gcctgggagg gcctgcacga gaacagcgtg cgcctgagcc gccagctgcg ccgcatcctg 540
ctgccgctgt acgaggtgga cgacctgcgc gacgccttcc gcacctaa 588
<210> 5
<211> 5970
<212> DNA
<213> Artificial sequence
<400> 5
gcccggaatg ccgggctggc tgggcggctc ctcgccgggg ccggtcggta gttgctgctc 60
gcccggatac agggtcggga tgcggcgcag gtcgccatgc cccaacagcg attcgtcctg 120
gtcgtcgtga tcaaccacca cggcggcact gaacaccgac aggcgcaact ggtcgcgggg 180
ctggccccac gccacgcggt cattgaccac gtaggccgac acggtgccgg ggccgttgag 240
cttcacgacg gagatccagc gctcggccac caagtccttg actgcgtatt ggaccgtccg 300
caaagaacgt ccgatgagct tggaaagtgt cttctggctg accaccacgg cgttctggtg 360
gcccatctgc gccacgaggt gatgcagcag cattgccgcc gtgggtttcc tcgcaataag 420
cccggcccac gcctcatgcg ctttgcgttc cgtttgcacc cagtgaccgg gcttgttctt 480
ggcttgaatg ccgatttctc tggactgcgt ggccatgctt atctccatgc ggtagggtgc 540
cgcacggttg cggcaccatg cgcaatcagc tgcaactttt cggcagcgcg acaacaatta 600
tgcgttgcgt aaaagtggca gtcaattaca gattttcttt aacctacgca atgagctatt 660
gcggggggtg ccgcaatgag ctgttgcgta cccccctttt ttaagttgtt gatttttaag 720
tctttcgcat ttcgccctat atctagttct ttggtgccca aagaagggca cccctgcggg 780
gttcccccac gccttcggcg cggctccccc tccggcaaaa agtggcccct ccggggcttg 840
ttgatcgact gcgcggcctt cggccttgcc caaggtggcg ctgccccctt ggaacccccg 900
cactcgccgc cgtgaggctc ggggggcagg cgggcgggct tcgccttcga ctgcccccac 960
tcgcataggc ttgggtcgtt ccaggcgcgt caaggccaag ccgctgcgcg gtcgctgcgc 1020
gagccttgac ccgccttcca cttggtgtcc aaccggcaag cgaagcgcgc aggccgcagg 1080
ccggaggctt ttccccagag aaaattaaaa aaattgatgg ggcaaggccg caggccgcgc 1140
agttggagcc ggtgggtatg tggtcgaagg ctgggtagcc ggtgggcaat ccctgtggtc 1200
aagctcgtgg gcaggcgcag cctgtccatc agcttgtcca gcagggttgt ccacgggccg 1260
agcgaagcga gccagccggt ggccgctcgc ggccatcgtc cacatatcca cgggctggca 1320
agggagcgca gcgaccgcgc agggcgaagc ccggagagca agcccgtagg gcgccgcagc 1380
cgccgtaggc ggtcacgact ttgcgaagca aagtctagtg agtatactca agcattgagt 1440
ggcccgccgg aggcaccgcc ttgcgctgcc cccgtcgagc cggttggaca ccaaaaggga 1500
ggggcaggca tggcggcata cgcgatcatg cgatgcaaga agctggcgaa aatgggcaac 1560
gtggcggcca gtctcaagca cgcctaccgc gagcgcgaga cgcccaacgc tgacgccagc 1620
aggacgccag agaacgagca ctgggcggcc agcagcaccg atgaagcgat gggccgactg 1680
cgcgagttgc tgccagagaa gcggcgcaag gacgctgtgt tggcggtcga gtacgtcatg 1740
acggccagcc cggaatggtg gaagtcggcc agccaagaac agcaggcggc gttcttcgag 1800
aaggcgcaca agtggctggc ggacaagtac ggggcggatc gcatcgtgac ggccagcatc 1860
caccgtgacg aaaccagccc gcacatgacc gcgttcgtgg tgccgctgac gcaggacggc 1920
aggctgtcgg ccaaggagtt catcggcaac aaagcgcaga tgacccgcga ccagaccacg 1980
tttgcggccg ctgtggccga tctagggctg caacggggca tcgagggcag caaggcacgt 2040
cacacgcgca ttcaggcgtt ctacgaggcc ctggagcggc caccagtggg ccacgtcacc 2100
atcagcccgc aagcggtcga gccacgcgcc tatgcaccgc agggattggc cgaaaagctg 2160
ggaatctcaa agcgcgttga gacgccggaa gccgtggccg accggctgac aaaagcggtt 2220
cggcaggggt atgagcctgc cctacaggcc gccgcaggag cgcgtgagat gcgcaagaag 2280
gccgatcaag cccaagagac ggcccgagac cttcgggagc gcctgaagcc cgttctggac 2340
gccctggggc cgttgaatcg ggatatgcag gccaaggccg ccgcgatcat caaggccgtg 2400
ggcgaaaagc tgctgacgga acagcgggaa gtccagcgcc agaaacaggc ccagcgccag 2460
caggaacgcg ggcgcgcaca tttccccgaa aagtgccacc tggcggcgtt gtgacaattt 2520
accgaacaac tccgcggccg ggaagccgat ctcggcttga acgaattgtt aggtggcggt 2580
acttgggtcg atatcaaagt gcatcacttc ttcccgtatg cccaactttg tatagagagc 2640
cactgcggga tcgtcaccgt aatctgcttg cacgtagatc acataagcac caagcgcgtt 2700
ggcctcatgc ttgaggagat tgatgagcgc ggtggcaatg ccctgcctcc ggtgctcgcc 2760
ggagactgcg agatcataga tatagatctc actacgcggc tgctcaaacc tgggcagaac 2820
gtaagccgcg agagcgccaa caaccgcttc ttggtcgaag gcagcaagcg cgatgaatgt 2880
cttactacgg agcaagttcc cgaggtaatc ggagtccggc tgatgttggg agtaggtggc 2940
tacgtctccg aactcacgac cgaaaagatc aagagcagcc cgcatggatt tgacttggtc 3000
agggccgagc ctacatgtgc gaatgatgcc catacttgag ccacctaact ttgttttagg 3060
gcgactgccc tgctgcgtaa catcgttgct gctgcgtaac atcgttgctg ctccataaca 3120
tcaaacatcg acccacggcg taacgcgctt gctgcttgga tgcccgaggc atagactgta 3180
caaaaaaaca gtcataacaa gccatgaaaa ccgccactgc gccgttacca ccgctgcgtt 3240
cggtcaaggt tctggaccag ttgcgtgagc gcatacgcta cttgcattac agtttacgaa 3300
ccgaacaggc ttatgtcaac tgggttcgtg ccttcatccg tttccacggt gtgcgtccat 3360
gggcaaatat tatacgcaag gcgacaaggt gctgatgccg ctggcgattc aggttcatca 3420
tgccgtttgt gatggcttcc atgtcggcag aatgcttaat gaattacaac agtttttatg 3480
catgcgccca atacgcaaac cgcctctccc cgcgcgttgg ccgattcatt aatgcagctg 3540
gcacgacagg tttcccgact ggaaagcggg cagtgagcgc aacgcaatta atgtgagtta 3600
gctcactcat taggcacccc aggctttaca ctttatgctt ccggctcgta tgttgtgtgg 3660
aattgtgagc ggataacaat ttcacacagg aaacagctat gaccatgatt acgccaagcg 3720
cgcaattaac cctcactaaa gggaacaaaa gctgggtacc gggccccccc tcgaggtcga 3780
cggtatcgat gcataatgtg cctgtcaaat ggacgaagca gggattctgc aaaccctatg 3840
ctactccgtc aagccgtcaa ttgtctgatt cgttaccaat tatgacaact tgacggctac 3900
atcattcact ttttcttcac aaccggcacg gaactcgctc gggctggccc cggtgcattt 3960
tttaaatacc cgcgagaaat agagttgatc gtcaaaacca acattgcgac cgacggtggc 4020
gataggcatc cgggtggtgc tcaaaagcag cttcgcctgg ctgatacgtt ggtcctcgcg 4080
ccagcttaag acgctaatcc ctaactgctg gcggaaaaga tgtgacagac gcgacggcga 4140
caagcaaaca tgctgtgcga cgctggcgat atcaaaattg ctgtctgcca ggtgatcgct 4200
gatgtactga caagcctcgc gtacccgatt atccatcggt ggatggagcg actcgttaat 4260
cgcttccatg cgccgcagta acaattgctc aagcagattt atcgccagca gctccgaata 4320
gcgcccttcc ccttgcccgg cgttaatgat ttgcccaaac aggtcgctga aatgcggctg 4380
gtgcgcttca tccgggcgaa agaaccccgt attggcaaat attgacggcc agttaagcca 4440
ttcatgccag taggcgcgcg gacgaaagta aacccactgg tgataccatt cgcgagcctc 4500
cggatgacga ccgtagtgat gaatctctcc tggcgggaac agcaaaatat cacccggtcg 4560
gcaaacaaat tctcgtccct gatttttcac caccccctga ccgcgaatgg tgagattgag 4620
aatataacct ttcattccca gcggtcggtc gataaaaaaa tcgagataac cgttggcctc 4680
aatcggcgtt aaacccgcca ccagatgggc attaaacgag tatcccggca gcaggggatc 4740
attttgcgct tcagccatac ttttcatact cccgccattc agagaagaaa ccaattgtcc 4800
atattgcatc agacattgcc gtcactgcgt cttttactgg ctcttctcgc taaccaaacc 4860
ggtaaccccg cttattaaaa gcattctgta acaaagcggg accaaagcca tgacaaaaac 4920
gcgtaacaaa agtgtctata atcacggcag aaaagtccac attgattatt tgcacggcgt 4980
cacactttgc tatgccatag catttttatc cataagatta gcggatccta cctgacgctt 5040
tttatcgcaa ctctctactg tttctccata cccgtttttt tgggctagcg aattcataac 5100
actggccgtc gttttacaac gtcgtgactg ggaaaaccct ggcgttaccc aacttaatcg 5160
ccttgcagca catccccctt tcgccagctg gcgtaatagc gaagaggccc gcaccgatcg 5220
cccttcccaa cagttgcgca gcctgaatgg cgaatggaaa ttgtaagcgt taatattttg 5280
ttaaaattcg cgttaaattt ttgttaaatc agctcatttt ttaaccaata ggccgactgc 5340
gatgagtggc agggcggggc gtaatttttt taaggcagtt attggtgccc ttaaacgcct 5400
ggtgctacgc ctgaataagt gataataagc ggatgaatgg cagaaattcg aaagcaaatt 5460
cgacccggtc gtcggttcag ggcagggtcg ttaaatagcc gcttatgtct attgctggtt 5520
taccggttta ttgactaccg gaagcagtgt gaccgtgtgc ttctcaaatg cctgaggcca 5580
gtttgctcag gctctccccg tggaggtaat aattgacgat atgatcattt attctgcctc 5640
ccagagcctg ataaaaacgg tgaatccgtt agcgaggtgc cgccggcttc cattcaggtc 5700
gaggtggccc ggctccatgc accgcgacgc aacgcgggga ggcagacaag gtatagggcg 5760
gcgaggcggc tacagccgat agtctggaac agcgcactta cgggttgctg cgcaacccaa 5820
gtgctaccgg cgcggcagcg tgacccgtgt cggcggctcc aacggctcgc catcgtccag 5880
aaaacacggc tcatcgggca tcggcaggcg ctgctgcccg cgccgttccc attcctccgt 5940
ttcggtcaag gctggcaggt ctggttccat 5970

Claims (10)

1. A method for detecting the binding position of a protein and deoxyribonucleotides in situ, comprising:
1) constructing a target strain for expressing the cytosine deoxynucleotide deaminase-target protein fusion protein and a control strain for expressing the target protein;
the target strain and the control strain do not contain uracil DNA glycosidase genes;
2) after inducing the protein expression of the target strain and the control strain, extracting the genome DNA of the target strain and the control strain;
3) and performing high-throughput sequencing on the genomic DNA of the target strain and the genomic DNA of the control strain, analyzing to obtain point mutations of the genomic DNA of the target strain and the genomic DNA of the control strain, removing the point mutations common to the target strain and the control strain, and screening the point mutations in the non-coding region from the rest point mutations of the target strain to obtain the target protein binding sites.
2. The method according to claim 1, wherein the genomic DNAs of the target strain and the control strain in step 3) are respectively provided with n groups for high-throughput sequencing, and the target protein binding sites are further obtained, and the method specifically comprises the following steps: performing high-throughput sequencing on the genomic DNA of n groups of target strains and n groups of control strains, analyzing to obtain point mutations of the genomic DNA of the target strains and the genomic DNA of the control strains, screening the common point mutations of the n groups of target strains from the rest point mutations of the target strains after removing the common point mutations of the target strains and the control strains, and screening the common point mutations in non-coding regions to obtain target protein binding sites;
n is more than or equal to 2 and less than or equal to 5.
3. The method of claim 1, wherein the target strain expressing the cytosine deoxynucleotide deaminase-target protein fusion protein is constructed by a method comprising: constructing a cytosine deoxynucleotide deaminase-target protein fusion protein expression vector, and then introducing the fusion protein expression vector into a strain containing no uracil DNA glycosidase gene;
the construction method of the control strain reaching the target protein comprises the following steps: a target protein expression vector was constructed, and then the target protein expression vector was introduced into a strain containing no uracil DNA glycosidase gene.
4. The method of claim 3, wherein in the cytosine deoxynucleotide deaminase-target protein fusion protein expression vector, the cytosine deoxynucleotide deaminase is located at the nitrogen terminus of the fusion protein, the target protein is located at the carbon terminus of the fusion protein, and the cytosine deoxynucleotide deaminase and the target protein are linked by 8 glycines.
5. The method of claim 1, wherein the inducing protein expression of the target strain and the control strain comprises: recovering bacterial strains, selecting clone bacterial plaques, and shaking the bacterial strains for 6-20 hours at 37 ℃ by using an LB culture medium and adding an inducer;
preferably, the inducer is arabinose;
preferably, the formulation of the LB medium is: 10g/L of sodium chloride, 5g/L of yeast powder and 10g/L of peptone.
6. The method of claim 1, wherein the protein of interest is a transcription factor.
7. The method of claim 1, further comprising calculating the conserved DNA sequence bound by the target protein, comprising the steps of:
with the target protein binding site obtained in the step 3) as a center, respectively extending m base pairs at the upstream and the downstream of the genome DNA to obtain a DNA fragment sequence containing the target protein binding site; m is more than or equal to 10 and less than or equal to 200;
and storing the DNA fragment sequence containing the target protein binding site into a text document, and calculating the conserved DNA sequence bound by the target protein by using a bioinformatics tool MEME Suite.
8. A method for predicting the intracellular concentration of a protein transcriptionally expressed from a promoter, comprising: mutating a transcription factor-targeted promoter based on the method of claim 1, and predicting the intracellular concentration of a protein transcriptionally expressed by the promoter according to the following formula;
x=1-exp(-K 5 ·[Protein]·t)
wherein the content of the first and second substances,
x represents the proportion of the mutant DNA on the transcription factor target promoter;
[ Protein ] represents the intracellular concentration of a Protein expressed by transcription of a promoter;
t represents the total time of occurrence of the mutation process;
K 5 denotes the equilibrium constant, K 5 Is represented as follows:
Figure FDA0003515305010000021
k TIC1 、k TIC2 is a constant number k translation To translation Rate, γ mRNA Is the mRNA decomposition rate, gamma Protein The protein degradation rate.
9. The prediction method of claim 8, wherein the ratio x of the mutated DNA on the transcription factor target promoter is expressed as follows:
x=1-exp(-k m ·t)
wherein the content of the first and second substances,
k m represents the mutation rate of the transcription factor binding site;
t represents the total time of occurrence of the mutation process.
10. The prediction method of claim 8, wherein the proportion x of the mutated DNA on the transcription factor target promoter is represented as follows:
x=1-exp(-K 3 ·θ·t)
wherein the content of the first and second substances,
θ represents the ratio occupied by the DNA bound to the transcription factor protein, and θ is expressed as follows:
Figure FDA0003515305010000031
[LasR]represents the concentration of the transcription factor protein, k d Represents an equilibrium constant;
k 3 is a constant number, k 3 Is represented as follows:
K 3 =[RNAP]·[DNA all ]·K 1 ·K 2 ·k TIC1
[RNAP]indicates the concentration of RNA polymerase, [ DNA ] all ]Indicates the total target DNA concentration, K, in the bacterial cell 1 And K 2 The equilibrium constants in the process of recruiting ribonucleic acid polymerase by the following transcription factors are respectively:
Figure FDA0003515305010000032
RNAP denotes RNA polymerase, LasR denotes transcription factor protein, RNAP-LasR 2 DNA represents a complex of RNA polymerase with transcription factors and DNA, and LasR-TIC represents a transcription initiation complex in an open loop state to which transcription factor proteins are bound;
preferably, the θ is determined using the method of claim 1.
CN202210163961.9A 2022-02-22 2022-02-22 Method for in-situ detection of binding position of protein and deoxyribonucleotide Pending CN115094127A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210163961.9A CN115094127A (en) 2022-02-22 2022-02-22 Method for in-situ detection of binding position of protein and deoxyribonucleotide
PCT/CN2022/140081 WO2023160163A1 (en) 2022-02-22 2022-12-19 Method for detecting binding position of protein and deoxyribonucleotide in situ

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210163961.9A CN115094127A (en) 2022-02-22 2022-02-22 Method for in-situ detection of binding position of protein and deoxyribonucleotide

Publications (1)

Publication Number Publication Date
CN115094127A true CN115094127A (en) 2022-09-23

Family

ID=83287479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210163961.9A Pending CN115094127A (en) 2022-02-22 2022-02-22 Method for in-situ detection of binding position of protein and deoxyribonucleotide

Country Status (2)

Country Link
CN (1) CN115094127A (en)
WO (1) WO2023160163A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023160163A1 (en) * 2022-02-22 2023-08-31 中国科学院深圳先进技术研究院 Method for detecting binding position of protein and deoxyribonucleotide in situ
WO2024065721A1 (en) * 2022-09-30 2024-04-04 Peking University Methods of determining genome-wide dna binding protein binding sites by footprinting with double stranded dna deaminase

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102482639B (en) * 2009-04-03 2016-01-06 医学研究会 Activation induction cytidine deaminase (AID) mutant and using method
CN114380922A (en) * 2016-06-15 2022-04-22 中国科学院上海营养与健康研究所 Fusion protein for generating point mutation in cell, preparation and application thereof
CN115094127A (en) * 2022-02-22 2022-09-23 中国科学院深圳先进技术研究院 Method for in-situ detection of binding position of protein and deoxyribonucleotide

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023160163A1 (en) * 2022-02-22 2023-08-31 中国科学院深圳先进技术研究院 Method for detecting binding position of protein and deoxyribonucleotide in situ
WO2024065721A1 (en) * 2022-09-30 2024-04-04 Peking University Methods of determining genome-wide dna binding protein binding sites by footprinting with double stranded dna deaminase

Also Published As

Publication number Publication date
WO2023160163A1 (en) 2023-08-31

Similar Documents

Publication Publication Date Title
Rombel et al. ORF-FINDER: a vector for high-throughput gene identification
CN115094127A (en) Method for in-situ detection of binding position of protein and deoxyribonucleotide
AU2022203184A1 (en) Sequencing controls
Qi et al. A one-step PCR-based method for rapid and efficient site-directed fragment deletion, insertion, and substitution mutagenesis
Robert-Le Meur et al. Polynucleotide phosphorylase of Escherichia coli induces the degradation of its RNase III processed messenger by preventing its translation
CN113136374A (en) Preparation and application of recombinant mutant Tn5 transposase
Isono et al. Mutations affecting the structural genes and the genes coding for modifying enzymes for ribosomal proteins in Escherichia coli
Hughes et al. High-throughput screening of cellulase F mutants from multiplexed plasmid sets using an automated plate assay on a functional proteomic robotic workcell
EP2209903A1 (en) Promoter detection and analysis
CN108165551B (en) Improved promoter, T vector composed of improved promoter and application of improved promoter
CN112899296A (en) Transposase screening report vector and preparation method and application thereof
Huttanus et al. Targeted mutagenesis and high-throughput screening of diversified gene and promoter libraries for isolating gain-of-function mutations
CN108060168B (en) Improved promoter, T vector composed of improved promoter and application of improved promoter
Contursi et al. Identification and autonomous replication capability of a chromosomal replication origin from the archaeon Sulfolobus solfataricus
CN108728477B (en) Efficient transposition mutation system and construction method
CN111019922B (en) Mutant restriction enzyme BsaI and preparation method and application thereof
CN114606234A (en) Japanese eel aromatase cyp19a1 gene promoter and application thereof
CN107603979B (en) Promoter-like gene for efficiently expressing foreign protein and application thereof
CN107475257B (en) Promoter-like gene for efficiently promoting expression of foreign protein and application thereof
Bauer et al. Vectors for determining the differential expression of genes in heterocysts and vegetative cells of Anabaena sp. strain PCC 7120
Marsch-Moreno et al. pTn5cat: a Tn5-derived genetic element to facilitate insertion mutagenesis, promoter probing, physical mapping, cloning, and marker exchange in phytopathogenic and other Gram-negative bacteria
CN107012231B (en) The SNP marker and its application of phenomenon assessment are penetrated into for rhesus macaque genetic background in machin
CN116751792B (en) Transcription factor downstream gene screening method
Séraphin Identification of transiently interacting proteins and of stable protein complexes
CA2375821A1 (en) Novel vectors for improving cloning and expression in low copy number plasmids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination