CN111696627B - Design method of long-chain RNA specific probe - Google Patents

Design method of long-chain RNA specific probe Download PDF

Info

Publication number
CN111696627B
CN111696627B CN202010225368.3A CN202010225368A CN111696627B CN 111696627 B CN111696627 B CN 111696627B CN 202010225368 A CN202010225368 A CN 202010225368A CN 111696627 B CN111696627 B CN 111696627B
Authority
CN
China
Prior art keywords
probe
probes
long
rna
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010225368.3A
Other languages
Chinese (zh)
Other versions
CN111696627A (en
Inventor
张晓娜
曹群发
庞盼盼
韩峻松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI BIOCHIP CO Ltd
Original Assignee
SHANGHAI BIOCHIP CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI BIOCHIP CO Ltd filed Critical SHANGHAI BIOCHIP CO Ltd
Priority to CN202010225368.3A priority Critical patent/CN111696627B/en
Publication of CN111696627A publication Critical patent/CN111696627A/en
Application granted granted Critical
Publication of CN111696627B publication Critical patent/CN111696627B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the field of biotechnology, in particular to a design method of a long-chain RNA specific probe; the long-chain RNA specific probe can detect the expression abundance or differential expression of at least two long-chain RNAs in mRNA, cyclic RNA or long-chain non-coding RNA. The long-chain RNA specific probe obtained by the design method has high sensitivity and strong specificity, and can be used for simultaneously, rapidly and high-flux detecting the expression abundance of regulatory RNA molecules such as annular RNA, long-chain non-coding RNA and the like of a trace sample and mRNA of an expression protein.

Description

Design method of long-chain RNA specific probe
Technical Field
The invention relates to the technical field of biology, in particular to a design method of a long-chain RNA specific probe. Carrying out
Background
Circular RNAs (circrnas) are a class of novel RNA molecules that are characterized by a covalently closed loop, which are widely found in eukaryotes. The circRNA is derived from an exon or an intron region of a gene and is present in a large amount in mammalian cells. The formation of circRNAs differs from the standard cleavage pattern of linear RNAs by the end-to-end joining of the 5 'end of the donor exon to the 3' end of the acceptor exon, forming a reverse splice site (backsplicing). The existing circRNA formation models mainly include the following (see FIG. 1):
(1) "nested lock driven loop (lariat-driven circularization)" or "exon skipping loop (exo-skip)" as shown in FIG. 1A;
(2) "paired introns drive loop (intron-pairing-driven circularization)" or "directly reverse splice loop (direct backsplicing)", as shown in fig. 1B;
(3) Circular intron RNAs (sirnas) formation patterns as shown in figure 1C;
(4) Depending on the pattern of cyclization of the RNA binding proteins (RNA binding proteins, RBPs), as shown in fig. 1D;
(5) Similar to the variable cyclized mode of variable shear, as shown in fig. 1E.
Prior studies have shown that most of the circrnas are conserved across different species. Meanwhile, the cyclic structure can resist degradation of RNase R and is stable. circRNA is becoming increasingly important due to its specificity of expression and complexity of regulation, as well as its important role in disease development. Just like mirnas and long-chain non-coding RNAs, circrnas have become a new research focus in the RNA field. The current common technical means for detecting the expression abundance of the circular RNA is real-time PCR, but the method has the defect of low research flux, and the research of the circRNA is just started, so that a high-flux reliable detection technology is urgently needed to meet the research requirement.
Long non-coding RNA (lncRNA) refers broadly to linear non-coding RNA greater than 200 bases in length, with overall expression abundance lower than mRNA, poor conservation, and about 40% of lncRNA with polyA tail with stronger tissue expression specificity. It has important functions in transcriptional silencing, transcriptional activation, chromosome modification, nuclear transport, etc. lncRNA has been metaphorically a dark substance in the universe, and in recent years, research has found that it is involved in a variety of biological processes, an important basis for maintaining gene function, and is associated with a variety of complex diseases. The positional relationship between long non-coding RNA and its nearest mRNA can be classified into: antisense long-chain non-coding RNA (Antisense lncRNA), synonymous long-chain non-coding RNA (Sense lncRNA), intronic long-chain non-coding RNA (Intronic lncRNA), intergenic long-chain non-coding RNA (Intergenic lnc RNA), bidirectional long-chain non-coding RNA (Bidirectional lncRNA), enhancer long-chain non-coding RNA (Enhancer lncRNA)
The gene chip technology is a revolutionary technology for carrying out a great deal of gene expression research by using a microarray technology to attach high-density DNA fragments to solid-phase surfaces such as glass slides, silicon wafers and the like in a certain sequence or arrangement mode through a high-speed robot or an in-situ synthesis mode and using fluorescence or biotin-labeled target fragments and the base complementary hybridization principle. Gene chip technology is a leading-edge biotechnology in the field of life sciences that has been developed with the implementation of the human genome project. At present, the classification and diagnosis level of diseases has been further improved, and feature selection technology based on gene chips plays a key role therein. Through development for more than ten years, the gene chip technology is continuously perfect and mature and is widely applied to various fields of life science, but no gene chip and method capable of simultaneously detecting the expression abundance of various long-chain RNAs such as mRNA, annular RNA, long-chain non-coding RNA and the like exist in the prior art.
Disclosure of Invention
In view of the above-described drawbacks of the prior art, an object of the present invention is to provide a method for designing a long-chain RNA-specific probe, which is used to solve the problems of the prior art.
To achieve the above and other related objects, a first aspect of the present invention provides a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the steps of:
s100, designing a probe of a target gene according to a preset probe type, wherein the preset probe type is selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequences with the full-length target sequences of all probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
and if the comparison result of the candidate probe does not accord with the preset value, eliminating the candidate probe.
S400, if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the preset probe type, and continuing to execute the step S200.
If the specific probes reserved by the target gene only comprise part of the types of the preset probes, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the step S200 until all the specific probes reserved by the target gene comprise the probes of the preset types;
Specifically, the S300 meets the preset value as follows: the similarity between the candidate probe and the full-length target sequences of all probes does not exceed a first preset value, and the base length which is continuously the same as that between the candidate probe and the full-length target sequences of all probes does not exceed a second preset value.
The non-compliance with the preset value is: the similarity of the candidate probe to at least one of the full-length target sequences of all probes exceeds a first preset value, or the consecutively identical base length between the candidate probe and at least one of the full-length target sequences of all probes exceeds a second preset value.
Preferably, the sequence used to design the mRNA probe in S100 is selected from the longest transcript sequence of the mRNA of each gene of interest.
Preferably, the sequence used to design the circular RNA probe in S100 is selected from the group consisting of fragments of the reverse splice sequences of circular RNAs of the respective genes of interest.
Preferably, the circular RNA probe in S100 is selected from circular RNA probes in which the binding site for the reverse splice sequence of the target gene is located at the reverse splice site.
Preferably, the specific sequences used in S100 to design the antisense long non-coding RNA or synonymous long non-coding RNA probes are selected from the group consisting of: fragments of non-overlapping regions of antisense long-chain non-coding RNAs or synonymous long-chain non-coding RNAs and mrnas;
The specific sequence for designing the intron long-chain non-coding RNA, the intergenic long-chain non-coding RNA, the bidirectional long-chain non-coding RNA or the enhancer long-chain non-coding RNA probe is selected from the following: fragments of the longest long non-coding RNA in each target gene.
In a second aspect, the invention provides a system for designing long-chain RNA-specific probes, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs in mRNA, circular RNA or long non-coding RNA.
The system comprises:
a design module 1 for designing a probe of a target gene according to a preset probe type selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
an alignment module 2 for aligning the candidate probe sequences with the full-length target sequences of all probes;
the screening module 3 is used for judging whether the comparison results of the candidate probes and the full-length target sequences of all the probes accord with preset values, and if so, the candidate probes are reserved as specific probes; if not, the candidate probe is eliminated.
Iteration module 4: and (3) judging whether the specific probes of each target gene remain or not, if not, redesigning the probes of the target genes as candidate probes according to the preset probe types, and continuously executing the comparison module 2.
If each target gene has reserved specific probes, and the reserved specific probes only comprise part of the preset probe types, redesigning the rest preset probe types of the target genes as candidate probes, and continuing to execute the comparison module 2 until all the specific probes reserved for the target genes comprise the probes of the preset types.
Further, the sequence used to design the mRNA probe is selected from the sequence of the longest transcript of the mRNA of each target gene.
Further, a fragment having a binding site to the longest transcript of the target gene located at the 3' -end of the longest transcript is selected as an mRNA candidate probe for the target gene.
Further, the sequences used to design the circular RNA probes are selected from the group consisting of: reverse splicing sequence of circular RNA of each target gene.
Still further, the circular RNA probe is selected from circular RNA probes having a binding site for a reverse splice sequence of a target gene at the reverse splice site.
Further, the sequences used to design the antisense long non-coding RNA or synonymous long non-coding RNA probes are selected from the group consisting of: sequences of non-overlapping regions of long-chain non-coding RNA and mRNA of a target gene;
the sequence of the long non-coding RNA probe for designing the intron long non-coding RNA, the intergenic long non-coding RNA, the bidirectional long non-coding RNA or the enhancer long non-coding RNA is selected from the following sequences: fragments of the longest long non-coding RNA in each target gene.
In particular, in the screening module 3,
and if the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the continuous same base length between the candidate probe and the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes accords with the preset value.
And if the similarity between a candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a first preset value or the base length which is continuously the same as that between the candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes does not accord with the preset value.
In a third aspect, the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneous detection of two or more long-chain RNAs.
A fourth aspect of the present invention provides a service terminal, the service terminal comprising a processor and a memory; the memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so that the service terminal can realize a method for designing a specific probe for detecting more than two long-chain RNAs simultaneously when executing the computer program.
As described above, the long-chain RNA specific probe, the design method and the gene chip thereof have the following beneficial effects:
1) Through the probe and the gene chip which are specifically designed, the long-chain RNA to be detected can be specifically hybridized and captured by utilizing the hybridization principle of the gene chip without removing linear RNA;
2) Meanwhile, nearly 8 ten thousand annular RNAs, 7 ten thousand long-chain non-coding RNAs and 2 ten thousand mRNAs are detected simultaneously, so that the simultaneous, rapid and high-throughput detection of the expression abundance of regulatory RNA molecules such as annular RNAs, long-chain non-coding RNAs and the like of a trace sample and the mRNAs of the expressed protein is realized.
3) The gene chip technology is better applied to long-chain RNA expression profile analysis, overcomes the defect that various RNAs of the existing transcriptome cannot be detected simultaneously, and provides a technical method for the research of transcription regulation networks.
Drawings
FIG. 1 shows a prior art 5 circRNA formation model, wherein, A is a set-lock driven loop formation or exon skipping loop formation model; b is a paired intron driven loop or directly and reversely spliced into a loop model; panel C shows a model of cRNAs; panel D shows a cyclization model depending on RBPs; figure E is a variable cyclization model.
FIG. 2 is a schematic diagram of six kinds of long-chain non-coding RNAs classified according to the positional relationship of the long-chain non-coding RNAs on the genome in the prior art, wherein a is inter-gene long-chain non-coding RNA, b is intron long-chain non-coding RNA, c is bidirectional long-chain non-coding RNA, d is enhancer long-chain non-coding RNA, e is synonymous long-chain non-coding RNA, and f is antisense long-chain non-coding RNA.
FIG. 3 is a schematic diagram showing six regulatory mechanisms of long non-coding RNA in the prior art.
FIG. 4 shows a flow chart of the overall design of a long-chain RNA-specific probe of the present invention.
FIG. 5 is a schematic diagram showing the design of a long-chain RNA probe of the present invention.
FIG. 6 shows a schematic diagram of a system for designing long-chain RNA-specific probes according to the present invention.
FIG. 7 shows a schematic diagram of a service terminal for designing long-chain RNA-specific probes according to the present invention.
FIG. 8 shows the relative expression values of the chip probe expression values, the probe sensitivity and the specific quantitative PCR verification.
FIG. 9 is a statistical chart showing the signal values of various long-chain RNAs detected by the gene chip according to the embodiment of the present invention.
FIG. 10 shows a scan of a gene chip for detecting abundance of long-chain RNA expression according to an embodiment of the present invention.
Description of the element reference numerals in FIGS. 6 and 7
1. Design module
2. Comparison module
3. Screening module
4. Iteration module
5. Processor and method for controlling the same
6. Memory device
Detailed Description
In the present invention, the term "long-chain RNA" includes mRNA, long-chain non-coding RNA, circRNA, and the like.
The term "probe" refers to a DNA or RNA nucleic acid sequence of known sequence that is complementary to a gene of interest.
The term "specific probe" refers to a probe having strong specificity without interfering with each other when detecting multiple long-chain RNAs simultaneously.
The term "full-length target sequence" refers to the full-length sequence of the RNA in which the fragment that the probe is capable of recognizing is located, the sequence of the RNA given in the sequence database typically being the sequence of the sense strand of DNA.
The term "similarity", a DNA sequence is made up of four base permutations of A, T, C, G, and the degree of similarity of the bases of the two sequences is scored using various scoring schemes available (e.g., a match scoring matrix), the resulting score, i.e., similarity, representing the degree of similarity.
The term "longest transcript" is the mature mRNA that a gene forms by transcription for encoding a protein, and there may be multiple transcripts in a gene due to the different ways in which mRNA is spliced when formed, where the longest sequence transcript is the longest transcript of the gene.
One embodiment of the present invention provides a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the steps of:
s100, designing a probe of a target gene according to a preset probe type, wherein the preset probe type is selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequences with the full-length target sequences of all probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
and if the comparison result of the candidate probe does not accord with the preset value, eliminating the candidate probe.
S400, if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the preset probe type, and continuing to execute the step S200.
If the specific probes reserved by the target gene only comprise part of the types of the preset probes, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the step S200 until all the specific probes reserved by the target gene comprise the probes of the preset types;
Specifically, in step S100, the preset probe types may be two types of mRNA probe and circular RNA probe, or two types of mRNA probe and long-chain non-coding RNA probe, or two types of circular RNA probe and long-chain non-coding RNA probe, or more three types of mRNA probe, circular RNA probe and long-chain non-coding RNA probe.
In the preferred embodiment shown in FIG. 4, the predetermined probe species are three species of mRNA probe, circular RNA probe and long non-coding RNA probe.
Specifically, in step S300:
and if the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the continuous same base length between the candidate probe and the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes accords with the preset value.
And if the similarity between a candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a first preset value or the base length which is continuously the same as that between the candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes does not accord with the preset value.
The judgment of not conforming to the preset value may be determined after the full-length target sequences of a candidate probe and all probes are aligned; the alignment may be stopped when the first full-length target sequence having a similarity exceeding the first predetermined value is found or when the first full-length target sequence having a base length identical to the candidate probe in succession exceeding the second predetermined value is found, and the alignment may be determined not to match the predetermined value.
Further, the first preset value and the second preset value may be changed according to different comparison programs, and in the initial design stage, whether the preset value is set reasonably is required to be verified through experiments, so that the specificity of the probes screened out under the preset value is ensured, for example, the first preset value may be not more than 75% and the second preset value may be not more than 15 by using a Bedtools line comparison program.
In a preferred embodiment as shown in FIG. 4, the sequence used to design the mRNA probe in S100 is selected from the longest transcript sequence of the mRNA of each target gene.
The sequence of the target gene can be confirmed using the prior art, for example, the target gene sequence can be derived from GenBank database.
The longest transcript of the gene of interest can be confirmed using prior art techniques, e.g., the longest transcript can be derived from the Refseq database.
Further, as shown in FIG. 5, an mRNA probe having a binding site to the longest transcript of the target gene located at the 3' -end of the longest transcript is selected as an mRNA candidate probe for the target gene. The 3 'end of the longest transcript generally refers to a fragment within 300 bases of the first base of the 3' end.
The sample to be tested starts reverse transcription from the 3 'end, and the mRNA probe is positioned at the 3' end, so that the detection sensitivity can be improved. In the identification of antisense long-chain non-coding RNA or fragments of non-overlapping regions of synonymous long-chain non-coding RNA and mRNA, the longest transcript of a candidate gene is used as the full-length target sequence of the gene of interest, enabling more accurate identification.
Further, the sequence used to design the circular RNA probe in S100 is selected from the group consisting of: fragments of the reverse splice sequence of the circular RNA.
The reverse splicing sequence refers to a sequence formed by connecting the 5 'end of a splicing donor exon and the 3' end of a splicing acceptor exon of a circular RNA linear sequence end to form a loop.
The circular RNA linear sequences can be obtained using prior art techniques, for example from sequences derived from the circBase, CIRCpedia multiple database, by removing redundancy based on sequence and chromosomal location.
The redundant sequences can be removed using existing software, such as Bedtools.
Further, as shown in FIG. 5, in S100, a circular RNA probe having a binding site to the reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe of the target gene.
The reverse splice site is the only region of the circular RNA that differs from the corresponding linear RNA and is produced by joining the 5 'end of the splice donor exon to the 3' end of the splice acceptor exon end-to-end.
According to the specific splicing mode of circular RNA, reverse splicing, the reverse splicing sequence of circular RNA has a specific reverse splice site (backspring), while linear RNA does not have the site, and therefore a circular RNA probe is designed at the reverse splice site, and the probe can specifically detect circular RNA in a sample.
Further, since long non-coding RNAs have different types, the rules of designing probes for different types of long non-coding RNAs are different, and thus probes need to be designed according to the types of long non-coding RNAs.
In the preferred embodiment as shown in fig. 4, in S100,
the sequences used to design the antisense long non-coding RNA or synonymous long non-coding RNA probes are selected from the group consisting of: fragments of non-overlapping regions of antisense long-chain non-coding RNAs or synonymous long-chain non-coding RNAs and mrnas;
The sequence of the long non-coding RNA probe for designing the intron long non-coding RNA, the intergenic long non-coding RNA, the bidirectional long non-coding RNA or the enhancer long non-coding RNA is selected from the following sequences: fragments of the longest long non-coding RNA of each target gene.
The reason for choosing the above sequences for designing long non-coding RNA probes is: the detection of isogenic mRNA by probes of long non-coding RNA is avoided.
The transcript sequence of long non-coding RNA may be obtained from sequences derived from a database such as Ensembl, NCBI, UCSC, GENECODE, NONCODE by removing redundancy according to sequence and chromosome position.
The probe design may be performed in S100 using existing probe design software, for example, agilent company professional gene chip probe design software earray.
Specifically, in the preferred embodiment shown in FIG. 4, the specific probe design method can design a specific probe for simultaneously detecting mRNA, circular RNA and long non-coding RNA. The design method is as follows: combining the mRNA longest transcript obtained by the method, the circular RNA reverse splicing sequence and the sequence for designing the long-chain non-coding RNA probe into a file, introducing the file into earray software, setting corresponding parameters according to the design principles of the existing probes such as GC proportion, annealing temperature and the like, and then designing the probes, wherein the design parameters of the probes cannot be changed at will until all probes used in the same experiment are designed once the design parameters of the probes are set so as to ensure that all the designed probes have the same hybridization temperature.
Furthermore, because of the complexity of the genome, avoiding hybridization of probes to multiple target sequences requires specific screening using iterative detection: namely, each probe is compared with the full-length target sequences of all probes, and the specificity screening is carried out according to whether the similarity between each candidate probe and the full-length target sequences of all probes is met or not, namely, whether the similarity between each candidate probe and the full-length target sequences of all probes is met or not, and the base length which is continuously the same with the full-length target sequences of all probes is met or not. According to the set specific screening conditions: the similarity is not more than 75%, the length of the continuous identical basic groups between the candidate probes and the full-length target sequence is not more than 15, the candidate probes are regarded as high in specificity when meeting the conditions, the candidate probes are reserved as specific probes, the candidate probes are regarded as poor in specificity when not meeting the conditions, and the candidate probes are abandoned. And judging whether the specific probe is reserved or not, and if the specific probe is not reserved, returning to redesign the probe. If the reserved specific probes exist, whether each target gene in the reserved specific probes has mRNA specific probes, long-chain non-coding RNA specific probes and annular RNA specific probes is continuously judged, if yes, the specific probes of the target gene are designed, and if no, the probes of the other types are continuously designed.
Further, when designing probes by software, one probe may be designed for each probe type of each target gene, or a plurality of probes may be designed as candidate probes. When designing a probe, if the probe meets the condition of specific screening, the probe is directly used as a specific probe. If the probe does not meet the conditions for specific screening, the probe is redesigned. When the probes are redesigned, the number of the software-generated probes can be set to be a plurality of, and other parameters are unchanged. After the software randomly generates a plurality of probes under the condition of meeting the set parameters, the probes are compared with the full-length target sequences of all the probes (including the newly designed probes and the screened specific probes), the probes meeting the specific screening are stored as the specific probes, and the probes not meeting the specific screening are not stored. Any one of the specific probes may be selected for the experiment. If no specific probe has been generated this time, the probe is redesigned again until a specific probe is generated. Based on the existing probe design software, when designing probes for the same sequence, the probes are randomly output in a plurality of probes meeting parameter requirements according to the number of probes output by the requirement, so that when redesigning the probes, different probe sequences can be obtained even if the same sequence and design parameters are adopted.
Further, the length of the specific probe for mRNA, circular RNA or long non-coding RNA is 50-70nt.
For example, 55nt, 60nt, 65nt, 70nt may be used.
Furthermore, two or more than two of the mRNA specific probes, the circular RNA specific probes or the long-chain non-coding RNA specific probes designed by the design method of the specific probes are integrated into a gene chip, and the gene chip can detect any two or three long-chain RNAs in the mRNA, the circular RNA or the long-chain non-coding RNA simultaneously.
In a preferred embodiment, as shown in FIG. 6, a system for designing long-chain RNA-specific probes is provided, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs in mRNA, circular RNA or long non-coding RNA.
The system comprises:
a design module 1 for designing a probe of a target gene according to a preset probe type selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
an alignment module 2 for aligning the candidate probe sequences with the full-length target sequences of all probes;
the screening module 3 is used for judging whether the comparison results of the candidate probes and the full-length target sequences of all the probes accord with preset values, and if so, the candidate probes are reserved as specific probes; if not, the candidate probe is eliminated.
Iteration module 4: and (3) judging whether the specific probes of each target gene remain or not, if not, redesigning the probes of the target genes as candidate probes according to the preset probe types, and continuously executing the comparison module 2.
If each target gene has reserved specific probes, and the reserved specific probes only comprise part of the preset probe types, redesigning the rest preset probe types of the target genes as candidate probes, and continuing to execute the comparison module 2 until all the specific probes reserved for the target genes comprise the probes of the preset types.
Specifically, in the design module 1, the preset probe types may be two types of mRNA probes and circular RNA probes, may be two types of mRNA probes and long-chain non-coding RNA probes, may be two types of circular RNA probes and long-chain non-coding RNA probes, and may be three types of mRNA probes, circular RNA probes and long-chain non-coding RNA probes.
Further, the sequence used to design the mRNA probe is selected from the longest transcript sequence of the mRNA of each target gene.
Further, a fragment selected from the group consisting of a fragment having a binding site to the longest transcript of the target gene located at the 3' -end of the longest transcript is used as an mRNA candidate probe for the target gene.
Further, the sequences used to design the circular RNA probes are selected from the group consisting of: a fragment selected from the reverse splice sequence of the circular RNA.
Further, a circular RNA sequence having a binding site to the reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe of the target gene.
Further, the sequences used to design the antisense long non-coding RNA or synonymous long non-coding RNA probes are selected from the group consisting of: fragments of the target gene that bind to non-overlapping regions of antisense long-chain non-coding RNAs or synonymous long-chain non-coding RNAs and mrnas;
the sequence of the long non-coding RNA probe for designing the intron long non-coding RNA, the intergenic long non-coding RNA, the bidirectional long non-coding RNA or the enhancer long non-coding RNA is selected from the following sequences: fragments of each target gene that bind to the longest sequence of long non-coding RNA.
In particular, in the screening module 3,
and if the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the continuous same base length between the candidate probe and the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes accords with the preset value.
And if the similarity between a candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a first preset value or the base length which is continuously the same as that between the candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes does not accord with the preset value.
Further, the first preset value and the second preset value may be changed according to different comparison programs, and in the initial design stage, it is required to verify whether the preset value is set reasonably, so as to ensure the specificity of the probes screened out under the preset value, for example, the first preset value may be not more than 75%, and the second preset value may be not more than 15.
The candidate probes are subjected to a screening module and an iteration module in order to increase the specificity between probes and target RNAs. The specific probe obtained after screening and iteration can ensure that the annular RNA detection probe and the detection probe of long-chain non-coding RNA and mRNA have high specificity and sensitivity.
It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these modules may all be implemented in software in the form of calls by the processing element; or can be realized in hardware; the method can also be realized in a form of calling software by a processing element, and the method can be realized in a form of hardware by a part of modules. For example, the x module may be a processing element that is set up separately, may be implemented in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. The implementation of the other modules is similar. In addition, all or part of the modules can be integrated together or can be independently implemented. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.
Yet another embodiment of the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneous detection of two or more long-chain RNAs.
Further, the storage medium includes: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
As shown in fig. 8, a further embodiment of the present invention provides a service terminal including a processor 5 and a memory 6; the memory 6 is used for storing a computer program, and the processor 5 is used for executing the computer program stored in the memory 6, so that the service terminal can realize a method for designing a specific probe for detecting more than two long-chain RNAs simultaneously when executing the computer program.
The memory 6 is used for storing a computer program. Preferably, the memory 6 comprises: various media capable of storing program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The processor 5 is connected to the memory 6 for executing the computer program stored in the memory 6, so that the service terminal executes the design method described above.
Preferably, the processor 5 may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field programmable gate arrays (Field Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
Specific probes for simultaneously detecting mRNA, long-chain non-coding RNA and circular RNA of ESR2 gene are designed by taking ESR2 gene as an example, namely three preset probe types are mRNA specific probes, long-chain non-coding RNA specific probes and circular RNA specific probes.
S100: the sequence of ESR2 gene (NM __ 001214902) is found in GenBank, the longest transcript of ESR2 gene (sequence number NM __ 001214902) is found in Refseq database, and the probe for detecting mRNA is designed according to the sequence of ESR2 gene as a candidate probe, and the sequence is as follows:
ATAAAAGAGTTTTGGGAATACACTGAGCTTTGAGTGAAAGAAGCTGCAGTGGCCTCCCTG(SEQ ID NO:1)
the Ensembl database found a probe for long non-coding RNA ENST00000359491, which was transcribed in the same direction as ESR2 gene and also had partial exons overlapping, as a candidate probe, with the following sequence:
ATACCTGAGCAAGTGAAATTAAGAAGGGAATTGAAGCAAATATTCCTGACATCCAAGTGG(SEQ ID NO:2)
The circular RNA hsa_circ_0102409 from ESR2 gene is found in the circBase database and is formed by splicing the 7 th exon to the 12 th exon of the ESR2 gene in a head-tail reverse manner. According to the characteristics of the 5 'end sequence (SEQ ID NO: 3) and the 3' end sequence (SEQ ID NO: 4) of the circular RNA, a probe covering the splicing site is designed and used as a candidate probe, and the sequences are as follows (SEQ ID NO: 5):
GGATGAGGGGAAATGCGTAGAAGGAATTCTGGAAATCTTTGACATGCTCCTGGCAACTACTTCAAGGTTTCGAGAGTTAAAACTCCAACACAAAGAATATCTCTGTGTCAAGGCCATGATCCTGCTCAATTCCA(SEQ ID NO:3)
CCATTATACTTGCCCACGAATCTTTGAGAACATTATAATGACCTTTGTGCCTCTTCTTGCAAGGTGTTTTCTCAGCTGTTATCTCAAGACATGGATATAAAAAACTCACCATCTAGCCTTAATTCTCCTTCCTCCTACAACTGCAGTCAATCCATCTTACCCCTGGAGCACGGCTCCATATACATACCTTCCTCCTATGTAGACAGCCACCATGAATATCCAGCCATGACATTCTATAGCCCTGCTGTGATGAATTACAGCATTCCCAGCAATGTCACTAACTTGGAAGGTGGGCCTGGTCGGCAGACCACAAGCCCAAATGTGTTGTGGCCAACACCTGGGCACCTTTCTCCTTTAGTGGTCCATCGCCAGTTATCACATCTGTATGCGGAACCTCAAAAGAGTCCCTGGTGTGAAGCAAGATCGCTAGAACACACCTTACCTGTAAACAG(SEQ ID NO:4)
GCCATGATCCTGCTCAATTCCACCATTATACTTGCCCACGAATCTTTGAGAACATTATAA(SEQ ID NO:5)
the probe can specifically detect the circular RNA hsa_circ_0102409.
S200: the candidate probes are aligned with the full-length target sequences of all probes.
S300: the comparison results all accord with preset values and remain as specific probes.
S400: the specific probes comprise all preset probe types, and the design of the specific probes of the gene is completed.
The specific probe design method for detecting mRNA, circular RNA or long non-coding RNA of other target genes in the gene chip is the same as ESR2 gene.
Verifying specificity and sensitivity of specific probes and synthesizing chip
In order to verify the sensitivity and specificity of the probe after screening and iterative detection, quantitative PCR verification is performed by selecting THBS1 gene, long-chain non-coding RNA ENST00000478845 from which introns are derived, and cyclic RNA hsa_circ_0034426 formed by cyclization from 2 nd exon to 7 th exon, and the result is that the change multiples of 3 long-chain RNAs of the THBS1 gene relative to a control group are consistent in chip and quantitative PCR expression change trend as shown in FIG. 7.
The quantitative PCR experiment steps are as follows:
first Strand Synthesis of cDNA
1. RNA was removed from the-80℃refrigerator, thawed at 4℃and then the reaction system was prepared in a 0.2ml PCR tube as follows:
2. the PCR tube was incubated at 37℃for 15min, denatured at 98℃for 5min, and incubated at 4 ℃.
(II) SYBR Green qPCR
1. The reaction mixture (384 well plate) was placed in a 1.5mL centrifuge tube:
2. placing the PCR tube into a PCR instrument for reaction, incubating for 2min at 50 ℃, and then incubating for 10min at 95 ℃; then 40 cycles are performed: 95 ℃ for 15 seconds; 60 ℃,1min, and finally adding a dissolution curve.
And (III) the primer sequences are as follows:
THBS1
an upstream primer: GAACGGGACAACTGCCAGTA (SEQ ID NO: 6)
A downstream primer: ACCTACAGCGAGTCCAGGAT (SEQ ID NO: 7)
ENST00000478845
An upstream primer: TCGCGCATTCTTGGAAGTCT (SEQ ID NO: 8)
A downstream primer: TGCCAGAGGGTGAAAAGCAA (SEQ ID NO: 9)
hsa_circ_0034426
An upstream primer: CTGCAAAAAGGTGTCCTGCC (SEQ ID NO: 10)
A downstream primer: TCAGGAACAGGACGCCTAGT (SEQ ID NO: 11)
After verification, the Agilent company is entrusted to customize a long-chain RNA gene expression abundance high-flux detection chip under strict quality control conditions by using an ink-jet printing chemical in-situ synthesis technology.
Detection of expression abundance of long-chain RNA
The experimental operation comprises the following specific steps:
1. Extracting and purifying total RNA of sample
Trizol extracts total RNA from the sample, and then QIAGENKit (cat 74106) purified total RNA, detailed procedure as follows (see RNeasy Mini Protocol):
1) Mu.g of total RNA was dissolved in 100. Mu.l of RNase free (RNase-free) water, and 350. Mu.l of buffer RLT was added and thoroughly mixed.
2) 250 μl of absolute ethanol was added and mixed well with the tip of the sample addition gun.
3) A total of 700. Mu.l of the total RNA-containing solution was transferred to an RNeasy column packed in a 2ml centrifuge tube, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
4) Mu.l of buffer RW1 was pipetted into an RNeasy mini column and centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
5) Mu.l of DNase I was added to 70. Mu.l of buffer RDD, mixed well, added to the column and left at room temperature for 15min.
6) Mu.l of buffer RW1 was pipetted into an RNeasy mini column and centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
7) Mu.l of buffer RPE was pipetted into an RNeasy mini column, centrifuged at 13200rpm for 15 seconds, the filtrate was discarded and the procedure repeated once.
8) The cannula was replaced with a new one, 13200rpm,2min. And the column was transferred into an elution tube.
9) The RNeasy mini column was transferred into a collection tube.
10 30. Mu.l of RNase free water was aspirated, and the mixture was allowed to stand for 1min at 13200rpm and centrifuged for 1min.
11 30ul of sample in the elution tube was returned to the column and allowed to stand for 1min at 13200rpm for 1min.
12 Nano Drop (Nanodrop ND-1000UV-VIS spectrophotometer) to measure RNA concentration and A260/280.
2. Linear amplification of RNA and labeling of fluorescence cy3
1) Single target Spike-In (RNA Spike-In Kit, one-Color, agilent 5188-5282) was prepared. The spike-in was diluted with dilution buffer at various RNA starting amounts as shown in Table 1:
TABLE 1 RNA spike-in
2) Reverse transcription: the reaction solutions having the compositions shown in table 2 were prepared:
TABLE 2 reverse transcription reaction solution composition
Total RNA 10-200ng 1.5μl
Diluted one-dye spike in 2.0μl
T7 Promoter Primer 0.8μl
Nuclease-free water (white cap) 1.0μl
Total volume of 5.3μl
The temperature is kept for 10min at 65 ℃ on a PCR instrument (MJ PTC-100) and is ice-bathed for 5min. Simultaneously preheating 5X first strand buffer at 80 ℃ for 3min and keeping at room temperature for later use. A reverse transcription mixed solution was prepared, and the specific composition is shown in Table 3:
TABLE 3 composition of reverse transcription mixed solution
5X First Strand Buffer 2.0μl
0.1M DTT 1.0μl
10mM dNTP mix 0.5μl
AffinityScript RNase Block Mix 1.2μl
Total volume of 4.7μl
The 4.7. Mu.l of the above mixed solution for reverse transcription was added to the denatured ice-bath RNA, and the mixture was homogenized, centrifuged, and subjected to PCR reaction. PCR reaction conditions: reacting for 2 hours at 40 ℃; inactivating at 70 ℃ for 15 minutes; the reaction was carried out at 4℃for 5 minutes.
3) Fluorescent markers
A fluorescent labeling mixed solution (Low Input Quick Amp Labeling Kit, one-Color, agilent 5190-2305) was prepared, and the specific compositions are shown in Table 4:
TABLE 4 composition of fluorescent labeling mixed solution
Nuclease-free water 0.75μl
5. Transcription buffer 3.2μl
0.1M DTT 0.6μl
NTP mix 1.0μl
T7 RNA polymerase mixture 0.21μl
Cy3-CTP 0.24μl
Total volume of 6.0μl
Adding the 6.0 mu l of fluorescent labeling mixed solution, uniformly mixing, centrifuging, and carrying out PCR reaction to obtain a fluorescent labeling product. PCR reaction conditions: reacting for 2 hours at 40 ℃; the reaction was carried out at 4℃for 5 minutes.
4) Fluorescent-labeled product purification
A) Add 84. Mu.l of nuclease free water to a total volume of 100. Mu.l.
B) Add 350. Mu.l RLT and mix well.
C) Add 250 μl of absolute ethanol, mix well without centrifugation.
D) 700. Mu.l of mix was transferred to the column. 13000rpm,4℃for 30sec. The flow-through was discarded.
E) Add 500. Mu.l of RPE,13000rpm, centrifuge at 4℃for 30 seconds. The flow-through was discarded.
F) An additional 500. Mu.l of RPE was added and centrifuged at 13000rpm at 4℃for 60 seconds. The flow-through was discarded.
G) The new cannula was replaced, 13000rpm, idle at 4 ℃ for 30 seconds, and the column was transferred to the elution tube.
H) Add 30. Mu.l of nuclease-free water, stand for 1min, centrifuge at 13000rpm,4℃for 30 seconds.
I) The 30. Mu.l sample in the elution tube was returned to the column and allowed to stand for 1min at 13000rpm and centrifuged at 4℃for 30 seconds.
J) RNA concentration, cy3 concentration, A260/280 was measured using NanoDrop.
The requirements for the amount of probe used for purification of the fluorescent-labeled product are shown in Table 5:
TABLE 5 amount of probes used for purification of fluorescently labeled products
1X chip cRNA>5μg Cy3>6pmol/μg
2X chip cRNA>3.75μg Cy3>6pmol/μg
4 x chip cRNA>1.65μg Cy3>6pmol/μg
5x chip cRNA>0.825μg Cy3>6pmol/μg
3. Gene chip hybridization
The purified fluorescent labeling product is hybridized with a probe on the circular RNA gene chip by utilizing the base complementary hybridization principle. The hybridization kit was Gene Expression Hybridization Kit (Agilent 5188-5242). The method comprises the following specific steps:
1) The fragmenting mixed solution was prepared as follows in table 6:
TABLE 6 composition of fragmented mixed solution
Composition of components 1x 2x 4x 8x
Cy3-cRNA 5μg 3.75μg 1.65μg 600ng
10X blockers 50μL 25μL 11μL 5μL
Nuclease-free water Up to 240. Mu.L Up to 120. Mu.L Up to 52.8 mu L Up to 24. Mu.L
25X fragmentation buffer 10μL 5μL 2.2μL 1μL
Total volume of 250μL 125μL 55μL 25μL
2) The temperature is kept at 60 ℃ for 30min, then ice bath is carried out for 1min, and the centrifugation is carried out briefly.
3) An equal volume of 2 XGEx hybridization buffer HI-RPM was added and mixed as shown in Table 7.
TABLE 7 hybridization mixture solution composition
Composition of components 1x 2x 4x 8x
Fragmenting cRNA in a mixed solution 250μL 125μL 55μL 25μL
2 XGEx hybridization buffer HI-RPM 250μL 125μL 55μL 25μL
4) 13000rpm, centrifuged for 1min, and then placed on ice.
5) Hybridization bins (Agilent G2534A) were placed on a horizontal tabletop, a coverslip with gasket was placed, and samples were added at the volumes shown in table 8:
TABLE 8 hybrid loading volumes
Composition of components 1x 2x 4x 8x
Preparation volume 500μL 250μL 110μL 50μL
Hybridization volume 490μL 240μL 100μL 40μL
6) The gene chip with "Agilent" face down was covered on a cover slip and the hybridization chamber was quickly assembled and hybridized in a hybridization oven (Agilent G2545A) at 65℃at 10rpm for 17h.
4. Gene chip washing and scanning
1) Washing 1 and washing 2 were added with 2ml of 10% Triton X-102, and washing 2 was preheated overnight at 37 ℃.
2) The gene chip having completed hybridization was taken out from the hybridization oven, the hybridization bins were disassembled, and the gene chip was washed according to steps 1 to 3 in Table 9:
TABLE 9 Gene chip washing procedure
Operating procedure Lotion composition Temperature (temperature) Washing time
Disassembling piece GE washing liquor 1 Room temperature -
Washing liquid 1 washing GE washing liquor 1 Room temperature 1min
Washing with washing liquor 2 GE washing liquor 2 37℃ 1min
GE wash 1, GE wash 2 from Gene Expression Wash Buffer Kit (Gene expression wash buffer kit, brand: agilent, cat# 5188-5327)
3) The washed gene chip was loaded into a slide holder and scanned by a scanner (Agilent Microarray Scanner G2565 CA). The scan parameters are shown in table 10:
TABLE 10 Gene chip scan parameters
5. Data analysis
After normalizing the original data by limma package in R software, analyzing the expression abundance and the differentially expressed RNA of the long-chain RNA by adopting a Fold-change (expression difference multiple) and T test (Student's T-test) statistical method.
The abundance of expression of the detected mRNA, long non-coding RNA, and circular RNA in the experimental and control groups is shown in FIGS. 9 and 10.
Taking prostate cancer tumor (experimental group) and paracarcinoma (control group) as examples, after gene chip screening and data analysis, 328 mRNAs with 2-fold difference up-regulated in the experimental group and 892 mRNAs with 2-fold difference down-regulated in the control group are obtained through screening, and 1220 mRNAs with 2-fold difference expression are obtained through total screening. 447 long non-coding RNAs up-regulated by 2-fold difference, 840 long non-coding RNAs down-regulated by 2-fold difference, and 1287 long non-coding RNAs up-regulated by 2-fold difference were selected. There were 508 different 2-fold up-regulated circular RNAs, 1706 different 2-fold down-regulated circular RNAs, and a total of 2-fold differentially expressed circular genes 2214 were screened.
In summary, the present invention effectively overcomes the disadvantages of the prior art and has high industrial utility value.
The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.
Sequence listing
<110> Shanghai biochip Co., ltd
<120> method for designing long-chain RNA-specific probe
<160> 11
<170> SIPOSequenceListing 1.0
<210> 1
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 1
ataaaagagt tttgggaata cactgagctt tgagtgaaag aagctgcagt ggcctccctg 60
<210> 2
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 2
atacctgagc aagtgaaatt aagaagggaa ttgaagcaaa tattcctgac atccaagtgg 60
<210> 3
<211> 134
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 3
ggatgagggg aaatgcgtag aaggaattct ggaaatcttt gacatgctcc tggcaactac 60
ttcaaggttt cgagagttaa aactccaaca caaagaatat ctctgtgtca aggccatgat 120
cctgctcaat tcca 134
<210> 4
<211> 452
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 4
ccattatact tgcccacgaa tctttgagaa cattataatg acctttgtgc ctcttcttgc 60
aaggtgtttt ctcagctgtt atctcaagac atggatataa aaaactcacc atctagcctt 120
aattctcctt cctcctacaa ctgcagtcaa tccatcttac ccctggagca cggctccata 180
tacatacctt cctcctatgt agacagccac catgaatatc cagccatgac attctatagc 240
cctgctgtga tgaattacag cattcccagc aatgtcacta acttggaagg tgggcctggt 300
cggcagacca caagcccaaa tgtgttgtgg ccaacacctg ggcacctttc tcctttagtg 360
gtccatcgcc agttatcaca tctgtatgcg gaacctcaaa agagtccctg gtgtgaagca 420
agatcgctag aacacacctt acctgtaaac ag 452
<210> 5
<211> 60
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 5
gccatgatcc tgctcaattc caccattata cttgcccacg aatctttgag aacattataa 60
<210> 6
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 6
gaacgggaca actgccagta 20
<210> 7
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 7
acctacagcg agtccaggat 20
<210> 8
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 8
tcgcgcattc ttggaagtct 20
<210> 9
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 9
tgccagaggg tgaaaagcaa 20
<210> 10
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 10
ctgcaaaaag gtgtcctgcc 20
<210> 11
<211> 20
<212> DNA
<213> Artificial sequence (Artificial Sequence)
<400> 11
tcaggaacag gacgcctagt 20

Claims (14)

1. A method of designing a specific probe for simultaneous detection of two or more long-chain RNAs, comprising the steps of:
s100, designing a long-chain RNA probe of a target gene according to a preset probe type, wherein the preset probe type is selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequences with the full-length target sequences of all probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
If the comparison result of the candidate probe does not accord with the preset value, eliminating the candidate probe;
the coincidence preset value is: the similarity of the candidate probes and the full-length target sequences of all probes does not exceed a first preset value, and the base length which is continuously the same as that of the full-length target sequences of all probes does not exceed a second preset value;
the non-compliance with the preset value is: the similarity of the candidate probe and at least one target sequence in the full-length target sequence of the probe exceeds a first preset value, or the base length continuously identical with at least one target sequence in the full-length target sequence of the probe exceeds a second preset value;
s400, if a target gene has no reserved specific probe, redesigning a long-chain RNA probe of the target gene as a candidate probe according to a preset probe type, and continuing to execute S200;
if the specific probe reserved by the target gene only comprises part of the types of the preset probe types, the other preset probe types of the target gene are redesigned to be used as candidate probes, and the step S200 is continuously executed until all the specific probes reserved by the target gene comprise the probes of the preset types.
2. The method of claim 1, wherein the sequence used to design the mRNA probe in S100 is selected from the group consisting of the mRNA longest transcript sequences of each gene of interest.
3. The method of claim 1, wherein the sequence used to design the circular RNA probe in S100 is selected from the group consisting of fragments of the reverse splice sequence of circular RNA.
4. A method according to claim 3, wherein the circular RNA probe in S100 is selected from circular RNA probes having a binding site for the reverse splice sequence of the target gene at the reverse splice site.
5. The method of claim 1, wherein the sequence used in S100 to design the antisense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: fragments of non-overlapping regions of antisense long-chain non-coding RNAs or synonymous long-chain non-coding RNAs and mrnas; the specific sequence for designing the intron long-chain non-coding RNA, the intergenic long-chain non-coding RNA, the bidirectional long-chain non-coding RNA or the enhancer long-chain non-coding RNA probe is selected from the following: fragments of the longest long non-coding RNA of each target gene.
6. A system for designing long-chain RNA-specific probes, characterized in that the system is used for designing specific probes for simultaneous detection of at least two long-chain RNAs in mRNA, circular RNA or long-chain non-coding RNA by the method of any one of claims 1-5.
7. The system of claim 6, wherein the system comprises:
The design module is used for designing a probe of the target gene according to a preset probe type, and the preset probe type is selected from the group consisting of candidate probes; at least two probes of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
an alignment module for aligning the candidate probe sequences with the full-length target sequences of all probes;
the screening module is used for judging whether the comparison result of the candidate probes and the full-length target sequences of all probes accords with a preset value or not, if so,
retaining the candidate probe as a specific probe; if not, eliminating the candidate probe;
the iteration module is used for judging whether the specific probes reserved for each target gene exist or not,
if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to a preset probe type, and continuously executing the comparison module (2);
if a target gene has reserved specific probes, and the reserved specific probes only comprise part of the preset probe types, redesigning the rest preset probe types of the target gene as candidate probes, and continuing to execute the comparison module (2) until all the specific probes reserved for the target gene comprise the probes of the preset types.
8. The system of claim 6, wherein the sequence used to design the mRNA probe is selected from the group consisting of the mRNA longest transcript sequence of each gene of interest.
9. The system of claim 6, wherein the sequence used to design the circular RNA probe is selected from the group consisting of: fragments of the reverse splice sequence of circular RNAs.
10. The system of claim 9, wherein the circular RNA probe is selected from circular RNA probes having a binding site for a reverse splice sequence of a gene of interest located at the reverse splice site.
11. The system of claim 6, wherein the sequence for designing the antisense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: fragments of non-overlapping regions of antisense long-chain non-coding RNAs or synonymous long-chain non-coding RNAs and mrnas; the sequence of the long non-coding RNA probe for designing the intron long non-coding RNA, the intergenic long non-coding RNA, the bidirectional long non-coding RNA or the enhancer long non-coding RNA is selected from the following sequences: fragments of the longest long non-coding RNA of each target gene.
12. The system of claim 6, wherein, in the screening module 3,
if the similarity between a candidate probe and the full-length target sequences of all probes does not exceed a first preset value, and the continuous same base length between the candidate probe and the full-length target sequences of all probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes accords with the preset value;
And if the similarity between a candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a first preset value or the base length which is continuously the same as that between the candidate probe and at least one target sequence in the full-length target sequences of all probes exceeds a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes does not accord with the preset value.
13. A storage medium having a computer program stored thereon, which when executed by a computer performs the method of any of claims 1-5.
14. A service terminal, characterized in that the service terminal comprises a processor and a memory; the memory is configured to store a computer program, and the processor is configured to execute the computer program stored in the memory, so as to cause the service terminal to perform the method according to any one of claims 1 to 5.
CN202010225368.3A 2020-03-26 2020-03-26 Design method of long-chain RNA specific probe Active CN111696627B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010225368.3A CN111696627B (en) 2020-03-26 2020-03-26 Design method of long-chain RNA specific probe

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010225368.3A CN111696627B (en) 2020-03-26 2020-03-26 Design method of long-chain RNA specific probe

Publications (2)

Publication Number Publication Date
CN111696627A CN111696627A (en) 2020-09-22
CN111696627B true CN111696627B (en) 2024-02-23

Family

ID=72476294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010225368.3A Active CN111696627B (en) 2020-03-26 2020-03-26 Design method of long-chain RNA specific probe

Country Status (1)

Country Link
CN (1) CN111696627B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083516B (en) * 2022-07-13 2023-03-21 北京先声医学检验实验室有限公司 Panel design and evaluation method for detecting gene fusion based on targeted RNA sequencing technology

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5639612A (en) * 1992-07-28 1997-06-17 Hitachi Chemical Company, Ltd. Method for detecting polynucleotides with immobilized polynucleotide probes identified based on Tm
CN105803101A (en) * 2016-05-20 2016-07-27 上海伯豪生物技术有限公司 Probe, gene chip and method for detecting expression abundance of circular RNA
CN106676109A (en) * 2016-12-08 2017-05-17 新疆医科大学第附属医院 ENST00000418539.1, preparation or diagnostic agent or medicine or kit, and application of ENST00000418539.1
WO2018001258A1 (en) * 2016-06-30 2018-01-04 厦门艾德生物医药科技股份有限公司 Probe for nucleic acid enrichment and capture, and design method thereof
CN108342390A (en) * 2018-02-13 2018-07-31 中国科学院苏州生物医学工程技术研究所 Long-chain non-coding RNA for early diagnosing human prostata cancer and preparation, purposes
CN109706269A (en) * 2019-02-06 2019-05-03 浙江农林大学 The multiple linking probe that a variety of fowl respiratory pathogens can be detected expands identification reagent box

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100297622A1 (en) * 2009-05-20 2010-11-25 Honghua Li Method for high-throughput gene expression profile analysis
US10900070B2 (en) * 2015-05-01 2021-01-26 The General Hospital Corporation Multiplex analysis of gene expression in individual living cells

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5639612A (en) * 1992-07-28 1997-06-17 Hitachi Chemical Company, Ltd. Method for detecting polynucleotides with immobilized polynucleotide probes identified based on Tm
CN105803101A (en) * 2016-05-20 2016-07-27 上海伯豪生物技术有限公司 Probe, gene chip and method for detecting expression abundance of circular RNA
WO2018001258A1 (en) * 2016-06-30 2018-01-04 厦门艾德生物医药科技股份有限公司 Probe for nucleic acid enrichment and capture, and design method thereof
CN106676109A (en) * 2016-12-08 2017-05-17 新疆医科大学第附属医院 ENST00000418539.1, preparation or diagnostic agent or medicine or kit, and application of ENST00000418539.1
CN108342390A (en) * 2018-02-13 2018-07-31 中国科学院苏州生物医学工程技术研究所 Long-chain non-coding RNA for early diagnosing human prostata cancer and preparation, purposes
CN109706269A (en) * 2019-02-06 2019-05-03 浙江农林大学 The multiple linking probe that a variety of fowl respiratory pathogens can be detected expands identification reagent box

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
古再丽努尔・阿不都热依木 ; 买尔旦・马合木提 ; 韩静 ; 张萌萌 ; 马玉娇 ; 柳惠斌 ; .透明细胞肾细胞癌长链非编码RNA表达谱分析及初步验证.新疆医科大学学报.2017,(06),全文. *
贾纯琰 ; 季小阳 ; 白雪 ; 戴豪扬 ; 王建蒙 ; 张文广 ; .长链非编码RNA的调控机制及其在家畜中的预测方法.中国畜牧兽医.2017,(07),全文. *

Also Published As

Publication number Publication date
CN111696627A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
Gosai et al. Global analysis of the RNA-protein interaction and RNA secondary structure landscapes of the Arabidopsis nucleus
Thellin et al. A decade of improvements in quantification of gene expression and internal standard selection
AU2021224760A1 (en) Capturing genetic targets using a hybridization approach
Dafforn et al. Linear mRNA amplification from as little as 5 ng total RNA for global gene expression analysis
CA2567735C (en) Rapid production of double-stranded target dna molecules
CN105695572B (en) Method for developing molecular markers in large scale and efficiently based on Indel and SSR site technology
Martin et al. [14] Principles of differential display
CN111808854B (en) Balanced joint with molecular bar code and method for quickly constructing transcriptome library
CN108103206B (en) Intramuscular fat related lncRNA and application thereof
CN107881249B (en) Application of lncRNA and target gene thereof in breeding high-quality livestock and poultry variety
Esumi et al. Method for single-cell microarray analysis and application to gene-expression profiling of GABAergic neuron progenitors
CN111696627B (en) Design method of long-chain RNA specific probe
Walker et al. Long versus short oligonucleotide microarrays for the study of gene expression in nonhuman primates
CN109913458B (en) circRNA and application thereof in detecting hypoxic-ischemic brain injury
CN111192637A (en) Analytical method for lncRNA identification and expression quantification
CN108085399B (en) Novel application of lncRNA and trans-regulatory gene WNT11 thereof
CN114107444A (en) Method for discovering and regulating plant development key regulation factor and application thereof
Sasaki et al. Identification and characterization of human non-coding RNAs with tissue-specific expression
Prawer et al. Pervasive effects of RNA degradation on Nanopore direct RNA sequencing
CN112592981A (en) Primer group, kit and method for DNA archive construction
Faccioli et al. From single genes to co-expression networks: extracting knowledge from barley functional genomics
CN114875118B (en) Methods, kits and devices for determining cell lineage
Oprescu et al. Microarray, IPA and GSEA analysis in mice models
EP3225689B1 (en) Method and device for correcting level of expression of small rna
CN114574569A (en) Terminal transferase-based genome sequencing kit and sequencing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant