WO2021227129A1 - 一种通用型高通量测序接头及其应用 - Google Patents

一种通用型高通量测序接头及其应用 Download PDF

Info

Publication number
WO2021227129A1
WO2021227129A1 PCT/CN2020/092418 CN2020092418W WO2021227129A1 WO 2021227129 A1 WO2021227129 A1 WO 2021227129A1 CN 2020092418 W CN2020092418 W CN 2020092418W WO 2021227129 A1 WO2021227129 A1 WO 2021227129A1
Authority
WO
WIPO (PCT)
Prior art keywords
throughput sequencing
sequence
stranded
sequencing adapter
strand
Prior art date
Application number
PCT/CN2020/092418
Other languages
English (en)
French (fr)
Inventor
曹彦东
周洋
扶媛媛
杨颖�
张丽婷
Original Assignee
北京安智因生物技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京安智因生物技术有限公司 filed Critical 北京安智因生物技术有限公司
Publication of WO2021227129A1 publication Critical patent/WO2021227129A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6858Allele-specific amplification
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms

Definitions

  • the invention relates to the field of gene sequencing, in particular to a universal high-throughput sequencing adapter and its application in the process of sequencing database construction.
  • High-Throughput Sequencing high-throughput sequencing technology
  • NGS Next Generation Sequencing Technology
  • Sanger sequencing is to extend a synthetic short oligonucleotide primer by DNA polymerase, hybridize with a single-stranded DNA template to synthesize new DNA fragments, and separate different fragments by polyacrylamide electrophoresis to read DNA sequences.
  • NGS sequencing usually uses massively parallel sequencing (MPS), which can realize simultaneous sequencing of multiple samples and multiple sites, greatly improving sequencing throughput.
  • MPS massively parallel sequencing
  • high-throughput sequencing technology has been proven to have high accuracy and sensitivity in clinical genetic testing.
  • it is affected by various noises and errors in the library construction process and sequencing process, causing low-frequency mutations in the sequencing results. It is difficult to distinguish its authenticity.
  • the proportion of label hopping can be as high as 2% (illumina.Effects of index misassignment on multiplexing) and downstream[Z]Analysis).
  • NGS sequencing platforms are Ion Torrent and Illumina.
  • different platforms use different technical procedures for library construction, which makes the library not universal, that is, libraries suitable for the Ion Torrent platform cannot be used on the Illumina platform. Generate sequencing data and vice versa. This has greatly restricted the clinical application, so it is necessary to find a universal library and apply different sequencing platforms.
  • sequencing adapter research is the focus of library research.
  • the design of sequencing adapters mainly includes two directions. One is to try to improve the shape of the adapter, such as Y-shaped or U-shaped adapters. In order to reduce or avoid the appearance of adapter dimers and increase the amount of available sequencing data; the other is to add specific molecular tags to the adapter structure to identify errors in the library construction process.
  • sequencing libraries prepared through the above two research directions can still only be used for fixed sequencing platforms, and cannot be used in mainstream sequencing platforms such as Ion Torrent and Illumina at the same time.
  • the Ion Torrent platform and the Illumina platform have different sequencing principles and different methods for constructing sequencing templates.
  • the Ion Torrent platform uses emulsion PCR to construct the sequencing template; the Illumina platform uses bridge amplification or exclusive amplification to construct the sequencing template. .
  • Ion Torrent generally uses straight link heads; Illumina generally uses Y-links for library construction.
  • the technical problem to be solved by the present invention is to overcome the disadvantages that the high-throughput sequencing adapter in the prior art cannot realize the compatibility of different sequencing platforms, and the applicability is not strong.
  • the first objective of the present invention is to seek a universal high-throughput sequencing adapter suitable for multiple sequencing platforms
  • the second objective of the present invention is to seek a method for preparing a universal high-throughput sequencing adapter suitable for multiple sequencing platforms
  • the third objective of the present invention is to seek the application of a universal high-throughput sequencing adapter
  • the fourth objective of the present invention is to seek a method for detecting low-frequency gene mutations.
  • the present invention provides the following technical solutions:
  • the present invention provides a single link head, which is characterized in that the single link heads are connected in sequence
  • the free arm includes a library amplification primer binding region and a carrier binding region;
  • the double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
  • the double-stranded complementary region further includes a tag sequence.
  • the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
  • the tag sequence consists of 6-12 random bases.
  • the length of the free arm of the single link head is 30-56 bp, and the length of the double-stranded complementary region is 40-58 bp.
  • the free arm can be composed of the following sequence:
  • the double-stranded complementary region may be composed of the following sequence:
  • XXXXXX represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the free arm further includes a tag sequence, and the tag sequence is consistent with the double-stranded complementary region tag sequence.
  • the free arm can be composed of the following sequence:
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the present invention also provides a Y-type high-throughput sequencing adapter, characterized in that the sequencing adapter includes a first single strand and a second single strand;
  • the first single strand and the second single strand respectively include:
  • the free arm includes a library amplification primer binding region and a carrier binding region;
  • the double-stranded complementary region contains two or more sequencing primer binding regions of the sequencing platform.
  • the free arm sequences of the first single strand and the second single strand are not complementary, and the first single strand and the second single strand may be annealed to form a Y-shaped structure double strand.
  • the double-stranded complementary region includes a tag sequence, and the tag sequence is located at one end of the double-stranded complementary region away from the free arm.
  • the sequencing platform includes, but is not limited to, Illumina, Ion Torrent, PacBio, Roche, Helicos, and ABI platforms; preferably, the sequencing platform is Ion Torrent and Illumina platforms.
  • the second single-stranded free arm further includes a tag sequence.
  • the tag sequence in the free arm is the same as the tag sequence in the double-stranded complementary region; more preferably, the tag sequence in the free arm is close to the end of the double-stranded complementary region.
  • the length of the double-stranded complementary region of the first single strand and the second single strand is 40-58 bp; the length of the free arm of the first single strand is 30-45 bp, and the length of the free arm of the second single strand is 30-45 bp.
  • the length is 35-56bp; the tag sequence is composed of random bases of 6-12bp.
  • the 3'end of the free arm of the first or second single strand is modified for stability
  • thio modification is carried out
  • the phosphodiester bond between the last 3 bases at the 3'end is replaced by phosphorothioate.
  • the first single-stranded sequence is as follows:
  • Double-stranded complementary region sequence
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the first single strand is connected by a free arm and a double-strand complementary region in a 5'-3' direction sequentially.
  • the second single-stranded sequence is as follows:
  • Double-stranded complementary region sequence
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • the second single-stranded double-stranded complementary region and the free arm are connected in a 5'-3' direction sequentially.
  • the present invention also provides a high-throughput sequencing adapter set, characterized in that the sequencing adapter set includes the above-mentioned high-throughput sequencing adapter.
  • the high-throughput sequencing adapter set further includes another Y-type high-throughput sequencing adapter as follows: the Y-type high-throughput sequencing adapter includes third and fourth single strands;
  • the sequence of the other Y-type high-throughput sequencing adaptor is similar to the sequence of the above-mentioned Y-type high-throughput sequencing adaptor, except that the sequence of the double-stranded complementary region is different;
  • sequence of the double-strand complementary region of the third single-strand is as follows:
  • sequence of the double-strand complementary region of the fourth single-stranded is complementary to the sequence of the third single-stranded double-strand complementary region;
  • the "XXXXXX" represents a tag sequence composed of 6-12 random bases
  • the "N” represents any base of A, T, C, G, or NA (no base).
  • connection sequence between the free arms of the third and fourth single strands and the double-strand complementary region sequence is the same as that of the first and second single strands.
  • the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
  • the first single-stranded sequence (SEQ ID NO.1):
  • the second single-stranded sequence (SEQ ID NO.2):
  • the third single-stranded sequence (SEQ ID NO.3):
  • the fourth single-stranded sequence (SEQ ID NO.4):
  • the single-stranded sequences of the Y-type high-throughput sequencing adapter are as follows:
  • the first single-stranded sequence (SEQ ID NO.5):
  • the second single-stranded sequence (SEQ ID NO.6):
  • the third single-stranded sequence (SEQ ID NO.7):
  • the fourth single-stranded sequence (SEQ ID NO. 8):
  • SEQ ID NO.1-8 in the sequence listing does not contain "XXXXX".
  • the present invention also provides a composition, characterized in that the composition comprises the above-mentioned high-throughput sequencing linker or linker set.
  • the present invention also provides a complex, which is characterized in that the complex is connected to the above-mentioned high-throughput sequencing adapter or adapter set.
  • the present invention also provides a kit, characterized in that the composition comprises the above-mentioned high-throughput sequencing adapter or adapter set.
  • the kit is a high-throughput sequencing library building kit or a gene sequence enrichment kit.
  • the present invention also provides a method for preparing the above-mentioned high-throughput sequencing adapter, which is characterized in that it comprises the following steps:
  • S1 synthesizes the first strand and the second strand single-stranded sequence respectively
  • S2 specifically anneals the two single-stranded sequences of S1 to obtain the high-throughput sequencing adapter.
  • the present invention also provides a method for constructing a sequencing library, which is characterized in that:
  • S1 prepares the target fragment of the sample to be tested
  • S2 connects the aforementioned high-throughput sequencing adapter or adapter set to the target fragment of S1 to obtain a ligation product
  • S3 amplifies the S1 ligation product, and obtains the sequencing library of the sample to be tested after purification.
  • the present invention also provides a method for detecting low-frequency gene mutations, and is characterized in that it comprises the following steps:
  • S1 prepares the above-mentioned high-throughput sequencing adapters or adapter sets, for the same sample, the tag sequences are the same;
  • S2 performs target fragment amplification on the sample to be tested and digests the primers
  • S3 connects the digested product of S2 to the mid-to-high-throughput sequencing adapter or adapter set of S1 to obtain a ligation product, amplify the ligation product, and obtain a sequencing library after purification;
  • S4 sequence the sequencing library of S3, correct the sequencing data according to the tag sequence of the high-throughput sequencing adapter, and perform mutation analysis based on the corrected sequencing data.
  • the mutation analysis in step S4 is: based on the fact that a specific mutation appears in both the sense strand and the antisense strand of the same read, it is determined as a true low-frequency mutation.
  • the sample to be tested is genomic DNA.
  • the present invention also provides the following applications of the above-mentioned high-throughput sequencing adapter, adapter set, composition, complex or kit:
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the distal end of the paired double-stranded part contains the tag sequence, and the non-free ends of the two free arms contain the tag sequence.
  • the base composition of the tag sequence carried by the same sample is consistent. According to the consistency of the tag sequence, it can be judged whether there is cross-contamination during the library construction process. After using some models of the Illumina sequencing platform for sequencing, the analysis of sequencing data can determine whether there is index hopping based on whether the base composition of the tag sequence of the same read is consistent.
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the distal end of the paired double-stranded part contains a tag sequence, and the base composition of the tag sequence should be the same for the sense strand and antisense strand of the same read.
  • a specific mutation must be in the sense strand of the same read, and the antisense strand can be judged as a true mutation. If a certain read only has mutations in the sense strand or the antisense strand, it can be judged as an error in the library construction or sequencing process, and the mutation cannot be included in the subsequent analysis process to avoid false positives.
  • the tag sequence contained in the universal high-throughput sequencing adapter provided by the present invention only uses a segment of tag sequence, and must exist in both the sense strand and the antisense strand through specific mutations; the base composition of the tag sequence of the sample should be in the read segment
  • the base composition of the tag sequence is the same.
  • the tag sequence of a sample is not the same as the tag sequence in the read segment, it can indicate that the read segment does not belong to this sample, that is, a tag skip situation has occurred.
  • the design of the present invention can effectively overcome the inherent label jumping problem of the sequencing part of the platform, and can realize the authenticity interpretation of low-frequency mutations.
  • the universal high-throughput sequencing adapter provided by the present invention includes paired double-stranded complementary regions and unpaired single-stranded free arms.
  • the universal high-throughput sequencing adapter includes a PN adapter and an AN adapter; the PN adapter double-strand complementary region consists of 40 to 58 bases; the AN adapter double-strand complementary region consists of 40 to 58 bases; the PN adapter or The 5'free arm of the AN linker is composed of 30 to 45 bases; the 3'free arm of the PN linker or the AN adaptor is composed of 35 to 56 bases; the tag sequence is composed of 6 to 12 bases, so as to achieve at least 114048 bases.
  • Universal high-throughput sequencing adapter is composed of 30 to 45 bases; the 3'free arm of the PN linker or the AN adaptor is composed of 35 to 56 bases; the tag sequence is composed of 6 to 12 bases, so as to achieve at least 114048 bases.
  • FIG. 1 Schematic diagram of the structure of the universal high-throughput sequencing adapter shown in Example 1;
  • FIG. 1 The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R19054232);
  • FIG. 1 The quality control map of the universal high-throughput library 2100 in Example 2, including the quality control map of the universal high-throughput sequencing adapter library 2100 (sample R20005128);
  • the terms “including”, “including”, “having”, “containing” or “involving” are inclusive or open-ended, and do not exclude other unlisted elements or method steps .
  • the term “consisting of” is considered a preferred embodiment of the term “comprising”. If in the following a certain group is defined as comprising at least a certain number of embodiments, this should also be understood as revealing a group preferably consisting of only these embodiments.
  • nucleic acid refers to any molecule comprising ribonucleic acid, deoxyribonucleic acid or its analogue unit, preferably a polymeric molecule.
  • the nucleic acid can be single-stranded or double-stranded.
  • the single-stranded nucleic acid may be a nucleic acid of one strand of denatured double-stranded DNA.
  • the single-stranded nucleic acid may be a single-stranded nucleic acid that is not derived from any double-stranded DNA.
  • complementary refers to the hydrogen bond base pairing between the nucleotide bases G, A, T, C, and U, so that when two given polynucleotides or polynucleotide sequences anneal to each other At this time, A paired with T, G paired with C in DNA, G paired with C, and A paired with U in RNA.
  • the “sequencing adapter” in the present invention refers to a double-stranded nucleotide sequence connected to the two ends of the target fragment to be sequenced.
  • the double-stranded oligonucleotide sequence can be double-stranded completely complementary or partially double-stranded. Complementarity, such as a "Y-type” linker formed because the terminal partial sequence is not complementary.
  • the sequencing linker of the present invention is preferably such a "Y-type” linker.
  • the composition of the nucleotide sequence of the sequencing adapter is related to the applicable sequencing platform.
  • the composition can include library amplification primer sequence, sample tag sequence, sequencing primer sequence, etc.; and the sequence length of the sequencing adapter is also related to the sequencing platform.
  • the length of the linker can be specifically: 3'free arm sequence is 35-56 bp, 5'free arm sequence is 30-45 bp, double-stranded complementary region sequence The length is 40 ⁇ 58bp.
  • Fig. 1 is a preferred universal high-throughput sequencing "Y-type" linker of the present invention, which includes a PN linker and an AN linker, which can be respectively located at either end of the target sequence.
  • Both the PN linker and the AN linker comprise a double-stranded complementary region, a single-stranded 5'free arm and a single-stranded 3'free arm.
  • the PN linker of the universal high-throughput sequencing linker and the double-stranded complementary region of the AN linker both include a tag sequence, and the tag sequence is composed of 6 to 12 bases.
  • the non-free end of the AN adaptor single-stranded 3'free arm of the high-throughput sequencing adaptor also contains the same base composition as the tag sequence
  • the PN adaptor of the above-mentioned universal high-throughput sequencing adaptor is single-stranded
  • the non-free end of the 3'free arm contains the same base composition as the tag sequence.
  • the 3'end of the 3'-free arm of the universal high-throughput sequencing linker AN linker and PN linker is thio modified; preferably, the last 3 bases The phosphodiester bond between is replaced by phosphorothioate.
  • the double-stranded complementary ends of the universal high-throughput sequencing linker AN linker and PN linker can be connected to the original gene fragment through a ligation reaction by a ligase.
  • the 5'free arm of the AN adaptor and the 3'free arm of the PN adaptor are non-complementary paired single strands and cannot be connected to the original gene fragments, thereby ensuring the efficiency of connecting the universal high-throughput sequencing adaptor to the DNA fragments.
  • PN linker and AN linker respectively refer to a partial double-stranded structural fragment (Y-type structure) containing a double-stranded complementary region and a single-stranded 3'/5' free arm, which is in the library When constructing, they are connected to one end of the target sequence respectively, and the nucleotide sequences of the two are preferably different.
  • the "free arm” in the present invention refers to the region where the bases in the linker sequence are not complementary paired, such as the unpaired region of the PN linker or AN linker of the present invention. Therefore, even if it is not clear that the sequence between the free arms is not complementary, this The field should also understand that the two sequences are not complementary and can form a Y-shaped structure in some cases.
  • the free arm of the present invention includes a library amplification primer region; in other embodiments, the 3'free arm of the present invention also includes a tag sequence.
  • the “double-stranded complementary region” in the present invention refers to the double-stranded complementary region contained in the sequencing adapter. This region usually contains sequencing primer sequences.
  • the double-stranded complementary region of the present invention contains at least two sequencing platforms for sequencing. Primer sequence.
  • the "tag sequence” in the present invention refers to a nucleotide sequence with a base length of 6 to 12 bp, which is used to identify different library samples.
  • non-free end in the present invention refers to the end where the double-stranded complementary region of the PN linker or the AN linker is connected to the 3'or 5'free arm of the single strand.
  • the "free end” in the present invention refers to the 3'end of the single-stranded 3'free arm or the 5'end of the single-stranded 5'free arm of the PN linker or the AN linker.
  • the "high-throughput sequencing platform” in the present invention refers to sequencing platforms such as Ion Torrent, Illumina, Roche454, and ABI. Although the sequencing platforms are preferably Ion Torrent and Illumina in the present invention, they are not limited. It is clear in the art that, based on the inventive concept of the present invention, primer sequences can be selected for any two or more platforms, and they can be constructed in the linker sequence to prepare the compatible high-throughput sequencing linker of the present invention. In addition, for sequencers under different sequencing platforms, considering that the sequencing principles of sequencers under the same type of sequencing platform are basically the same, the method of the present invention is applicable to all models under the same platform, for example, in the Ion Torrent sequencing platform.
  • the "low-frequency mutation” in the present invention refers to mutations where the frequency of gene mutation is less than 5%, including less than 5%, less than 4%, less than 3%, less than 2%, less than 1%, etc. Various mutations.
  • AN1/PN1 is a set of universal high-throughput sequencing adapters with shorter sequences
  • AN2/PN2 is another set of universal high-throughput sequencing adapters with longer sequences.
  • the specific preparation method is as follows:
  • the end of the double-strand complementary region of the AN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X".
  • the 3'free arm of the AN linker also contains a tag sequence of 6-12 random bases "X”, and is connected to the non-free end of the 3'free arm of the AN linker.
  • the end of the double-stranded complementary region of the PN linker contains a tag sequence, where the tag sequence is 6-12 random bases "X”.
  • the 3'free arm of the PN linker also contains a tag sequence of 6-12 random bases "X", and is connected to the non-free end of the 3'free arm of the PN linker. *Represents the thio modification site.
  • the universal high-throughput sequencing linker AN linker and the phosphodiester between the last 3 bases of the 3'end of the 3'free arm of the PN linker The bond is replaced by phosphorothioate.
  • the universal high-throughput sequencing adapters AN1/PN1 and AN2/PN2 prepared in Example 1 were used in the experiment, respectively.
  • the number of universal high-throughput sequencing adapters corresponds to the number of samples to be tested. For example, if the number of samples to be tested is 10, 10 sets of universal high-throughput sequencing adapter sets are prepared correspondingly, and each set of universal high-throughput sequencing adapter sets includes PN1 adapters and AN1 adapters.
  • the base sequence composition of the tag sequence in the same group of PN1 adaptors and AN1 adaptors is the same, and the base sequence composition of the tag sequence in different adaptor groups is different.
  • Sample genomic DNA extraction Take peripheral blood samples 1 and 2 (corresponding to R190542432 and R20005128 respectively) for genomic DNA extraction.
  • the sample DNA was extracted in accordance with the operating instructions of the nucleic acid extraction reagent (DR181003-48) produced by Beijing Anzhiyin Biotechnology Co., Ltd.
  • the target regions to be examined are the entire coding region and the variable splicing region of the ACTA2, COL3A1, FBN1, MYH11, MYLK, SMAD3, TGFBR1, and TGFBR2 genes (20bp from exons to introns).
  • the multiple PCR primer pool of the target detection area is based on the design of Ion Ampliseq Designer, synthesized and provided by Thermo Fisher.
  • Target fragment amplification the specific implementation is as follows:
  • the ligase is Fast T4 DNA Ligase produced by Shanghai Yisheng Biotechnology Co., Ltd.
  • the ligation buffer is Shanghai Yi 5 ⁇ Fast Ligation Buffer produced by Sheng Biotechnology Co., Ltd.
  • the specific implementation is as follows:
  • the Ion Torrent platform and the Illumina platform are used to perform sequencing verification on the above-mentioned high-throughput library, as follows:
  • Ion 520 TM & Ion 530 TM Kit–OT Dilute the library after purification and quality inspection, use Ion 520 TM & Ion 530 TM Kit–OT, and proceed according to the kit operating procedures. After template preparation on the IonTouch 2 instrument, the Ion GeneStudio TM S5 Plus gene sequencer is used for sequencing and data analysis.
  • the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
  • the average read length of the above two samples is ⁇ 200bp, indicating that all samples in the sample are read through, that is, the bases between the beginning and the end of the target fragment to be tested can be identified;
  • Mean depth average sequencing depth
  • On Target target rate
  • Uniformity is ⁇ 90 %, indicating that the amplification efficiency of each read in the target area to be tested and the efficiency of connecting the universal high-throughput adapter are similar.
  • the above parameters all indicate that the two ends of the target segment to be tested are successfully connected to the universal high-throughput sequencing adapter, and the sequencing is successful; it indicates that the library connected to the universal high-throughput sequencing adapter can be sequenced on the Ion GeneStudio TM S5 Plus gene sequencer.
  • the data output of the above two samples are both ⁇ 0.5G, the Reads data are both ⁇ 3M, and the proportion of Q30 is ⁇ 75%, indicating that the two samples are successfully sequenced; indicating that the two samples are successfully connected to the universal high-throughput sequencing adapter at both ends of the target segment to be tested.
  • the library connected to the universal high-throughput sequencing adapter can be sequenced on the Miseq DX gene sequencer.
  • the use of the sequencing adapters prepared by the present invention to build a library can meet the sequencing requirements of the Ion GeneStudio TM S5 Plus platform and the Miseq DX platform at the same time, that is, meet the requirements of the two mainstream sequencing platforms of the Ion Torrent platform and the Illumina platform at the same time.
  • the sequencing adapter of the present invention has the properties of a universal library-building adapter.
  • the library applicable to the Ion GeneStudio TM S5Plus sequencer can be applied to other Ion Torrent platform sequencers, such as PGM, Proton, etc.
  • the library applicable to the Miseq DX gene sequencer can be applied to other types of sequencers on the Illumina platform, such as MiniSeq, NextSeq, etc. Therefore, it can be clarified that the library connected with the universal sequencing adapter of the present invention can be applied to all types of sequencers on the Ion Torrent platform and the Illumina platform.
  • This embodiment further verifies the application of the sequencing adapter of the present invention in low-frequency detection, and specifically provides a detection method for judging the authenticity of low-frequency mutations, which can correct sequencing errors introduced by index hopping.
  • the technical circuit diagram is shown in Figure 4, which specifically includes the following steps:
  • sample is a commercial tumor SNV 5% gDNA standard (GW-OGTM005), which is serially diluted with commercial human genomic DNA (G304A) to a mutation frequency of 2.5%, 1.25%, and 0.5%, named as sample 1, sample 2.
  • Sample 3 Sample 4.
  • the target area to be inspected is the designated hot spot area of EGFR(L858R/T790M/ ⁇ E746_ ⁇ A750)/PIK3CA(E545K)/KRAS(G12D/G13D/A146T)/NRAS(Q61K) gene.
  • the multiple PCR primer pool of the target detection area uses Thermo Fisher's Ampliseq colon&lung panel. 3 replicates for each sample.
  • Target fragment amplification the specific implementation is as follows:
  • the high-throughput sequencing adapter set adopts the PN1/AN1 and PN2/AN2 described in Example 1. Taking the PN2/AN2 test data as an example, 4 sets of adapter sets are prepared as follows: The sample sequence tags are ATCACG; CGATGT; TTAGGC; TGACCA. For specific preparation methods, refer to Example 1.
  • the amplified library was purified using Ampure magnetic beads, and the purified library was quantified using QUBIT 4.0.
  • the library concentration is calculated according to the dilution factor.
  • the library concentration higher than 1ng/uL can be used for subsequent experimental steps, and the library construction fails when the library concentration is lower than 1ng/uL.
  • the library was diluted, and Miseq DX Reagent Kit v3 was used to proceed in accordance with the kit operating procedures. Sequencing and data analysis were performed on the Miseq DX gene sequencer.
  • Sequencing data analysis mainly includes the following contents:
  • sequencing tag sequence and the universal high-throughput sequencing adapter AN adapter double-stranded end tag sequence base sequence to form a consistent identification of sample cross-contamination and tag hopping (index hopping) introduction
  • the sequencing error For the above-mentioned sequencing data classified into the same sample source, further use the sequencing tag sequence and the universal high-throughput sequencing adapter AN adapter double-stranded end tag sequence base sequence to form a consistent identification of sample cross-contamination and tag hopping (index hopping) introduction The sequencing error.
  • a universal high-throughput sequencing adapter is used. After the sequencing is completed, the obtained sequencing data is analyzed. First, the tag sequence is used to identify the source data of the same sample, and the sample is divided into four mutation frequencies of sample 1, sample 2, sample 3, and sample 4. Then identify whether the double-stranded partial tag sequence of the read linker is the same as the sample tag sequence, and eliminate the index hopping problem. Then, the authenticity of the mutation site is further recognized by whether the positive read and negative read with the same tag sequence have the same mutation site. However, mutations in which only positive or negative reads exist, or mutations in which the tag sequence in the read is inconsistent with the sample tag are excluded, so as to realize the correct identification of low-frequency mutations.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Organic Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

提供的是一种通用型高通量测序接头及其应用。该通用型高通量测序接头包含双链互补区和单链自由臂。采用该测序接头能够兼容于多种测序平台,包括Ion Torrent和Illumina平台等,适于临床检测和成本节约,同时还能应用于低频突变的真实性判读。

Description

一种通用型高通量测序接头及其应用
相关申请的交叉引用
本申请要求于2020年05月14日提交中国专利局的申请号为202010407833.5、名称为“一种通用型高通量测序接头及其应用”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及基因测序领域,具体涉及一种测序建库过程中的通用型高通量测序接头及其应用。
背景技术
随着基因测序技术的发展,高通量测序技术(High-Throughput Sequencing)在临床实践中应用越来越广泛,如在高危疾病的新生儿筛查、遗传疾病的诊断和基因携带的检测以及基因药物检测用于个体化用药剂量、选择和药物反应等诸多方面。高通量测序技术(High-Throughput Sequencing)又名下一代测序技术Next Generation Sequencing,NGS),是相对于传统的桑格尔测序技术(Sanger Sequencing)而言的。Sanger测序原理是通过DNA聚合酶使人工合成的短寡核酸引物延伸,与单链DNA模板杂交,合成新的DNA片段,通过聚丙烯酰胺电泳分离不同片段达到读取DNA序列的技术。在Sanger测序出现以后的时间里,由于其读长长、数据准确性高,一直被认为是DNA测序的金标准。至今也是用于下一代测序结果的验证的金标准。但Sanger测序数据通量低,在进行较多基因较多位点的同时检测时费时费力。
NGS测序通常采用大规模平行测序技术(Massively parallel sequencing,MPS),可实现多样本多位点的同时测序,大大提高了测序通量。目前,高通量测序技术在临床基因检测中被证实具有很高的准确度和灵敏度,然而受到文库构建过程及测序过程中的各种噪音及错误的影响,使得测序结果中出现低频突变时,很难区分其真实性。如由于标签跳跃(index hopping)引入的数据污染,相比桥式扩增,在排他性扩增(Exclusion Amplification,ExAmp)方式下,标签跳跃的比例可高达2%(illumina.Effects of index misassignment on multiplexing and downstream[Z]Analysis)。
目前应用广泛的NGS测序平台主要为Ion Torrent和Illumina,但由于测序技术原理不同,不同平台在文库构建时采用技术流程不同,进而造成文库不通用,即适用于Ion Torrent平台的文库不可在Illumina平台产生测序数据,反之亦然。这就对临床应用产生了极大的限制,因此有必要去寻求一种通用型文库进而适用不同的测序平台。
文库构建是NGS测序中的重要组成部分,其中测序接头研究则是文库研究中的重点,目前关于测序接头设计主要包括两个方向,一个是试图改进接头形状,如Y型接头或U型接头,以减少 或避免接头二聚体的出现,提高可用测序数据量;另一个是在接头结构中加入特定的分子标签用于识别文库构建过程中产生的错误。不过目前现有技术中,通过以上两个研究方向制备的测序文库仍然只能针对固定的测序平台使用,不能同时在主流的测序平台如Ion Torrent和Illumina中使用。Ion Torrent平台及Illumina平台由于其测序原理不同,构建测序模板的方式也完全不同,Ion Torrent平台采用乳液PCR进行测序模板的构建;Illumina平台采用桥式扩增或排他性扩增方式进行测序模板的构建。根据模板构建方式Ion Torrent普遍采用直链接头;Illumina普通采用Y型接头进行文库构建。鉴于以上目前常规高通量测序接头均为单一适用接头,不能适用Ion Torrent及Illumina双平台。
有鉴于此,提出本发明。
发明内容
本发明要解决的技术问题是克服现有技术中的高通量测序接头无法实现不同测序平台的兼容性通用,适用性不强等缺陷。
因此,本发明的第一目的是寻求一种适用于多种测序平台的通用型高通量测序接头;
本发明的第二目的是寻求一种适用于多种测序平台的通用型高通量测序接头的制备方法;
本发明的第三目的是寻求一种通用型高通量测序接头的应用;
本发明的第四目的是寻求一种基因低频突变的检测方法。
为实现上述目的,本发明提供了如下技术方案:
本发明提供了一种单链接头,其特征在于,所述单链接头依次连接
1)自由臂,
2)双链互补区,其中,
所述自由臂包含文库扩增引物结合区和载体结合区;
所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。
在一些实施方式中,所述双链互补区中还包含标签序列。
优选的,所述标签序列位于双链互补区远离自由臂一端。
在一些实施方式中,所述标签序列由6~12个随机碱基组成。
在一些实施方式中,所述单链接头的自由臂长度为为30-56bp,所述双链互补区长度为40-58bp。
在一些具体的实施方式中,
所述自由臂可由如下序列构成:
5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
所述双链互补区域可由如下序列构成:
5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
其中,所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
在一些实施方式中,所述自由臂中还包含标签序列,且标签序列与双链互补区标签序列一致。
在一些具体的实施方式中,
所述自由臂可由如下序列构成:
5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’,
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
本发明还提供了一种Y型高通量测序接头,其特征在于,所述测序接头包括第一单链和第二单链;
所述第一单链和第二单链分别包含:
1)自由臂,
2)双链互补区,其中,
所述自由臂包含文库扩增引物结合区和载体结合区;
所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。
在一些实施方式中,所述第一单链和第二单链的自由臂序列不互补,所述第一单链和第二单链经退火可形成Y型结构双链。
在一些实施方式中,所述双链互补区中包含标签序列,所述标签序列位于双链互补区远离自由臂一端。
在一些实施方式中,所述测序平台包括但不限于Illumina、Ion Torrent、PacBio、Roche、Helicos、ABI平台;优选的,所述测序平台为Ion Torrent和Illumina平台。
在一些实施方式中,所述第二单链自由臂中还包含标签序列。
在一些优选实施方式中,所述自由臂中的标签序列与双链互补区中标签序列相同;更优选的,所述自由臂中标签序列靠近双链互补区端。
在一些实施方式中,所述第一单链和第二单链的双链互补区长度为40-58bp;所述第一单链自由臂长度为30-45bp,所述第二单链自由臂长度为35-56bp;所述标签序列为6~12bp的随机碱基组成。
在一些实施方式中,所述第一或第二单链的自由臂3’末端进行稳定性修饰;
优选的,进行硫代修饰;
更优选的,在3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。
在一些实施方式中,所述第一单链序列如下:
自由臂序列:
5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
双链互补区域序列:
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
5’-GCTCTTCCGATNNNNNNNNNNCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
在一些优选的实施方式中,所述第一单链为自由臂和双链互补区依次5’-3’方向连接。
在一些实施方式中,所述第二单链序列如下:
自由臂序列:
5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;
双链互补区域序列:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
在一些优选的实施方式中,所述第二单链为双链互补区和自由臂依次5’-3’方向连接。
本发明还提供了一种高通量测序接头组,其特征在于,所述测序接头组包括上述所述的高通量测序接头。
在一些实施方式中,所述高通量测序接头组还包括如下另一种Y型高通量测序接头:所述Y型高通量测序接头包含第三和第四单链;
该另一种Y型高通量测序接头序列与上述Y型高通量测序接头序列类似,仅双链互补区序列不同;
其中,所述第三单链的双链互补区域序列如下:
5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
所述第四单链的双链互补区序列与第三单链双链互补区序列互补;
所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
在一些优选的实施方式中,所述第三、第四单链的自由臂和双链互补区序列之间的连接顺序与第一、第二单链相同。
在一些优选的实施方式中,所述Y型高通量测序接头的单链序列分别如下:
第一单链序列(SEQ ID NO.1):
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
第二单链序列(SEQ ID NO.2):
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;
第三单链序列(SEQ ID NO.3):
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
第四单链序列(SEQ ID NO.4):
5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’。
在另一些优选的实施方式中,所述Y型高通量测序接头的单链序列分别如下:
第一单链序列(SEQ ID NO.5):
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
第二单链序列(SEQ ID NO.6):
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;
第三单链序列(SEQ ID NO.7):
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
第四单链序列(SEQ ID NO.8):
5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’。
鉴于序列表生成问题,序列表中的SEQ ID NO.1-8不包含“XXXXXX”。
本发明还提供一种组合物,其特征在于,所组合物包含上述的高通量测序接头或接头组。
本发明还提供一种复合物,其特征在于,所复合物连接于上述的高通量测序接头或接头组。
本发明还提供一种试剂盒,其特征在于,所组合物包含上述的高通量测序接头或接头组。
在一些实施方式中,所述试剂盒为高通量测序建库试剂盒或基因序列富集试剂盒。
本发明还提供上述高通量测序接头的制备方法,其特征在于,包括如下步骤:
S1分别合成第一链和第二链单链序列;
S2将S1所述两条单链序列进行特异性退火,得到所述高通量测序接头。
本发明还提供一种测序文库的构建方法,其特征在于,
S1制备待测样本目标片段;
S2将上述高通量测序接头或接头组连接于S1的目标片段获得连接产物;
S3扩增S1连接产物,纯化后获得所述待测样本的测序文库。
本发明还提供一种基因低频突变的检测方法,且特征在于,包括如下步骤:
S1制备上述高通量测序接头或接头组,针对同一样本,所述标签序列相同;
S2对待测样本进行目标片段扩增,消化引物;
S3将S2所述消化产物连接S1所述中高通量测序接头或接头组,获得连接产物,扩增连接产物,纯化后获得测序文库;
S4将S3所述测序文库进行测序,根据高通量测序接头的标签序列校正所述测序数据,基于矫正后的测序数据进行突变分析。
在一些实施方式中,所述步骤S4中的突变分析为:基于某一特定突变在同一读段的正义链和反义链均出现则判定为真低频突变。
在一些优选的实施方式中,所述待测样本为基因组DNA。
本发明还提供一种上述高通量测序接头、接头组、组合物、复合物或试剂盒的如下应用:
a、在测序文库构建中或制备测序文库的产品中的应用;
b、在高通量测序中或在制备高通量测序产品中的应用;
c、在基因低频突变检测中或在制备基因低频突变检测产品中的应用;
d、在体外诊断或在制备体外诊断产品中的应用;
e、在用于目标基因或扩增富集中的应用。
本发明的有益技术效果:
1)利用本发明的通用高通量测序接头或其试剂盒构建文库,可在主流测序平台Ion Torrent及Illumina所有型号测序平台上进行,产生测序数据。使得文库构建试剂盒及方法不受已有测序平台的限制。满足日益多样的临床需求。对一个特定的检测需求,只需要开发一种建库试剂盒即可在Ion Torrent及Illumina所有型号测序平台产生测序数据,也节约了应用企业的开发成本及周期。
2)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。配对双链部分远端含有标签序列,两个自由臂的非自由端含有标签序列。同一样本带有的标签序列碱基构成一致,根据标签序列的一致性可判断在建库过程中是否存在交叉污染。在使用Illumina测序平台部分型号测序仪进行测序后,测序数据的分析可根据相同读段的标签序列碱基构成是否一致来判断是否出现index hopping情况。
3)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。配对双链部分远端含有标签序列,同一读段正义链、反义链应带有的标签序列碱基构成相同。某一特定突变需在同一读段的正义链,反义链均出现方可判定为真突变。若某读段只在正义链或反义链出现突变,则可判定为文库构建或测序过程中的错误,不可计入突变进行后续分析流程,避免了假阳性。
4)本发明提供的通用高通量测序接头中包含的标签序列,只利用一段标签序列,通过特定突变需在正义链、反义链同时存在;样本的标签序列碱基构成应和读段内的标签序列碱基构成相同。当某一样本的标签序列与读段内标签序列不相同时,可说明此读段不属于此样本,即出现了标签跳跃情况。利用本发明设计能够有效克服测序部分平台自身固有的标签跳跃问题,可实现低频突变的真实性判读。
5)本发明提供的通用高通量测序接头包含配对的双链互补区和不配对的单链自由臂。示例性的,通用高通量测序接头包含PN接头和AN接头;PN接头双链互补区域由40~58个碱基构成;AN接头双链互补区域由40~58个碱基构成;PN接头或AN接头5’自由臂由30~45个碱基构成;PN接头或AN接头3’自由臂由35~56个碱基构成;标签序列由6~12个碱基构成,从而实现至少构建114048个通用高通量测序接头。
附图说明
为了更清楚地说明本发明具体实施方式或现有技术中的技术方案,下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1.实施例1所示的通用高通量测序接头的结构示意图;
图2.实施例2中通用高通量文库2100质控图谱,含通用高通量测序接头文库2100质控图谱(样本R19054232);
图3.实施例2中通用高通量文库2100质控图谱,含通用高通量测序接头文库2100质控图谱(样本R20005128);
图4.实施例4中技术线路图。
具体实施方式
下面将结合附图对本发明的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
以下术语或定义仅仅是为了帮助理解本发明而提供。这些定义不应被理解为具有小于本领域技术人员所理解的范围。
除非在下文中另有定义,本发明具体实施方式中所用的所有技术术语和科学术语的含义意图与本领域技术人员通常所理解的相同。虽然相信以下术语对于本领域技术人员很好理解,但仍然阐述以下定义以更好地解释本发明。
如本发明中所使用,术语“包括”、“包含”、“具有”、“含有”或“涉及”为包含性的(inclusive)或开放式的,且不排除其它未列举的元素或方法步骤。术语“由…组成”被认为是术语“包含”的优选实施方案。如果在下文中某一组被定义为包含至少一定数目的实施方案,这也应被理解为揭示了一个优选地仅由这些实施方案组成的组。
在提及单数形式名词时使用的不定冠词或定冠词例如“一个”或“一种”,“所述”,包括该名词的复数形式。
本发明中的术语“大约”、“大体”表示本领域技术人员能够理解的仍可保证论及特征的技术效果的准确度区间。该术语通常表示偏离指示数值的±10%,优选±5%。
此外,说明书和权利要求书中的术语第一、第二、第三、(a)、(b)、(c)以及诸如此类,是用于区分相似的元素,不是描述顺序或时间次序必须的。应理解,如此应用的术语在适当的环境下可互换,并且本发明描述的实施方案能以不同于本发明描述或举例说明的其它顺序实施。
本发明中的术语“核酸”或“核酸序列”指包含核糖核酸、脱氧核糖核酸或其类似物单元的任何分子、优选聚合分子。所述核酸可为单链的或双链的。单链核酸可为变性双链DNA的一条链的核酸。或者,单链核酸可为不来源于任何双链DNA的单链核酸。
本文所使用的术语“互补”涉及核苷酸碱基G、A、T、C和U之间的氢键碱基配对,以使得当两种给定的多核苷酸或多核苷酸序列彼此退火时,在DNA中A与T配对、G与C配对,在RNA中G与C配对、A与U配对。
其它术语在本发明各个方面的描述中进行定义。
本发明中所述的“测序接头”是指:连接在待测序目标片段两端的一段双链核苷酸序列,该双链寡核苷酸序列可以是双链完全互补,也可以是部分双链互补,比如因末端部分序列不互补而形成的“Y型”接头,本发明的测序接头优选为这种“Y型”接头。另外测序接头的核苷酸序列的构成与所适用测序平台相关,例如构成上,可以包括文库扩增引物序列,样本标签序列,测序引物序列等;而测序接头的序列长度也与测序平台相关,本领域可以进行选择,比如本发明的一些实施方式中,所述的接头长度具体可为:3’自由臂序列为35~56bp,5’自由臂序列长度为30~45bp,双链互补区序列长度为40~58bp。
示例性的,如图1为本发明的一种优选的通用高通量测序“Y型”接头,其包括PN接头及AN接头,其可分别位于目标序列的任一端。所述PN接头及AN接头均包含双链互补区、单链5’自由臂和单链3’自由臂。在一些实施方式中,通用高通量测序接头的PN接头及AN接头的双链互补区 均包含标签序列,所述标签序列为由6~12个碱基组成。在一些优选的实施方式中,所述高通量测序接头的AN接头单链3’自由臂非自由端还包含与标签序列相同的碱基构成,上述通用高通量测序接头的PN接头单链3’自由臂非自由端包含与标签序列相同的碱基构成。在一些实施方式中,为增强接头稳定性,防止水解,通用高通量测序接头AN接头及PN接头的3’-自由臂的3’末端进行硫代修饰;优选的,在最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。示例性的,可通过连接酶将通用高通量测序接头AN接头与PN接头的双链互补端与原始基因片段通过连接反应进行连接。而AN接头的5’自由臂及PN接头的3’自由臂由于为非互补配对单链而不能与原始基因片段相连,从而能够保障通用高通量测序接头与DNA片段的连接效率。
本发明说明书中所述的“PN接头”和“AN接头”分别是指:包含双链互补区及单链3’/5’自由臂的部分双链结构片段(Y型结构),其在文库构建时分别连接于目标序列的一端,两者核苷酸序列优选不同。
本发明中所述的“自由臂”是指:接头序列中碱基不互补配对的区域,比如本发明PN接头或AN接头的非配对区域,因此,即便不明确自由臂间序列不互补,本领域也应该理解到两者序列不互补,在一些情况下能够形成Y型结构。另外,在一些实施方式中,本发明的自由臂中包括文库扩增引物区;在另一些实施方式中,本发明的3’自由臂还包含标签序列。
本发明中所述的“双链互补区”是指:包含于测序接头中的双链互补的区域,该区域通常包含测序引物序列,本发明所双链互补区包含至少两个测序平台的测序引物序列。
本发明中所述的“标签序列”是指:6~12bp碱基长度的核苷酸序列,用于识别不同文库样本。
本发明中所述的“非自由端”是指:PN接头或AN接头双链互补区域与单链3’或5’自由臂相连的一端。
本发明中所述的“自由端”是指:PN接头或AN接头的单链3’自由臂的3’端或单链5’自由臂的5’端。
本发明所述的“高通量测序平台”是指:诸如Ion Torrent、Illumina、Roche454和ABI等测序平台,虽然本发明优选所述测序平台为Ion Torrent和Illumina,但并不对其进行限制。本领域清楚,基于本发明的发明构思,可以针对任意两个或多个平台进行引物序列选择,将其构建与接头序列中,进而制备出本发明的兼容性高通量测序接头。另外,对于不同测序平台下的测序仪,考虑到同类型测序平台下测序仪的测序原理基本相同,因此本发明的方法适用于相同平台下的所有机型,比如,所述Ion Torrent测序平台内所有型号,包括但不限于Ion GeneStudio TM S5 Plus、PGM、Proton等;Illumina测序平台内所有型号,包括但不限于Miseq DX、MiniSeq、NextSeq等都适于本发明。
本发明中所述的“低频突变”是指:基因突变频率低于5%的突变情况,包括低于5%,低于4%,低于3%,低于2%,低于1%等各种突变情况。
本发明通过附图和如下实施例进一步描述,所述的附图和实施例只是为了例证本发明的特定实施方案,不应理解为以任何方式限制本发明范围之意。除非另外说明,本发明中所公开的实验方法均采用本技术领域常规技术,通用高通量测序接头由生工生物工程(上海)股份有限公司完成,实施例中所用的试剂和原材料均可由市场购得。
实施例1 高通量测序接头设计和制备
根据图1所示结构组成,设计两组通用高通量测序接头AN1/PN1和AN2/PN2。其中,AN1/PN1为一组序列较短的通用高通量测序接头,AN2/PN2为另一组序列较长的通用高通量测序接头。
具体制备方法如下:
制备以下序列1-4所示测序接头,将序列1与2退火形成AN1接头,将序列3与4退火形成AN2接头。
序列1:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’;(3’自由臂链)
序列2:
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;(5’自由臂链)
序列3:
5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTT*T*G-3’;(3’自由臂链)
序列4:
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;(5’自由臂链)
制备以下序列5-8所示测序接头,将序列5与6退火形成PN1,序列7与8退火形成PN2接头。
序列5:
5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
序列6:
5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTA*T*G-3’
序列7:
5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
序列8:
5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTT*T*G-3’
其中,AN接头双链互补区末端包含标签序列,其中标签序列为6~12个随机碱基“X”。AN接头3’自由臂也包含标签序列为6~12个随机碱基“X”,并连接于AN接头3’自由臂的非自由端。PN接头双链互补区末端包含标签序列,其中标签序列为6~12个随机碱基“X”。PN接头3’自由臂也包含标签序列为6~12个随机碱基“X”,并连接于PN接头3’自由臂的非自由端。*代表硫代修饰位点,具体的,为增强接头稳定性,防止水解,通用高通量测序接头AN接头及PN接头的3’自由臂的3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。
实施例2 主动脉相关基因通用高通量测序文库制备
分别采用实施例1制备的通用高通量测序接头AN1/PN1和AN2/PN2用于实验。
其中,通用高通量测序接头数目对应待测样本的数量。例如,待测样本的数量为10,则对应制备10组通用高通量测序接头组,每组通用高通量测序接头组中均包含PN1接头及AN1接头。同组PN1接头与AN1接头中标签序列碱基序列构成相同,不同接头组中标签序列碱基序列构成不同。
连接待测目的片段与上述测序接头组,获得连接产物;其中,同一样本来源的所述基因片段连接同一组所述通用高通量测序接头组;扩增所述连接产物得到扩增产物,纯化后获得所述待测样本的通用高通量测序文库。
具体步骤如下:
1、DNA提取及质检
(1)样本基因组DNA提取:取外周血样本1和2(分别对应R190542432和R20005128)分别进行基因组DNA提取。样本DNA的提取按照北京安智因生物技术有限公司生产的核酸提取试剂(DR181003-48)操作说明进行提取。
(2)利用NanodropOne进行DNA纯度测定,利用Qubit 4.0进行双链DNA浓度测定。将DNA稀释至5ng/ul备用。
2、文库构建
待检目标区域为ACTA2,COL3A1,FBN1,MYH11,MYLK,SMAD3,TGFBR1,TGFBR2基因的全编码区和可变剪切区(外显子向内含子外延20bp)。目标检测区域的多重PCR引物池,基于Ion Ampliseq Designer设计,由Thermo Fisher公司进行合成并提供。
(1)目标片段扩增,具体实施如下:
组分 反应体积
多重PCR master mix 2uL
引物池1/2 5uL
基因组DNA(5ng/uL) 2uL
无核酸酶水 1uL
总体积 10uL
反应条件
Figure PCTCN2020092418-appb-000001
Figure PCTCN2020092418-appb-000002
(2)消化反应,具体实施如下:
将同一样本引物池1与引物池2扩增产物混合,体积为20uL,加入2uL消化反应预混液,反应条件见下:
反应温度 反应时间
50℃ 10min
55℃ 10min
60℃ 20min
10℃ 保持
(3)利用连接酶将样本1和2分别连接通用高通量测序接头AN1/PN1和AN2/PN2,连接酶为上海翊圣生物科技有限公司生产的Fast T4 DNA Ligase,连接缓冲液为上海翊圣生物科技有限公司生产的5×Fast Ligation Buffer。具体实施如下:
按照下表配置连接反应液:
组分 反应体积
连接酶 2uL
连接缓冲液 4uL
通用高通量测序接头PN接头(10uM) 1uL
通用高通量测序接头AN接头(10uM) 1uL
酶切后的PCR产物 22uL
总体积 30uL
反应条件
反应温度 反应时间
22℃ 30min
68℃ 5min
72℃ 5min
10℃ 保持
(4)纯化及扩增,具体实施如下:
利用Ampure磁珠纯化连接后产物,对纯化后的产物进行PCR扩增。
反应体系
组分 反应体积
PCR MIX 25uL
上游引物(5uM) 5uL
下游引物(5uM) 5uL
纯化后连接产物 20uL
总体积 50uL
反应条件
Figure PCTCN2020092418-appb-000003
(5)文库纯化及定量,具体实施如下:
利用Ampure磁珠对扩增后的文库进行纯化,纯化后文库利用Agilent 2100及QUBIT 4.0对文库进行质检及定量。文库2100质检图谱见图2和图3,显示文库长度片段主峰在400bp附近且文库主峰呈单一尖锐单峰,结果表明原始基因片段两端已连接通用高通量测序接头。根据稀释倍数计算得到文库浓度,文库浓度高于1ng/uL可进行后续实验步骤,低于1ng/uL建库失败。
实施例3 基于Ion Torrent平台和Illumina平台的测序分析
本实施例分别采用Ion Torrent平台和Illumina平台对上述高通量文库进行测序验证,具体如下:
1、Ion Torrent平台Ion GeneStudio TM S5 Plus测序仪上机测序,具体实施步骤如下:
将上述纯化并质检后文库稀释,利用Ion 520 TM&Ion 530 TM Kit–OT,按照试剂盒操作规程进行,在IonTouch 2仪器上进行模板制备后,Ion GeneStudio TM S5 Plus基因测序仪上进行测序及数据分析。
2、Illumina平台Miseq DX测序仪上机测序,具体实施步骤如下:
将上述纯化并质检后文库稀释,利用Miseq DX Reagent Kit v3,按照试剂盒操作规程进行,在Miseq DX基因测序仪上进行测序及数据分析。
如下进行测序数据结果分析,具体如下:
对于Ion GeneStudio TM S5 Plus平台:
1、分析通用高通量测序文库浓度,文库片段长度分布浓度满足测序后续要求;
2、分析Ion GeneStudio TM S5 Plus平台下机结果,主要包含≥Q20碱基数,读段数,读段平均读长,On Target,Uniformity。具体见下表:
Figure PCTCN2020092418-appb-000004
3、以上两个样本读段平均长度≥200bp,表明样本所有样本均读通即待测目标片段首尾之间的碱基均可识别;Mean depth(平均测序深度)均≥500×,表明待测目标片段均被测序500次以上;On Target(中靶率)均≥95%,表明所测得碱基序列中有95%可比对到待测目标区域范围内;Uniformity(均一性)均≥90%,表明待测目标区域中每个读段扩增效率及连接通用高通量接头效率相近。以上参数均表明两样本待测目标区段两端成功连接通用高通量测序接头,且测序成功;表明连接通用高通量测序接头的文库可在Ion GeneStudio TM S5 Plus基因测序仪进行测序。
对于Miseq DX平台:
1、分析Miseq DX平台下机结果,主要包含数据产量,Reads数量,Q30百分比。见下表所示。
Figure PCTCN2020092418-appb-000005
2、以上两样本数据产量均≥0.5G,Reads数据均≥3M,Q30占比≥75%,表明两样本测序成功;表明两样本待测目标区段两端成功连接通用高通量测序接头,且连接通用高通量测序接头的文库可在Miseq DX基因测序仪进行测序。
综上所述,利用本发明制备的测序接头建库,能够同时满足Ion GeneStudio TM S5 Plus平台和Miseq DX平台的测序要求,即同时满足Ion Torrent平台和Illumina平台两种主流测序平台的要求,因此,本发明的测序接头具备通用型建库接头属性。
另外,由于Ion Torrent测序平台内所有型号测序原理及过程一致,适用于Ion GeneStudio TM S5Plus测序仪的文库即可适用Ion Torrent平台其他型号测序仪,如PGM、Proton等。同理,Illumina测序平台内所有型号测序原理及过程一致,适用于Miseq DX基因测序仪的文库即可适用于Illumina平台其他型号测序仪,MiniSeq、NextSeq等。因此,可以明确,连接本发明的通用测序接头的文库可适用于Ion Torrent平台及Illumina平台所有型号测序仪。
实施例4、低频突变真实性判断
本实施例进一步验证本发明测序接头在低频检测中的应用,具体提供一种低频突变真实性判断的检测方法,可校正index hopping引入的测序错误。技术线路图见图4所示,具体包含以下步骤:
1、样本稀释
(1)样本为商品化肿瘤SNV 5%gDNA标准品(GW-OGTM005),利用商品化人基因组DNA(G304A)梯度稀释至突变频率为2.5%,1.25%,0.5%,命名为样本1,样本2,样本3,样本4。
(2)利用Qubit 4.0进行双链DNA浓度测定。将DNA稀释至5ng/ul备用。
2、文库构建
待检目标区域为EGFR(L858R/T790M/△E746_△A750)/PIK3CA(E545K)/KRAS(G12D/G13D/A146T)/NRAS(Q61K)基因指定热点区域。目标检测区域的多重PCR引物池,采用Thermo Fisher公司的Ampliseq colon&lung panel。每个样本3重复。
(1)目标片段扩增,具体实施如下:
组分 反应体积
多重PCR master mix 4uL
引物池 10uL
基因组DNA(5ng/uL) 2uL
无核酸酶水 4uL
总体积 20uL
反应条件
Figure PCTCN2020092418-appb-000006
(2)消化反应,具体实施如下:
在上述PCR产物中加入2uL消化反应预混液,反应条件见下:
反应温度 反应时间
50℃ 10min
55℃ 10min
60℃ 20min
10℃ 保持
(3)连接通用高通量测序接头:
I.制备通用高通量测序接头组:所述高通量测序接头组采用实施例1所述的PN1/AN1和PN2/AN2,如下以PN2/AN2试验数据为例,制备4组接头组,其中样本序列标签分别为ATCACG;CGATGT;TTAGGC;TGACCA,具体制备方法参见实施例1。
II.连接通用高通量测序接头,具体实施如下:
反应体系
组分 反应体积
连接酶 2uL
连接缓冲液 4uL
通用高通量测序接头P接头(10uM) 1uL
通用高通量测序接头A接头(10uM) 1uL
酶切后的PCR产物 22uL
总体积 30uL
反应条件
反应温度 反应时间
22℃ 30min
68℃ 5min
72℃ 5min
10℃ 保持
(4)纯化及扩增,具体实施如下:
利用Ampure磁珠纯化连接后产物,对纯化后的产物进行PCR扩增。
反应体系
组分 反应体积
PCR MIX 25uL
上游引物(5uM) 5uL
下游引物(5uM) 5uL
纯化后连接产物 20uL
总体积 50uL
反应条件
Figure PCTCN2020092418-appb-000007
Figure PCTCN2020092418-appb-000008
(5)文库纯化及定量,具体实施如下:
利用Ampure磁珠对扩增后的文库进行纯化,纯化后文库利用QUBIT 4.0进行定量。
根据稀释倍数计算得到文库浓度,文库浓度高于1ng/uL可进行后续实验步骤,低于1ng/uL建库失败。
3、Illumina平台Miseq DX测序仪上机测序,具体实施步骤如下:
将上述纯化并质检后文库稀释,利用Miseq DX Reagent Kit v3,按照试剂盒操作规程进行,在Miseq DX基因测序仪上进行测序及数据分析。
4、测序数据分析,主要包含以下内容:
(1)利用标签序列识别同一样本来源的数据,将具有相同标签序列的测序数识别为同一样本来源的数据;
(2)对于上述归入同一样本来源的测序数据,进一步利用测序标签序列与通用高通量测序接头AN接头双链端标签序列碱基序列构成一致识别样本交叉污染及标签跳跃(index hopping)引入的测序错误。
(3)对于上述归入同一样本来源的测序数据,进一步利用突变位点正义链,反义链应含有相同碱基构成的标签序列,即AN端与PN端应含有相同碱基构成的标签序列。
具体结果见下表:
样本1(预期突变频率5%)
基因 突变位点 突变频率 正向读段数 负向读段数
EGFR L858R 5% 198 201
EFGR T790M 5.5% 213 187
EGFR ΔE746_A750 4.7% 217 168
PIK3CA E545K 6.8% 204 196
KRAS G12D 5.3% 199 200
KRAS G13D 4.5% 184 216
KRAS A146T 7% 214 186
NRAS Q61K 4.3% 204 193
样本2(预期突变频率2.5%)
基因 突变位点 突变频率 正向读段数 负向读段数
EGFR L858R 2.8% 207 191
EFGR T790M 1.3% 212 187
EGFR ΔE746_A750 2.1% 244 139
PIK3CA E545K 3.8% 177 223
KRAS G12D 2.5% 188 212
KRAS G13D 2.2% 197 203
KRAS A146T 4% 211 189
NRAS Q61K 1.3% 199 197
样本3(预期突变频率1.25%)
基因 突变位点 突变频率 正向读段数 负向读段数
EGFR L858R 1.5% 216 180
EFGR T790M 2.5% 245 152
EGFR ΔE746_A750 1% 249 143
PIK3CA E545K 2% 225 175
KRAS G12D 1.3% 199 199
KRAS G13D 1.8% 192 208
KRAS A146T 1% 203 197
NRAS Q61K 1.8% 194 204
样本4(预期突变频率0.5%)
基因 突变位点 突变频率 正向读段数 负向读段数
EGFR L858R 1.5% 216 181
EFGR T790M 0.5% 245 155
EGFR ΔE746_A750 0.3% 227 154
PIK3CA E545K 1% 218 182
KRAS G12D 0.5% 182 218
KRAS G13D 1.5% 200 200
KRAS A146T 0.8% 205 195
NRAS Q61K 0.5% 217 182
在文库构建过程中采用通用高通量测序接头,测序完成后,对得到的测序数据进行分析。首先利用标签序列识别同一样本来源数据,将样本拆分为4个突变频率的样本1,样本2,样本3,样本4。然后识别读段接头双链部分标签序列与样本标签序列是否相同,排除index hopping问题。而后通过对带有同一标签序列的正向读段及负向读段是否带有相同的突变位点进一步识别突变位点的真实性。而对于只存在正向读段或负向读段的突变或读段内标签序列与样本标签不一致的突变加以排除,从而实现低频突变的正确识别。
5、结果表明:通过采用本发明的通用高通量测序接头进行建库测序,能够有效检测频率低于5%的低频突变,对于突变频率为0.5%的低频突变也可有效检出,进一步降低了低频突变的检测限。
以上对本申请具体实施方式的描述并不限制本申请,本领域技术人员可以根据本申请做出各种改变或变形,只要不脱离本申请的精神,均应属于本申请所附权利要求的范围。

Claims (24)

  1. 一种Y型高通量测序接头,其特征在于,所述测序接头包括第一单链和第二单链;
    所述第一单链和第二单链分别包含:
    1)自由臂,
    2)双链互补区,其中,
    所述自由臂包含文库扩增引物结合区和载体结合区;
    所述双链互补区中包含两种或两种以上测序平台的测序引物结合区。
  2. 权利要求1所述的高通量测序接头,其特征在于,所述第一单链和第二单链的自由臂序列不互补,所述第一单链和第二单链经退火可形成Y型结构双链。
  3. 权利要求1-2任一所述的高通量测序接头,其特征在于,所述双链互补区中包含标签序列,所述标签序列位于双链互补区远离自由臂一端。
  4. 权利要求1-3任一所述的高通量测序接头,其特征在于,所述测序平台包括但不限于Illumina、Ion Torrent、PacBio、Roche、Helicos和ABI平台;优选的,所述测序平台为Ion Torrent和Illumina平台。
  5. 权利要求1-4任一所述的高通量测序接头,其特征在于,所述第二单链自由臂中还包含标签序列。
  6. 权利要求5所述的高通量测序接头,其特征在于,所述自由臂中的标签序列与双链互补区中标签序列相同;优选的,所述自由臂中标签序列靠近双链互补区端。
  7. 权利要求1-6任一所述的高通量测序接头,其特征在于,所述第一单链和第二单链的双链互补区长度为40-58bp;所述第一单链自由臂长度为30-45bp,所述第二单链自由臂长度为35-56bp;所述标签序列为6~12bp的随机碱基组成。
  8. 权利要求1-7任一所述的高通量测序接头,其特征在于,所述第一或第二单链的自由臂3’末端进行稳定性修饰;优选的,进行硫代修饰;更优选的,在3’末端最后3个碱基间的磷酸二酯键由硫代磷酸酯代替。
  9. 权利要求1-8任一所述的高通量测序接头,其特征在于,所述第一单链序列如下:
    自由臂序列:
    5’-NNNNNNNNNNNNNNNACCGAGATCTACACTCTTTCCCTACACGAC-3’;
    双链互补区域序列:
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
    3’
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
  10. 权利要求9所述的高通量测序接头,其特征在于,所述第二单链序列如下:
    自由臂序列:
    5’-ACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGNNNNNNNNNNNNNNN-3’;
    双链互补区域序列:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGNNNNNNNNNNATCGGAAGAGC-3’;
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
  11. 一种高通量测序接头组,其特征在于,所述测序接头组包括权利要求1-10任一所述的高通量测序接头。
  12. 权利要求11所述的高通量测序接头组,其特征在于,所述高通量测序接头组还包括如下Y型高通量测序接头:
    所述Y型高通量测序接头包含第三、第四单链,其序列仅双链互补区序列与权利要求1-10所述高通量测序接头序列不同;
    其中,所述第三单链的双链互补区域序列如下:
    5’-GCTCTTCCGATNNNNNNNNNNNNCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’;
    所述第四单链的双链互补区序列与第三单链双链互补区序列互补;
    所述“XXXXXX”表示6~12个随机碱基组成的标签序列;
    所述“N”表示A、T、C、G任意碱基,或者表示NA(无碱基)。
  13. 权利要求12所述的高通量测序接头组,其特征在于,所述Y型高通量测序接头的单链序列分别如下:
    第一单链序列:
    5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
    第二单链序列:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’;
    第三单链序列:
    5’-ACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
    第四单链序列:
    5’-XXXXXXATCACCGACTGCCCATAGAGAGGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG-3’。
  14. 权利要求12所述的高通量测序接头组,其特征在于,所述Y型高通量测序接头的单链序列分别如下:
    第一单链序列:
    5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATTCTCCATCCA CCTGCGTGTCTCCGACTCAGCTAXXXXXX-3’;
    第二单链序列:
    5’-XXXXXXTAGCTGAGTCGGAGACACGCAGGTGGATGGAGAATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATGGTCCTCGCTCTTTG-3’;
    第三单链序列:
    5’-CAAAGAGCGAGGACACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGCTTTCGC CCTCTCTATGGGCAGTCGGTGATXXXXXX-3’
    第四单链序列:
    5’-XXXXXXATCACCGACTGCCCATAGAGAGGGCGAAAGCGGAGATCGGAAGAGCACGTCTGAACTCCAGTCACXXXXXXATCTCGTATG GTCCTCGCTCTTTG-3’。
  15. 一种组合物,其特征在于,所组合物包含权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。
  16. 一种复合物,其特征在于,所复合物连接于权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。
  17. 一种试剂盒,其特征在于,所组合物包含权利要求1-10任一所述的高通量测序接头,或权利要求11-14任一所述的高通量测序接头组。
  18. 权利要求17所述的试剂盒,其特征在于,所述试剂盒为高通量测序建库试剂盒或基因序列富集试剂盒。
  19. 权利要求1-10任一所述高通量测序接头的制备方法,其特征在于,包括如下步骤:
    S1分别合成第一链和第二链单链序列;
    S2将S1所述两条单链序列进行特异性退火,得到所述高通量测序接头。
  20. 一种测序文库的构建方法,其特征在于,
    S1制备待测样本目标片段;
    S2将权利要求1-10任一所述高通量测序接头或权利要求11-14任一所述的高通量测序接头组连接于S1的目标片段获得连接产物;
    S3扩增S1连接产物,纯化后获得所述待测样本的测序文库。
  21. 一种基因低频突变的检测方法,且特征在于,包括如下步骤:
    S1制备权利要求1-10任一所述高通量测序接头或权利要求11-14任一所述的高通量测序接头组,针对同一样本,所述标签序列相同;
    S2对待测样本进行目标片段扩增,消化引物;
    S3将S2所述消化产物连接S1所述中高通量测序接头或接头组,获得连接产物,扩增连接产物,纯化后获得测序文库;
    S4将S3所述测序文库进行测序,根据高通量测序接头的标签序列校正所述测序数据,基于矫正后的测序数据进行突变分析。
  22. 权利要求21所述的基因低频突变的检测方法,且特征在于,所述步骤S4中的突变分析为:基于某一特定突变在同一读段的正义链和反义链均出现则判定为真低频突变。
  23. 权利要求21-22任一所述的基因低频突变的检测方法,且特征在于,所述待测样本为基因组DNA。
  24. 权利要求1-10任一所述高通量测序接头、权利要求11-14任一所述的高通量测序接头组,权利要求15所述组合物,权利要求16所述复合物或权利要求17-18所述的试剂盒的如下应用:
    a、在测序文库构建中或制备测序文库的产品中的应用;
    b、在高通量测序中或在制备高通量测序产品中的应用;
    c、在基因低频突变检测中或在制备基因低频突变检测产品中的应用;
    d、在体外诊断或在制备体外诊断产品中的应用;
    e、在用于目标基因或扩增富集中的应用。
PCT/CN2020/092418 2020-05-14 2020-05-26 一种通用型高通量测序接头及其应用 WO2021227129A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010407833.5A CN111471754B (zh) 2020-05-14 2020-05-14 一种通用型高通量测序接头及其应用
CN202010407833.5 2020-05-14

Publications (1)

Publication Number Publication Date
WO2021227129A1 true WO2021227129A1 (zh) 2021-11-18

Family

ID=71759877

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/092418 WO2021227129A1 (zh) 2020-05-14 2020-05-26 一种通用型高通量测序接头及其应用

Country Status (2)

Country Link
CN (1) CN111471754B (zh)
WO (1) WO2021227129A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831233A (zh) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 一种基于mTag的靶向测序数据预处理的方法、设备和介质
WO2023092872A1 (zh) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 基于已知标签的内参进行高通量测序的方法
WO2023092601A1 (zh) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Umi分子标签及其应用、接头、接头连接试剂及试剂盒和文库构建方法
CN117286231A (zh) * 2023-09-28 2023-12-26 广州精检生物技术有限公司 一种基于Ion Torrent测序平台的检测方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112301432B (zh) * 2020-12-29 2021-04-06 北京贝瑞和康生物技术有限公司 一种构建全基因组高通量测序的文库的方法和试剂盒
CN115029425B (zh) * 2022-05-26 2023-04-18 北京爱普益生物科技有限公司 兼容多种测序平台的高通量测序str检测试剂盒及其应用

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107858414A (zh) * 2017-10-18 2018-03-30 广州漫瑞生物信息技术有限公司 一种高通量测序接头、其制备方法及其在超低频突变检测中的应用
CN111118001A (zh) * 2019-12-31 2020-05-08 苏州贝康医疗器械有限公司 一种多测序平台通用接头、适用于多测序平台的文库构建方法及试剂盒

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201615486D0 (en) * 2016-09-13 2016-10-26 Inivata Ltd Methods for labelling nucleic acids
CN108893466B (zh) * 2018-06-04 2021-04-13 上海奥根诊断技术有限公司 测序接头、测序接头组和超低频突变的检测方法
CN110827920B (zh) * 2018-08-14 2022-11-22 武汉华大医学检验所有限公司 测序数据分析方法和设备及高通量测序方法
CN110257480A (zh) * 2019-07-04 2019-09-20 北京京诺玛特科技有限公司 核酸序列测序接头及其构建测序文库的方法
CN110734908B (zh) * 2019-11-15 2021-06-08 福州福瑞医学检验实验室有限公司 高通量测序文库的构建方法以及用于文库构建的试剂盒
CN111073961A (zh) * 2019-12-20 2020-04-28 苏州赛美科基因科技有限公司 一种基因稀有突变的高通量检测方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107858414A (zh) * 2017-10-18 2018-03-30 广州漫瑞生物信息技术有限公司 一种高通量测序接头、其制备方法及其在超低频突变检测中的应用
CN111118001A (zh) * 2019-12-31 2020-05-08 苏州贝康医疗器械有限公司 一种多测序平台通用接头、适用于多测序平台的文库构建方法及试剂盒

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023092872A1 (zh) * 2021-11-26 2023-06-01 广州达安基因股份有限公司 基于已知标签的内参进行高通量测序的方法
WO2023092601A1 (zh) * 2021-11-29 2023-06-01 京东方科技集团股份有限公司 Umi分子标签及其应用、接头、接头连接试剂及试剂盒和文库构建方法
CN115831233A (zh) * 2023-02-07 2023-03-21 杭州联川基因诊断技术有限公司 一种基于mTag的靶向测序数据预处理的方法、设备和介质
CN117286231A (zh) * 2023-09-28 2023-12-26 广州精检生物技术有限公司 一种基于Ion Torrent测序平台的检测方法

Also Published As

Publication number Publication date
CN111471754A (zh) 2020-07-31
CN111471754B (zh) 2021-01-29

Similar Documents

Publication Publication Date Title
WO2021227129A1 (zh) 一种通用型高通量测序接头及其应用
CN108893466B (zh) 测序接头、测序接头组和超低频突变的检测方法
CN107190329B (zh) 基于dna的融合基因定量测序建库、检测方法及其应用
US11286524B2 (en) Multi-position double-tag connector set for detecting gene mutation and preparation method therefor and application thereof
WO2019114146A1 (zh) 基因目标区域富集方法及建库试剂盒
CN105442054B (zh) 对血浆游离dna进行多目标位点扩增建库的方法
CN109971827B (zh) 血浆dna的建库方法和建库试剂盒
CN109844137B (zh) 用于鉴定嵌合产物的条形码化环状文库构建
CN110036117A (zh) 通过多联短dna片段增加单分子测序的处理量的方法
WO2019144582A1 (zh) 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
CN110643680B (zh) 适用于超微量dna测序的接头及其应用
CN106939344B (zh) 用于二代测序的接头
CN113502287A (zh) 分子标签接头及测序文库的构建方法
CN113337501B (zh) 一种发卡型接头及其在双端index建库中的应用
CN113005121A (zh) 接头元件、试剂盒及其相关应用
CN110004225B (zh) 一种肿瘤化疗药个体化基因检测试剂盒、引物及方法
WO2020232635A1 (zh) 基于甲基化dna目标区域构建测序文库及***和应用
CN110564838A (zh) 用于新生儿糖原累积病基因分型的多重pcr引物***及其用途
CN113249437A (zh) 一种用于sRNA测序的建库方法
CN116445581A (zh) 少突胶质细胞瘤相关基因高通量扩增子文库的制备方法、多重pcr引物对及应用
CN113046835A (zh) 检测慢病毒***位点的测序文库构建方法和慢病毒***位点检测方法
CN111808855B (zh) 一种遗传性家族性高胆固醇血症的通用基因检测文库的构建方法及其试剂盒
CN108728515A (zh) 一种使用duplex方法检测ctDNA低频突变的文库构建和测序数据的分析方法
CN116246704B (zh) 用于胎儿无创产前检测的***
CN110423805B (zh) 用于新生儿黏多糖贮积症基因分型的多重pcr引物***及其用途

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20935873

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15/03/2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20935873

Country of ref document: EP

Kind code of ref document: A1