WO2014086037A1 - Method for constructing nucleic acid sequencing library and applications thereof - Google Patents

Method for constructing nucleic acid sequencing library and applications thereof Download PDF

Info

Publication number
WO2014086037A1
WO2014086037A1 PCT/CN2012/086164 CN2012086164W WO2014086037A1 WO 2014086037 A1 WO2014086037 A1 WO 2014086037A1 CN 2012086164 W CN2012086164 W CN 2012086164W WO 2014086037 A1 WO2014086037 A1 WO 2014086037A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequencing
nucleic acid
library
primer
tag
Prior art date
Application number
PCT/CN2012/086164
Other languages
French (fr)
Chinese (zh)
Inventor
刘琳
卢丽华
林丹妮
尹烨
陈叶观
何毅敏
Original Assignee
深圳华大基因科技服务有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳华大基因科技服务有限公司 filed Critical 深圳华大基因科技服务有限公司
Priority to PCT/CN2012/086164 priority Critical patent/WO2014086037A1/en
Publication of WO2014086037A1 publication Critical patent/WO2014086037A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1093General methods of preparing gene libraries, not provided for in other subgroups

Definitions

  • the present invention relates to the field of biotechnology, and in particular, to a method of constructing a nucleic acid sequencing library and its use, and more particularly to a method of constructing a nucleic acid sequencing library, a nucleic acid sequencing library, a nucleic acid sequencing method, and a method of determining nucleic acid sequence information.
  • Background technique
  • the second-generation sequencing technology represented by Illumina solexa, AB Solid, and oche 454, has greatly reduced the cost of sequencing and has grown rapidly in recent years and has become an important tool for genomics research. Compared to the Sanger sequencing technology of the chain termination method, the second generation sequencing technology adopts a technology strategy of sequencing while synthesizing.
  • the second generation of sequencing technology is characterized by high throughput, which can simultaneously sequence hundreds of millions of DNA fragments. Currently, a high-throughput sequencer can generate up to 200 Gb of data at a time, equivalent to one person's The whole genome was sequenced 65 times.
  • this high-throughput sequencing technique breaks the genome into a series of small fragments by ultrasound or other methods, and adds a linker to both sides of the small fragment, and then forms a bridge PCR or emulsion PCR amplification by a linker primer.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6.
  • the present invention provides six nucleic acid tags, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC ( SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ ID NO: 6)
  • ACTCTTAC SEQ ID NO: 1
  • GATGGACT SEQ ID NO: 2
  • TATGGTAG SEQ ID NO: 3
  • CCATATCC SEQ ID NO: 4
  • CTAGCGCT SEQ ID NO: 5
  • ATATAGA SEQ ID NO: 5
  • a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained.
  • the sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, from The nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and performing sequencing simultaneously, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
  • This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
  • the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 7-12.
  • the PCR primer as the 5' primer, the PCR primer described above can be introduced at the 5' end of the nucleic acid sequencing library by PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment with the DNA fragment of the nucleic acid sample or its equivalent.
  • the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • PCR primers are sometimes referred to herein as "PCR primers” or “tag primers” or "PCR tag primers”.
  • the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18.
  • the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent.
  • the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • the invention proposes a method of constructing a nucleic acid sequencing library.
  • the method comprises the following: fragmenting a nucleic acid sample to obtain a DNA fragment; end-repairing the DNA fragment to obtain a DNA fragment that has undergone end repair; adding at the end of the end-repaired DNA fragment Base A, in order to obtain a DNA fragment having a sticky end A; the two ends of the DNA fragment having the sticky end A are respectively linked to the first linker and the second linker to obtain a ligation product having a linker; 5, primer and 3 a primer for amplifying the ligation product having the linker to obtain an amplification product; and isolating the amplification product, the amplification product construct A nucleic acid sequencing library, wherein at least one of the first linker, the second linker, the 5' primer, and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag.
  • the nucleic acid tag is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the tags shown in the aforementioned SEQ ID NOS: 1 to 6 into the 5' primer and the 3' primer.
  • the 5' primer is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOS: 7 to 12
  • the 3' primer is selected from the group consisting of SEQ ID NO: 13- At least one of the oligonucleotides of the sequence shown in 18.
  • the nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby
  • the nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
  • the method of constructing a sequencing library may further have the following additional technical features:
  • the nucleic acid sample is a genomic DNA sample.
  • the nucleic acid sample is a human genomic DNA sample.
  • the DNA fragment is between 100 and 800 bp in length.
  • the fragmentation is carried out by at least one of atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion.
  • the end fragmenting of the DNA fragment is carried out by Klenow, T4 polymerase and T4 polynucleotide kinase.
  • the addition of base A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
  • the amplification uses PCR primers using the oligonucleotides shown in SEQ ID NOS: 7 to 12 as 5' primers, and the oligonucleotides shown in SEQ ID NOS: 13 to 18 are used. As a 3' primer.
  • the isolated amplification product is electrophoresed by using a 2% agarose gel. Purified by purification.
  • the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above.
  • the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
  • This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
  • the invention proposes a nucleic acid sequencing method.
  • the method comprises the steps of: constructing a sequencing library according to the method of constructing a sequencing library as described above for a nucleic acid sample; and performing sequencing on the sequencing library.
  • the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
  • the nucleic acid sample sequencing libraries derived from different samples can be mixed by the tag library technology, and simultaneously sequenced, and the nucleic acid sequences of the samples are classified based on the nucleic acid tags to obtain nucleic acid sequence information of various samples.
  • sequencing is performed by using a second-generation sequencing platform, and a nucleic acid sequencing library having a tag at both the 3' end and the 5' end is constructed, sometimes in this paper.
  • a nucleic acid sequencing library having a tag at both the 3' end and the 5' end is constructed, sometimes in this paper.
  • dual-label library is a single-label library that uses a linker ligation or PCR process to direct the tag to the 3' end of the library.
  • N libraries can be used for mixed sequencing using N tags, and using these N tags, N x N libraries can be mixed and sequenced using the dual-label library of the present invention.
  • the specific steps for sequencing a dual-label library of the invention are different from common single-label libraries.
  • the steps for sequencing a common single-label library are generally as follows: First, the tag is introduced to the 3' end of the library by a linker ligation or PCR process; then, the single-tag library is subjected to SBS sequencing, for example, in Illumina Solexa/Hiseq sequencing. The platform was sequenced by sequencing primers, and the synthesis/reading direction was 5'end ⁇ 3' end, and the synthesized sequence was sequentially the 5' end sequence of the insert ⁇ tag sequence ⁇ 3' end sequence.
  • sequencing the sequencing library of the present invention (ie, a dual-label library) further comprises: sequencing from the 5' end of the sequencing library to sequentially obtain the first sequencing data, the 3' end tag sequence of the 5' end And a 5' end tag sequence and a 2' end of the second sequencing data.
  • sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: sequencing using the first sequencing primer to obtain the 5' end First sequencing data; sequencing using a second sequencing primer to obtain a 3' end tag sequence; obtaining a 5' end tag sequence using a library coding strand synthesis; sequencing using a third sequencing primer to obtain a 3' end second sequencing Data, wherein the first sequencing primer binds to the 3' end of the library template strand, the second sequencing primer binds to the 5' end of the library template strand, and the third sequencing primer binds to the 3' end of the library coding strand.
  • sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: firstly, using the first sequencing primer to bind to the 3' end of the library template strand for sequencing, and synthesizing the first sequencing data Readl
  • the 5' end sequence information of the library coding strand is then sequenced by using the second sequencing primer binding to the 5' end of the library template strand to obtain the sequence information of the first tag sequencing data, ie, the 3' end tag of the library, and then deblocked, by sequencing An oligonucleotide chain on the chip that matches the 3' end of the library template strand, so that when the library coding strand is synthesized, the second tag sequencing data, that is, the sequence information of the 5' end tag of the library is read, and finally the library code is synthesized.
  • the third sequencing primer is used to bind to the 3' end of the library coding strand for sequencing, and the second sequencing data ead2, that is, the 3' end sequence
  • the invention proposes a method of determining nucleic acid sequence information.
  • the method comprises the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result.
  • the nucleic acid sequence information of a plurality of samples can be efficiently determined.
  • the invention proposes a kit for constructing a nucleic acid sequencing library.
  • the kit comprises: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NOs: 7-12 or an oligonucleotide represented by SEQ ID NO: 19; And a second PCR primer, wherein the second PCR primer is an oligonucleotide represented by SEQ ID NOS: 13-18.
  • the aforementioned label primer can be introduced by at least one of the 3' end and the 5' end of the nucleic acid sequencing library by PCR reaction, thereby equating the nucleic acid tag with the DNA fragment of the nucleic acid sample or the like.
  • the nucleic acids are sequenced to obtain a tagged nucleic acid sequencing library.
  • sequencing the nucleic acid sequencing library the sequence of the nucleic acid sample and the sequence of the tag can be obtained, and the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • FIG. 1 shows a schematic flow diagram of a method of constructing a sequencing library according to an embodiment of the present invention
  • FIG. 2 shows a library using a 5'-end or P5-end tag primer sequence of the present invention and using a common label according to an embodiment of the present invention.
  • Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, according to one embodiment of the present invention
  • Figure 4 is a graph showing the variation of the base distribution of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag according to one embodiment of the present invention
  • Figure 5 is a graph showing the variation of the library error rate with the number of cycles using a library of the 5'-end or P5-end tag primer sequences of the present invention and using a common tag, according to one embodiment of the present invention.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the indicated technology.
  • features defining “first,”, “second,” may include one or more of the features, either explicitly or implicitly.
  • the meaning of “plurality” is two or more unless otherwise stated.
  • the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6.
  • the present invention provides six nucleic acid tags listed in Table 1, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC (SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ ID NO: 6).
  • a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained.
  • the sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • the GT content of each base site on the mixed label must be considered. Since the excitation fluorescence of the bases G and T is the same in the Solexa/HiSeq sequencing process, the excitation lights of the bases A and C are the same, so the balance of the base "GT” content and the base “AC” content must be considered. Finally, consider the accuracy and repeatability of the data output. In the process of designing the label, the present invention fully considers the above factors, and avoids the occurrence of 3 or more consecutive identical bases between the label sequences. The appearance of this can reduce the error rate of the sequence during the synthesis process or during the sequencing process.
  • tags of the present invention are 8 bp in length, and the difference between them is more than 5 bases, and any one of the three 8 bp bases has a sequencing error or Synthetic errors do not affect the final identification of the label.
  • the present invention proposes a PCR primer carrying the above-mentioned tag, which can efficiently introduce the above tag into the 5' end or the 3' end of the sequencing library by PCR reaction.
  • the invention proposes a set of isolated PCR primers (also referred to as tag primers) consisting of oligonucleotides having the sequences set forth in SEQ ID NOS: 7-12.
  • the label primer described above can be introduced at the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining a DNA fragment by ligating the nucleic acid tag with the DNA fragment of the nucleic acid sample or its equivalent.
  • the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples.
  • the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18.
  • the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent.
  • the nucleic acid sequencing library of the tag by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • the efficiency and throughput of the technology reduces the cost of determining the sequence information of the nucleic acid sample.
  • TCTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 7
  • TTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 13) tag sequence (index2-2) TATGGTAG (SEQ ID NO: 2)
  • CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 8
  • GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 15) tag sequence (index2-4) CCATATCC (SEQ ID NO: 4)
  • CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 10
  • TTCAGACGTGTGCTCTTCCGATCT SEQ ID NO: 16
  • tag sequence index2-5
  • CTAGCGCT SEQ ID NO: 5
  • CTTTCCCTACACGACGCTCTTCCGATCT SEQ ID NO: 11
  • GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 17) tag sequence (index2-6) ATATAAGA (SEQ ID NO: 6)
  • the 5' end tag primer sequence is also referred to herein as the P5 end tag primer sequence
  • the 3' end tag primer sequence is also referred to herein as the P7 end tag primer sequence.
  • the invention proposes a kit for constructing a nucleic acid sequencing library.
  • the kit may comprise: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NO: 7-12 or an oligonucleoside represented by SEQ ID NO: Acid (5'- AATGATACGGCGACCA
  • the second PCR primer is the oligonucleotide shown in SEQ ID NO: 13-18.
  • nucleic acid sequencing library having a tag by linking the nucleic acid tag to a DNA fragment of the nucleic acid sample or an equivalent thereof, and sequencing the nucleic acid sequencing library
  • the sequence of the nucleic acid sample and the sequence of the tag, and further based on the sequence of the tag can accurately characterize the sample source of the nucleic acid sample.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, thereby being able to The nucleic acid sample sequencing library of the sample is mixed and simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag.
  • nucleic acid sequence information from a variety of samples, thereby making full use of high-throughput sequencing technologies, such as Solexa sequencing technology, while sequencing nucleic acid sequences of multiple samples, thereby improving the efficiency and throughput of high-throughput sequencing technologies.
  • the cost of determining the sequence information of the nucleic acid sample is reduced.
  • the method of constructing a nucleic acid sequencing library of the present invention may comprise the following steps: S100: Fragmentation
  • the nucleic acid sample is fragmented to obtain a DNA fragment.
  • the type of nucleic acid sample that can be processed is not limited in any way, and can be used for any common biological sample, such as plants, such as Arabidopsis thaliana, rice; animals such as humans, mice; microorganisms, for example E. coli and the like.
  • the nucleic acid sample is a genomic DNA sample.
  • the nucleic acid sample is a human genomic DNA sample.
  • the DNA fragment is between 100 and 800 bp in length, preferably at 500 bp. This can further improve the construction of sequencing libraries and subsequent sequencing s efficiency.
  • genomic DNA may be disrupted by any known method.
  • the fragmentation is at least by atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion.
  • One is carried out.
  • it is preferred to interrupt the group DNA by ultrasonication.
  • the inventors have found that the genomic DNA is disrupted by ultrasonic disruption, and the resulting fragment length is easily controlled and does not affect subsequent sequencing operations.
  • the DNA fragment is subjected to end repair to obtain a DNA fragment which has been repaired at the end.
  • Those skilled in the art can end-end the DNA fragments by any known method, and there are many commercial kits to choose from in the art.
  • the end fragmenting of the DNA fragment is by Klenow, T4 polymerase and T4 polynucleotide kinase.
  • a base A is added to the end of the end-repaired DNA fragment to obtain a DNA fragment having a sticky end A.
  • the end-repaired random fragment has two oligonucleotide strands, wherein base A is added at the 3' end of the two oligonucleotide strands.
  • base A can be added to both the 3' ends of both oligonucleotide strands.
  • the addition of A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
  • Both ends of the DNA fragment having the sticky end A are connected to the first linker and the second linker, respectively, to obtain a ligation product having a linker.
  • joints used herein those skilled in the art can select the procedures for adding the joints according to the sequencing platform used, and can also refer to the instructions provided by the manufacturer.
  • the ligation product having the linker is amplified using a 5' primer and a 3' primer to obtain an amplification product.
  • at least one of the first linker, the second linker, the 5' primer and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag.
  • the nucleic acid tag is at least one selected from the group consisting of the sequences having the sequences set forth in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the aforementioned label of SEQ ID NOS: 1-6 into the 5' primer and the 3' primer.
  • the 5' primer is at least one selected from the group consisting of the sequences having the sequences shown in SEQ ID NOS: 7 to 12, and the 3' primer is selected from the group consisting of SEQ ID NO: 13 ⁇ At least one of the oligonucleotides of the sequence shown in 18.
  • the oligonucleotide shown in SEQ ID NOS: 7 to 12 or the oligonucleotide shown in SEQ ID NO: 19 is used as a 5' primer, as shown by SEQ ID NOS: 13-18.
  • the PCR primer of the oligonucleotide serves as a 3' primer which carries the nucleic acid tag described above.
  • amplification is performed using SEQ ID NO:
  • the oligonucleotide shown in 7-12 was used as a 5' primer
  • a PCR primer using the oligonucleotide shown in SEQ ID NOS: 13-18 was used as a primer. It should be noted that these label primers were obtained by the inventors through a large number of screening work and were significantly superior to other primer combinations.
  • the amplification product is isolated, and the amplification product constitutes a nucleic acid sequencing library.
  • the method for separating and recovering the amplified product is also not particularly limited, and those skilled in the art can select an appropriate method and apparatus for separation according to the characteristics of the amplified product, for example, by electrophoresis and recycling of a specific length. The method of the target fragment is recycled.
  • the isolated amplification product is carried out by electrophoresis using a 2% agarose gel and purification.
  • the nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby
  • the nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library.
  • a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples.
  • the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
  • the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above.
  • the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags.
  • This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
  • the invention provides a nucleic acid sequencing method.
  • the method may comprise the steps of: constructing a sequencing library for the nucleic acid sample according to the method of constructing the sequencing library as described above; and sequencing the sequencing library.
  • the inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
  • the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology.
  • the mixture is simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag to obtain nucleic acid sequence information of a plurality of samples.
  • sequencing is performed using a second generation sequencing platform.
  • the nucleic acid sequencing library having the tagged 3' and 5' ends of the present invention can be sequenced, and the end of the 3' tag can be first sequenced by using the sequencing primer to obtain the first end.
  • Sequencing the data Readl and then sequencing the second end containing the 5' tag by using the sequencing primer to obtain the second end sequencing data Read2.
  • the sequencing library can be efficiently sequenced.
  • Sequencing primers can also be used for sequencing directly, depending on where the tag is set.
  • sequencing the sequencing library further comprises: in other words, in one embodiment of the present invention, sequencing is performed from the 5th end by using a sequencing primer first, and sequencing data of Readl is performed at the 5th end. The sequence of the 3' end tag is then obtained, and then the 5' end tag sequence is obtained. Finally, the 3' end is sequenced using the sequencing primer to obtain the sequencing data Read2.
  • the invention proposes a method of determining nucleic acid sequence information.
  • the method may comprise the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result.
  • the nucleic acid sequence information of a plurality of samples can be efficiently determined.
  • NanoDrop 1000 Take 1 ⁇ 2 ⁇ ⁇ human peripheral blood genomic DNA sample, use NanoDrop 1000 to measure sample concentration, OD260/280 Ratio, OD260/230 ratio, etc.
  • the samples were subjected to agarose gel electrophoresis.
  • the electrophoresis result and the measured OD value it is judged whether the total amount and quality of the sample are qualified, and whether or not the sample preparation can be judged.
  • Sample purity 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination; sample concentration: the concentration of the sample should be at least lOOng/ ⁇ ;
  • Sample amount To ensure the quality of the library was prepared, the total amount of sample required no less than 45 ⁇ ⁇ .
  • sample breaking methods Nebulization and Covaris, which can break the sample DNA into fragments ranging from 100 to 800 b and the main band is about 500 bp. If the sample is interrupted DNA, you can skip this step.
  • Adapter Adapter Oligo Mix is formed by annealing the commonly used sense sequence PE adapter F and the antisense sequence PE adapter R.
  • step 5 The sample obtained in step 5 was electrophoresed on a 2% agarose gel at 100 V for 120 min;
  • PC Primer 1 is the oligonucleotide shown in SEQ ID NO: 7 to 12 (label carrying the label of the present invention) TTCCGATCT (SEQ ID NO: 19, primers without a tag), and
  • the PCR product was electrophoresed on a 2% agarose gel at 100 V for 120 min, and the n+120 bp (n insert size) position was cut out, and then recovered using QIAquick Gel Extraction Kit (Qiagen).
  • the DNA was dissolved in 40 ⁇ l of ⁇ . in.
  • Sequencing Primer 1 5 - ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23). Also schedule the HiSeq2000 for sequencing according to the manufacturer's protocol.
  • the oligonucleotides shown in SEQ ID NOS: 7 to 12 were used as 5'-label primers and the conventional P7-end primer primer sequences were used as 3'-label primers, and the 5'-end introduction table was constructed as described in the above general method.
  • the sequencing library with the tag shown in 1 and the conventional tag at the 3' end corresponds to SEQ ID NO: 1-6, and is named as follows: METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI- 169, METqdiMBBPEI- 109.
  • the first sequencing primer when sequencing the above six libraries, is used to bind to the 3' end of the library template strand for sequencing, and the first sequencing data Readl is the 5' end of the library. Sequence information, and then using the second sequencing primer to bind to the 5' end of the library template strand for sequencing, to obtain the first tag sequencing data
  • the sequence information of the 3' end tag of the library is then blocked, by means of an oligonucleotide strand on the sequencing chip that matches the 3' end of the library template strand, so that, by using the library coding strand synthesis, by using the second sequencing primer
  • the synthesis can be continued to obtain the sequence information of the second tag sequencing data, that is, the 5' end tag of the library, and finally, after synthesizing the library coding strand, the third sequencing primer is used to bind to the 3' end of the library coding strand for synthesis and sequencing, and the second is obtained.
  • the sequencing data Read2 is the 3' end sequence information of the library.
  • First sequencing primer 5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23); Second sequencing primer: 5'-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (SEQ ID NO: 23);
  • Third sequencing primer 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 25).
  • the 5' end of the library refers to the “5' end of the library coding strand”
  • the "3' end of the library” refers to the 3' end of the library coding strand
  • '(End) end” or “5' (End) end” is not specifically limited, and refers to the 3' (end) or 5' (end) end of the coding strand of the library.
  • the construct has no tag at the 5' end but at 3' Sequencing libraries with conventional tags at the ends were named: CSZPE0120821001 and CSZPE0120821002.
  • RawCluster/Tile number of all identifiable DNA clusters/cells throughout the sequencing reaction
  • %PF The percentage of standard DNA clusters filtered by the entire sequencing reaction
  • %Index 0 mismatch The proportion of the number of mismatches in the tag is 0;
  • %Index 1 mismatch The proportion of the number of mismatches in the tag is 1.
  • Filtered adaptor Filtered connector.
  • the quality value can reflect the quality of the sequencing, between 0-40. Within this range, the higher the quality, the better.
  • Q20 refers to the proportion of bases with mass values greater than 20 in all bases, which can reflect the quality of the sequence sequenced. The closer the value is to 1, the more the sequencing quality is. it is good.
  • the ratio of (Index Omismatch+Index 1 mismatch ) of the 6 libraries (METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI-169, METqdiMBBPEI-109) introduced into the 5' or P5 end More than 90%, the label primer sequence was sequenced normally, demonstrating the availability of the nucleic acid tags, tag primers, and library construction methods and sequencing methods provided by the present invention.
  • Figures 2-5 show the library of the tag of the present invention introduced at the 5' end of the present embodiment
  • METqdiMBEPEI-46 and METqdiMBGPEI-137 libraries are representative of the results of mass comparisons with libraries that routinely introduce tags at the 3' end (CSZPE0120821001 and CSZPE0120821002). 2, FIG.
  • FIG. 2 is a diagram showing a library using the 5'-end or P5-end tag primer sequences of the present invention and a change in the library quality value using a common tag as a function of the number of cycles, wherein the abscissa indicates the number of cycles, and the ordinate indicates the quality condition;
  • Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, wherein the abscissa indicates the number of cycles and the ordinate indicates the light intensity.
  • Figure 4 shows a plot of the base distribution of the library using the 5' or P5 end tag primer sequences of the present invention and the number of cycles of the library using the common tag, where the abscissa Reads ) indicates the number of cycles run, and the ordinate (Percent ) indicates the percentage of different bases in this cycle.
  • FIG. 5 is a graph showing the change in the error rate of the library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using the common tag, wherein the cross-sitting (Position along reads) indicates the number of cycles to run, the ordinate (% Error-rate) indicates the error rate (ie, the proportion of sequencing errors occurred in this cycle), and the solid line indicates the error rate (Error ate, which is sequenced in this cycle). error occurrence rate), the broken line shows the nucleotide ratio (Blank rate) can not be analyzed, which shows the different libraries on the error rate in the regions other 1 j (error-rate along reads ) »
  • Sample purity 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination;
  • Sample size The minimum requirement is 50 ng.
  • the enzyme was digested with Transposomes, and the sample was digested with DNA to a fragment of about 300 bp.
  • the specific conditions are as follows:
  • the size of the fragment was detected using Bioanalyzer, which was approximately 150 bp to lkb.
  • the 50 ⁇ primer mixture includes the following:
  • PC primer 1 is: 5'-AATGATACGGCGACCACCGA (SEQ ID NO: 20);
  • PC primer 2 is: 5'-CAAGCAGAAGACGGCATACGA (SEQ ID NO: 21).
  • NNNNNNNN is at least one selected from the group consisting of SEQ ID NOS: 1 to 6).
  • the method for constructing a nucleic acid sequencing library of the present invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Plant Pathology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Provided are a method for constructing a nucleic acid sequencing library and applications thereof. The method for constructing a nucleic acid sequencing library comprises: fragmentizing a nucleic acid sample to obtain a DNA fragment; performing end repair on the DNA fragment, to obtain the DNA fragment after end repair; adding a basic group A at an end of the DNA fragment after end repair, to obtain the DNA fragment with a cohesive end A; connecting two ends of the DNA fragment with the cohesive end A to a first adaptor and a second adaptor respectively, to obtain a connection product with adaptors; amplifying the connection product with the adaptors by using a 5' primer and a 3' primer, to obtain amplified products; and separating the amplified products, and constructing a nucleic acid sequencing library with the amplified products, wherein at least one of the first adaptor, the second adaptor, the 5' primer, and the 3' primer comprise a nucleic acid tag, so that the amplified product contains at least one nucleic acid tag.

Description

构建核酸测序文库的方法及其应用 优先权信息  Method for constructing nucleic acid sequencing library and its application
无 技术领域  No technical field
本发明涉及生物技术领域, 具体而言, 涉及构建核酸测序文库的方法及其应用, 更 具体地, 涉及构建核酸测序文库的方法、 核酸测序文库、 核酸测序方法以及确定核酸序 列信息的方法。 背景技术  The present invention relates to the field of biotechnology, and in particular, to a method of constructing a nucleic acid sequencing library and its use, and more particularly to a method of constructing a nucleic acid sequencing library, a nucleic acid sequencing library, a nucleic acid sequencing method, and a method of determining nucleic acid sequence information. Background technique
以 Illumina solexa、 AB Solid和 oche 454为代表的第二代测序技术使测序成本大 大降低,在近几年得到快速发展,并成为基因组学研究的重要工具。与链终止法的 Sanger 测序技术相比, 第二代测序技术采用边合成边测序的技术策略。 第二代测序技术最大的 特点是高通量, 其可对数以亿计的 DNA片段同时进行测序, 目前一台高通量测序仪一 次可产生高达 200 Gb的数据, 相当于将一个人的全基因组测序 65次。 然而这种高通量 的测序技术是通过超声波或其他方法将基因组打断成一系列的小片段,并在小片段的两 侧加上接头, 然后通过接头引物进行桥式 PCR或 emulsion PCR扩增形成测序的基本单 位, 再根据接头上的部分序列设计公共测序引物, 对基因组 DNA进行测序。  The second-generation sequencing technology, represented by Illumina solexa, AB Solid, and oche 454, has greatly reduced the cost of sequencing and has grown rapidly in recent years and has become an important tool for genomics research. Compared to the Sanger sequencing technology of the chain termination method, the second generation sequencing technology adopts a technology strategy of sequencing while synthesizing. The second generation of sequencing technology is characterized by high throughput, which can simultaneously sequence hundreds of millions of DNA fragments. Currently, a high-throughput sequencer can generate up to 200 Gb of data at a time, equivalent to one person's The whole genome was sequenced 65 times. However, this high-throughput sequencing technique breaks the genome into a series of small fragments by ultrasound or other methods, and adds a linker to both sides of the small fragment, and then forms a bridge PCR or emulsion PCR amplification by a linker primer. The basic unit of sequencing, and then designing common sequencing primers based on partial sequences on the linker, sequencing the genomic DNA.
然而, 目前对核酸样本进行构建测序文库和测序的方法仍有待改进。 发明内容  However, methods for constructing sequencing libraries and sequencing of nucleic acid samples are still to be improved. Summary of the invention
本发明旨在至少解决现有技术中存在的技术问题之一。  The present invention aims to solve at least one of the technical problems existing in the prior art.
在本发明的第一方面, 本发明提出了一种分离的核酸标签, 其由具有 SEQ ID NO: 1-6 所示序列的寡核苷酸构成。 由此, 根据本发明的实施例, 本发明提供了六种核酸标签, 即: ACTCTTAC ( SEQ ID NO: 1 ), GATGGACT ( SEQ ID NO: 2 ), TATGGTAG ( SEQ ID NO: 3 )、 CCATATCC ( SEQ ID NO: 4 )、 CTAGCGCT ( SEQ ID NO: 5 ) 以及 ATATAAGA ( SEQ In a first aspect of the invention, the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6. Thus, according to an embodiment of the present invention, the present invention provides six nucleic acid tags, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC ( SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ
ID NO: 6 )。 发明人发现, 利用根据本发明实施例的核酸标签, 通过将核酸标签与核酸样本 的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序文库进行测 序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列可以精确地表征核酸 样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测序文库, 从 而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸标 签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可以充分利用高通量 的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提高 了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的成本。 ID NO: 6). The inventors have found that, by using a nucleic acid tag according to an embodiment of the present invention, a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained. The sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, from The nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and performing sequencing simultaneously, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
在本发明的第二方面,本发明提出了一组分离的 PCR引物,其由具有 SEQ ID NO: 7-12 所示序列的寡核苷酸构成。 利用该组 PCR引物作为 5'引物, 可以通过 PCR反应, 在核酸测 序文库的 5'端引入前面所述的 PCR引物, 从而通过将核酸标签与核酸样本的 DNA片段或 其等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序文库进行测序, 可以获得 核酸样本的序列以及标签的序列, 进而基于标签的序列可以精确地表征核酸样本的样品来 源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测序文库, 从而可以通过将 来源于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸标签对样品的核 酸序列进行分类, 获得多种样品的核酸序列信息。 从而可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提高了通过高通量 测序技术的效率和通量, 降低了确定核酸样本序列信息的成本。 其中, 需要说明的是, 在 本文中有时也将 PCR引物称为 "PCR引物" 或 "标签引物" 或 "PCR标签引物"。  In a second aspect of the invention, the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 7-12. By using the PCR primer as the 5' primer, the PCR primer described above can be introduced at the 5' end of the nucleic acid sequencing library by PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment with the DNA fragment of the nucleic acid sample or its equivalent. The nucleic acid sequencing library of the tag, by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of. Among them, it should be noted that PCR primers are sometimes referred to herein as "PCR primers" or "tag primers" or "PCR tag primers".
在本发明的第三方面,本发明提出了一组分离的 PCR引物,其由具有 SEQ ID NO: 13-18 所示序列的寡核苷酸构成。 利用该组 PCR引物作为 3'引物, 可以通过 PCR反应, 在核酸测 序文库的 3'端引入前面所述的标签引物,从而通过将核酸标签与核酸样本的 DNA片段或其 等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序文库进行测序, 可以获得核 酸样本的序列以及标签的序列, 进而基于标签的序列可以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测序文库, 从而可以通过将来源 于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸标签对样品的核酸序 列进行分类, 获得多种样品的核酸序列信息。 从而可以充分利用高通量的测序技术, 例如 利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提高了通过高通量测序 技术的效率和通量, 降低了确定核酸样本序列信息的成本。  In a third aspect of the invention, the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18. Using the set of PCR primers as a 3' primer, the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent. The nucleic acid sequencing library of the tag, by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
在本发明的第四方面, 本发明提出了一种构建核酸测序文库的方法。 根据本发明的实 施例, 该方法包括下列: 将核酸样品进行片段化, 以便获得 DNA片段; 将 DNA片段进行 末端修复, 以便获得经过末端修复的 DNA片段; 在经过末端修复的 DNA片段的末端添加 碱基 A, 以便获得具有粘性末端 A的 DNA片段; 将具有粘性末端 A的 DNA片段的两端分 别与第一接头和第二接头相连, 以便获得具有接头的连接产物; 采用 5,引物和 3,引物对该 具有接头的连接产物进行扩增, 以便获得扩增产物; 以及分离该扩增产物, 该扩增产物构 成核酸测序文库, 其中, 第一接头、 第二接头、 5'引物和 3'引物的至少之一包含核酸标签, 以便该扩增产物含有至少一个核酸标签。 其中, 根据本发明的一些实施例, 上述核酸标签 为选自具有 SEQ ID NO: 1-6所示序列的寡核苷酸的至少之一。根据本发明的实施例, 优选 同时将前述 SEQ ID NO: 1~6所示的标签引入 5'引物和 3'引物。 才艮据本发明的一些优选示 例, 5'引物为选自具有 SEQ ID NO: 7~12所示序列的寡核苷酸的至少之一, 3'引物为选自 具有 SEQ ID NO: 13-18所示序列的寡核苷酸的至少之一。 In a fourth aspect of the invention, the invention proposes a method of constructing a nucleic acid sequencing library. According to an embodiment of the invention, the method comprises the following: fragmenting a nucleic acid sample to obtain a DNA fragment; end-repairing the DNA fragment to obtain a DNA fragment that has undergone end repair; adding at the end of the end-repaired DNA fragment Base A, in order to obtain a DNA fragment having a sticky end A; the two ends of the DNA fragment having the sticky end A are respectively linked to the first linker and the second linker to obtain a ligation product having a linker; 5, primer and 3 a primer for amplifying the ligation product having the linker to obtain an amplification product; and isolating the amplification product, the amplification product construct A nucleic acid sequencing library, wherein at least one of the first linker, the second linker, the 5' primer, and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag. Wherein according to some embodiments of the present invention, the nucleic acid tag is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the tags shown in the aforementioned SEQ ID NOS: 1 to 6 into the 5' primer and the 3' primer. According to some preferred examples of the present invention, the 5' primer is at least one selected from the group consisting of the oligonucleotides having the sequences shown in SEQ ID NOS: 7 to 12, and the 3' primer is selected from the group consisting of SEQ ID NO: 13- At least one of the oligonucleotides of the sequence shown in 18.
利用该方法能够有效地构建用于核酸测序的核酸测序文库, 并且能够通过 PCR反应, 有效地在核酸测序文库的 3'端和 5'端的至少之一引入前面所述的标签引物, 从而通过将核 酸标签与核酸样本的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对核 酸测序文库进行测序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列可 以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样 本的测序文库, 从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行 测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可 以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进 行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的 成本。 另外, 发明人惊奇地发现, 当针对相同的样品, 基于上述方法, 采用具有不同标签 的寡核苷酸构建含有各种核酸标签的核酸测序文库时, 所得到的测序数据结果的稳定性和 可重复性非常好, 因而可以实现多个样品在同一反应体系中进行序列测序。  The nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby The nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library. By sequencing the nucleic acid sequencing library, the sequence of the nucleic acid sample and the sequence of the tag can be obtained, and the sequence based on the tag can be accurately characterized. Sample source for nucleic acid samples. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of. In addition, the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
根据本发明的实施例, 该构建测序文库的方法还可以具有下列附加技术特征: 在本发明的一个实施例中, 所述核酸样品为基因组 DNA样品。  According to an embodiment of the present invention, the method of constructing a sequencing library may further have the following additional technical features: In one embodiment of the invention, the nucleic acid sample is a genomic DNA sample.
在本发明的一个实施例中, 所述核酸样品为人基因组 DNA样品。  In one embodiment of the invention, the nucleic acid sample is a human genomic DNA sample.
在本发明的一个实施例中, 所述 DNA片段长度为 100~800bp。  In one embodiment of the invention, the DNA fragment is between 100 and 800 bp in length.
在本发明的一个实施例中, 所述片段化是通过雾化、 超声片段化、 HydroShear和酶切 处理的至少一种进行的。  In one embodiment of the invention, the fragmentation is carried out by at least one of atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion.
在本发明的一个实施例中, 所述将所述 DNA片段进行末端修复是通过 Klenow、 T4聚 合酶和 T4多聚核苷酸激酶进行的。  In one embodiment of the invention, the end fragmenting of the DNA fragment is carried out by Klenow, T4 polymerase and T4 polynucleotide kinase.
在本发明的一个实施例中,所述在所述经过末端修复的 DNA片段的末端添加碱基 A是 利用 Klenow Frgment (3'-5'exo-)聚合酶进行的。  In one embodiment of the invention, the addition of base A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
在本发明的一个实施例中, 所述扩增采用采用 SEQ ID NO: 7~12所示寡核苷酸作为 5' 引物, 采用 SEQ ID NO: 13~18所示寡核苷酸的 PCR引物作为 3'引物。  In one embodiment of the present invention, the amplification uses PCR primers using the oligonucleotides shown in SEQ ID NOS: 7 to 12 as 5' primers, and the oligonucleotides shown in SEQ ID NOS: 13 to 18 are used. As a 3' primer.
在本发明的一个实施例中, 所述分离扩增产物是通过利用 2%琼脂糖凝胶进行电泳并进 行纯化进行的。 In one embodiment of the invention, the isolated amplification product is electrophoresed by using a 2% agarose gel. Purified by purification.
在本发明的第五方面, 本发明提出了一种核酸测序文库, 其是根据前面所述构建测序 文库的方法构建的。 发明人发现所构建的基因组测序文库适用于第二代测序技术, 尤其是 solexa测序技术。从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行 测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可 以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进 行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的 成本。  In a fifth aspect of the invention, the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above. The inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
在本发明的第六方面, 本发明提出了一种核酸测序方法。 根据本发明的实施例, 该方 法包括以下步骤: 针对核酸样本, 根据前面所述的构建测序文库的方法构建测序文库; 以 及针对所述测序文库进行测序。 发明人发现所构建的基因组测序文库适用于第二代测序技 术, 尤其是 solexa测序技术。 可以通过标签文库技术将来源于不同样品的核酸样本测序文 库进行混合, 同时进行测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的 核酸序列信息。 从而可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对 多种样品的核酸序列进行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确 定核酸样本序列信息的成本。  In a sixth aspect of the invention, the invention proposes a nucleic acid sequencing method. According to an embodiment of the present invention, the method comprises the steps of: constructing a sequencing library according to the method of constructing a sequencing library as described above for a nucleic acid sample; and performing sequencing on the sequencing library. The inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. The nucleic acid sample sequencing libraries derived from different samples can be mixed by the tag library technology, and simultaneously sequenced, and the nucleic acid sequences of the samples are classified based on the nucleic acid tags to obtain nucleic acid sequence information of various samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
根据本发明的优选实施例, 本发明的核酸测序方法中, 测序是利用第二代测序平台进 行的, 构建获得的是 3'端和 5'端均具有标签的核酸测序文库, 在本文中有时也称为 "双标 签文库"。 需要说明的是, 常见的标签文库是单标签文库, 其利用接头连接或 PCR过程将标 签引到文库的 3'端。对于常见的单标签文库,使用 N个标签能够使 N个文库进行混合测序, 而同样使用这 N个标签, 利用本发明的双标签文库能够使 N X N个文库进行混合测序。  According to a preferred embodiment of the present invention, in the nucleic acid sequencing method of the present invention, sequencing is performed by using a second-generation sequencing platform, and a nucleic acid sequencing library having a tag at both the 3' end and the 5' end is constructed, sometimes in this paper. Also known as "dual-label library." It should be noted that a common tag library is a single-label library that uses a linker ligation or PCR process to direct the tag to the 3' end of the library. For a common single-label library, N libraries can be used for mixed sequencing using N tags, and using these N tags, N x N libraries can be mixed and sequenced using the dual-label library of the present invention.
此外, 根据本发明的一些实施例, 针对本发明的双标签文库进行测序的具体步骤, 与 常见的单标签文库不同。 具体地, 对常见单标签文库进行测序的步骤一般为: 首先, 利用 接头连接或 PCR过程将标签引到文库的 3'端; 接着, 对单标签文库进行 SBS测序, 例如在 Illumina Solexa/Hiseq测序平台,利用测序引物边合成边测序,合成 /读取方向为 5'端→3'端, 合成读取的序列依次为***片段的 5'端末端序列→标签序列→3'端末端序列。而根据本发明 的优选实施例, 对本发明的双标签文库进行合成测序时, 合成 /读取的方向均为 5'端→3'端。 根据本发明的一个实施例, 对本发明的测序文库(即双标签文库)进行测序进一步包括: 从测序文库的 5'端开始测序, 以便依次获得 5'端的第一测序数据、 3'端标签序列及 5'端标 签序列以及 3'端的第二测序数据。 具体地, 根据本发明的一个实施例, 对本发明的测序文 库(即双标签文库)进行测序进一步包括: 利用第一测序引物进行测序, 以便获得 5'端的 第一测序数据; 利用第二测序引物进行测序, 以便获得 3'端标签序列; 利用文库编码链的 合成获得 5'端标签序列; 利用第三测序引物进行测序, 以便获得 3'端的第二测序数据, 其 中, 第一测序引物与文库模板链的 3'端结合, 第二测序引物与文库模板链的 5'端结合, 第 三测序引物与文库编码链的 3'端结合。 根据本发明的另一个实施例, 对本发明的测序文库 (即双标签文库)进行测序进一步包括: 首先利用第一测序引物结合文库模板链的 3'端进 行测序, 合成得到第一测序数据 Readl 即文库编码链的 5'末端序列信息, 接着利用第二测 序引物结合文库模板链的 5'端进行测序, 得到第一标签测序数据即文库 3'端标签的序列信 息, 然后去阻断, 借助测序芯片上与文库模板链 3'端相匹配的寡核苷酸链, 使得, 利用文 库编码链合成之际, 读取获得第二标签测序数据即文库 5'端标签的序列信息, 最后合成文 库编码链后, 利用第三测序引物结合文库编码链的 3'端进行合成测序, 得到第二测序数据 ead2即文库的 3'端序列信息。 Moreover, according to some embodiments of the invention, the specific steps for sequencing a dual-label library of the invention are different from common single-label libraries. Specifically, the steps for sequencing a common single-label library are generally as follows: First, the tag is introduced to the 3' end of the library by a linker ligation or PCR process; then, the single-tag library is subjected to SBS sequencing, for example, in Illumina Solexa/Hiseq sequencing. The platform was sequenced by sequencing primers, and the synthesis/reading direction was 5'end→3' end, and the synthesized sequence was sequentially the 5' end sequence of the insert→tag sequence→3' end sequence. According to a preferred embodiment of the present invention, when the double-label library of the present invention is synthesized and sequenced, the synthesis/reading direction is 5'end→3' end. According to an embodiment of the present invention, sequencing the sequencing library of the present invention (ie, a dual-label library) further comprises: sequencing from the 5' end of the sequencing library to sequentially obtain the first sequencing data, the 3' end tag sequence of the 5' end And a 5' end tag sequence and a 2' end of the second sequencing data. Specifically, according to an embodiment of the present invention, sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: sequencing using the first sequencing primer to obtain the 5' end First sequencing data; sequencing using a second sequencing primer to obtain a 3' end tag sequence; obtaining a 5' end tag sequence using a library coding strand synthesis; sequencing using a third sequencing primer to obtain a 3' end second sequencing Data, wherein the first sequencing primer binds to the 3' end of the library template strand, the second sequencing primer binds to the 5' end of the library template strand, and the third sequencing primer binds to the 3' end of the library coding strand. According to another embodiment of the present invention, sequencing the sequencing library (ie, the dual-label library) of the present invention further comprises: firstly, using the first sequencing primer to bind to the 3' end of the library template strand for sequencing, and synthesizing the first sequencing data Readl The 5' end sequence information of the library coding strand is then sequenced by using the second sequencing primer binding to the 5' end of the library template strand to obtain the sequence information of the first tag sequencing data, ie, the 3' end tag of the library, and then deblocked, by sequencing An oligonucleotide chain on the chip that matches the 3' end of the library template strand, so that when the library coding strand is synthesized, the second tag sequencing data, that is, the sequence information of the 5' end tag of the library is read, and finally the library code is synthesized. After the chain, the third sequencing primer is used to bind to the 3' end of the library coding strand for sequencing, and the second sequencing data ead2, that is, the 3' end sequence information of the library is obtained.
在本发明的第七方面, 本发明提出了一种确定核酸序列信息的方法。根据本发明的实施 例, 该方法包括以下步骤: 针对核酸样本, 根据前面所述的方法进行测序, 以便获得测序 结果; 以及基于所述测序结果, 确定所述核酸样本的序列信息。 由此, 可以有效地确定多 个样本的核酸序列信息。  In a seventh aspect of the invention, the invention proposes a method of determining nucleic acid sequence information. According to an embodiment of the invention, the method comprises the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result. Thereby, the nucleic acid sequence information of a plurality of samples can be efficiently determined.
在本发明的第八方面, 本发明提出了一种用于构建核酸测序文库的试剂盒。 根据本发 明的实施例, 该试剂盒包括: 第一 PCR引物, 所述第一 PCR引物为 SEQ ID NO: 7-12所 示寡核苷酸或者 SEQ ID NO: 19所示寡核苷酸; 以及第二 PCR引物, 所述第二 PCR引物 为 SEQ ID NO: 13~18所示寡核苷酸。 利用该组 PCR引物作为引物, 可以通过 PCR反应, 在核酸测序文库的 3'端和 5'端的至少之一引入前面所述的标签引物, 从而通过将核酸标签 与核酸样本的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序 文库进行测序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列可以精确 地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测 序文库, 从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可以充分 利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的成本。  In an eighth aspect of the invention, the invention proposes a kit for constructing a nucleic acid sequencing library. According to an embodiment of the present invention, the kit comprises: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NOs: 7-12 or an oligonucleotide represented by SEQ ID NO: 19; And a second PCR primer, wherein the second PCR primer is an oligonucleotide represented by SEQ ID NOS: 13-18. Using the set of PCR primers as primers, the aforementioned label primer can be introduced by at least one of the 3' end and the 5' end of the nucleic acid sequencing library by PCR reaction, thereby equating the nucleic acid tag with the DNA fragment of the nucleic acid sample or the like. The nucleic acids are sequenced to obtain a tagged nucleic acid sequencing library. By sequencing the nucleic acid sequencing library, the sequence of the nucleic acid sample and the sequence of the tag can be obtained, and the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得 明显, 或通过本发明的实践了解到。 附图说明 本发明的上述和 /或附加的方面和优点从结合下面附图对实施例的描述中将变得明 显和容易理解, 其中: The additional aspects and advantages of the invention will be set forth in part in the description which follows. DRAWINGS The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from
图 1显示了根据本发明一个实施例的构建测序文库的方法的流程示意图; 图 2显示了根据本发明一个实施例, 使用本发明的 5'端或 P5端标签引物序列的文库 和使用普通标签的文库质量值随循环数的变化的图;  1 shows a schematic flow diagram of a method of constructing a sequencing library according to an embodiment of the present invention; FIG. 2 shows a library using a 5'-end or P5-end tag primer sequence of the present invention and using a common label according to an embodiment of the present invention. a plot of the library quality value as a function of the number of cycles;
图 3显示了根据本发明一个实施例, 使用本发明的 5'端或 P5端标签引物序列的文库 和使用普通标签的文库光强随循环数的变化的图;  Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, according to one embodiment of the present invention;
图 4显示了根据本发明一个实施例, 使用本发明的 5'端或 P5端标签引物序列的文库 和使用普通标签的文库碱基分布随循环数的变化的图;  Figure 4 is a graph showing the variation of the base distribution of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag according to one embodiment of the present invention;
图 5显示了根据本发明一个实施例, 使用本发明的 5'端或 P5端标签引物序列的文库 和使用普通标签的文库错误率随循环数的变化的图。 发明详细描述  Figure 5 is a graph showing the variation of the library error rate with the number of cycles using a library of the 5'-end or P5-end tag primer sequences of the present invention and using a common tag, according to one embodiment of the present invention. Detailed description of the invention
下面详细描述本发明的实施例, 需要说明的是, 术语 "第一,, 、 "第二,, 仅用于描 述目的, 而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。 由此, 限定有 "第一,, 、 "第二,, 的特征可以明示或者隐含地包括一个或者更多个该特 征。 进一步地, 在本发明的描述中, 除非另有说明, "多个" 的含义是两个或两个以上。  The embodiments of the present invention are described in detail below. It should be noted that the terms "first," and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the indicated technology. The number of features. Thus, features defining "first,", "second," may include one or more of the features, either explicitly or implicitly. Further, in the description of the present invention, the meaning of "plurality" is two or more unless otherwise stated.
核酸标签和标签引物  Nucleic acid label and label primer
在本发明的第一方面, 本发明提出了一种分离的核酸标签, 其由具有 SEQ ID NO: 1-6 所示序列的寡核苷酸构成。 由此, 根据本发明的实施例, 本发明提供了表 1 中所列出的六 种核酸标签, 即: ACTCTTAC ( SEQ ID NO: 1 ), GATGGACT ( SEQ ID NO: 2 ), TATGGTAG ( SEQ ID NO: 3 ), CCATATCC ( SEQ ID NO: 4 ), CTAGCGCT ( SEQ ID NO: 5 ) 以及 ATATAAGA ( SEQ ID NO: 6 )。 发明人发现, 利用根据本发明实施例的核酸标签, 通过将 核酸标签与核酸样本的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对 核酸测序文库进行测序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列 可以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸 样本的测序文库, 从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进 行测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而 可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列 进行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息 的成本。 标签设计首先需要考虑标签序列之间的序列差异程度和碱基识别率。 在标签混合 量少于 6个样品的情况下, 必须考虑到混合后的标签上的每个碱基位点的 GT含量。 因 为 Solexa/HiSeq测序过程中, 碱基 G和 T的激发荧光一样, 碱基 A和 C的激发光是一 样的, 因此必须考虑碱基 "GT" 含量与碱基 "AC" 含量的 "平衡,,, 最后考虑数据产出 的准确性和可重复性。 在设计标签的过程中, 本发明充分考虑到以上几个因素, 同时避 免了标签序列之间出现 3或 3个以上连续的相同碱基的出现, 这样可以降低序列在合成 过程中或测序过程中的错误率。 标签序列本身嵌入引物中, 也要尽可能的避免出现发夹 结构或与测序引物及其反向互补序列相同的现象。 另外, 发明人发现本发明的这些标签 的长度为 8个 bp, 并且它们之间的差异在 5个碱基以上, 当 3种 8个 bp的碱基中的任意 1 个碱基出现测序错误或合成错误, 都不影响到标签的最终识别。 In a first aspect of the invention, the invention proposes an isolated nucleic acid tag consisting of an oligonucleotide having the sequence set forth in SEQ ID NO: 1-6. Thus, according to an embodiment of the present invention, the present invention provides six nucleic acid tags listed in Table 1, namely: ACTCTTAC (SEQ ID NO: 1), GATGGACT (SEQ ID NO: 2), TATGGTAG (SEQ ID NO: 3), CCATATCC (SEQ ID NO: 4), CTAGCGCT (SEQ ID NO: 5) and ATATAGA (SEQ ID NO: 6). The inventors have found that, by using a nucleic acid tag according to an embodiment of the present invention, a nucleic acid tag can be obtained by ligating a nucleic acid tag with a DNA fragment of a nucleic acid sample or an equivalent thereof to obtain a nucleic acid sequencing library having a tag, and by sequencing the nucleic acid sequencing library, a nucleic acid sample can be obtained. The sequence and the sequence of the tag, and thus the sequence based on the tag, can accurately characterize the sample source of the nucleic acid sample. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of. Label design first needs to consider the degree of sequence difference and base recognition rate between tag sequences. In the case where the label is mixed in less than 6 samples, the GT content of each base site on the mixed label must be considered. Since the excitation fluorescence of the bases G and T is the same in the Solexa/HiSeq sequencing process, the excitation lights of the bases A and C are the same, so the balance of the base "GT" content and the base "AC" content must be considered. Finally, consider the accuracy and repeatability of the data output. In the process of designing the label, the present invention fully considers the above factors, and avoids the occurrence of 3 or more consecutive identical bases between the label sequences. The appearance of this can reduce the error rate of the sequence during the synthesis process or during the sequencing process. The tag sequence itself is embedded in the primer, and the hairpin structure or the same phenomenon as the sequencing primer and its reverse complementary sequence should be avoided as much as possible. In addition, the inventors have found that the tags of the present invention are 8 bp in length, and the difference between them is more than 5 bases, and any one of the three 8 bp bases has a sequencing error or Synthetic errors do not affect the final identification of the label.
接下来, 本发明提出了携带上述标签的 PCR引物, 可以有效地通过 PCR反应, 将上 述标签引入到测序文库的 5'端或者 3'端。  Next, the present invention proposes a PCR primer carrying the above-mentioned tag, which can efficiently introduce the above tag into the 5' end or the 3' end of the sequencing library by PCR reaction.
在本发明的第二方面, 本发明提出了一组分离的 PCR引物(也称为标签引物), 其由具 有 SEQ ID NO: 7~12所示序列的寡核苷酸构成。 利用该组 PCR引物作为 5'引物, 可以通过 PCR反应, 在核酸测序文库的 5'端引入前面所述的标签引物, 从而通过将核酸标签与核酸 样本的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序文库进 行测序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列可以精确地表征 核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测序文库, 从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸 标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可以充分利用高通 量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提 高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的成本。  In a second aspect of the invention, the invention proposes a set of isolated PCR primers (also referred to as tag primers) consisting of oligonucleotides having the sequences set forth in SEQ ID NOS: 7-12. Using the set of PCR primers as a 5' primer, the label primer described above can be introduced at the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining a DNA fragment by ligating the nucleic acid tag with the DNA fragment of the nucleic acid sample or its equivalent. The nucleic acid sequencing library of the tag, by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
在本发明的第三方面,本发明提出了一组分离的 PCR引物,其由具有 SEQ ID NO: 13-18 所示序列的寡核苷酸构成。 利用该组 PCR引物作为 3'引物, 可以通过 PCR反应, 在核酸测 序文库的 3'端引入前面所述的标签引物,从而通过将核酸标签与核酸样本的 DNA片段或其 等同物相连, 得到具有标签的核酸测序文库, 通过对核酸测序文库进行测序, 可以获得核 酸样本的序列以及标签的序列, 进而基于标签的序列可以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样本的测序文库, 从而可以通过将来源 于不同样品的核酸样本测序文库进行混合, 同时进行测序, 基于核酸标签对样品的核酸序 列进行分类, 获得多种样品的核酸序列信息。 从而可以充分利用高通量的测序技术, 例如 利用 Solexa测序技术, 同时对多种样品的核酸序列进行测序, 从而提高了通过高通量测序 技术的效率和通量, 降低了确定核酸样本序列信息的成本。 In a third aspect of the invention, the invention proposes a set of isolated PCR primers consisting of oligonucleotides having the sequences set forth in SEQ ID NOs: 13-18. Using the set of PCR primers as a 3' primer, the label primer described above can be introduced at the 3' end of the nucleic acid sequencing library by a PCR reaction, thereby obtaining the nucleic acid tag by linking the DNA fragment to the DNA fragment of the nucleic acid sample or its equivalent. The nucleic acid sequencing library of the tag, by sequencing the nucleic acid sequencing library, can obtain the sequence of the nucleic acid sample and the sequence of the tag, and then the sample source of the nucleic acid sample can be accurately characterized based on the sequence of the tag. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby improving high-throughput sequencing. The efficiency and throughput of the technology reduces the cost of determining the sequence information of the nucleic acid sample.
关于上述标签, 引物的核苷酸序列总结如下表 1 :  For the above labels, the nucleotide sequences of the primers are summarized in Table 1 below:
表 1  Table 1
标签序列 (index2-l ) ACTCTTAC ( SEQ ID NO: 1 ) Tag sequence (index2-l) ACTCTTAC (SEQ ID NO: 1)
5'端标签引物序列 * 5'-AATGATACGGCGACCACCGAGATCTACACACTCTTAC  5' end tag primer sequence * 5'-AATGATACGGCGACCACCGAGATCTACACACTCTTAC
TCTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 7 ) TCTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 7 )
3'端标签引物序列 ** CAAGCAGAAGACGGCATACGAGATACTCTTACGTGACTGGAG 3' end tag primer sequence ** CAAGCAGAAGACGGCATACGAGATACTCTTACGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 13 ) 标签序列 (index2-2 ) TATGGTAG ( SEQ ID NO: 2 )  TTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 13) tag sequence (index2-2) TATGGTAG (SEQ ID NO: 2)
5'端标签引物序列 5'-AATGATACGGCGACCACCGAGATCTACACTATGGTAG  5' end tag primer sequence 5'-AATGATACGGCGACCACCGAGATCTACACTATGGTAG
CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 8 ) CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 8 )
3'端标签引物序列 CAAGCAGAAGACGGCATACGAGATTATGGTAGGTGACTGGAG 3' end tag primer sequence CAAGCAGAAGACGGCATACGAGATTATGGTAGGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 14 ) 标签序列 (index2-3 ) GATGGACT ( SEQ ID NO: 3 )  TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 14 ) tag sequence (index2-3 ) GATGGACT ( SEQ ID NO: 3 )
5'端标签引物序列 5'-AATGATACGGCGACCACCGAGATCTACACGATGGACTTCTT  5' end tag primer sequence 5'-AATGATACGGCGACCACCGAGATCTACACGATGGACTTCTT
TCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 9 ) TCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 9 )
3'端标签引物序列 CAAGCAGAAGACGGCATACGAGATGATGGACTGTGACTGGA 3' end tag primer sequence CAAGCAGAAGACGGCATACGAGATGATGGACTGTGACTGGA
GTTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 15 ) 标签序列 (index2-4 ) CCATATCC ( SEQ ID NO: 4 )  GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 15) tag sequence (index2-4) CCATATCC (SEQ ID NO: 4)
5'端标签引物序列 5'-AATGATACGGCGACCACCGAGATCTACACCCATATCC  5' end tag primer sequence 5'-AATGATACGGCGACCACCGAGATCTACACCCATATCC
CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 10 ) CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 10 )
3'端标签引物序列 CAAGCAGAAGACGGCATACGAGATCCATATCCGTGACTGGAG 3' end tag primer sequence CAAGCAGAAGACGGCATACGAGATCCATATCCGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 16 ) 标签序列 (index2-5 ) CTAGCGCT ( SEQ ID NO: 5 )  TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 16 ) tag sequence (index2-5 ) CTAGCGCT ( SEQ ID NO: 5 )
5'端标签引物序列 5'-AATGATACGGCGACCACCGAGATCTACACCTAGCGCT  5' end tag primer sequence 5'-AATGATACGGCGACCACCGAGATCTACACCTAGCGCT
CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 11 ) CTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 11 )
3'端标签引物序列 CAAGCAGAAGACGGCATACGAGATCTAGCGCTGTGACTGGA 3' end tag primer sequence CAAGCAGAAGACGGCATACGAGATCTAGCGCTGTGACTGGA
GTTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 17 ) 标签序列 (index2-6 ) ATATAAGA ( SEQ ID NO: 6 )  GTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 17) tag sequence (index2-6) ATATAAGA (SEQ ID NO: 6)
5'端标签引物序列 5'-AATGATACGGCGACCACCGAGATCTACACATATAAGA TCTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 12 )5' end tag primer sequence 5'-AATGATACGGCGACCACCGAGATCTACACATATAAGA TCTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO: 12 )
3'端标签引物序列 CAAGCAGAAGACGGCATACGAGATATATAAGAGTGACTGGAG 3' end tag primer sequence CAAGCAGAAGACGGCATACGAGATATATAAGAGTGACTGGAG
TTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 18 ) 注: *在本文中 5'端标签引物序列也称为 P5端标签引物序列;  TTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 18) Note: * The 5' end tag primer sequence is also referred to herein as the P5 end tag primer sequence;
**在本文中 3'端标签引物序列也称为 P7端标签引物序列。  ** The 3' end tag primer sequence is also referred to herein as the P7 end tag primer sequence.
为此, 在本发明的又一方面, 本发明提出了一种用于构建核酸测序文库的试剂盒。 根 据本发明的实施例, 该试剂盒可以包括: 第一 PCR引物, 所述第一 PCR引物为 SEQ ID N 0: 7-12所示寡核苷酸或者 SEQ ID NO: 19所示寡核苷酸( 5'- AATGATACGGCGACCA 述第二 PCR引物为 SEQ ID NO: 13-18所示寡核苷酸。 利用该试剂盒中的 PCR引物作为 引物, 可以通过 PCR反应, 在核酸测序文库的 3'端和 5'端的至少之一引入前面所述的标签 引物, 从而通过将核酸标签与核酸样本的 DNA片段或其等同物相连, 得到具有标签的核酸 测序文库, 通过对核酸测序文库进行测序, 可以获得核酸样本的序列以及标签的序列, 进 而基于标签的序列可以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以 同时构建多种核酸样本的测序文库, 从而可以通过将来源于不同样品的核酸样本测序文库 进行混合, 同时进行测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核 酸序列信息。 从而可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多 种样品的核酸序列进行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定 核酸样本序列信息的成本。 构建核酸测序文库的方法及其应用  To this end, in yet another aspect of the invention, the invention proposes a kit for constructing a nucleic acid sequencing library. According to an embodiment of the present invention, the kit may comprise: a first PCR primer, wherein the first PCR primer is an oligonucleotide represented by SEQ ID NO: 7-12 or an oligonucleoside represented by SEQ ID NO: Acid (5'- AATGATACGGCGACCA The second PCR primer is the oligonucleotide shown in SEQ ID NO: 13-18. Using the PCR primer in the kit as a primer, the PCR reaction can be performed at the 3' end of the nucleic acid sequencing library. Introducing the label primer described above with at least one of the 5' end, thereby obtaining a nucleic acid sequencing library having a tag by linking the nucleic acid tag to a DNA fragment of the nucleic acid sample or an equivalent thereof, and sequencing the nucleic acid sequencing library The sequence of the nucleic acid sample and the sequence of the tag, and further based on the sequence of the tag, can accurately characterize the sample source of the nucleic acid sample. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, thereby being able to The nucleic acid sample sequencing library of the sample is mixed and simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag. Obtain nucleic acid sequence information from a variety of samples, thereby making full use of high-throughput sequencing technologies, such as Solexa sequencing technology, while sequencing nucleic acid sequences of multiple samples, thereby improving the efficiency and throughput of high-throughput sequencing technologies. The cost of determining the sequence information of the nucleic acid sample is reduced. The method for constructing the nucleic acid sequencing library and the application thereof
为了方便理解, 下面针对利用根据本发明的标签、 标签引物构建核酸测序文库的方法 进行描述。  For ease of understanding, the following describes a method for constructing a nucleic acid sequencing library using the tag, tag primer according to the present invention.
根据本发明的实施例,参照图 1 ,本发明的构建核酸测序文库的方法可以包括下列步骤: S100: 片段化  According to an embodiment of the present invention, referring to FIG. 1, the method of constructing a nucleic acid sequencing library of the present invention may comprise the following steps: S100: Fragmentation
将核酸样品进行片段化 , 以便获得 DNA片段。  The nucleic acid sample is fragmented to obtain a DNA fragment.
根据本发明的实施例, 可以进行处理的核酸样品的类型并不受任何限制, 可以用于任 何常见生物样品, 例如植物, 例如拟南芥、 水稻; 动物, 例如人、 小鼠; 微生物, 例如大 肠杆菌等。 在本发明的一个实施例中, 所述核酸样品为基因组 DNA样品。 在本发明的一个 实施例中, 所述核酸样品为人基因组 DNA样品。 在本发明的一个实施例中, 所述 DNA片 段长度为 100~800bp, 优选主带在 500bp。 由此能够进一步提高构建测序文库以及后续测序 的效率。 根据本发明的实施例, 可以采用任何已知的方法对基因组 DNA进行打断, 在本发 明的一个实施例中, 所述片段化是通过雾化、 超声片段化、 HydroShear和酶切处理的至少 一种进行的。 其中, 优选通过超声波打断法将组 DNA进行打断。 发明人发现, 通过超声波 打断法将所述基因组 DNA进行打断, 所得到的片段长度易于控制, 并且不会影响后续测序 操作。 According to an embodiment of the present invention, the type of nucleic acid sample that can be processed is not limited in any way, and can be used for any common biological sample, such as plants, such as Arabidopsis thaliana, rice; animals such as humans, mice; microorganisms, for example E. coli and the like. In one embodiment of the invention, the nucleic acid sample is a genomic DNA sample. In one embodiment of the invention, the nucleic acid sample is a human genomic DNA sample. In one embodiment of the invention, the DNA fragment is between 100 and 800 bp in length, preferably at 500 bp. This can further improve the construction of sequencing libraries and subsequent sequencing s efficiency. According to an embodiment of the present invention, genomic DNA may be disrupted by any known method. In one embodiment of the invention, the fragmentation is at least by atomization, ultrasonic fragmentation, HydroShear, and enzymatic digestion. One is carried out. Among them, it is preferred to interrupt the group DNA by ultrasonication. The inventors have found that the genomic DNA is disrupted by ultrasonic disruption, and the resulting fragment length is easily controlled and does not affect subsequent sequencing operations.
S200: 末端修复  S200: End repair
将所述 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段。 本领域技术人 员可以采用任何已知的方法对 DNA片段进行末端修复, 并且本领域有许多可供选择的商业 试剂盒可供选择。 在本发明的一个实施例中, 所述将所述 DNA 片段进行末端修复是通过 Klenow、 T4聚合酶和 T4多聚核苷酸激酶进行的。  The DNA fragment is subjected to end repair to obtain a DNA fragment which has been repaired at the end. Those skilled in the art can end-end the DNA fragments by any known method, and there are many commercial kits to choose from in the art. In one embodiment of the invention, the end fragmenting of the DNA fragment is by Klenow, T4 polymerase and T4 polynucleotide kinase.
S300: 添加碱基 A  S300: Add base A
在所述经过末端修复的 DNA片段的末端添加碱基 A,以便获得具有粘性末端 A的 DNA 片段。根据本发明的实施例, 经过末端修复的随机片段具有两条寡核苷酸链, 其中, 碱基 A 即是添加在所述两条寡核苷酸链的 3'末端。 根据本发明的实施例, 可以在两条寡核苷酸链 的 3'末端都添加碱基 A。 在本发明的一个实施例中, 所述在所述经过末端修复的 DNA片段 的末端添加 A是利用 Klenow Frgment (3'-5'exo-)聚合酶进行的。  A base A is added to the end of the end-repaired DNA fragment to obtain a DNA fragment having a sticky end A. According to an embodiment of the invention, the end-repaired random fragment has two oligonucleotide strands, wherein base A is added at the 3' end of the two oligonucleotide strands. According to an embodiment of the invention, base A can be added to both the 3' ends of both oligonucleotide strands. In one embodiment of the invention, the addition of A at the end of the end-repaired DNA fragment is carried out using Klenow Frgment (3'-5'exo-) polymerase.
S400: 连接接头  S400: Connector
将具有粘性末端 A的 DNA片段的两端分别与第一接头和第二接头相连,以便获得具有 接头的连接产物。 关于这里所使用的接头, 本领域技术人员, 可以根据所采用的测序平台 来选择, 添加接头的程序, 也可以参考制造商所提供的说明书。  Both ends of the DNA fragment having the sticky end A are connected to the first linker and the second linker, respectively, to obtain a ligation product having a linker. Regarding the joints used herein, those skilled in the art can select the procedures for adding the joints according to the sequencing platform used, and can also refer to the instructions provided by the manufacturer.
S500: 扩增  S500: Amplification
采用 5'引物和 3'引物对该具有接头的连接产物进行扩增, 以便获得扩增产物。 其中, 第一接头、 第二接头、 5'引物和 3'引物的至少之一包含核酸标签, 以便该扩增产物含有至少 一个核酸标签。 其中, 才艮据本发明的一些实施例, 核酸标签为选自具有 SEQ ID NO: 1-6 所示序列的寡核苷酸的至少之一。 根据本发明的实施例, 优选同时将前述 SEQ ID NO: 1-6 所示的标签引入 5'引物和 3'引物。 才艮据本发明的一些优选示例, 5'引物为选自具有 SEQ ID NO: 7~12所示序列的寡核苷酸的至少之一, 3'引物为选自具有 SEQ ID NO: 13~18所示序 列的寡核苷酸的至少之一。  The ligation product having the linker is amplified using a 5' primer and a 3' primer to obtain an amplification product. Wherein at least one of the first linker, the second linker, the 5' primer and the 3' primer comprises a nucleic acid tag such that the amplification product contains at least one nucleic acid tag. Wherein, according to some embodiments of the invention, the nucleic acid tag is at least one selected from the group consisting of the sequences having the sequences set forth in SEQ ID NOs: 1-6. According to an embodiment of the present invention, it is preferred to simultaneously introduce the aforementioned label of SEQ ID NOS: 1-6 into the 5' primer and the 3' primer. According to some preferred examples of the present invention, the 5' primer is at least one selected from the group consisting of the sequences having the sequences shown in SEQ ID NOS: 7 to 12, and the 3' primer is selected from the group consisting of SEQ ID NO: 13~ At least one of the oligonucleotides of the sequence shown in 18.
根据本发明的实施例, 扩增采用 SEQ ID NO: 7~12所示寡核苷酸或者 SEQ ID NO: 19 所示寡核苷酸作为 5'引物,采用 SEQ ID NO: 13~18所示寡核苷酸的 PCR引物作为 3'引物, 所述扩增产物携带前面所述的核酸标签。 在本发明的一个实施例中, 扩增采用采用 SEQ ID NO: 7-12所示寡核苷酸作为 5'引物, 采用 SEQ ID NO: 13-18所示寡核苷酸的 PCR引物 作为 3,引物。 需要说明的是, 这些标签引物是发明人通过大量筛选工作筛选获得的, 具有 显著优于其他引物组合。 According to an embodiment of the present invention, the oligonucleotide shown in SEQ ID NOS: 7 to 12 or the oligonucleotide shown in SEQ ID NO: 19 is used as a 5' primer, as shown by SEQ ID NOS: 13-18. The PCR primer of the oligonucleotide serves as a 3' primer which carries the nucleic acid tag described above. In one embodiment of the invention, amplification is performed using SEQ ID NO: The oligonucleotide shown in 7-12 was used as a 5' primer, and a PCR primer using the oligonucleotide shown in SEQ ID NOS: 13-18 was used as a primer. It should be noted that these label primers were obtained by the inventors through a large number of screening work and were significantly superior to other primer combinations.
最后, 分离所述扩增产物, 所述扩增产物构成核酸测序文库。 才艮据本发明的实施例, 分离回收扩增产物的方法也不受特别限制, 本领域技术人员可以根据扩增产物的特点选择 适当的方法和设备进行分离, 例如可以通过电泳并且回收特定长度的目的片段的方法进行 回收。 在本发明的一个实施例中, 所述分离扩增产物是通过利用 2%琼脂糖凝胶进行电泳并 进行纯化进行的。  Finally, the amplification product is isolated, and the amplification product constitutes a nucleic acid sequencing library. According to the embodiment of the present invention, the method for separating and recovering the amplified product is also not particularly limited, and those skilled in the art can select an appropriate method and apparatus for separation according to the characteristics of the amplified product, for example, by electrophoresis and recycling of a specific length. The method of the target fragment is recycled. In one embodiment of the invention, the isolated amplification product is carried out by electrophoresis using a 2% agarose gel and purification.
利用该方法能够有效地构建用于核酸测序的核酸测序文库, 并且能够通过 PCR反应, 有效地在核酸测序文库的 3'端和 5'端的至少之一引入前面所述的标签引物, 从而通过将核 酸标签与核酸样本的 DNA片段或其等同物相连, 得到具有标签的核酸测序文库, 通过对核 酸测序文库进行测序, 可以获得核酸样本的序列以及标签的序列, 进而基于标签的序列可 以精确地表征核酸样本的样品来源。 由此, 利用上述核酸标签, 可以同时构建多种核酸样 本的测序文库, 从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行 测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可 以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进 行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的 成本。 另外, 发明人惊奇地发现, 当针对相同的样品, 基于上述方法, 采用具有不同标签 的寡核苷酸构建含有各种核酸标签的核酸测序文库时, 所得到的测序数据结果的稳定性和 可重复性非常好, 因而可以实现多个样品在同一反应体系中进行序列测序。  The nucleic acid sequencing library for nucleic acid sequencing can be efficiently constructed by using the method, and the label primers described above can be efficiently introduced into at least one of the 3' end and the 5' end of the nucleic acid sequencing library by a PCR reaction, thereby The nucleic acid tag is linked to the DNA fragment of the nucleic acid sample or its equivalent to obtain a tagged nucleic acid sequencing library. By sequencing the nucleic acid sequencing library, the sequence of the nucleic acid sample and the sequence of the tag can be obtained, and the sequence based on the tag can be accurately characterized. Sample source for nucleic acid samples. Thus, by using the above nucleic acid tag, a sequencing library of a plurality of nucleic acid samples can be simultaneously constructed, and the nucleic acid sequences of the samples can be classified based on nucleic acid tags by mixing and sequencing the nucleic acid sample sequencing libraries derived from different samples. Obtain nucleic acid sequence information for a variety of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of. In addition, the inventors have surprisingly found that when the nucleic acid sequencing library containing various nucleic acid tags is constructed using oligonucleotides having different tags for the same sample based on the above method, the stability of the obtained sequencing data results and The repeatability is very good, so that multiple samples can be sequenced in the same reaction system.
在本发明的又一方面, 本发明提出了一种核酸测序文库, 其是根据前面所述构建测序 文库的方法构建的。 发明人发现所构建的基因组测序文库适用于第二代测序技术, 尤其是 solexa测序技术。从而可以通过将来源于不同样品的核酸样本测序文库进行混合, 同时进行 测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序列信息。 从而可 以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样品的核酸序列进 行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸样本序列信息的 成本。  In yet another aspect of the invention, the invention proposes a nucleic acid sequencing library constructed according to the method of constructing a sequencing library as described above. The inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Therefore, the nucleic acid sequence information of the plurality of samples can be obtained by mixing the nucleic acid sample sequencing libraries derived from different samples and simultaneously performing sequencing, and classifying the nucleic acid sequences of the samples based on the nucleic acid tags. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
在本发明的又一方面, 本发明提出了一种核酸测序方法。 根据本发明的实施例, 该方 法可以包括以下步骤: 针对核酸样本, 才艮据前面所述的构建测序文库的方法构建测序文库; 以及针对所述测序文库进行测序。 发明人发现所构建的基因组测序文库适用于第二代测序 技术, 尤其是 solexa测序技术。 从而可以通过将来源于不同样品的核酸样本测序文库进行 混合, 同时进行测序, 基于核酸标签对样品的核酸序列进行分类, 获得多种样品的核酸序 列信息。 从而可以充分利用高通量的测序技术, 例如利用 Solexa测序技术, 同时对多种样 品的核酸序列进行测序, 从而提高了通过高通量测序技术的效率和通量, 降低了确定核酸 样本序列信息的成本。 In yet another aspect of the invention, the invention provides a nucleic acid sequencing method. According to an embodiment of the invention, the method may comprise the steps of: constructing a sequencing library for the nucleic acid sample according to the method of constructing the sequencing library as described above; and sequencing the sequencing library. The inventors found that the constructed genome sequencing library is suitable for second generation sequencing technology, especially solexa sequencing technology. Thus, it is possible to perform sequencing of a nucleic acid sample derived from different samples. The mixture is simultaneously sequenced, and the nucleic acid sequence of the sample is classified based on the nucleic acid tag to obtain nucleic acid sequence information of a plurality of samples. This allows for the full use of high-throughput sequencing technologies, such as the use of Solexa sequencing technology to simultaneously sequence nucleic acid sequences from multiple samples, thereby increasing the efficiency and throughput of high-throughput sequencing techniques and reducing the sequence information for nucleic acid samples. the cost of.
根据本发明的优选实施例, 测序是利用第二代测序平台进行的。 需要说明的是, 针对 本发明构建获得的 3'端和 5'端均具有标签的核酸测序文库, 进行测序时, 可以首先利用测 序引物对包含 3'标签的一端进行测序, 得到第一端的测序数据 Readl , 接着, 再利用测序引 物对包含 5'标签的第二端进行测序, 得到第二端的测序数据 Read2。 由此, 能够有效地对测 序文库进行测序。 根据标签设置的位置, 也可以利用测序引物直接进行测序。 在本发明的 一个实施例中, 对所述测序文库进行测序进一步包括: 换言之, 在本发明的一个实施例中, 采用首先利用测序引物从 5, 端开始进行测序, 5, 端的测序数据 Readl , 接着获得 3'端标 签的序列,然后获得 5'端标签序列,最后利用测序引物对 3'端进行测序获得测序数据 Read2。 由此, 可以实现对双标签进行有效的读取, 从而可以使得能够同时测序的样品得到显著增 加, 充分利用高通量测序装置的测序通量。  According to a preferred embodiment of the invention, sequencing is performed using a second generation sequencing platform. It should be noted that the nucleic acid sequencing library having the tagged 3' and 5' ends of the present invention can be sequenced, and the end of the 3' tag can be first sequenced by using the sequencing primer to obtain the first end. Sequencing the data Readl, and then sequencing the second end containing the 5' tag by using the sequencing primer to obtain the second end sequencing data Read2. Thereby, the sequencing library can be efficiently sequenced. Sequencing primers can also be used for sequencing directly, depending on where the tag is set. In one embodiment of the present invention, sequencing the sequencing library further comprises: in other words, in one embodiment of the present invention, sequencing is performed from the 5th end by using a sequencing primer first, and sequencing data of Readl is performed at the 5th end. The sequence of the 3' end tag is then obtained, and then the 5' end tag sequence is obtained. Finally, the 3' end is sequenced using the sequencing primer to obtain the sequencing data Read2. Thereby, efficient reading of the double tags can be achieved, so that the samples that can be simultaneously sequenced can be significantly increased, and the sequencing throughput of the high-throughput sequencing device can be fully utilized.
在本发明的又一方面, 本发明提出了一种确定核酸序列信息的方法。 根据本发明的实 施例, 该方法可以包括以下步骤: 针对核酸样本, 根据前面所述的方法进行测序, 以便获 得测序结果; 以及基于所述测序结果, 确定所述核酸样本的序列信息。 由此, 可以有效地 确定多个样本的核酸序列信息。  In yet another aspect of the invention, the invention proposes a method of determining nucleic acid sequence information. According to an embodiment of the present invention, the method may comprise the steps of: sequencing a nucleic acid sample according to the method described above to obtain a sequencing result; and determining sequence information of the nucleic acid sample based on the sequencing result. Thereby, the nucleic acid sequence information of a plurality of samples can be efficiently determined.
需要说明的是, 根据本发明实施例的构建核酸测序文库的方法, 是本申请的发明人经 过艰苦的创造性劳动和优化工作才完成的。 下面将结合实施例对本发明的方案进行解释。 本领域技术人员将会理解, 下面的实施 例仅用于说明本发明, 而不应视为限定本发明的范围。 实施例中未注明具体技术或条件的, 按照本领域内的文献所描述的技术或条件(例如参考 J.萨姆布鲁克等著, 黄培堂等译的《分 子克隆实验指南》, 第三版, 科学出版社)或者按照产品说明书进行。 所用试剂或仪器未注 明生产厂商者, 均为可以通过市购获得的常规产品, 例如可以采购自 Illumina公司。  It should be noted that the method of constructing a nucleic acid sequencing library according to an embodiment of the present invention is completed by the inventor of the present application through arduous creative labor and optimization work. The solution of the present invention will be explained below in conjunction with the embodiments. Those skilled in the art will appreciate that the following examples are merely illustrative of the invention and are not to be considered as limiting the scope of the invention. In the examples, the specific techniques or conditions are not indicated, according to the techniques or conditions described in the literature in the field (for example, refer to J. Sambrook et al., Huang Peitang et al., Molecular Cloning Experimental Guide, Third Edition, Science Press) or in accordance with the product manual. The reagents or instruments used are not specified by the manufacturer, and are conventional products that are commercially available, for example, from Illumina.
Solexa/HiSeq标签文库以及标签文库构建的一般方法 Solexa/HiSeq tag library and general method for tag library construction
1、 样品检测及标准  1, sample testing and standards
1.1 样品检测  1.1 Sample testing
取 1~2 μβ人类外周血基因组 DNA样品,使用 NanoDrop 1000测样品浓度、 OD260/280 比值、 OD260/230比值等信息。 Take 1~2 μ β human peripheral blood genomic DNA sample, use NanoDrop 1000 to measure sample concentration, OD260/280 Ratio, OD260/230 ratio, etc.
对样品进行琼脂糖凝胶电泳检测。  The samples were subjected to agarose gel electrophoresis.
根据电泳结果及测量的 OD值判断样品的总量和质量是否合格, 并给出是否可以进行 样品制备的判断。  According to the electrophoresis result and the measured OD value, it is judged whether the total amount and quality of the sample are qualified, and whether or not the sample preparation can be judged.
1.2样品质量合格标准  1.2 sample quality standards
样品纯度: 260/280值应在 1.8 ~ 2.0之间, 没有蛋白、 多糖和 RNA污染; 样品浓度: 样品的浓度最低不应低于 lOOng/μΙ;  Sample purity: 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination; sample concentration: the concentration of the sample should be at least lOOng/μΙ;
样品完整性: DNA样品应没有降解;  Sample integrity: DNA samples should be free of degradation;
样品量: 为保证文库制备的质量, 要求样品总量不低于 45μβSample amount: To ensure the quality of the library was prepared, the total amount of sample required no less than 45μ β.
2、 样品打断  2, the sample is interrupted
样品打断方法有两种, 分别是雾化法( Nebulization )和 Covaris打断法, 均可将样品 DNA打碎至 100~800 b 范围的片段且主带在 500bp左右。 若样品为已打断的 DNA, 则可 以跳过此步。  There are two kinds of sample breaking methods, Nebulization and Covaris, which can break the sample DNA into fragments ranging from 100 to 800 b and the main band is about 500 bp. If the sample is interrupted DNA, you can skip this step.
3、 末端修复  3, end repair
1 )在 1.5 ml的离心管中配制末端修复反应体系  1) Prepare the end-repair reaction system in a 1.5 ml centrifuge tube
Figure imgf000014_0001
Figure imgf000014_0001
2 )在 Thermomixer中 , 20 °C , 温浴 30 min。  2) In a Thermomixer, 20 ° C, warm bath for 30 min.
3 )其后使用 QIAquick PC 纯化试剂盒( Qiagen ), 进行柱纯化, 溶于 34 μΐ的洗脱 緩冲液(ΕΒ ) 中。  3) Thereafter, the column was purified using QIAquick PC Purification Kit (Qiagen) and dissolved in 34 μM of elution buffer (ΕΒ).
4、 末端加 "Α" 碱基  4, add "Α" base at the end
1 )在 1.5 ml的离心管中配制末端加 "A" 碱基反应体系:  1) Prepare the end of the "A" base reaction system in a 1.5 ml centrifuge tube:
来自步骤 3的样品 32 μΐ  Sample from step 3 32 μΐ
10x blue緩冲液 5 μΐ  10x blue buffer 5 μΐ
dATP 10 μΐ Klenow (3 -5 exo-) 3 μΐ dATP 10 μΐ Klenow (3 -5 exo-) 3 μΐ
总体积 50 μΐ  Total volume 50 μΐ
2)在 The mixer中 , 37 °C , 温浴 30 min  2) In the mixer, 37 °C, warm bath 30 min
3)其后使用 MiniElute PC 纯化试剂盒( Qiagen )进行柱纯化, 溶于 12 μΐ的 ΕΒ中。 5、 接头 (Adapter ) 的连接  3) Column purification was then carried out using a MiniElute PC Purification Kit (Qiagen) and dissolved in 12 μM of hydrazine. 5, the connection of the connector (Adapter)
1 )在 1.5 ml的离心管中配制接头连接反应体系:  1) Prepare a joint ligation reaction system in a 1.5 ml centrifuge tube:
PE文库: PE library:
Figure imgf000015_0001
Figure imgf000015_0001
注: *ΡΕ Adapter Oligo Mix由常用的有义序列 PE adapter F和反义序列 PE adapter R经退火形成。Note: * Adapter Adapter Oligo Mix is formed by annealing the commonly used sense sequence PE adapter F and the antisense sequence PE adapter R.
2 )在 Thermomixer中 , 20 °C , 温浴 15 min 2) In Thermomixer, 20 °C, warm bath 15 min
3 )其后使用 QIAquick PC 纯化试剂盒( Qiagen )进行柱纯化, 溶于 30μ1的 ΕΒ中。 6 DNA片段大小选取  3) The column was purified using QIAquick PC Purification Kit (Qiagen) and dissolved in 30 μl of sputum. 6 DNA fragment size selection
1 )将步骤 5获得的样品在 2%琼脂糖凝胶上, 以 100V电泳 120 min;  1) The sample obtained in step 5 was electrophoresed on a 2% agarose gel at 100 V for 120 min;
2 )切取 n+120bp =***片段大小)位置胶块;  2) cut n+120bp = insert size) position glue block;
3 )其后使用 QIAquick Gel Extraction Kit ( Qiagen )进行回收, 溶于 40μ1的 ΕΒ中。 7 PCR反应  3) It was then recovered using QIAquick Gel Extraction Kit (Qiagen) and dissolved in 40 μl of mash. 7 PCR reaction
1 )在 0.2 ml的 PCR管中配制 PCR反应体系:  1) Prepare a PCR reaction system in a 0.2 ml PCR tube:
PE文库: PE library:
Figure imgf000015_0002
Figure imgf000015_0002
注: * PC 引物 1为 SEQ ID NO: 7~12所示寡核苷酸(携带本发明标签的标签 TTCCGATCT ( SEQ ID NO: 19, 不携带标签的引物), 和 Note: * PC Primer 1 is the oligonucleotide shown in SEQ ID NO: 7 to 12 (label carrying the label of the present invention) TTCCGATCT (SEQ ID NO: 19, primers without a tag), and
** PC 引物 2位 SEQ ID NO: 13-18所示寡核苷酸(携带本发明标签的标签引物), 或者常规的携带已知标签的 P7端标签引物序列 (来自 Illumina公司的 Paired-End DNA Sample Prep Kit , 货号: IP- 102- 1001 )  ** PC primer 2 SEQ ID NO: 13-18 oligonucleotides (tag primers carrying the tags of the invention), or conventional P7 end tag primer sequences carrying known tags (Paired-End from Illumina) DNA Sample Prep Kit, article number: IP- 102- 1001 )
2 )在热循环仪中运行下列程序:  2) Run the following program in the thermal cycler:
PE文库  PE library
98 °C 30s  98 °C 30s
98 °C 10s  98 °C 10s
65 °C 30s r 10个循环  65 °C 30s r 10 cycles
72 °C 30s或 50s .  72 ° C 30s or 50s .
72 °C 5min  72 °C 5min
4°C ∞  4°C ∞
8、 PCR产物的胶回收纯化  8. Recovery and purification of PCR products
将 PCR产物在 2%琼脂糖凝胶上, 以 100V电泳 120 min, 切取 n+120bp ( n ***片段 大小)位置胶块, 其后使用 QIAquick Gel Extraction Kit ( Qiagen ) 回收, DNA溶于 40μ1的 ΕΒ中。  The PCR product was electrophoresed on a 2% agarose gel at 100 V for 120 min, and the n+120 bp (n insert size) position was cut out, and then recovered using QIAquick Gel Extraction Kit (Qiagen). The DNA was dissolved in 40 μl of ΕΒ. in.
9、 测序  9, sequencing
当构建 DNA PE ( Pair-end ) 文库时, 使用测序引物为 Sequencing Primer 1: 5 - ACACTCTTTCCCTACACGACGCTCTTCCGATCT ( SEQ ID NO : 23 )。 同时安排在 HiSeq2000按照制造商提供的操作规程进行测序  When constructing a DNA PE (Pair-end) library, the sequencing primer was used as Sequencing Primer 1: 5 - ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23). Also schedule the HiSeq2000 for sequencing according to the manufacturer's protocol.
采用 SOAP对测序结果进行处理。 实施例 1  The sequencing results were processed using SOAP. Example 1
首先, 采用 SEQ ID NO: 7~12所示寡核苷酸作为 5'标签引物以及采用常规 P7端标签 引物序列作为 3'标签引物, 按照上述一般方法中的描述, 构建在 5'端引入表 1 中所示标签 而在 3'端具有常规标签的测序文库, 与 SEQ ID NO : 1-6 对应, 依次分别命名为: METqdiMBEPEI-46、 METqdiMBGPEI-137、 METqdiMBDPEI-37、 METqdiMBFPEI-122、 METqdiMBBPEI- 169、 METqdiMBBPEI- 109。  First, the oligonucleotides shown in SEQ ID NOS: 7 to 12 were used as 5'-label primers and the conventional P7-end primer primer sequences were used as 3'-label primers, and the 5'-end introduction table was constructed as described in the above general method. The sequencing library with the tag shown in 1 and the conventional tag at the 3' end corresponds to SEQ ID NO: 1-6, and is named as follows: METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI- 169, METqdiMBBPEI- 109.
需要说明的是, 在本实施例中, 针对上述 6 个文库进行测序时, 首先利用第一测序引 物结合文库模板链的 3'端进行测序, 合成得到第一测序数据 Readl即文库的 5'末端序列信 息, 接着利用第二测序引物结合文库模板链的 5'端进行测序, 得到第一标签测序数据即文 库 3'端标签的序列信息, 然后去阻断, 借助测序芯片上与文库模板链 3'端相匹配的寡核苷 酸链, 使得, 利用文库编码链合成之际, 通过采用第二测序引物即可继续进行合成, 从而 获得第二标签测序数据即文库 5'端标签的序列信息, 最后合成文库编码链后, 利用第三测 序引物结合文库编码链的 3'端进行合成测序, 得到第二测序数据 Read2即文库的 3'端序列 信息。 其中所采用的测序引物分别如下: It should be noted that, in the present embodiment, when sequencing the above six libraries, the first sequencing primer is used to bind to the 3' end of the library template strand for sequencing, and the first sequencing data Readl is the 5' end of the library. Sequence information, and then using the second sequencing primer to bind to the 5' end of the library template strand for sequencing, to obtain the first tag sequencing data The sequence information of the 3' end tag of the library is then blocked, by means of an oligonucleotide strand on the sequencing chip that matches the 3' end of the library template strand, so that, by using the library coding strand synthesis, by using the second sequencing primer The synthesis can be continued to obtain the sequence information of the second tag sequencing data, that is, the 5' end tag of the library, and finally, after synthesizing the library coding strand, the third sequencing primer is used to bind to the 3' end of the library coding strand for synthesis and sequencing, and the second is obtained. The sequencing data Read2 is the 3' end sequence information of the library. The sequencing primers used are as follows:
第一测序引物: 5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT( SEQ ID NO: 23 ); 第二测序引物: 5'-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC ( SEQ ID NO: First sequencing primer: 5*-ACACTCTTTCCCTACACGACGCTCTTCCGATCT (SEQ ID NO: 23); Second sequencing primer: 5'-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC (SEQ ID NO:
24 ); twenty four );
第三测序引物: 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT ( SEQ ID NO: 25 )。  Third sequencing primer: 5'-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT (SEQ ID NO: 25).
此外, 还需要说明的是, 在本文中, "文库的 5'端" 指 "文库编码链的 5'端", "文库的 3'端" 指 "文库编码链的 3'端"; "3' (末)端" 或者 "5' (末)端" 前面没有特别限定的, 都是指文库的编码链的 3' (末)端或者 5' (末)端。  In addition, it should be noted that, in this context, "the 5' end of the library" refers to the "5' end of the library coding strand", and the "3' end of the library" refers to the 3' end of the library coding strand; '(End) end" or "5' (End) end" is not specifically limited, and refers to the 3' (end) or 5' (end) end of the coding strand of the library.
然后, 采用 SEQ ID NO: 19所示核苷酸作为 5'标签引物, 采用常规 P7端标签引物序列 作为 3'标签引物, 按照一般方法的描述, 构建在 5'端不具有标签而在 3'端具有常规标签的 测序文库, 分别命名为: CSZPE0120821001和 CSZPE0120821002。  Then, using the nucleotide shown in SEQ ID NO: 19 as a 5'-tag primer, and using the conventional P7-end tag primer sequence as a 3'-label primer, as described in the general method, the construct has no tag at the 5' end but at 3' Sequencing libraries with conventional tags at the ends were named: CSZPE0120821001 and CSZPE0120821002.
使用 Solexa/HiSeq技术对上述八个测序文库进行测序和检验, 结果总结于下表 2。  The above eight sequencing libraries were sequenced and tested using Solexa/HiSeq technology and the results are summarized in Table 2 below.
Solexa/HiSeq标签序列文库测序结果  Solexa/HiSeq tag sequence library sequencing results
Figure imgf000017_0001
Figure imgf000017_0001
注: RawCluster/Tile: 在整个测序反应中所有可以识别的 DNA簇的数目 /小区;  Note: RawCluster/Tile: number of all identifiable DNA clusters/cells throughout the sequencing reaction;
%PF: 在整个测序反应中通过过滤标准的 DNA簇的百分比;  %PF: The percentage of standard DNA clusters filtered by the entire sequencing reaction;
%Error Rate: 错误率;  %Error Rate: error rate;
%Align: 比对率;  %Align: comparison ratio;
%Index 0 mismatch: 标签中错配数为 0的比例;  %Index 0 mismatch: The proportion of the number of mismatches in the tag is 0;
%Index 1 mismatch: 标签中错配数为 1的比例;  %Index 1 mismatch: The proportion of the number of mismatches in the tag is 1.
Filtered adaptor: 过滤掉的接头。  Filtered adaptor: Filtered connector.
另外, 对上述八个测序文库的质量值进行了测定。 质量值(Q-Value )可以反映测序质 量, 介于 0-40之间, 在此范围内, 越高表示质量越好。 Q20是指质量值大于 20的碱基在所 有碱基中所占的比例, 可以反映测序出来的序列质量好坏, 数值越接近 1 , 说明测序质量越 好。 In addition, the mass values of the above eight sequencing libraries were determined. The quality value (Q-Value) can reflect the quality of the sequencing, between 0-40. Within this range, the higher the quality, the better. Q20 refers to the proportion of bases with mass values greater than 20 in all bases, which can reflect the quality of the sequence sequenced. The closer the value is to 1, the more the sequencing quality is. it is good.
由上表及质量值的数据可看出, 将上述 8个测序文库汇集在一个芯片上进行测序, R1 的 Q20均值都在 0.8以上, R2的 Q20均值也和此质量值非常接近的范围内 (其中 Rl、 R2 分别代表 readl、 read2 )。 5'端或 P5端不引入标签的文库是人的 DNA ( CSZPE0120821001 和 CSZPE0120821002 ),与参考序列比对率均在 90%以上。引入标签的文库主要为人类 DNA, 无比对率。其中 5'端或 P5端引入标签的 6个文库( METqdiMBEPEI-46、 METqdiMBGPEI- 137、 METqdiMBDPEI-37、 METqdiMBFPEI-122、 METqdiMBBPEI-169、 METqdiMBBPEI-109 ) 的 ( Index Omismatch+Index 1 mismatch ) 的比例都超过了 90%, 标签引物序列测序正常, 证 明本发明所提供的核酸标签、 标签引物以及文库构建方法和测序方法具备可用性。  From the data of the above table and the mass value, it can be seen that the above eight sequencing libraries are collected on one chip for sequencing, and the average value of R1 Q20 is above 0.8, and the Q20 mean of R2 is also very close to the mass value ( Where Rl and R2 represent readl and read2 respectively. The library in which the 5' end or the P5 end did not introduce a tag was human DNA (CSZPE0120821001 and CSZPE0120821002), and the alignment ratio with the reference sequence was above 90%. The library of introduced tags is mainly human DNA, unmatched rate. The ratio of (Index Omismatch+Index 1 mismatch ) of the 6 libraries (METqdiMBEPEI-46, METqdiMBGPEI-137, METqdiMBDPEI-37, METqdiMBFPEI-122, METqdiMBBPEI-169, METqdiMBBPEI-109) introduced into the 5' or P5 end More than 90%, the label primer sequence was sequenced normally, demonstrating the availability of the nucleic acid tags, tag primers, and library construction methods and sequencing methods provided by the present invention.
此外, 图 2-5 显示了本实施例所构建的 5'端引入本发明的标签的文库(以 In addition, Figures 2-5 show the library of the tag of the present invention introduced at the 5' end of the present embodiment (
METqdiMBEPEI-46 和 METqdiMBGPEI- 137 文库为代表)与常规在 3'端引入标签的文库 (CSZPE0120821001和 CSZPE0120821002)的质量比较结果。 其中, 图 2显示了使用本发明 的 5'端或 P5端标签引物序列的文库和使用普通标签的文库质量值随循环数的变化的图,其 中横坐标表示循环数, 纵坐标表示质量情况; 图 3显示了使用本发明的 5'端或 P5端标签引 物序列的文库和使用普通标签的文库光强随循环数的变化的图, 其中横坐标( Cycle )表示 循环数, 纵坐标表示光强信号平均值(Signal mean ); 图 4显示了使用本发明的 5'端或 P5 端标签引物序列的文库和使用普通标签的文库碱基分布随循环数的变化的图, 其中横坐标 ( Position along reads )表示运行的循环数, 纵坐标(Percent )表示在此循环中不同碱基所 占的百分比,该图显示了每次测序中测到的各种碱基比例( base percentage composition along reads ); 图 5显示了使用本发明的 5 '端或 P5端标签引物序列的文库和使用普通标签的文库 错误率随循环数的变化的图, 其中横坐标(Position along reads )表示运行的循环数, 纵坐 标 (% Error-rate )表示错误率 (即在这个循环中测序错误发生的比例), 实线表示错误率 ( Error ate, 即在这个循环中测序错误发生的比例), 虚线表示无法分析的碱基比例(Blank Rate ), 该图显示了不同文库在错误率上的区另1 j ( Error-rate along reads )» The METqdiMBEPEI-46 and METqdiMBGPEI-137 libraries are representative of the results of mass comparisons with libraries that routinely introduce tags at the 3' end (CSZPE0120821001 and CSZPE0120821002). 2, FIG. 2 is a diagram showing a library using the 5'-end or P5-end tag primer sequences of the present invention and a change in the library quality value using a common tag as a function of the number of cycles, wherein the abscissa indicates the number of cycles, and the ordinate indicates the quality condition; Figure 3 is a graph showing the change in the light intensity of a library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using a common tag, wherein the abscissa indicates the number of cycles and the ordinate indicates the light intensity. Signal mean; Figure 4 shows a plot of the base distribution of the library using the 5' or P5 end tag primer sequences of the present invention and the number of cycles of the library using the common tag, where the abscissa Reads ) indicates the number of cycles run, and the ordinate (Percent ) indicates the percentage of different bases in this cycle. The figure shows the base percentage composition along reads for each sequence; Figure 5 is a graph showing the change in the error rate of the library using the 5'-end or P5-end tag primer sequences of the present invention and the number of cycles using the common tag, wherein the cross-sitting (Position along reads) indicates the number of cycles to run, the ordinate (% Error-rate) indicates the error rate (ie, the proportion of sequencing errors occurred in this cycle), and the solid line indicates the error rate (Error ate, which is sequenced in this cycle). error occurrence rate), the broken line shows the nucleotide ratio (Blank rate) can not be analyzed, which shows the different libraries on the error rate in the regions other 1 j (error-rate along reads ) »
图 2-5的结果表明, 在 5'端引入本发明的标签的文库与常规在 3'端引入标签的文库相 比, 并无明显差异, 使用 5'端或 P5端标签并不会影响文库的整体测序结果。  The results of Figures 2-5 show that the library of the tag of the present invention introduced at the 5' end is not significantly different from the library that is conventionally introduced at the 3' end. The use of the 5' or P5 end tag does not affect the library. Overall sequencing results.
此外,此次测序碱基簇 (cluster)预期密度为 420万 /tile, PF为 90.2%,数据具有可用性, 表明本实施例获得的结果可靠。 实施例 2构建转座子文库  In addition, the sequence of the base cluster is expected to be 4.2 million /tile, and the PF is 90.2%. The data is usable, indicating that the results obtained in this embodiment are reliable. Example 2 Construction of a transposon library
1、 样品检测及标准 1.1 样品检测 1, sample testing and standards 1.1 Sample testing
取 1~2 μg人类外周血基因组 DNA样品,使用 NanoDrop 1000测样品浓度、 OD260/280 比值、 OD260/230 比值等信息。 对样品进行琼脂糖凝胶电泳检测。 根据电泳结果及测量的 OD值判断样品的总量和质量是否合格, 并给出是否可以进行样品制备的判断。  Take 1~2 μg of human peripheral blood genomic DNA samples and use NanoDrop 1000 to measure the sample concentration, OD260/280 ratio, OD260/230 ratio and other information. The samples were subjected to agarose gel electrophoresis. According to the electrophoresis result and the measured OD value, it is judged whether the total amount and quality of the sample are qualified, and whether the sample preparation can be judged.
1.2样品质量合格标准  1.2 sample quality standards
样品纯度: 260/280值应在 1.8 ~ 2.0之间, 没有蛋白、 多糖和 RNA污染;  Sample purity: 260/280 should be between 1.8 and 2.0, no protein, polysaccharide and RNA contamination;
样品完整性: DNA样品应没有降解;  Sample integrity: DNA samples should be free of degradation;
样品量: 最低需求量可达 50 ng。  Sample size: The minimum requirement is 50 ng.
2、 样品打断  2, the sample is interrupted
用转座酶(Transposomes )进行酶切打断, 将样品 DNA酶切至 300 bp左右片段。 具 体条件如下:  The enzyme was digested with Transposomes, and the sample was digested with DNA to a fragment of about 300 bp. The specific conditions are as follows:
Figure imgf000019_0001
Figure imgf000019_0001
3、 纯化  3, purification
使用 Zymo DNA Clean&Concentrator-5 Kit进行柱纯化。  Column purification was performed using the Zymo DNA Clean&Concentrator-5 Kit.
Figure imgf000019_0002
Figure imgf000019_0002
利用 Bioanalyzer检测片段大小, 大概在 150bp~lkb。  The size of the fragment was detected using Bioanalyzer, which was approximately 150 bp to lkb.
4、 PCR反应  4, PCR reaction
1 )在 50μ1的 PCR管中配制 PCR反应体系: PE或 PEI文库: 1) Prepare a PCR reaction system in a 50 μl PCR tube: PE or PEI library:
Figure imgf000020_0001
Figure imgf000020_0001
50 χ引物混合物包括如下:  The 50 χ primer mixture includes the following:
PC 引物 1为: 5'-AATGATACGGCGACCACCGA ( SEQ ID NO: 20 );  PC primer 1 is: 5'-AATGATACGGCGACCACCGA (SEQ ID NO: 20);
PC 引物 2为: 5'-CAAGCAGAAGACGGCATACGA ( SEQ ID NO: 21 )。 PC primer 2 is: 5'-CAAGCAGAAGACGGCATACGA (SEQ ID NO: 21).
接头 1 : Connector 1 :
( SEQ ID NO: (SEQ ID NO:
22 ); twenty two );
接头 2: Connector 2:
AG-3* (其中 NNNNNNNN为选自 SEQ ID NO: 1~6所示的标签序列的至少之一)。 AG-3* (wherein NNNNNNNN is at least one selected from the group consisting of SEQ ID NOS: 1 to 6).
2 )在热循环仪中运行下列程序:  2) Run the following program in the thermal cycler:
72 °C 3min  72 °C 3min
95 °C 30s  95 °C 30s
95 °C 10s 9个循环  95 °C 10s 9 cycles
62 °C 30s  62 °C 30s
72 °C 3min  72 °C 3min
4°C ∞  4°C ∞
8、 PCR产物的纯化  8. Purification of PCR products
使用 AMPure^ XP纯化, 新鲜的 80%乙醇清洗。  Purify with AMPure^ XP and wash with fresh 80% ethanol.
9、 测序 9, sequencing
按照 HiSeq2000制造商所提供的操作规范, 在 HiSeq2000平台上进行测序, 分析与实 施例 1相同。 结果显示, 本发明构建核酸测序文库的方法能够适用于构建 Nextera转座子方 法的标签标准文库。 工业实用性 Sequencing was performed on the HiSeq2000 platform according to the operating specifications provided by the HiSeq2000 manufacturer, and the analysis was the same as in Example 1. The results show that the method of constructing a nucleic acid sequencing library of the present invention can be applied to the construction of a tag standard library of the Nextera transposon method. Industrial applicability
本发明的构建核酸测序文库的方法, 能够有效地应用于样品 DNA的 DNA测序文库的 构建以及测序, 并且获得的文库质量好, 测序结果准确。 尽管本发明的具体实施方式已经得到详细的描述, 本领域技术人员将会理解。 根据已 经公开的所有教导, 可以对那些细节进行各种修改和替换, 这些改变均在本发明的保护范 围之内。 本发明的全部范围由所附权利要求及其任何等同物给出。  The method for constructing a nucleic acid sequencing library of the present invention can be effectively applied to the construction and sequencing of a DNA sequencing library of sample DNA, and the obtained library has good quality and accurate sequencing results. Although specific embodiments of the invention have been described in detail, those skilled in the art will understand. Various modifications and alterations of those details are possible in light of the teachings of the invention. The full scope of the invention is given by the appended claims and any equivalents thereof.
在本说明书的描述中, 参考术语 "一个实施例"、 "一些实施例"、 "示意性实施例"、 "示 例"、 "具体示例"、 或 "一些示例" 等的描述意指结合该实施例或示例描述的具体特征、 结 构、 材料或者特点包含于本发明的至少一个实施例或示例中。 在本说明书中, 对上述术语 的示意性表述不一定指的是相同的实施例或示例。 而且, 描述的具体特征、 结构、 材料或 者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。  In the description of the present specification, the description of the terms "one embodiment", "some embodiments", "illustrative embodiment", "example", "specific example", or "some examples", etc. Particular features, structures, materials or features described in the examples or examples are included in at least one embodiment or example of the invention. In the present specification, the schematic representation of the above terms does not necessarily mean the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in a suitable manner in any one or more embodiments or examples.

Claims

权利要求书 claims
I、 一种构建核酸测序文库的方法, 其特征在于, 包括下列: I. A method of constructing a nucleic acid sequencing library, characterized by including the following:
将核酸样品进行片段化 , 以便获得 DNA片段; Fragment nucleic acid samples to obtain DNA fragments;
将所述 DNA片段进行末端修复, 以便获得经过末端修复的 DNA片段; Perform end repair on the DNA fragment to obtain an end-repaired DNA fragment;
在所述经过末端修复的 DNA片段的末端添加碱基 A,以便获得具有粘性末端 A的 DNA 片段; Adding base A to the end of the end-repaired DNA fragment to obtain a DNA fragment with a sticky end A;
将所述具有粘性末端 A的 DNA片段的两端分别与第一接头和第二接头相连,以便获得 具有接头的连接产物; Connect the two ends of the DNA fragment with the sticky end A to the first linker and the second linker respectively, so as to obtain a ligation product with a linker;
采用 5'引物和 3'引物对所述具有接头的连接产物进行扩增, 以便获得扩增产物; 以及 分离所述扩增产物, 所述扩增产物构成核酸测序文库, Use 5' primer and 3' primer to amplify the ligation product with the adapter to obtain an amplification product; and separate the amplification product, and the amplification product constitutes a nucleic acid sequencing library,
其巾, Its scarf,
所述第一接头、 第二接头、 5'引物和 3'引物的至少之一包含核酸标签, 以便所述扩增产 物含有至少一个核酸标签。 At least one of the first linker, the second linker, the 5' primer and the 3' primer contains a nucleic acid tag, so that the amplification product contains at least one nucleic acid tag.
2、 根据权利要求 1所述的方法, 其特征在于, 所述核酸标签为选自具有 SEQ ID NO: 2. The method according to claim 1, characterized in that the nucleic acid tag is selected from the group consisting of SEQ ID NO:
1-6所示序列的寡核苷酸的至少之一。 At least one of the oligonucleotides of the sequence shown in 1-6.
3、根据权利要求 1所述的方法,其特征在于,所述 5'引物为选自具有 SEQ ID NO: 7-12 所示序列的寡核苷酸的至少之一, 所述 3'引物为选自具有 SEQ ID NO: 13~18所示序列的 寡核苷酸的至少之一。 3. The method according to claim 1, characterized in that the 5' primer is at least one selected from the oligonucleotides having the sequences shown in SEQ ID NO: 7-12, and the 3' primer is At least one selected from the group consisting of oligonucleotides having the sequences shown in SEQ ID NO: 13~18.
4、 根据权利要求 3所述的方法, 其特征在于, 所述核酸样品为基因组 DNA样品。 4. The method according to claim 3, characterized in that the nucleic acid sample is a genomic DNA sample.
5、 根据权利要求 4所述的方法, 其特征在于, 所述核酸样品为人基因组 DNA样品。5. The method according to claim 4, characterized in that the nucleic acid sample is a human genomic DNA sample.
6、 根据权利要求 3所述的方法, 其特征在于, 所述 DNA片段长度为 100~800bp。6. The method according to claim 3, characterized in that the length of the DNA fragment is 100~800bp.
7、 根据权利要求 1所述的方法, 其特征在于, 所述片段化是通过雾化、 超声片段化、 HydroShear和酶切处理的至少一种进行的。 7. The method of claim 1, wherein the fragmentation is performed by at least one of atomization, ultrasonic fragmentation, HydroShear and enzyme digestion.
8、 根据权利要求 3所述的方法, 其特征在于, 所述将所述 DNA片段进行末端修复是 通过 Klenow、 T4聚合酶和 T4多聚核苷酸激酶进行的。 8. The method according to claim 3, wherein the end repair of the DNA fragment is performed by Klenow, T4 polymerase and T4 polynucleotide kinase.
9、 根据权利要求 3所述的方法, 其特征在于, 所述在所述经过末端修复的 DNA片段 的末端添加 A是利用 Klenow Frgment (3'-5'exo-)聚合酶进行的。 9. The method according to claim 3, wherein the adding A to the end of the end-repaired DNA fragment is performed using Klenow Frgment (3'-5'exo-) polymerase.
10、 根据权利要求 3所述的方法, 其特征在于, 所述分离扩增产物是通过利用 2%琼脂 糖凝胶进行电泳并进行纯化进行的。 10. The method according to claim 3, characterized in that the isolation of the amplification product is performed by electrophoresis and purification using a 2% agarose gel.
II、 一种核酸测序文库, 其是根据权利要求 1-10任一项所述的方法构建的。 II. A nucleic acid sequencing library constructed according to the method of any one of claims 1-10.
12、 一种核酸测序方法, 其特征在于, 包括以下步骤: 12. A nucleic acid sequencing method, characterized in that it includes the following steps:
针对核酸样本, 根据权利要求 1-10任一项所述的方法构建测序文库; 以及 For nucleic acid samples, construct a sequencing library according to the method of any one of claims 1-10; and
针对所述测序文库进行测序。 Sequencing was performed on the sequencing library.
13、 根据权利要求 12所述的方法, 其特征在于, 所述测序是利用第二代测序平台进行 的。 13. The method according to claim 12, characterized in that the sequencing is performed using a second-generation sequencing platform.
14、根据权利要求 13所述的方法, 其特征在于,对所述测序文库进行测序进一步包括: 从所述测序文库的 5'端开始测序, 以便依次获得 5'端的第一测序数据、 3'端标签序列 及 5'端标签序列以及 3'端的第二测序数据。 14. The method according to claim 13, wherein sequencing the sequencing library further includes: starting sequencing from the 5' end of the sequencing library to sequentially obtain the first sequencing data at the 5' end and the 3' end. The tag sequence at the end, the tag sequence at the 5' end, and the second sequencing data at the 3' end.
15、 权利要求 14所述的方法, 其特征在于, 对所述测序文库进行测序进一步包括以下 步骤: 15. The method of claim 14, wherein sequencing the sequencing library further includes the following steps:
利用第一测序引物进行测序, 以便获得 5'端的第一测序数据; 利用第二测序引物进行 测序, 以便获得 3 '端标签序列; Use the first sequencing primer to perform sequencing to obtain the first sequencing data at the 5' end; use the second sequencing primer to perform sequencing to obtain the 3' end tag sequence;
利用文库编码链的合成, 获得 5'端标签序列; Use the synthesis of the library coding strand to obtain the 5' end tag sequence;
利用第三测序引物进行测序, 以便获得 3'端的第二测序数据, Use the third sequencing primer to perform sequencing to obtain the second sequencing data at the 3' end.
其中, in,
所述第一测序引物与文库模板链的 3'端结合, The first sequencing primer binds to the 3' end of the library template strand,
所述第二测序引物与文库模板链的 5'端结合, 所述第三测序引物与文库编码链的 3'端 结合。 The second sequencing primer binds to the 5' end of the library template strand, and the third sequencing primer binds to the 3' end of the library coding strand.
16、 一种确定核酸序列信息的方法, 其特征在于, 包括以下步骤: 16. A method for determining nucleic acid sequence information, characterized by including the following steps:
针对核酸样本, 根据权利要求 12~15任一项所述的方法进行测序, 以便获得测序结果; 以及 For nucleic acid samples, perform sequencing according to the method described in any one of claims 12 to 15, so as to obtain sequencing results; and
基于所述测序结果, 确定所述核酸样本的序列信息。 Based on the sequencing results, the sequence information of the nucleic acid sample is determined.
PCT/CN2012/086164 2012-12-07 2012-12-07 Method for constructing nucleic acid sequencing library and applications thereof WO2014086037A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086164 WO2014086037A1 (en) 2012-12-07 2012-12-07 Method for constructing nucleic acid sequencing library and applications thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2012/086164 WO2014086037A1 (en) 2012-12-07 2012-12-07 Method for constructing nucleic acid sequencing library and applications thereof

Publications (1)

Publication Number Publication Date
WO2014086037A1 true WO2014086037A1 (en) 2014-06-12

Family

ID=50882790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/086164 WO2014086037A1 (en) 2012-12-07 2012-12-07 Method for constructing nucleic acid sequencing library and applications thereof

Country Status (1)

Country Link
WO (1) WO2014086037A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105506748A (en) * 2016-01-18 2016-04-20 北京百迈客生物科技有限公司 DNA high-flux sequencing and library building method
CN106282332A (en) * 2016-08-08 2017-01-04 中国科学院北京基因组研究所 Label and primer for multiple nucleic acid order-checking
CN107075513A (en) * 2014-09-12 2017-08-18 深圳华大基因科技有限公司 The oligonucleotides of separation and its purposes in nucleic acid sequencing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008093098A2 (en) * 2007-02-02 2008-08-07 Illumina Cambridge Limited Methods for indexing samples and sequencing multiple nucleotide templates
CN102409047A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 Method for building sequencing library by hybridization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008093098A2 (en) * 2007-02-02 2008-08-07 Illumina Cambridge Limited Methods for indexing samples and sequencing multiple nucleotide templates
CN102409047A (en) * 2010-09-21 2012-04-11 深圳华大基因科技有限公司 Method for building sequencing library by hybridization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FAIRCLOTH, B.C. ET AL.: "Not All Sequence Tags Are Created Equal: Designing and Validating Sequence Identification Tags Robust to Indels.", PLOS ONE., vol. 7, no. 8, August 2012 (2012-08-01), pages 1 - 11 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107075513A (en) * 2014-09-12 2017-08-18 深圳华大基因科技有限公司 The oligonucleotides of separation and its purposes in nucleic acid sequencing
CN107075513B (en) * 2014-09-12 2020-11-03 深圳华大智造科技有限公司 Isolated oligonucleotides and their use in nucleic acid sequencing
CN105506748A (en) * 2016-01-18 2016-04-20 北京百迈客生物科技有限公司 DNA high-flux sequencing and library building method
CN106282332A (en) * 2016-08-08 2017-01-04 中国科学院北京基因组研究所 Label and primer for multiple nucleic acid order-checking
CN106282332B (en) * 2016-08-08 2019-11-15 中国科学院北京基因组研究所 Label and primer for multiple nucleic acid sequencing

Similar Documents

Publication Publication Date Title
CN105506125B (en) A kind of sequencing approach and a kind of two generation sequencing libraries of DNA
CN106795514B (en) Bubble joint and application thereof in nucleic acid library construction and sequencing
US10400279B2 (en) Method for constructing a sequencing library based on a single-stranded DNA molecule and application thereof
TWI742059B (en) DNA amplification method
KR102458022B1 (en) Methods of sequencing nucleic acids in mixtures and compositions related thereto
US20210363570A1 (en) Method for increasing throughput of single molecule sequencing by concatenating short dna fragments
WO2020135259A1 (en) Sequencing library construction kit and use method and application thereof
AU2021204166B2 (en) Reagents, kits and methods for molecular barcoding
WO2012116661A1 (en) Dna tag and use thereof
JP7033602B2 (en) Barcoded DNA for long range sequencing
WO2012037880A1 (en) Dna tag and application thereof
WO2012068919A1 (en) Dna library and preparation method thereof, and method and device for detecting snps
TW201321518A (en) Method of micro-scale nucleic acid library construction and application thereof
JP2019501641A (en) Rapid sequencing of short DNA fragments using nanopore technology
EP2844766B1 (en) Targeted dna enrichment and sequencing
WO2012037883A1 (en) Nucleic acid tags and use thereof
WO2013192292A1 (en) Massively-parallel multiplex locus-specific nucleic acid sequence analysis
WO2021052310A1 (en) Dna library construction method
WO2013086964A1 (en) Method for enrichment, library construction and snp analysis of gene regions in complex genome of higher plant
WO2020177012A1 (en) Nucleic acid sequence for direct rna library construction, method for directly constructing sequencing library based on rna samples, and use thereof
WO2012037881A1 (en) Nucleic acid tags and use thereof
AU2013325107A1 (en) Method of producing a normalised nucleic acid library using solid state capture material
WO2012037875A1 (en) Dna tags and use thereof
CN111979307A (en) Targeted sequencing method for detecting gene fusion
WO2021027236A1 (en) Method for constructing dna library and application thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12889695

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12889695

Country of ref document: EP

Kind code of ref document: A1