For library, reagent and the application of the assessment of two generation sequencing qualities
Technical field
This application involves nucleic acid sequencing quality evaluation field, more particularly to a kind of text for being used for the assessment of two generation sequencing qualities
Library, reagent and application.
Background technology
High throughput sequencing technologies are a very important technologies, are all had to pass in biological study and clinical practice
Important role, the particularly effect in accurate medical treatment become more and more important.As the sequencing of two generations is in accurate medical treatment
Status is more and more important, and accuracy requirement, which is sequenced, to it accordingly also steps up.The Mainstream Platform of current two generations sequencing, such as
Illumina and proton, is attained by 99.9% accuracy rate, but the complexity of the lengthening of sequencing reading length and base contents
Degree etc. may all cause the decline of accuracy rate is sequenced.In order to better meet effect of the two generations sequencing in accurate medical treatment, having must
Constantly to promote sequencing technologies.
The basic procedure of current two generations sequencing includes, the structure of sequencing library, and the amplification of library signal will by Sequenase
Base signal is changed into the optical signal that sequenator can identify, optical signal is reduced into base letter finally by computer software
Breath.
In above-mentioned sequencing basic procedure, there are several aspects all easily to introduce sequencing mistake, cause under the accuracy rate being sequenced
Drop:(1) in library construction process, interrupting for segment may lead to base mutation or missing, and PCR amplification may introduce the alkali of mistake
Basigamy pair;(2) the signal amplification in library is typically also to be carried out by PCR, likewise, the fidelity sex chromosome mosaicism of enzyme may also introduce
The mistake of sequencing;(3) during base signal being transformed into electric signal, because dNTP is usually the dNTP with modification, sequencing
Enzyme is that the dNTP of cooperation transformation is also required to be transformed accordingly, and fidelity can also be affected to a certain extent, so as to
Lead to the accuracy of sequencing;(4) during final data analysis software converts optical signal into base information, it is also possible to
Because the factors such as fluorescence background, impurity, signal be weak cause signal processing to malfunction.
Under normal circumstances, in order to verify the accuracy rate of two generations sequencing, generation sequencing sanger PCR sequencing PCRs can be selected to be tested
Card.But the method is cumbersome, in being unfavorable for for being researched and developed in sequencing technologies, targetedly improves each ring in sequencing procedure
Caused error rate in section.
Invention content
The purpose of the application is to provide a kind of new library, reagent and application for the assessment of two generation sequencing qualities.
The application's discloses the library assessed for two generation sequencing qualities on one side, which is with different bases feature
Known array single-stranded DNA banks, and joint sequence and index sequence are connected in library;Wherein, there are different alkali
It is single-stranded that the single-stranded DNA banks of the known array of base feature include high AT contents single stranded DNA, high GC content single stranded DNA, poly structures
At least one of DNA and hairpin structure single stranded DNA.
It should be noted that in the sequencing procedure of practical unknown nucleotide sequence, it is understood that there may be various influences sequencing it is accurate
Property base feature, such as high AT contents, high GC content, poly structures and hairpin structure etc., the application it is creative use people
Work synthetic method, synthesis have the single-stranded DNA banks of the known array of more than different bases feature;In this way by comparing sequencing
As a result and known array, it is possible to which what kind of sequencing deviation is microarray dataset specifically will appear used by knowing, so as to two generations
Sequencing quality is assessed.By to these sequencing deviations, further can also targetedly be corrected, and then improve and survey
Sequence accuracy.
It is appreciated that the application is used for the library of two generation sequencing qualities assessment, commented in addition to two generation sequencing qualities can be carried out
Other than estimating, as previously mentioned, can also further the sequencing of two generations be corrected, be optimized, with improve sequencing accuracy or
Sequencing quality.
It should also be noted that, for ease of use, Library development flow is reduced also for further, reduces Library development flow institute
The base mistake or error of introducing, the application is preferred, connects joint sequence and index sequence in advance in library, that is,
It says, in the sequence of synthetic library, joint sequence and index sequence are directly artificial synthesized together;Avoiding problems again to text
Add the reaction step of joint sequence and index sequence in library.Wherein, the particular sequence of joint sequence and index sequence can refer to
Existing microarray dataset, does not limit herein.
Preferably, the both ends in library also have universal primer binding sequence.
It should be noted that the purpose of universal primer binding sequence is identical in order to make not homotactic library that can use
Primer pair expanded, for example, by six libraries of the application, using identical universal primer binding sequence, then only need
Six libraries can be expanded using pair of primers, do not need to each library pair of primers.
Preferably, the library of the application sequence, sequence, SEQ ID shown in SEQ ID NO.8 as shown in SEQ ID NO.7
Sequence shown in NO.9, sequence shown in SEQ ID NO.10, sequence shown in SEQ ID NO.11 and sequence shown in SEQ ID NO.12
At least one of composition.
It should be noted that the library of sequence shown in SEQ ID NO.7 to 12 is in a kind of realization method of the application,
Six libraries effectively two generation sequencing qualities can be assessed by verification, optimized;Those skilled in the art are according to this
The guidance of application on the basis of the application, can be commented with artificial synthesized more libraries for be sequenced in two generations with progress quality
Estimate or optimize.
The application's discloses a kind of cloning vector on one side again, which includes plasmid and Insert Fragment, wherein, it inserts
Enter the library that segment includes the application.
Preferably, plasmid is pMD18-T or pMD19-T.
It should be noted that in a kind of preferred embodiment of the application, artificial synthesized library sequence is inserted into plasmid,
Primary artificial synthesized library is only needed, passes through the duplication of plasmid later, it is possible to unlimited number of acquisition library sequence.
The application's discloses a kind of engineering bacteria on one side again, which includes recipient bacterium and import and be stored in recipient bacterium
In the application cloning vector.
Preferably, the recipient bacterium that the application uses is Escherichia coli.
It should be noted that the application will be after in library clone to plasmid, it is only necessary to as soon as time single-stranded DNA banks are synthesized,
It can infinitely use, without synthesizing again, reduce sequent synthesis cost.In subsequent use, it is only necessary to culturing engineering bacterium,
Extract plasmid, it is possible to obtain required library.Also, used by having included corresponding microarray dataset in the library sequence
Sequence measuring joints can be sequenced by simple library construction.Whole process is simple, conveniently, and stability is high.
The application's discloses a kind of reagent for being used for the assessment of two generation sequencing qualities on one side again, which includes the application's
The engineering bacteria in library, the cloning vector of the application or the application.
Library, cloning vector and the engineering bacteria of the application may be used to the assessment of two generation sequencing qualities or for two
Generation sequencing is corrected, is optimized, to improve sequencing accuracy or sequencing quality;Therefore, any of which can be made to examination
Agent box, with easy to use.
Preferably, in the reagent of the application, universal primer is further included, the sense primer of universal primer is SEQ ID NO.13
Shown sequence, downstream primer are sequence shown in SEQ ID NO.14.
It should be noted that the universal primer binding sequence that universal primer is the both ends for being directed to library designs, it can be right
Library or cloning vector are expanded, to obtain library sequence.For ease of use, the application is using universal primer as one
Independent packaging is put into the kit of the application.
It should also be noted that, for cloning vector, such as pMD18-T or pMD19-T, itself has plasmid amplification to draw
Object can be directed to plasmid design primer, and different Insert Fragment progress is expanded simultaneously, then is not needed at the both ends in library
Universal primer binding sequence is designed, amplified library or sequencing can also directly use plasmid amplification primer, not need to individual SEQ
The universal primer of sequence shown in sequence shown in ID NO.13 and SEQ ID NO.14.What kind of do not limited herein using mode specifically
It is fixed.
It is furthermore preferred that in the reagent of the application, splint oligo are further included, splint oligo are SEQ ID NO.15
Shown sequence.
It should be noted that the effect of splint oligo is to be cyclized library DNA, in a kind of realization method of the application
It is sequenced using DNA nanosphere technologies, it is therefore desirable to which library is cyclized.It is appreciated that if DNA nanospheres are not used
Technology can not also use splint oligo, be not specifically limited herein.
The application also disclose on one side again the library of the application, the cloning vector of the application, the application engineering bacteria or
The reagent of person the application is in base and sequencing quality relationship assessment, amplification enzyme base Preference and accuracy evaluation, Sequenase standard
True property assessment, base signal extraction assessment improve, the sequencing accuracy detection of two generations, build library to sequencing links error rate inspection
Library is surveyed or built to the application in sequencing links scheme optimization.
It should be noted that the library of the application, cloning vector, engineering bacteria and reagent based on the application library can be with
Quality evaluation is carried out for two generations to be sequenced;Its principle is exactly inclined between comparative analysis sequencing result and known library sequence
Difference, the deviation can assess sequencing quality, assess amplification enzyme, the accuracy of Sequenase or optimized based on the deviation.
It is appreciated that based on principles above, library, cloning vector, engineering bacteria and the reagent of the application etc., flow can be sequenced to two generations
Each step assessed, detected, optimized, be not specifically limited herein.
The application's also discloses a kind of method for improving nucleic acid sequencing accuracy on one side again, including being had not using one
Single-stranded DNA banks with the known array of base feature are sequenced, and sequencing result and known array are compared, statistical
Deviation is sequenced present in different bases feature in analysis, according to sequencing bias correction sequencing software algorithm, is surveyed so as to improve nucleic acid
Sequence accuracy;The single-stranded DNA banks of known array with different bases feature include high AT contents single stranded DNA, high GC content
At least one of single stranded DNA, poly structures single stranded DNA and hairpin structure single stranded DNA.
Preferably, poly structures single stranded DNA includes poly A structures single stranded DNA, poly T structures single stranded DNA, poly G
At least one of structure single stranded DNA and poly C-structure single stranded DNAs.
Preferably, single-stranded DNA banks are exactly the library of the application.
It should be noted that the method that the application improves nucleic acid sequencing accuracy, the actually text based on the application
Library according to the principle of the application, was sequenced for two generations and carries out quality evaluation, and then optimizes and improve sequencing accuracy.It is appreciated that base
In identical principle, on the basis of " method for improving nucleic acid sequencing accuracy " of the application, the application can also provide core
Method, base and sequencing quality relationship assessment method, amplification enzyme base Preference and the accuracy evaluation that sour sequencing quality is assessed
Method, Sequenase Accuracy Evaluation, base signal extraction assessment or ameliorative way, two generations sequencing method for detecting accuracy,
Library is built to sequencing links error rate detection method or builds library to links scheme optimization method etc. is sequenced, is not done herein
It is specific to limit.
It should also be noted that, the accuracy of nucleic acid sequencing can be improved by the present processes, likewise, using this
The method of application can also assess the base Preference and accuracy of amplification enzyme, for example, by comparing amplification enzyme is used
The sequencing result of the front and rear single-stranded DNA banks of amplification, it is possible to analyze influence of the amplification enzyme to sequencing deviation, be commented so as to reach
Estimate the purpose of amplification enzyme accuracy, the concrete type of deviation is sequenced by analysis, it is possible to know the base Preference of amplification enzyme.
It is similar to the assessment principle of Sequenase accuracy.In addition, the present processes can improve the accuracy of nucleic acid sequencing, it is crucial
It also resides in, after comparative analysis sequencing result and known array, corrects sequencing software algorithm, just include base signal among these
The processing of extraction, therefore, the present processes can be applied to improve or assess base signal extraction.
Due to using the technology described above, the advantageous effect of the application is:
The base feature of various structure-controllables is designed in the library of the application wherein, using these known base features
Sequence is sequenced, it can be estimated that the quality of two generations sequencing is realized in the influence and deviation that different base features was sequenced for two generations
These deviations are targetedly corrected in assessment, and then realize and the sequencing of two generations is optimized.The application improves nucleic acid sequencing accuracy
Method, the creative base feature for employing the application library by comparing sequencing result and known library sequence, obtains
The sequencing deviation of different bases feature so as to instruct the improvement of sequencing software algorithm, and then reaches the mesh for improving sequencing accuracy
's;By the present processes, sequencing deviation can be effectively reduced, is provided for raising sequencing accuracy a kind of simple and effective
Method.
Description of the drawings
Fig. 1 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.7 in the embodiment of the present application;
Fig. 2 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.8 in the embodiment of the present application;
Fig. 3 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.9 in the embodiment of the present application;
Fig. 4 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.10 in the embodiment of the present application;
Fig. 5 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.11 in the embodiment of the present application;
Fig. 6 is the preceding 50bp sequencing result figures of Sequence Library shown in SEQ ID NO.12 in the embodiment of the present application;
Fig. 7 is the Q30 distribution maps in high GC libraries in the embodiment of the present application.
Specific embodiment
The application is by a large amount of experiment and the study found that in practical sequencing procedure, in face of various sequencings pair
As the complexity of base contents is current an important factor for influencing two generation sequencing qualities.Such as AT, a GC are distributed
Uniformly, poly structures and the less sequence of hairpin structure, illumina and proton are attained by 99.9% accuracy rate;But
It is that for the more sequence of high AT contents, high GC content or poly structures and hairpin structure, the accuracy of sequencing is just significantly
Decline in addition can not effectively meet the use demand that is accurately sequenced in precisely medical treatment.
For this purpose, the proposition of the application creativeness and have developed the known array with different bases feature single stranded DNA text
Library just includes specially designed various base features in sequence, including high AT contents, high GC content, poly structures and hair
Clamping structure etc.;In a kind of realization method of the application, i.e., shown in SEQ ID NO.7 to SEQ ID NO.12 six of sequence it is single-stranded
DNA;The library designed using the application carries out the sequencing of two generations, and analyze and compare sequencing to the known array of particular bases feature
As a result the deviation between known library sequence, so as to analyze two generations sequencing in the case that various base features accuracy or
Sequencing quality, the deviation obtained for analysis are corrected, and then optimize the sequencing of two generations.
It should be noted that before the library of structure the application, the application devises a set of standard nucleic acid in advance, this kind of
The various base features needed for the application library are contained in nucleic acid, which part is then chosen by this set of standard nucleic acid again
Or whole sequences carries out library construction.In a kind of realization method of the application, standard nucleic acid by six single stranded DNAs extremely
A few composition;The sequence of six single stranded DNAs is sequentially sequence shown in SEQ ID NO.1, sequence, SEQ shown in SEQ ID NO.2
Sequence shown in ID NO.3, sequence shown in SEQ ID NO.4, sequence shown in SEQ ID NO.5 and sequence shown in SEQ ID NO.6.
The library of sequence shown in SEQ ID NO.7 to 12 in the application, sequentially corresponding is exactly the institutes of SEQ ID NO.1 to 6 of the application
Show the standard nucleic acid of sequence.
Below by specific embodiment, the application is described in further detail.Following embodiment only to the application into advance one
Step explanation, should not be construed as the limitation to the application.
Embodiment
This example devises one group respectively comprising bases such as high AT contents, high GC content, poly structures and hairpin structures first
The standard nucleic acid sequence of feature then for the standard nucleic acid sequence design library, and is added BGISEQ in library sequence and is connect
Header sequence, index sequence and universal primer binding sequence.Artificial synthesized designed library sequence, by artificial synthesized library sequence
Row are inserted into pMD19-T plasmids, and plasmid is imported in Ecoli Escherichia coli, and engineering bacteria is made.Extract the matter in engineering bacteria
Grain obtains library sequence, is sequenced for two generations, assesses sequencing quality.It is as follows in detail:
First, the design of standard nucleic acid
This example is according to record base feature relatively common in practical be sequenced, such as high AT contents, high GC content, poly knot
Structure and hairpin structure etc., have separately designed six standard nucleic acid sequences, and each standard nucleic acid sequence pair should use different indexes
Sequence.In detail as shown in table 1.
The sequence of 1 standard nucleic acid of table
In six standard nucleic acid sequences of this example, including two high GC sequences, two high AT sequences and two stochastic orderings
Row, two random sequences are all the similar general sequences of ACGT contents, and two random sequences are used for comparative analysis.Wherein every mark
Quasi- nucleic acid sequence respectively corresponds to an index sequence, i.e. barcode sequences, for distinguishing different sequences.Two high GC sequences
Include hairpin structure and poly structures in two high AT sequences.
2nd, library sequence design and structure
For six standard nucleic acid sequences of this example design, wherein most sequence is chosen, builds library, also, in text
The joint sequence for being suitble to BGISEQ is inserted into library, it is identical in the both ends connection of the different library sequences of six standard nucleic acid sequences
Universal primer binding sequence.This example is for six single stranded DNA standard cores of sequence shown in SEQ ID NO.1 to SEQ ID NO.6
The library sequence of acid sequence design is sequentially sequence shown in SEQ ID NO.7 to SEQ ID NO.12.
SEQ ID NO.7:
5’-GATATCTGCAGGCATAGAATGAATATTATTGAATCAATAATTAAAGTCGGAGGCCAAGCGGTCTTA
GGAAGACAAACTAGTACGTCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTTACAACTACAGATAAT
GGGCTGGATACATGGAATGATTATAGATATATTAAGGAATAATGTTAATTAATGCCTAAATTAATTAATCTAAGGGG
GTTAATACTTCAGCCTGTGATATC-3’;
SEQ ID NO.8:
5’-GATATCTGCAGGCATGAATAATAATGGAATAGCAATAATTAAAGTCGGAGGCCAAGCGGTCTTAGG
AAGACAACGATCAGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTATATAATGTAATACATAA
TATTAATATATTAATTATTGTATGATTGTTATCTATTACAGTCTAGTACTGACCCGTAGACATATATGCCCCCGATT
AATTACTTATCAGCCTGTGATATC-3’;
SEQ ID NO.9:
5’-GATATCTGCAGGCATCGGCCGCGGCGTCCAGTGCGCGGCGCTAGAGCCGGCAAGTCGGAGGCCAAG
CGGTCTTAGGAAGACAACGCTATGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTCCGCCGCG
GTCGCTTGTCCGGCCGCCGGTCCGGCGCCGGCGGCGCAAAGTGCCAGGCCGAGCCGGCGAACCAGCGGTCCGAAAAA
CACGGACACTCAGCCTGTGATATC-3’;
SEQ ID NO.10:
5’-GATATCTGCAGGCATCACCGCCGAGGCCGCGGCGGAGACCGCCGGCGCAGGAAGTCGGAGGCCAAG
CGGTCTTAGGAAGACAACAGAGTGTACCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTCAAACTAC
CGGCGCGGCGCTCCTCCGGCCGTCCGCCGCCGACCGGCGGCGGCGTTCCGGTGTGGCACTCCAGGTGGCCGGTTCTC
TGCCAAGCGTCAGCCTGTGATATC-3’;
SEQ ID NO.11:
5’-GATATCTGCAGGCATGAAGAACAACCCCGCACGACGCCTACCAACCAAGTCGGAGGCCAAGCGGTC
TTAGGAAGACAACTGTATCGTACAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTGCTGTTCGCGGCC
GATGTTCGTATAAGATATAAGTTTGGGTATATTCCAGTTTATCGATCGTATCGAAATGTATGAGTTTATACAGGTCC
TACTTCAACTCAGCCTGTGATATC-3’;
SEQ ID NO.12:
5’-GATATCTGCAGGCATACTAGACCAGTTCATTATTATAGTGCTAGCCAAAGTCGGAGGCCAAGCGGT
CTTAGGAAGACAAACATCAACGTCAACTCCTTGGCTCACAGAACGACATGGCTACGATCCGACTTGACGGATTCCCT
CGCTTTCTATTGGCTGACAGTACAAGTAACATAGGTTGGGTCGGTTAACCCTGCCGTCACAAGTGGAACGATGTTAA
TAGTTGCGGTCAGCCTGTGATATC-3’;
More than in six library sequences " GATATCTGCAGGCAT " be 5 ' ends universal primer binding sequences,
" TCAGCCTGTGATATC " is the universal primer binding sequence at 3 ' ends, for this two sections of sequence design universal primers.
“AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAANNNNNNNNNNCAACTCCTTGGCTCACAGAACGACATGGCTACGA
TCCGACTT " is the joint sequence comprising index sequence, wherein, " NNNNNNNNNN " is the index sequence of 10bp.SEQ ID
The index sequence of Sequence Library shown in NO.7 is " ACTAGTACGT ", and the index sequence of Sequence Library is shown in SEQ ID NO.8
" CGATCAGTAC ", the index sequence of Sequence Library shown in SEQ ID NO.9 is " CGCTATGTAC ", shown in SEQ ID NO.10
The index sequence of Sequence Library is " CAGAGTGTAC ", and the index sequence of Sequence Library is shown in SEQ ID NO.11
" CTGTATCGTA ", the index sequence of Sequence Library shown in SEQ ID NO.12 are " ACATCAACGT ".
The universal primer of this example, sense primer are sequence shown in SEQ ID NO.13, and downstream primer is SEQ ID
Sequence shown in NO.14;
SEQ ID NO.13:5’-GATATCTGCAGGCAT-3’;
SEQ ID NO.14:5’-GATATCACAGGCTGA-3’.
This example is sequenced using DNA nanosphere technologies, needs to be cyclized library DNA, for this purpose, this example devises splint
Oligo, splint oligo are sequence shown in SEQ ID NO.15;
SEQ ID NO.15:5’-ATGCCTGCAGATATCGATATCACAGGCTGA-3’.
The library of sequence and universal primer, splint oligo shown in this example SEQ ID NO.7 to SEQ ID NO.12
All synthesized by Shanghai life work.
3rd, cloning vector and engineering bacteria structure
Synthesized library sequence is cloned, and cloning vector is imported in Ecoli Escherichia coli.This example clone carries
The structure of body and engineering bacteria is synthesized by Nanjing Jin Sirui.
4th, library obtains
By the engineering bacteria of preservation, using LB culture mediums, in 37 DEG C of overnight incubations, using Thermo Fisher's
Extracts kit extracts plasmid according to the specification mode of the kit.And PCR is carried out to the plasmid of extraction using universal primer
Amplification can be directly used for being sequenced after pcr amplification product cyclisation.
1. plasmid extraction
The plasmid extraction of this example usesPlasmid extraction kit, extraction step referenceExplanation
Book, it is not tired herein to state.
2.PCR is expanded
100 μ L of PCR amplification system, including:5 × high-fidelity enzyme reaction solution, 20 μ L, each group are divided into the dNTPs mixed liquors of 10mM
5 μ L, 1 μ L of high-fidelity enzyme of 1U/ μ L, 20 μM of 6 μ L of sense primer, 20 μM of 6 μ L of downstream primer, extraction 1 μ L of plasmid template,
ddH261 μ L of O amount to 100 μ L.
PCR amplification condition is 98 DEG C of 3min, subsequently into 33 cycles:98℃20s、60℃15s、72℃30s;Cycle
After, 72 DEG C of 5min, 4 DEG C of Hold.
3.PCR amplified productions are cyclized
This example first using magnetic beads for purifying pcr amplification product, then according to BGIseq500SE50 cyclisation build library kit and
Flow is cyclized the pcr amplification product of purifying.The specific steps reference reagent box specification of pcr amplification product cyclisation,
This, which does not tire out, states.
5th, library sequencing detection and sequencing accuracy detection
Library to verify synthesized known array can meet the sequencing of BGISEQ platforms, the SEQ ID that this example is obtained
Six libraries of sequence shown in NO.7 to SEQ ID NO.12 carry out the sequencing of SE50+10 according to BGISEQ500SE50 kits
Verification.
The cyclisation product in six libraries is taken, DNB preparations are carried out according to the operating process of BGISEQ500.Then system is taken respectively
Each 15 μ L of standby DNB are mixed into the DNB systems of 90 μ L, and chip manufacturing is carried out, and SE50+10 is selected to be sequenced according to normal process
Pattern is sequenced.
Sequencing result is shown, according to index sequence to six texts of sequence shown in SEQ ID NO.7 to SEQ ID NO.12
The sequencing result in library distinguishes, and the preceding 50bp results of six library sequence sequencings are identical with practical standard nucleic acid sequence, and six
As shown in Figures 1 to 6, Fig. 1 to Fig. 6 sequentially corresponds to SEQ ID NO.7 to SEQ ID to the preceding 50bp results of a library sequence sequencing
The sequencing result in six libraries of sequence shown in NO.12;Illustrate that library is built into work(, algorithm basecall is accurate.
6th, the assessment of sequencing quality
To compare the relationship of sequencing quality and base, this example uses the sequencing kit of BGISEQ500SE100+10, to height
Sequence shown in the library (referred to as high AT libraries) of sequence shown in the SEQ ID NO.7 of AT contents and the SEQ ID NO.9 of high GC content
The library (referred to as high GC libraries) of row carries out the sequencing of SE100.
The preparation of DNB and chip manufacturing are identical with " five, library sequencing detection and sequencing accuracy detection ".Only this experiment
The DNB in the library of sequence shown in the library of sequence shown in SEQ ID NO.7 and SEQ ID NO.9 is only prepared for, and is carried out
The upper machine sequencings of SE100.
The sequencing quality in two libraries, as shown in table 2, sequence shown in the SEQ ID NO.9 of high GC content are compared in analysis
Library, Q30 is less than the library of sequence shown in the SEQ ID NO.7 of high AT contents, and error rate is higher than the library of high AT contents.
For this purpose, in sequencing technologies later improve, the abundant library of G/C content can be directed to and targetedly optimized.
The sequencing quality in 2 two libraries of table compares
Title |
PredQual |
G/C content % |
Q10% |
Q10% |
Q10% |
EsErr% |
High AT libraries |
33 |
27.05% |
99.16 |
98.02 |
91.44 |
0.23 |
High GC libraries |
33 |
75.47% |
98.16 |
94.18 |
85.05 |
0.68 |
In addition, base and the relationship of mass value are further analyzed, as shown in fig. 7, Fig. 7 is the Q30 distribution maps in high GC libraries,
Can significantly it find out, in 60bp, 68bp, 81bp, 91bp, 97bp, there are one apparent downward trends for Q30 figures, corresponding
These positions of the sequence, all there are one common traits, i.e., when bases G is followed by A, the sequencing quality of A can be deteriorated, this is for it
The optimization of sequencing technologies afterwards provides direction.
As it can be seen that the standard nucleic acid of this example and the library based on standard nucleic acid, during in two generations, can be sequenced, the alkali of sequencing
Base Preference and accuracy are assessed, and two generation sequencing qualities are assessed in the accuracy of detection two generations sequencing;Also, for sequencing
As a result it with the analysis of base feature, targetedly optimizes, improves the accuracy of nucleic acid sequencing.
The foregoing is a further detailed description of the present application in conjunction with specific implementation manners, it is impossible to assert this Shen
Specific implementation please is confined to these explanations.For those of ordinary skill in the art to which this application belongs, it is not taking off
Under the premise of conceiving from the application, several simple deduction or replace can also be made.
SEQUENCE LISTING
<110>Shenzhen Hua Da life science institute
<120>For library, reagent and the application of the assessment of two generation sequencing qualities
<130> 17I25566-A23542
<160> 15
<170> PatentIn version 3.3
<210> 1
<211> 150
<212> DNA
<213>Artificial sequence
<400> 1
tacaactaca gataatgggc tggatacatg gaatgattat agatatatta aggaataatg 60
ttaattaatg cctaaattaa ttaatctaag ggggttaata ctatgtgtta attaatctta 120
ttagaatgaa tattattgaa tcaataatta 150
<210> 2
<211> 150
<212> DNA
<213>Artificial sequence
<400> 2
atataatgta atacataata ttaatatatt aattattgta tgattgatat ctattacagt 60
ctagtactga cccgtagaca tatatgcccc cgattaatta cttaggctta ttaataatat 120
ataggaataa taatggaata gcaataatta 150
<210> 3
<211> 150
<212> DNA
<213>Artificial sequence
<400> 3
ccgccgcggt cgcttgtccg gccgccggtc cggcgccggc ggcgcaaagt gccaggccga 60
gccggcgaac cagcggtccg aaaaacacgg acacggtaac ctcaccacga tggccggccg 120
cggcgtccag tgcgcggcgc tagagccggc 150
<210> 4
<211> 150
<212> DNA
<213>Artificial sequence
<400> 4
caaactaccg gcgcggcgct cctccggccg tccgccgccg accggcggcg gcgttccggt 60
gtggcactcc aggtggccgg ttctctgcca agcggcaggc gaaaaatcga cggccaccgc 120
cgaggccgcg gcggagaccg ccggcgcagg 150
<210> 5
<211> 150
<212> DNA
<213>Artificial sequence
<400> 5
gctgttcgcg gccgatgttc gtataagata taagtttggg tatattccag tttatcgatc 60
gtatcgaaat gtatgagttt atacaggtcc tacttcaaca agcggcactt tactaccgtg 120
aagaacaacc ccgcacgacg cctaccaacc 150
<210> 6
<211> 150
<212> DNA
<213>Artificial sequence
<400> 6
gacggattcc ctcgctttct attggctgac agtacaagta acataggttg ggtcggttaa 60
ccctgccgtc acaagtggaa cgatgttaat agttgcggaa ccctatgttc ggcggaatac 120
tagaccagtt cattattata gtgctagcca 150
<210> 7
<211> 244
<212> DNA
<213>Artificial sequence
<400> 7
gatatctgca ggcatagaat gaatattatt gaatcaataa ttaaagtcgg aggccaagcg 60
gtcttaggaa gacaaactag tacgtcaact ccttggctca cagaacgaca tggctacgat 120
ccgactttac aactacagat aatgggctgg atacatggaa tgattataga tatattaagg 180
aataatgtta attaatgcct aaattaatta atctaagggg gttaatactt cagcctgtga 240
tatc 244
<210> 8
<211> 244
<212> DNA
<213>Artificial sequence
<400> 8
gatatctgca ggcatgaata ataatggaat agcaataatt aaagtcggag gccaagcggt 60
cttaggaaga caacgatcag taccaactcc ttggctcaca gaacgacatg gctacgatcc 120
gacttatata atgtaataca taatattaat atattaatta ttgtatgatt gttatctatt 180
acagtctagt actgacccgt agacatatat gcccccgatt aattacttat cagcctgtga 240
tatc 244
<210> 9
<211> 244
<212> DNA
<213>Artificial sequence
<400> 9
gatatctgca ggcatcggcc gcggcgtcca gtgcgcggcg ctagagccgg caagtcggag 60
gccaagcggt cttaggaaga caacgctatg taccaactcc ttggctcaca gaacgacatg 120
gctacgatcc gacttccgcc gcggtcgctt gtccggccgc cggtccggcg ccggcggcgc 180
aaagtgccag gccgagccgg cgaaccagcg gtccgaaaaa cacggacact cagcctgtga 240
tatc 244
<210> 10
<211> 244
<212> DNA
<213>Artificial sequence
<400> 10
gatatctgca ggcatcaccg ccgaggccgc ggcggagacc gccggcgcag gaagtcggag 60
gccaagcggt cttaggaaga caacagagtg taccaactcc ttggctcaca gaacgacatg 120
gctacgatcc gacttcaaac taccggcgcg gcgctcctcc ggccgtccgc cgccgaccgg 180
cggcggcgtt ccggtgtggc actccaggtg gccggttctc tgccaagcgt cagcctgtga 240
tatc 244
<210> 11
<211> 244
<212> DNA
<213>Artificial sequence
<400> 11
gatatctgca ggcatgaaga acaaccccgc acgacgccta ccaaccaagt cggaggccaa 60
gcggtcttag gaagacaact gtatcgtaca actccttggc tcacagaacg acatggctac 120
gatccgactt gctgttcgcg gccgatgttc gtataagata taagtttggg tatattccag 180
tttatcgatc gtatcgaaat gtatgagttt atacaggtcc tacttcaact cagcctgtga 240
tatc 244
<210> 12
<211> 244
<212> DNA
<213>Artificial sequence
<400> 12
gatatctgca ggcatactag accagttcat tattatagtg ctagccaaag tcggaggcca 60
agcggtctta ggaagacaaa catcaacgtc aactccttgg ctcacagaac gacatggcta 120
cgatccgact tgacggattc cctcgctttc tattggctga cagtacaagt aacataggtt 180
gggtcggtta accctgccgt cacaagtgga acgatgttaa tagttgcggt cagcctgtga 240
tatc 244
<210> 13
<211> 15
<212> DNA
<213>Artificial sequence
<400> 13
gatatctgca ggcat 15
<210> 14
<211> 15
<212> DNA
<213>Artificial sequence
<400> 14
gatatcacag gctga 15
<210> 15
<211> 30
<212> DNA
<213>Artificial sequence
<400> 15
atgcctgcag atatcgatat cacaggctga 30