Summary of the invention
The technical problem that an aspect of the present disclosure will solve is to provide a kind of detection method of transgenosis exogenous sequences insertion point, and accuracy is good.
According to an aspect of the present invention, a kind of transgenosis exogenous sequences insertion point detection method is provided, comprises:
Paired short fragment sequence and external source Insert Fragment sequence are compared and determine the one-sided short-movie section of external source (single reads), paired short fragment sequence obtains by carrying out two end sequence of resurveying to the sequenced fragments of testing sample;
Paired short fragment sequence and reference genome sequence are compared and determines the one-sided short-movie section of genome (single reads);
Common factor according to external source one-sided short-movie section and the one-sided short-movie section of genome determines the insertion point of exogenous sequences in genome sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention, external source one-sided short-movie section comprises only has a short-movie section sequence alignment to the paired short fragment sequence in external source Insert Fragment sequence; Genome one-sided short-movie section comprises only has a short-movie section sequence alignment to the paired short fragment sequence on reference genome sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention, external source one-sided short-movie section comprise only have a short fragment sequence and for once comparison to the normal paired short fragment sequence in external source Insert Fragment sequence; Genome one-sided short-movie section comprise only have a short fragment sequence and for once comparison to reference to the normal paired short fragment sequence on genome sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention, the method also comprises: be filtered into short fragment sequence to remove underproof short fragment sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention, be filtered into and short fragment sequence is comprised to remove underproof paired short fragment sequence: remove sequencing quality exceedes short-movie section series number 50% paired short fragment sequence lower than the base number of predetermined threshold; And/or removal sequencing result uncertain base number exceedes the paired short fragment sequence of paired short-movie section series number 10%; And/or the joint sequence removed in paired short fragment sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention, the length of the sequenced fragments of testing sample is 170-500bp, 500-1000bp, 1000-2000bp, 2000-10000bp; And/or the length of short fragment sequence is 40-75bp, 75bp-200bp; And/or the sequenced fragments of testing sample check order the base total amount that obtains be with reference to genome sequence base total amount 5-10 doubly, 10-20 doubly or more than 20 times.
The detection method of the transgenosis exogenous sequences insertion point that the embodiment of the present invention provides, the short fragment sequence that sequence of being resurveyed by two end obtains and the comparison of external source Insert Fragment obtain the one-sided short-movie section in the two ends enrichment of external source Insert Fragment, the short fragment sequence obtained in sequence of simultaneously being resurveyed by two end and the single reads obtained with reference to genome sequence comparison at external source Insert Fragment two ends; Then these two portions are got common factor, determine the position of common factor sequence on reference sequences; By the position of statistics common factor sequence support situation determination insertion point in each site of correspondence on reference sequences, take full advantage of the position characteristic of external source Insert Fragment, accuracy is good.
The technical problem that another aspect of the present disclosure will solve is to provide a kind of detection system of transgenosis exogenous sequences insertion point, and accuracy is good.
A kind of transgenosis exogenous sequences insertion point detection system is provided according to a further aspect in the invention, comprises:
Order-checking unit, obtains paired short fragment sequence for carrying out two end sequence of resurveying to the sequenced fragments of testing sample;
External source one-sided short-movie section determining unit, determines the one-sided short-movie section of external source (single reads) for paired short fragment sequence and external source Insert Fragment sequence being compared;
Genome one-sided short-movie section determining unit, compares paired short fragment sequence and reference genome sequence and determines the one-sided short-movie section of genome (single reads);
Insertion point determining unit, for determining the insertion point of exogenous sequences in genome sequence according to the common factor of external source one-sided short-movie section and the one-sided short-movie section of genome.
According to an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention, external source one-sided short-movie section comprises only has a short-movie section sequence alignment to the paired short fragment sequence in external source Insert Fragment sequence;
Genome one-sided short-movie section comprises only has a short-movie section sequence alignment to the paired short fragment sequence on reference genome sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention, external source one-sided short-movie section comprise only have a short fragment sequence and for once comparison to the normal paired short fragment sequence in external source Insert Fragment sequence; Genome one-sided short-movie section comprise only have a short fragment sequence and for once comparison to reference to the normal paired short fragment sequence on genome sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention, this system also comprises filtering unit, for being filtered into short fragment sequence to remove underproof short fragment sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention, filtering unit removes sequencing quality exceedes short-movie section series number 50% paired short fragment sequence lower than the base number of predetermined threshold; And/or removal sequencing result uncertain base number exceedes the paired short fragment sequence of paired short-movie section series number 10%; And/or the joint sequence removed in paired short fragment sequence.
According to an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention, the length of the sequenced fragments of testing sample is 170-500bp, 500-1000bp, 1000-2000bp, 2000-10000bp; And/or the length of short fragment sequence is 40-75bp, 75bp-200bp; And/or the sequenced fragments of testing sample check order the base total amount that obtains be with reference to genome sequence base total amount 5-10 doubly, 10-20 doubly or more than 20 times.
The detection system of the transgenosis exogenous sequences insertion point that the embodiment of the present invention provides, resurveyed by two end short fragment sequence and the comparison of external source Insert Fragment that sequence obtains of external source one-sided short-movie section determining unit obtains one-sided short-movie section in the two ends enrichment of external source Insert Fragment, and resurveyed by the two end short fragment sequence that obtains in sequence and the comparison of reference genome sequence of genome one-sided short-movie section determining unit obtains singlereads at external source Insert Fragment two ends; These two portions are got common factor by insertion point determining unit, determine the position of common factor sequence on reference sequences; By the position of statistics common factor sequence support situation determination insertion point in each site of correspondence on reference sequences, take full advantage of the position characteristic of external source Insert Fragment, accuracy is good.
Description of the invention provides in order to example with for the purpose of describing, and is not exhaustively or limit the invention to disclosed form.Many modifications and variations are obvious for the ordinary skill in the art.Selecting and describing embodiment is in order to principle of the present invention and practical application are better described, and enables those of ordinary skill in the art understand the present invention thus design the various embodiments with various amendment being suitable for specific end use.
Embodiment
With reference to the accompanying drawings the present invention is described more fully, exemplary embodiment of the present invention is wherein described.
Fig. 1 illustrates the schematic diagram of transgenosis exogenous sequences insertion point detection method of the present invention.
As shown in Figure 1, in step S102, compare determine the one-sided short-movie section of external source (single reads) by by carrying out resurvey paired short fragment sequence that sequence obtains and external source Insert Fragment sequence of two end to the sequenced fragments of testing sample.External source one-sided short-movie section comprises in two short fragment sequences only has a short-movie section sequence alignment to the paired short fragment sequence in external source Insert Fragment sequence.Sequenced fragments (sequencing data) (transgenosis) and external source Insert Fragment sequence (insertsize) comparison, understand the comparison of enriched at Insert Fragment two ends to the azygous short fragment sequence on exogenous sequences, this is labeled as single reads 1.External source Insert Fragment is often referred to the exogenous sequences or transposon sequence that import in transgenic technology, and external source Insert Fragment such as can be obtained by clone.
In step S104, paired short fragment sequence and reference genome sequence are compared and determines the one-sided short-movie section of genome.Genome one-sided short-movie section comprises in two short fragment sequences only has a short-movie section sequence alignment to the paired short fragment sequence on reference genome sequence.Sequencing data and reference genome sequence (reference) comparison, this, to the azygous short fragment sequence on exogenous sequences, is labeled as single reads 2 by comparison enriched at Insert Fragment two ends equally.Be often referred to the genome sequence of certain species with reference to genome sequence genome, genome sequence can by checking order and assembling acquisition.
In step S106, the common factor according to external source one-sided short-movie section and the one-sided short-movie section of genome determines the insertion point of exogenous sequences in genome sequence.Singlereads in being gathered by single reads 1 and single reads 2 gets common factor, determines these position of common factor sequence on reference sequences.
In the above-described embodiments, twice comparison process need be carried out, first the short fragment sequence that sequence of being resurveyed by two end obtains and the comparison of external source Insert Fragment, obtain the single reads in the two ends enrichment of external source Insert Fragment, the short fragment sequence obtained in sequence of simultaneously being resurveyed by two end and the single reads obtained with reference to genome sequence comparison at external source Insert Fragment two ends; Then these two portions are got common factor, determine the position of common factor sequence on reference sequences; By statistics common factor sequence corresponding and support situation in each site on reference sequences, thus determine the position of insertion point, take full advantage of the position characteristic of external source Insert Fragment, accuracy is good.
Fig. 2 illustrates the schema of an embodiment of transgenosis exogenous sequences insertion point detection method of the present invention.
As shown in Figure 2, in step 202, the sequenced fragments of testing sample carries out two end (pair-end) and to resurvey sequence, obtains paired short fragment sequence.
DNA sample to be measured is broken at random the fragment of certain length, this fragment is called as sequenced fragments.The length such as value from 170-500bp, 500-1000bp, 1000-2000bp or 2000-10000bp of sequenced fragments.Checking order respectively in two ends from sequenced fragments during order-checking, thus obtains the short-movie section sequence information at a pair these sequenced fragments two ends, is each paired No. ID, short-movie section sequence distribution, has identical No. ID with two short fragment sequences in a pair short-movie section sequence.The sequenced fragments of testing sample check order the base total amount that obtains be with reference to genome sequence base total amount 5-10 doubly, 10-20 doubly or more than 20 times, to ensure required genome coverage.Preferably, the base total amount that obtains of checking order is at more than 20 times of Genome Size.
In step 204, filter two end and to resurvey the paired short fragment sequence that sequence obtains.
Receiving two end resurveys after the short fragment sequence of sequence gained, will underproof short fragment sequence removal by filtering.Such as, the sequencing sequence that sequencing quality exceedes whole piece short-movie section series number such as 50% lower than the base number of predetermined inferior quality threshold value is removed, wherein inferior quality threshold value is determined by concrete sequencing technologies and order-checking environment, such as, be that B (ASCII value) is as inferior quality threshold value using the mass value of base; The sequencing sequence uncertain base of sequencing result in short fragment sequence (N as in Illumina GA sequencing result) number being exceeded whole piece short-movie section series number 10% is removed; Joint sequence in short fragment sequence is removed; The removal joint sequence short fragment sequence in ground and other are tested the exogenous array comparison introduced, as various terminal sequence, if there is exogenous array in sequence, thinks defective sequence, remove.By filtering, remove underproof short fragment sequence, thus improve the accuracy detected.
In step 206, paired short fragment sequence and external source Insert Fragment sequence are compared and are obtained the one-sided fragment of external source.
Can compare with various common short data records comparison software such as soap, bwa etc.The length of sequenced fragments should be substantially identical, certain domain of walker can be allowed, be called normal short data records for the sequenced fragments of length in the normal range short fragment sequence obtained that checks order, the short fragment sequence obtained that checks order of the sequenced fragments outside normal range is called abnormal short data records.Domain of walker can be arranged according to demand voluntarily.During comparison, the minimum comparison length of the short fragment sequence of sequence of resurveying gained is 40bp, and the most very much not coupling number during comparison, a short data records allowed is as far as possible little, to ensure precise alignment.Such as, short-movie section sequence length is 90bp, the most very much not mates number and is set to 1 or 2.The actual value arranged can change along with the length of short fragment sequence.
After comparison, the short fragment sequence that sequence of resurveying obtains is divided into three types, 1. soap reads: exist in pairs and can comparison to the normal short data records in external source Insert Fragment sequence; 2. single reads: only have a comparison in two paired normal short data records in external source Insert Fragment sequence, such short data records is marked as single reads.In addition, paired abnormal short data records may be labeled as single reads by short data records comparison software, in this case, can increase filtration step and remove these abnormal short fragment sequences; 3. unmap reads: two paired short data records all do not have comparison in external source Insert Fragment sequence, and such short data records is marked as unmap reads.According to comparison result, extract and only have one with on external source Insert Fragment sequence alignment, and only comparison, to the single reads of external source Insert Fragment sequence last time, can ensure the specificity of comparison result like this.The single reads of acquisition is stored, such as, is stored in the document of a called after single file 1.
Step 208, paired short fragment sequence is compared with reference genome sequence and is obtained the one-sided fragment of genome.
Can compare with various common short data records comparison software such as soap, bwa etc.After comparison, the short data records that sequence of resurveying obtains is divided into three types, 1. soap reads: exist in pairs and can comparison to reference to the normal short data records on genome; 2. single reads: only have a comparison in two paired normal short data records on reference genome, such short data records is marked as single reads in addition, paired abnormal short data records may be labeled as single reads by short data records comparison software, in this case, filtration step can be increased and remove these abnormal short fragment sequences; 3. unmap reads: two paired short data records all do not have comparison on reference genome, and such short data records is marked as unmap reads.According to comparison result, be extracted into right short fragment sequence and only have a comparison on reference genome, and only comparison is to the single reads with reference to the genome last time, this is done to the specificity ensureing comparison result, put in the document of a called after single file2.Single reads azygous in single file 2 is sorted according to sample number order, and comparison result is pressed the separation of karyomit(e) order.
Step 210, according to the one-sided fragment of external source and genome one-sided fragment determination exogenous sequences insertion point.
Extract the single reads that single file 1 is identical with in single file 2 No. ID.Get common factor for No. ID according to the single reads in singlefile 1 and single file 2, in two files, No. ID identical and single reads that is mutually pairing is the single reads after qualified common factor.(in two paired short fragment sequences, a comparison is wherein to reference on genome, another just may comparison on external source Insert Fragment.No. ID of the short fragment sequence of this paired existence is identical.) according to occuring simultaneously in the size sequence with reference to corresponding site on genome (short data records comparison is to the zero position on reference genome), add the step-length of short fragment sequence, support situation on each site that single reads after statistics common factor is corresponding on reference sequences, determines the insertion point of transgenosis exogenous sequences.The support situation of the short fragment sequence on each site corresponding on reference sequences can present trough formula curve, and have a lower-most point namely, two ends exist the peak of projection.The peak of a short-movie section sequence enrichment is had at insertion point two ends, more trend towards insertion point, short fragment sequence support number can reduce gradually, occurs the tomography (i.e. the lower-most point of short fragment sequence support number) of a short fragment sequence in insertion point vicinity.The scope of this tomography (lower-most point) can be identified as the on position of exogenous sequences.
In above-described embodiment, belong to normal fragment or abnormal fragment according to the length set and domain of walker determination sequenced fragments thereof.According to one embodiment of present invention, before comparison, first calculate point degree (SD value) partially of Insert Fragment, determine suitable Insert Fragment scope, thus improve the tolerance range of comparison, obtain rational comparison result.The statistics length of Insert Fragment and the occurrence number of this length, and find the highest length of the frequency of occurrences to be designated as M wherein; The Insert Fragment length being not equal to M is designated as x, and occurrence number is designated as n, uses shown formula can calculate SD value:
The distribution of Insert Fragment should be similar to Normal Distribution, and vertex is the mean value of Insert Fragment length, and Insert Fragment length has float cap and floating lower limit.According to the number of times that the mean length of Insert Fragment occurs with the Insert Fragment being less than mean length, calculate a left SD value (L-SD), then obtain the value of floating lower limit according to lower value=(mean length-L-SD is worth) of floating; According to the number of times that the mean length of Insert Fragment occurs with the Insert Fragment being greater than mean length, calculate a right SD value (R-SD), obtain the value of float cap according to float cap value=(R-SD value+mean length).The value fluctuated between restriction in Insert Fragment length is rational Insert Fragment scope.
Fig. 3 illustrates the schema of an application examples of transgenosis exogenous sequences insertion point detection method of the present invention.In this application examples, adopt the process that simulation exogenous sequences inserts and checks order, biological sample to be measured is the transgenic arabidopsis inserting exogenous genetic fragment; By the exogenous sequences radom insertion of 10kb in Arabidopis thaliana reference genome, as Arabidopis thaliana transgenic sample, and utilize Maq simulate software will carry out simulation order-checking to Arabidopis thaliana transgenic sample, the result that order-checking obtains is as sequencing data.
Transgenosis exogenous sequences: the fragment of long 10kb in mouse genome, source database: UCSC GenomeBrowser, network address: http://hgdownload.cse.ucsc.edu/goldenPath/mm9/chromosomes/
Reference sequences genome: arabidopsis gene group, source database: Ensembl Genome Browser network address: http://plants.ensembl.org/Arabidopsis_thaliana/Info/Index
Simulation length is that the exogenous sequences of 10kb is inserted in genome, then carries out simulation order-checking.Simulator program is maq simulate, needs to arrange following parameter :-d ,-N ,-1 ,-2, fq1, fq2 and simupars.dat.Below parameters is described in detail :-d parameter is sequenced fragments length, is set to 500; The short fragment sequence sum that the order-checking of-N parametric representation will obtain, this parameter is determined according to the order-checking degree of depth (Sequencing Depth), the order-checking degree of depth is one of index evaluating sequencing quality, represents the ratio of base total amount (bp) and the Genome Size (Genome) checking order and obtain.Utilize formula: the N=order-checking degree of depth × reference genome total length/(2 × reads length) calculates.The present case simulation order-checking degree of depth is 20 to take advantage of, and be 121M with reference to genome total length, short-movie section sequence length is set to 75bp, and-N is set to 16Mb;-1 ,-2 parameters are that two end is resurveyed the length of the short-and-medium fragment sequence of sequence 1 and short fragment sequence 2, are set to 75 in this example; Fq1, fq2 are output file, by the sequencing data after simulation order-checking and short fragment sequence 1 and short fragment sequence 2 respectively with fastq form stored in fq1, fq2 file; Simupars.dat is the system file of maq simulate software, determines length and the mass value of short fragment sequence.
As shown in Figure 3, in step S301, receive sequencing data, carry out sequencing data pre-treatment (due to herein be simulated data, therefore do not carry out data prediction), store with fastq file layout.
In step s 302, comprise two-part content, concrete steps are decomposed into:
(1) the soap comparison of sequencing data and external source insertion sequence;
(2) sequencing data and the soap comparison with reference to genome sequence;
When carrying out the soap comparison of above-mentioned two parts, need to arrange following parameter :-p ,-a ,-b ,-D ,-o ,-2 ,-u ,-m ,-x ,-s ,-l ,-v.Below parameters is described in detail: internal memory required when this script of-p parametric representation runs; During-a parametric representation two end sequencing, input file is attached most importance to the fq1 file (file at short fragment sequence 1 place) checking order and obtain; During-b parametric representation two end sequencing, input file is attached most importance to the fq2 file (file at short fragment sequence 2 place) checking order and obtain;-D parametric representation with reference to genomic sequence with fasta file layout input (the first row of fasta sequential file is by greater-than sign " > " or branch "; " any explanatory note of beginning, for sequence mark; Be sequence itself from the second row, only allow to use set Nucleotide or amino acid coded identification); Output parameter has three ,-o parameter, and the result of output is comparison to reference to the paired short fragment sequence on genome or exogenous sequences, and its output file take .soap as suffix;-2 parameters, its Output rusults is only have a comparison in paired short fragment sequence to reference on genome or exogenous sequences, and output file is using .single as suffix;-u parameter, its Output rusults is non-comparison to reference to the paired short fragment sequence on genome or exogenous sequences, and output file is using .unmap as suffix; Do not arrange-t parameter to be to retain the primary ID number of short fragment sequence;-m ,-x parameter is the domain of walker of Insert Fragment, and-m parameter refers to the floating lower limit of sequenced fragments, i.e. (negative percentage ratio × sequenced fragments length), and-x parameter refers to the float cap of sequenced fragments, i.e. (positive percentage ratio × sequenced fragments length).In the present invention, find qualified short fragment sequence in order to maximum range, the domain of walker of sequenced fragments relaxed ,-m ,-x parameter is set to sequenced fragments length ± 0.88 × sequenced fragments length respectively;-s parameter is minimum comparison length, is set to 40;-l parameter is Seed Sequences (3 ' end error rate of long segment sequence is high, sets the sequence of certain length as Seed Sequences from the 5 ' end) length in initial comparison, is set to 32; What during the comparison of-v parametric representation, a short fragment sequence allows the most very much not mates number, and this optimum configurations is as far as possible little in the present invention, to ensure precise alignment.Should be noted that the consistence of the setting of soap parameter.
In step S303, extract the short fragment sequence on sequencing data and external source Insert Fragment sequence alignment, stored in document single file 1;
In step s 304, sequencing data and the short fragment sequence on reference genome alignment is extracted, stored in document single file 2;
In step S305, process the short fragment sequence in single file, removing paired short fragment sequence in single file and retaining comparison value is that the single reads of 1 is to ensure the specificity of short-movie section sequence alignment.Occur that the reason of paired short fragment sequence is provided with the domain of walker of sequenced fragments when being comparison, such as value range is ± X × Insert Fragment length, the short fragment sequence of paired exception in comparison result not within the scope of this, also can be put in single file.So, for ensureing the specificity of short-movie section sequence alignment, short fragment sequence paired in single file need be removed, retaining the single reads that comparison number is 1.
In step S306, according to the order of each sample number, the single reads in the single file 2 obtained in step S305 is sorted, and press the separation of karyomit(e) order.
In step S307, determine the corresponding position of single reads on reference genome obtained with exogenous sequences sequence alignment, single file 1 is just needed to get common factor with the single reads in the single file2 obtained after step S306 sorts, to determine the position of occuring simultaneously on reference genome.
In step S308, the common factor obtained is sorted in the size with reference to corresponding site on genome according to it, adds the length of short fragment sequence, the support of the short fragment sequence in each site of statistical-reference sequence.
Have the peak of a short-movie section sequence enrichment at insertion point two ends, more trend towards insertion point, short fragment sequence support number can reduce gradually, occurs the tomography of a short fragment sequence in insertion point vicinity.The scope of this tomography, is defined as the on position (position as in the dotted line circle of arrow indication in Fig. 4) of exogenous sequences.
The present invention can determine a scope (several base to more than 100 bases not etc.) of the insertion point of exogenous sequences more accurately, though the insertion point of exogenous sequences accurately can not be located, insertion point can be found out accurately in conjunction with normal PCR (polymerase chain reaction) experiment.Through a large amount of simulated experiments checking, the detection efficiency of this invention is higher, and recall rate is 92.15% ± 0.01%, and false positive rate is 3.87% ± 0.013%, and false negative rate is 4.4% ± 0.05%.The present invention detects insertion point by information biology means, and the cycle is fast, cost is low, solves current pure experimental technique detection efficiency low, the problem that Expenses Cost is high.
Fig. 5 illustrates the structure iron of an embodiment of transgenosis exogenous sequences insertion point detection system of the present invention.As shown in Figure 5, this system comprises order-checking unit 51, external source one-sided short-movie section determining unit 52, genome one-sided short-movie section determining unit 53 and insertion point determining unit 54.Order-checking unit 51 is crossed and is carried out two end sequence of resurveying to the sequenced fragments of testing sample and obtain paired short fragment sequence; Paired short fragment sequence and external source Insert Fragment sequence are compared and are determined the one-sided short-movie section (singlereads) of external source by external source one-sided short-movie section determining unit 52; Paired short fragment sequence and reference genome sequence are compared and are determined the one-sided short-movie section of genome by genome one-sided short-movie section determining unit 53; Insertion point determining unit 54 determines the insertion point of exogenous sequences in genome sequence according to the common factor of external source one-sided short-movie section and the one-sided short-movie section of genome.Wherein, external source one-sided short-movie section comprises and only has a short-movie section sequence alignment to the paired short fragment sequence in external source Insert Fragment sequence; Genome one-sided short-movie section comprises only has a short-movie section sequence alignment to the paired short fragment sequence on reference genome sequence.According to one embodiment of present invention, external source one-sided short-movie section comprise only have a short fragment sequence and for once comparison to the normal paired short fragment sequence in external source Insert Fragment sequence; Genome one-sided short-movie section comprise only have a short fragment sequence and for once comparison to reference to the normal paired short fragment sequence on genome sequence.
According to one embodiment of present invention, the length of the sequenced fragments of testing sample is 170-500bp, 500-1000bp, 1000-2000bp, 2000-10000bp; The length of short fragment sequence is 40-75bp, 75bp-200bp; The sequenced fragments of testing sample check order the base total amount that obtains be with reference to genome sequence base total amount 5-10 doubly, 10-20 doubly or more than 20 times.
Fig. 6 illustrates the structure iron of another embodiment of transgenosis exogenous sequences insertion point detection system of the present invention.Compare with Fig. 5, the system of this embodiment also comprises filtering unit 65, for being filtered into short fragment sequence to remove underproof short fragment sequence.Such as, filtering unit 65 removes sequencing quality exceedes short-movie section series number 50% paired short fragment sequence lower than the base number of predetermined threshold; Remove the paired short fragment sequence that sequencing result uncertain base number exceedes paired short-movie section series number 10%; Remove the joint sequence in paired short fragment sequence.
In above-described embodiment, remove underproof short fragment sequence by filtering unit, the accuracy of detection can be improved.
For the function of each device or unit in Fig. 5 to Fig. 6, can with reference to above about the explanation of corresponding part in the embodiment of the inventive method, for for purpose of brevity, be not described in detail in this.
Detection method provided by the invention and system, based on the heavy sequencing technologies of full-length genome, solve existing four kinds of variation detection techniques and accurately can not detect variant sites and the indeterminable problem of other experimental techniques, accuracy is good, simple and efficient, cost is low, and detecting for the detection of genetically modified organism and products thereof with in supervising about the variation of large fragment provides science, accurate, reliable detection means.
It will be understood by those of skill in the art that for each device in Fig. 5 to Fig. 6, can be realized by independent calculating treatmenting equipment, or be integrated into an independently equipment realization.In Fig. 5 to Fig. 6, with frame, the function that they are described is shown.These functional blocks can realize by hardware, software, firmware, middleware, microcode, hardware description voice or their arbitrary combination.For example, one or two functional blocks can utilize the codes implement operated on microprocessor, digital signal processor (DSP) or any other suitable computing equipment.Code can represent the arbitrary combination of process, function, sub-routine, program, routine, subroutine, module or instruction, data structure or program statement.Code can be arranged in computer-readable medium.Computer-readable medium can comprise one or more storing device, such as, RAM storer, flash memories, ROM storer, eprom memory, eeprom memory, register, hard disk, portable hard drive, CD-ROM or other any type of storage medias well known in the art are comprised.Computer-readable medium can also comprise the carrier wave of encoded data signal.
Those skilled in the art will recognize that hardware, firmware and software arrangements replaceability in these cases, and with how realizing each application-specific best this function.