CN106326689A - Method and device for determining site subject to selection in colony - Google Patents
Method and device for determining site subject to selection in colony Download PDFInfo
- Publication number
- CN106326689A CN106326689A CN201510358145.3A CN201510358145A CN106326689A CN 106326689 A CN106326689 A CN 106326689A CN 201510358145 A CN201510358145 A CN 201510358145A CN 106326689 A CN106326689 A CN 106326689A
- Authority
- CN
- China
- Prior art keywords
- snp
- colony
- data
- selection
- site
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a method and device for determining a site subject to selection in a colony. The method includes the following steps: acquiring nucleic acid sequencing data of colony samples, wherein the colony samples are from a plurality of individuals of a specie, and the colony samples can be divided into 2n first class sub colonies according to n pairs of preset indexes, wherein the n is a natural number; performing detection according to the nucleic acid sequencing data so as to acquire colony SNP data, wherein the colony SNP data includes a plurality of first class colony SNP data; and comparing differences of polymorphism of different first class sub colonies on the basis of the colony SNP data so as to determine an SNP subject to selection, wherein the SNP subject to selection is a site subject to selection. The invention further provides a device and system for determining a site subject to selection in colony. The method, device and/or system can accurately determine the site subject to selection.
Description
Technical field
The present invention relates to field of biology, especially, relate to population genetics field, more particularly, it relates to a kind of true
By the method in site and the device in a kind of site determined in colony by selection of selection in grouping body.
Background technology
Along with secondary order-checking (next generation sequencing, the NGS) maturation of technology and gradually reducing of cost, every
Based on this, investigative technique for various purposes emerges in an endless stream.RNA-Seq is a kind of based on NGS, by sample
Transcript profile (transcriptome) checks order, and is mainly used in disclosing the technology of gene expression rule in sample, is extensively transported
With.Meanwhile, the sequencing data of RNA-Seq can also be used for detecting the pleomorphism site in whole subgenomic transcription region, including SNP
Site.
Summary of the invention
According to an aspect of of the present present invention, the present invention provides a kind of method in site determined in colony by selection, described choosing
The effect selected includes that artificial selection acts on and at least one of natural selection effect, and the method comprises the following steps: (1) obtains colony
The nucleic acid sequencing data of sample, described population sample is from multiple individualities of species, optional, described population sample from
The homologue of one multiple individuality of species or the same area of multiple individualities of species, described population sample can foundation
N is divided into 2n one-level subpopulation to desired indicator, and n is natural number;(2) based on the nucleic acid sequencing data in (1), detection
To obtain colony's SNP data, described colony SNP data include multiple one-level subpopulation SNP data;(3) based in (2)
Colony's SNP data, relatively the difference of the polymorphism of different one-level subpopulations, to determine the SNP by selection, described
It is the described site by selection by the SNP of selection.In one embodiment of the invention, described nucleic acid sequencing
Data utilize RNA-Seq technology to get, for transcript sequencing data.Alleged desired indicator can be arbitrary two each and every one
The different feature of body sample, in one embodiment of the invention, desired indicator is geographical and/or biological character is relevant,
Such as can originate with different geographical, there is certain (a bit) various trait etc. be used as the index of Preliminary division colony.At this
In a bright embodiment, before carrying out the step (3) of the method or after step (3), carry out group structure analysis,
Including: based on the colony's SNP data in (2), described population sample is carried out group structure analysis, it is thus achieved that group structure divides
Analysis result;Optional, carry out described group structure analysis and include phylogenetic tree construction, principal component analysis and STRUCTURE
At least one in analysis.And, in another embodiment of the present invention, further, analyze based on described group structure
As a result, described population sample is repartitioned, i.e. original described to the classification results replacement of colony with the division result obtained
One-level subpopulation, and then carry out (3) and determine in colony the site by selection.
According to another aspect of the present invention, the present invention provides a kind of method based on colony's transcript data analysis group structure, should
Method includes: obtaining the nucleic acid sequencing data of population sample, described population sample is from multiple individualities of species, optional,
Described population sample is from the homologue of a multiple individuality of species or the same area of multiple individualities of species, described
Population sample can be divided into 2n one-level subpopulation according to n to desired indicator, and n is natural number;Based on described nucleic acid sequencing data,
Detecting to obtain colony's SNP data, described colony SNP data include multiple one-level subpopulation SNP data;Based on described group
Body SNP data, relatively the difference of the polymorphism of different one-level subpopulations, determines the SNP by selection, and/or, base
In described colony SNP data, described colony is carried out group structure analysis.
According to another aspect of the invention, the present invention provides the device in a kind of site determined in colony by selection, this dress
The method putting to implement in the invention described above grouping body the most really site by selection, device includes: data are defeated
Enter unit, be used for inputting data;Data outputting unit, is used for exporting data;Processor, is used for performing machine-executable program,
Perform described machine-executable program has included one aspect of the present invention or method in any embodiment;Memory element, with
Described data input cell, data outputting unit are connected with processor, are used for storing data, can perform including described machine
Program.It will be appreciated by those skilled in the art that described machine-executable program can be saved in storage medium, alleged storage
Medium may include that read only memory, random access memory, disk or CD etc..
According to another aspect of the present invention, the present invention provides the system in a kind of site determined in colony by selection, and this is
System can in order to implement the invention described above on the one hand or all or part of step of method in any embodiment, this system bag
Including: sequencing data acquisition device, in order to obtain the nucleic acid sequencing data of population sample, described population sample is from species
Multiple individualities, optional, described population sample from the homologue of a multiple individuality of species or species many each and every one
The same area of body, described population sample can be divided into 2n one-level subpopulation according to n to desired indicator, and n is natural number;SNP
Detection device, is connected with described sequencing data acquisition device, and for based on described nucleic acid sequencing data, detection is to obtain colony SNP
Data, described colony SNP data include multiple one-level subpopulation SNP data;Purpose site determines device, with described SNP
Detection device connects, in order to based on described colony SNP data, the relatively difference of the polymorphism of different one-level subpopulations, to determine
It is the described site by selection by the SNP of selection, the described SNP by selection.
Utilize the method for the invention described above, device and/or system can determine in colony the site by selection accurately.
The method of the present invention and/or device, concentrate on the subgenomic transcription region of more general importance, it is possible to turns based on the colony obtained
Record notebook data, it is thus achieved that gene expression data, discloses the gene expression rule of sample, and this is beneficial to disclose genetic background difference bar
Gene expression rule under part, is the further expansion to population selection scopes such as RAD, GBS.And, it is obtained in that again group
Body SNP data, disclose group structure and population genetic evolution laws.The inventive method, device and/or system can be in order to specifications
Colony's transcript profile is resurveyed sequence analysis process, reduces and analyzes risk, it is possible to high efficiency, high-quality and high standard complete colony's project
Analysis.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage will be apparent from from combining the accompanying drawings below description to embodiment
With easy to understand, wherein:
Fig. 1 be in one embodiment of the present of invention really in grouping body by the flow chart of steps of method in site of selection.
Fig. 2 be in one embodiment of the present of invention really in grouping body by the flow chart of steps of method in site of selection.
Fig. 3 be in one embodiment of the present of invention really in grouping body by the flow chart of steps of method in site of selection.
Fig. 4 be in one embodiment of the present of invention really in grouping body by the device schematic diagram in site of selection.
Fig. 5 be in one embodiment of the present of invention really in grouping body by the system schematic in site of selection.
Fig. 6 is the schematic diagram of the population genetic variations that the Frappe in one embodiment of the present of invention speculates based on colony SNP.
Fig. 7 is the schematic diagram of the phylogenetic tree using adjacent method to infer based on colony SNPs in one embodiment of the present of invention.
Fig. 8 is the PCA analysis result schematic diagram based on colony SNP in one embodiment of the present of invention.
Fig. 9 is that the Arlequin program in one embodiment of the present of invention detects the result by selection site based on colony SNP
Schematic diagram.
Figure 10 is that the Global FST test program in one embodiment of the present of invention detects by selection site based on colony SNP
Result schematic diagram.
Figure 11 is that the BayeScan program in one embodiment of the present of invention is tied by selection site based on colony SNP detection
Really schematic diagram.
Detailed description of the invention
Embodiments of the invention are described below in detail, and the example of described embodiment is shown in the drawings, wherein, the most identical
Or similar label represents same or similar element or has the element of same or like function.Describe below with reference to accompanying drawing
Embodiment be exemplary, be only used for explaining the present invention, and be not considered as limiting the invention.Need explanation, this
Term " one-level " used in literary composition, " two grades " etc. are only for convenience of describing, it is impossible to be interpreted as instruction or hint relative importance,
Sequencing relation can not be had between being interpreted as.In describing the invention, except as otherwise noted, " multiple " are meant that two
Individual or two or more.In this article, unless otherwise clearly defined and limited, term " is connected ", the term such as " connection " should do
Broadly understood, connect for example, it may be fixing, it is also possible to be to removably connect, or be integrally connected;Can be to be mechanically connected,
It can also be electrical connection;Can be to be joined directly together, it is also possible to be indirectly connected to by intermediary, can be two element internals
Connection.
According to one embodiment of present invention, as it is shown in figure 1, the present invention provides a kind of position determined in colony by selection
The method of point, described selection includes that artificial selection acts on and at least one of natural selection effect, and the method includes following step
It is rapid: S10 obtains the nucleic acid sequencing data of population sample, and described population sample is from multiple individualities of species, optional,
Described population sample is from the homologue of a multiple individuality of species or the same area of multiple individualities of species, described
Population sample can be divided into 2n one-level subpopulation according to n to desired indicator, and n is natural number;S20 is based on the core in S10
Acid sequencing data, detects to obtain colony's SNP data, and described colony SNP data include multiple one-level subpopulation SNP data;
S30 is based on the colony's SNP data in S20, and relatively the difference of the polymorphism of different one-level subpopulations, is made by selection to determine
SNP, the described SNP by selection be the described site by selection.
According to one embodiment of present invention, described nucleic acid sequencing data utilize RNA-Seq technology to get, and survey for transcript
Ordinal number evidence.With same species, multiple different genetic backgrounds individuality as object of study, by transcript profile (transcriptome)
Sample carries out high-flux sequence, the disposable subgenomic transcription region polymorphism data obtaining this individually defined thing kind of groups level, including
Colony's SNP data and full genome/transcript expressing information, may be used for disclosing the evolutionary relationship between research individuality and genetic constitution
Difference, the site being subject to artificial/natural selection effect under specific selection in the gene cluster of common evolutionary, subpopulation and individuality
Or the biological question such as functional module and metabolic pathway on expressing with significant difference between subpopulation.And, relative to
The transcript profile of conventional a small amount of sample is resurveyed sequence, compared to the population selection technology such as RAD, GBS, the survey region phase of the present invention
To concentrating on subgenomic transcription region, gene expression can be carried out quantitatively, under the conditions of this is beneficial to disclose genetic background difference
Gene expression rule, be the further expansion to population selection scopes such as RAD, GBS.
Alleged desired indicator can be the different feature of arbitrary two individual specimen, according to one embodiment of present invention,
Desired indicator is geographical and/or biological character is relevant, such as, can originate with different geographical, have certain (a bit) dissimilarity
Shapes etc. are used as the index of Preliminary division colony.
According to one embodiment of present invention, as in figure 2 it is shown, before carrying out step S30 of the method, also include carrying out S23
Group structure is analyzed, and S23 group structure is analyzed and included: based on the colony's SNP data in S20, carry out described population sample
Group structure is analyzed, it is thus achieved that group structure analysis result;Optional, carry out described group structure analysis and include that constructing system is grown
At least one in tree, principal component analysis (PCA) and Group Structure analysis.
Adjacent method phylogenetic tree construction can be utilized, it is also possible to utilize MEGA software building relation, utilize MEGA software
(http://www.megasoftware.net), by the genotype file composition sequence of all for each sample SNP site, one by one
The corresponding sequence of body sample, as the input file of MEGA, MEGA, should according to the difference on each individual sample sequence
Software has three kinds of methods (Maximum likelihood, Least Squares and Maximum parsimony) to build relation
Tree.
In statistics, principal component analysis (Principal Components Analysis, PCA) is a kind of skill simplifying data set
Art, is a linear transformation.This conversion transforms the data in a new coordinate system so that the of any data projection
One big variable number is upper at first coordinate (referred to as first principal component), second largest variable number is at the second coordinate (Second principal component)
On, the like.Principal component analysis is frequently used for reducing the dimension of data set, retains the feature maximum to data set contribution simultaneously
Variable.By retaining low order main constituent, ignore what high-order main constituent realized.This is owing to low order composition tends to encumbrance
According to concentrating most important aspect.According to list of references A tutorial on Principal Components Analysis.Lindsay I
In Smith, 2002-02 and embodiment, first SNP data are converted into character matrix, such as, set by real SNP data characteristics
Fixed the most consistent with reference sequences for 0, contrary for 2, degeneracy base is 1, and uniforms.Then by the side of above-mentioned introduction
Method builds linear vector equation.Wherein i represents i-th sample from 1 to k.Application
What R lingware bag was powerful solves equation ability, solves matrix a, according to the data characteristics of each sample extract front four main constituents to
Amount, shows each individual cluster situation using vector as coordinate axes.
Group Structure analyzes can utilize Structure software
(http://pritch.bsd.uchicago.edu/software/structure2_1.html) is carried out, this software base based on SNP site
Because of typing data, infer whether there are different groups and judge the colony that each individuality is belonged to.According to software manual, by colony
The genotype file format transformation of SNP, as Structure input file up to 50,000 times simulations of employing in mixed model,
In the presence of assuming multiple colony, calculate the probability of each individual ownership all kinds of (sub-) colony.More than Jing Guo, it is possible to realize individual
The classification of body.In one embodiment of the invention, on the basis of classification, it is also possible to screen individuality, such as basis further
Above-mentioned group structure analysis result, it is achieved to individual classification, extracts each individual specimen information, rejects the individuality that there is objection,
Such as classify indefinite or obvious outliers.
According to one embodiment of present invention, further, based on described group structure analysis result, described population sample is entered
Row is repartitioned, and substitutes original one-level subpopulation with the new subpopulation that the division result obtained i.e. obtains, and then based on new
Subpopulation and SNP data thereof carry out step S30 to determine in colony the site by selection, so, divide with group structure
Colony/subpopulation is classified or reclassifies by analysis result again, is conducive to accurately judging the site by selection.
According to one embodiment of present invention, as it is shown on figure 3, after carrying out step S30 of the method, also include carrying out S23
Group structure is analyzed, and S23 group structure is analyzed and included: based on the colony's SNP data in S20, carry out described population sample
Group structure is analyzed, it is thus achieved that group structure analysis result;Optional, carry out described group structure analysis and include that constructing system is grown
At least one in tree, principal component analysis (PCA), Group Structure analysis and the detection of Genetic Constitution of Population Frappe.
According to one embodiment of present invention, the nucleic acid sequencing data of described population sample are by each individual sample forming population sample
This nucleic acid sequencing data composition, it is desirable to the nucleic acid sequencing data of each individual specimen are no less than 4G, are beneficial to accurately detect
SNP, and then be conducive to accurately determining by selection site based on colony SNP data accurately.
According to one embodiment of present invention, population sample is from same species, the individuality with different genetic background.For
Population sample is analyzed, it is proposed that the individual specimen quantity comprised in population sample is not less than 30, and, all individualities related to are extremely
Two and plural subpopulation can be divided into less, i.e. alleged one-level subpopulation according to certain index, in order to after
Continuous variation analysis.According to one embodiment of present invention, it is also preferred that the left each one-level subpopulation includes at least 10 individual specimen,
It is beneficial to variation analysis.According to one embodiment of present invention, all individual specimen are cultivated at identical conditions, so
After sample at identical tissue or position, obtain population sample, so make to carry out colony based on these population sample data and divide
It is meaningful that analysis includes carrying out analysis of gene differential expression, and reason is, the hereditary difference of individual specimen i.e. variable has existed,
Sample under the same terms, it is possible to make the difference expression gene obtained can go to lay down a definition from the angle of hereditary difference, otherwise, Duo Gebian
The existence of amount, the reason that can cause differential expression is equivocal.Such as, research colony is divided into anti-saline and alkaline and does not resist saline and alkaline
Two classes, it is possible to use all individualities being grown under equivalent environment are processed by the saline of identical metering, then to special after processing
The tip of a root of (such as 1 hour) of fixing time is sampled, and so, subsequent population Analysis and Identification difference expression gene out may
Can be used for disclosing these species and resist saline and alkaline mechanism, and, can determine that this differential expression is owing to the difference of genetic background causes.
According to one embodiment of present invention, described one-level subpopulation includes at least one two grades of subpopulation;Optional, an institute
State two grades of subpopulations and include at least 10 individualities.Two grades of subpopulations can be different from another (a bit) dividing colony by utilization
Index divides one-level subpopulation and obtains.Utilize the method in any embodiment of the present invention can to repeatedly divide after multistage Asia
The site by selection in colony accurately judges.
It is according to one embodiment of present invention, described based on colony's SNP data, the relatively difference of different one-level subpopulation polymorphisms,
To determine the SNP by selection, including: based on colony's SNP data, utilize at least two method of inspection relatively described in not
With the difference of the heterozygosity of the identical SNP site in one-level subpopulation, the SNP position that at least two method of inspection is supported will be obtained
Point is defined as the SNP by selection;Optional, the described method of inspection includes F statistic, molecular variant analysis and multilamellar
Bayes method.In some embodiments of the invention, Arlequin program, Global FST test program and BayeScan are utilized
In program two or whole three, or include utilizing in tri-kinds of methods of Arlequin, BayesScan and Datacal extremely
Lack two or all three method judges to compare the heterozygosity difference degree in site, when certain SNP site obtains three of the above inspection
At least two in proved recipe method or the support of all three, the assay of two kinds the most at least within all assert that this SNP is not
It is significant with the difference of the heterozygosity in subpopulation, then judges that this SNP is as the site by selection.So, be conducive to
Accurately judge.
According to one embodiment of present invention, described at least two method of inspection is utilized to come described in comparison in different one-level subpopulations
The difference of the heterozygosity of identical SNP site, is defined as being selected by the SNP site obtaining at least two method of inspection support
The SNP of effect, including: calculate described SNP site heterozygosity difference value in different one-level subpopulations, by heterozygosity difference
Value is defined as the site by selection not less than the SNP site of threshold value.In one embodiment of the invention, alleged miscellaneous
Right difference value is with FST(Fixation index) represents.FSTCan be used to genome distance and the difference of population evaluating between colony
Different, it is one index of differentiation degree between tolerance population, special in the one of nineteen twenty-two application F-inspection by Sewall Wright
Situation develops.FSTNull hypothesis be when colony does not break up, pleomorphism site is in (sub-) group and between (sub-) group
The frequency difference of inferior bit base does not have significance.Calculate FSTMethod a lot, although circular is different, but substantially manages
Opinion is consistent, the definition be i.e. given by Hudson (1992):Wherein, ΠBetweenAt this
In represent from two subpopulations (Between), extract a sample respectively, partner, calculate this to sample SNP gene
The difference of type, so can calculate the difference of all paired samples SNP genotype, finally average and be ΠBetween。ΠWithin
Represent from a subpopulation (Within), extract 2 samples respectively, partner, calculate this to sample SNP genotype
Difference, so can calculate the difference of all paired samples SNP genotype, finally average and be ΠWithin.If having two
Individual subpopulation, can the most first calculate Π by two subpopulationsWithin, then add up.In this embodiment, in conjunction with existing subpopulation
The structure of SNP data, based on above-mentioned principle, derivation formula is as follows:
According to one embodiment of present invention, the present invention provides a kind of method based on colony's transcript data analysis group structure,
The method includes: obtaining the nucleic acid sequencing data of population sample, described population sample is from multiple individualities of species, optionally
, described population sample from the homologue of a multiple individuality of species or the same area of multiple individualities of species,
Described population sample can be divided into 2n one-level subpopulation according to n to desired indicator, and n is natural number;Based on described nucleic acid sequencing
Data, detect to obtain colony's SNP data, and described colony SNP data include multiple one-level subpopulation SNP data;Based on
The difference of the polymorphism of described colony SNP data, relatively different one-level subpopulations, determines the SNP by selection, and/
Or, based on described colony SNP data, described colony is carried out group structure analysis.
According to one embodiment of present invention, as shown in Figure 4, the present invention provides a kind of position determined in colony by selection
The device 100 of point, this device 100 in order to implement the invention described above on the one hand the most really in grouping body by the site of selection
Method, device 100 includes: data input cell 110, is used for inputting data;Data outputting unit 120, is used for exporting data;
Processor 130, is used for performing machine-executable program, performs described machine-executable program and has included one aspect of the present invention
Or the method in any embodiment;Memory element 140, with described data input cell 110, data outputting unit 120 and place
Reason device 130 is connected, and is used for storing data, including described machine-executable program.It will be appreciated by those skilled in the art that
Described machine-executable program can be saved in storage medium, and alleged storage medium may include that read only memory, random
Memorizer, disk or CD etc..
According to one embodiment of present invention, as it is shown in figure 5, the present invention provides a kind of position determined in colony by selection
The system 1000 of point, this system can in order to implement the invention described above on the one hand or method in any embodiment whole or
Part steps, this system 1000 includes: sequencing data acquisition device 1100, in order to obtain the nucleic acid sequencing data of population sample,
Described population sample is from multiple individualities of species, optional, and described population sample is from the phase of a multiple individuality of species
With tissue or the same area of multiple individualities of species, described population sample can be divided into 2n according to n to desired indicator
One-level subpopulation, n is natural number;SNP detects device 1200, is connected, for base with described sequencing data acquisition device 1100
In described nucleic acid sequencing data, detecting to obtain colony's SNP data, described colony SNP data include multiple one-level subpopulation
SNP data;Purpose site determines device 1300, is connected, in order to based on described colony SNP with described SNP detection device 1200
The difference of the polymorphism of data, relatively different one-level subpopulations, to determine the SNP by selection, described is made by selection
SNP be the described site by selection.
Utilize the method in the invention described above any embodiment, device and/or system to determine in colony accurately to be selected
The site of effect.The method of the present invention and/or device, focus primarily upon the subgenomic transcription region of more general importance, it is possible to
Based on the colony's transcript data obtained, it is thus achieved that gene expression data, disclosing the gene expression rule of sample, this is beneficial to take off
Show the gene expression rule under the conditions of genetic background difference, be the further expansion to population selection scopes such as RAD, GBS.And
And, it is obtained in that again colony's SNP data, discloses group structure and population genetic evolution laws.The inventive method, device and/
Or system can resurvey sequence analysis process in order to specification colony transcript profile, reduce and analyze risk, it is possible to high efficiency, high-quality and height
Standard completes the analysis to colony's project.
Below in conjunction with accompanying drawing and concrete sample data embodiment to the determination of the present invention by the method in the site of selection, colony
Item analysis device and/or system are described in detail.It is exemplary by the embodiment being described with reference to the drawings, is only used for
Explain the present invention, and be not considered as limiting the invention.Except as otherwise explaining, relate in following example hands over the most especially
Reagent, sequence (joint, label and primer), software and the instrument treated is all conventional commercial product or increases income, such as, purchase
Buy the transcript profile library construction Kit of Illumina.
Embodiment one
Reference sequences, sequencing strategy, sample requirement and other points for attention:
I) reference sequences: require the genome reference sequences of useful better quality.
Ii) sequencing strategy: use PE91 (double end sequencings, it is thus achieved that multipair paired-end reads, the length of every reads is all
For 91bp) sequencing strategy, single sample reaches the standard of filtered data amount 4G.
Iii) sample should be from same species, the individuality with different genetic background.
Iv) for total research colony, it is recommended that 30 individualities and the above scale of construction.Meanwhile, all individualities related to can be according to certain
Plant index and be divided into two and plural subpopulation (being easy to variation analysis), and each subpopulation is preferably more than 10
Individual.
V) all samples are cultivated at identical conditions, then in identical tissue, position sampling.Reason is, sample
The hereditary difference (variable) of product has existed, and samples the most under the same conditions, and the difference expression gene obtained is only possible to from something lost
The angle passing difference goes to lay down a definition.Otherwise, the existence of multiple variablees, the reason that can cause differential expression is equivocal.Such as grind
Study carefully colony be divided into anti-saline and alkaline and do not resist saline and alkaline two classes.The saline of identical metering can be used being grown under equivalent environment
All individualities process, and are then sampled the tip of a root of special time (such as 1 hour) after processing.Follow-up qualification
Difference expression gene out then might reveal that these species resist saline and alkaline mechanism, because differential expression is due to genetic background
Difference causes.
For the analysis process of specification colony transcript profile weight sequencing project, reduce and analyze risk, to reach high efficiency, high-quality, height
The purpose of standard finished item, herein proposes a kind of groups transcript profile weight sequencing analysis method, specifically includes that
One, experiment flow
After extracting sample total serum IgE and using DNase I digestion DNA, raw with the enrichment with magnetic bead eucaryon with Oligo (dT)
Thing mRNA (if prokaryote, then enter next step after removing rRNA with test kit);Addition interrupts reagent and exists
In Thermomixer, mRNA is broken into short-movie section by thermophilic, with the mRNA after interrupting for templated synthesis one chain cDNA, so
Rear preparation two chain synthesis reaction system synthesizes two chain cDNA, and uses kits to reclaim sticky end reparation, cDNA
3' end adds base " A " jointing, then carries out clip size selection, finally carries out PCR amplification;The library built
With Agilent 2100Bioanalyzer and ABI StepOnePlus Real-Time PCRSystem quality inspection qualified after, use
Illumina HiSeqTM2000 or other sequenators check order.
Two, information analysis content
1) standard rna-Seq analyzes
Analyze including data filtering, quantitative gene expression, group difference gene identification and GO, KEGG Pathway enrichment thereof,
SNP calling and annotation etc..
2) analyses based on colony's SNP data
Prediction to the consensus sequence (consensus sequence) of single sample in analyzing based on standard rna-Seq, i.e. SNP
Identify the intermediate steps of (SNP calling), arrange the SNP data obtaining population level, for the analysis of following many aspects:
A, group structure analysis: include that phylogenetic tree construction, main constituent (PCA) are analyzed and STRUCTURE analyzes,
Three can reflect the structure of colony, but each analysis side emphasis is the most different.Phylogenetic tree construction lays particular emphasis on announcement
Evolutionary relationship between individual in population;Main constituent (PCA) analysis side overweights genetic background difference between announcement individual in population
Principal element;STRUCTURE analysis side overweights the genetic constitution to each individuality and compares, quantifies, and with diagram
Mode discloses the similarities and differences of genetic constitution between individuality.
B, detection are by the site of selection: selection (coming from artificial or natural) is generally (sub-in the differentiation of population
Group formation) during play very important effect.From the SNP data of subgroup, all sites can be counted not
With the difference (Fst) of polymorphism between subgroup, and verify the site of Fst significant difference.These sites are subject to as potential
The site of selection, it is possible to support study dies person recognizes the process of the selection being directed to some subgroup further.
Fst (Fixation index) is mainly used to evaluate the genome distance between colony and the difference of population, is to divide between tolerance population
One index of change degree, is developed in a kind of special circumstances of nineteen twenty-two application F-inspection by Sewall Wright.
FSTNull hypothesis be when colony does not break up, the frequency difference of the pleomorphism site inferior bit base in group and between group is
Inapparent.Calculate FSTMethod a lot, although circular is different, but basic theories is consistent, i.e. by Hudson
(1992) definition be given:
Wherein ΠBetweenRepresent from Liang Ge colony (Between), extract a sample respectively, partner, calculate this to sample
The difference of this SNP genotype, so can calculate the difference of all paired samples SNP genotype, finally average and be
ΠBetween。
ΠWithinRepresent from a colony (Within), extract 2 samples respectively, partner, calculate this to sample SNP
The difference of genotype, so can calculate the difference of all paired samples SNP genotype, finally average and be ΠWithin。
If You Liangge colony, Liang Ge colony can the most first calculate ΠWithin, then add up.
3) additional analysis based on gene expression data
A, cluster analysis, PCA analyze: based on gene expression data, can cluster the individuality in colony, PCA divides
Analysis, presents the difference on gene expression level between individuality and individuality.This result can be with SNP data construct system out
Grow tree and PCA analysis result is mutually confirmed, compared.
Compare between b, co-expression gene network struction and group: in various vital movements, multiple genes (co-expression genes)
Generally express synergistically, to realize some specific function under the conditions of a lot.Go out from the gene expression data of multiple Different Individual
Send out, the module of many co-expression genes can be constructed.Based on this, researcher can be analyzed: i) under given conditions,
Which co-expression gene module is playing effect (expressing in higher levels), and this is conducive to recognizing these specified conditions behinds
Gene expression rule;Ii) which co-expression gene module plays a role in the specific individuality of which (which), and this is conducive to solving
The biological function of analysis part co-expression gene module;Iii) between the co-expression gene module constructed more than can be with subpopulation
Compare.Difference between this higher level of co-expression gene module is the most individual, can reveal that out from routine
Gene differential expression data (assuming that separate between gene and gene, do not consider the interaction between them) in cannot
The new content embodied.
Above, with same species, multiple different genetic backgrounds individuality as object of study, by transcript profile (transcriptome)
Sample carries out high-flux sequence, the disposable subgenomic transcription region polymorphism data (colony obtaining this individually defined thing kind of groups level
And full genome/transcript expressing information, and then can reveal that the evolutionary relationship between (i) research individuality and genetic constitution are poor SNP)
Different, (ii) be the gene cluster of common evolutionary under specific selection, by the position of artificial/natural selection effect in (iii) subpopulation
The biologys such as functional module and the metabolic pathway on expressing with significant difference between point, and (iv) individuality or subpopulation
Problem.Resurveying sequence relative to the transcript profile of conventional a small amount of sample, the method also will provide colony's SNP data, and these data can be used
In disclosing the evolutionary relationship of each individuality in group structure, Swarm Evolution history, colony, and the potential position by selection
The biological questions such as point.Compared to population selection technology such as RAD, GBS, the survey region of the method concentrates on the most universal
The subgenomic transcription region of importance.Meanwhile, gene expression can be carried out quantitatively by the present invention, and this is beneficial to disclose the heredity back of the body
Gene expression rule under the conditions of scape difference, is the further expansion to population selection scopes such as RAD, GBS.
Embodiment two
Example introduces operating process step by step in detail below:
One, convenient transcript group is resurveyed sequence flow process
Different geographical includes the Qinling Mountains, Mount Min, Liangshan, Qionglai and the giant panda in phase ridge, the giant panda blood of acquisition or tissue samples
Number 34 altogether, wherein, from Liangshan be 2 sample number be GP37 and GP52 (being blood sample), come
It is GP14-19 and GP51 (being blood sample) from 7 sample number that have of Mount Min, has 8 from the Qinling Mountains
Sample number is respectively GP3-8 (blood sample), GP10 (tissue samples) and GP12 (blood sample), from Qionglai
There are 15 sample number to be respectively GP2, GP13, GP22-31, GP33 and GP35-36 (being blood sample), come
It is respectively GP38-39 (being blood sample) from 2 sample number that have in phase ridge.Sample transcript profile nucleic acid extraction, library
Build and order-checking is carried out with reference to preceding embodiment, it is thus achieved that each sample sequencing data.According to the difference of region, 34 samples are divided
It is 5 one-level subpopulations.
Complete data filtering, Quality Control, clean sequencing data (clean data) comparison, to genome reference sequences, is such as utilized
SOAP or BWA, compare according to its default setting, each sample is carried out SNP identification (call snp), by clean
Data comparison on gene set reference sequences, calculate each gene expression and carry out group difference expressing gene identify and GO,
KEGG pathway is enriched with analysis.Again by clean data comparison to genome reference sequences, such as utilize TopHat or
STAR compares, it was predicted that variable sheer and new transcript, and completes various statistical work, including original, filter after
Data volume statistics, reads mapping Information Statistics, genome coverage statistics, generation library randomness assessment figure etc..
Two, (Call) colony SNP and Swarm Evolution analysis based on colony SNP are identified
In the consensus information of genome reference sequences, (i.e. SOAPsnp exports each individual relative obtained from previous step
Cns file) set out, it is integrally formed colony's SNP data, this is all individual level, is and takes all individual specimen SNP
Union is colony's SNP data.Based on this colony SNP, carrying out Swarm Evolution analysis, Swarm Evolution analysis includes evolving
The structure of tree, principal component analysis, individual inheritance composition analysis etc..This requirements of process prepares some simple configuration files, explanation
As follows:
Individual.txt: sample (individual specimen) message file, every a line is the information of a sample, and often row 6 arranges, such as table
Shown in 1.
Table 1
Snp.lst: colony SNP (genotype) listed files, colony's SNP file format is as shown in table 2.
Table 2
First row | Chromosome numbers |
Secondary series | Allele position |
3rd row | The nucleotide in corresponding reference sequences site |
4th row | Order-checking sample genotype, separates with space, and order need to be corresponding with individual file |
Population.txt: carry out two community information of site selection analysis, first row is subgroup title, can be with individual
File is different, and secondary series is sample abbreviation ID, need to be present in individual file the 4th row.
* .gff: genome gff file, carries out determining during the selection analysis of site by selecting place, site gene, can not provide.
1) Call colony SNP
Utilizing SOAPsnp to detect the SNP of each sample, the SNP data integrating all single samples obtain colony's SNP data.
Specifically include:
First we take into full account and utilize published panda genomic information (Zhao S, et al.Whole-genome
sequencing of giant pandas provides insights into demographic history and local adaptation.Nat
Genet.45 (1): 67-71 (2013)), download the dbsnp that panda genome is corresponding from NCBI website, as the elder generation of SOAPsnp
Test probability, and according to the result of study determined at present, the prior probability arranging heterozygous sites SNP is 0.0010, and isozygoty site SNP
Prior probability be 0.0005.After above parameter is set, utilize SOAPsnp software by filtered data and panda reference gene
Group comparison, obtaining comparison result is CNS file.Owing to there is the region of some low order-checking degree of depth in each sample genome, at this
The file of the probability of comprehensive all sample genotype, utilizes method of maximum likelihood to integrate the data of all samples, and generation comprises all
The pseudogene group (Pseudo-genome) in each site of sample.The genotype of select probability maximum is as the consistent base of each sample
Because of type, go out high-quality SNPs by infomation detection such as genotype and the order-checking degree of depth.After obtaining the consensus sequence of each sample,
Result saves as colony's SNPs form, it is thus achieved that colony's SNP data.
2) Swarm Evolution analysis
Input colony SNP result, and based on colony SNP, integration is called multiple software and is carried out Swarm Evolution analysis, bag
Include Tree, PCA, Structure and Frappe to analyze, specific as follows.
Software names PopuStruct.pl, and relevant parameter illustrates such as table 3, it should be noted that colony's SNP file must be with
Individual file is corresponding.The Structure running software time is longer, if be pressed for time, it is proposed that first carry out group with Frappe
Fluid-structure analysis, obtains Preliminary Analysis Results.
Table 3
Parameter | Explanation |
-indi<s> | Each individual information in colony, individual order is consistent with colony SNP file, it is necessary to arrange. |
-list<s> | Colony's SNPs genotype listed files, it is necessary to arrange. |
-OutDir<s> | Outgoing route, gives tacit consent to current path. |
-prefix<s> | Output script prefix information, gives tacit consent to " Pop ". |
-Struct<y/n> | Whether carry out group structure analysis with Structure software, give tacit consent to " y " |
-Tree<y/n> | Whether constructing system tree, gives tacit consent to " y " |
-Frappe<y/n> | Whether carry out group structure analysis with Frappe software, give tacit consent to " y " |
-PCA<y/n> | Whether carry out principal component analysis, give tacit consent to " y " |
-queue<s> | Deliver task queue, give tacit consent to bc.q |
-project<s> | Deliver throwing task-P parameter value, give tacit consent to rdtest |
-help | Help information |
Output file (result)
I) Frappe destination file and Structure destination file, can be adjusted in conjunction with excel and map.Result such as Fig. 6 institute
Showing, Fig. 6 is the population genetic variations schematic diagram that Frappe speculates based on colony SNP, and in figure, every piece of separation represents a group
Body, abscissa represents a sample, and different spacing blocks represent K ancestors that are different or that differ greatly, analyze each strain
In genetic constitution, the ratio of each the imagination ancestors' composition being had.If corresponding two the different segmentation blocks of sample,
Then represent the intermediate varieties that this sample is probably between two subgroups.When K value obtains the biggest, the diversity between sample gets over quilt
Amplify, get the thinnest, certainly defining K value can be carried out according to actual result and which is got just can embody the structural relation of all samples completely.
In figure, K takes 2,3,4 and 5 respectively, it can be seen that K=3 colony will be divided into 3 subpopulations and substantially can completely embody
The structural relation of all samples.
Ii) tree destination file utilizes mega software to be adjusted, and result is as shown in Figure 7.Fig. 7 is to use based on colony SNP
The schematic diagram of the phylogenetic tree that adjacent method is inferred, in figure, branch's distance is the nearest, illustrates that between two branches, evolutionary relationship is the nearest.Right
Sample in same subgroup, it should display can well be grouped together or not far behind, can be illustrated between kind by this figure
Evolutionary relationship far and near.As can be seen from Figure 7, this colony is segmented into 3 subpopulations.
Iii) PCA analysis result, need to map with excel, and result is as shown in Figure 8.Fig. 8 is PCA based on colony SNP
The schematic diagram of analysis result, in figure, difform labelling represents the sample of different subgroup, and a labelling point represents a sample,
The transverse and longitudinal coordinate of point is the value of same sequential element in the first and second characteristic vectors that this sample is corresponding respectively, corresponding eigenvalue
Size represents the ratio that this main constituent is shared in whole relation, can be contrasted with the actual packet of sample by this figure, see
Go out sample packet quality.And then can see and want to reclassify to obtain new subgroup.
Three, detected by selection site
In conjunction with the embodiments one and the structure of colony's SNP data of above-mentioned acquisition, derivation formula is as follows:
X in above formulai jIt it is the frequency of SNP site i inferior bit base (the second base) in subpopulation j;And ni jIt it is SNP position
Point i physical location on chromosome in subpopulation j;njIt it is then the subpopulation j summation for the SNP site number of comparative analysis.
Wherein variable j is according to above-mentioned group structure analysis result, is newly taken as 3, and variable i substitutes into the last SNP position judged.
Above-mentioned process of calculation analysis, based on colony SNP, calls that may be present between multiple software detection subpopulation selection
The site of effect, named SnpSelect.pl, the software approach of use includes: Arlequin, BayesScan and Datacal tri-
Kind, each software correspondence parameter declaration, including the setting of threshold value, refers to table 4.
perl SnpSelect.pl<snp.list><individual><2population.txt>[options];Wherein 2population file
Referring to participate in two sub-population information of site selection analysis, concrete form is shown in explanation.
Table 4
Output file
I) Arlequin analysis result, as shown in Figure 9.Fig. 9 shows that Arlequin program is made by selection based on colony SNP detection
By the analysis result in site.Transverse axis is represented to the anchor point heterozygosity at population level, and the longitudinal axis represents between subgroup to anchor point
On heterozygosity difference value (Fst).Point during upper part encloses represents the site (q < 0.01 or q < 0.05) by orthoselection,
Point during lower part encloses represents the site (q < 0.01 or q < 0.05) by Balancing selection.
Ii) Global FST test analysis result, as shown in Figure 10.Figure 10 shows that Global FST test program is based on colony SNP
Detect the result by selection site.Transverse axis is represented to the anchor point heterozygosity at population level, the longitudinal axis represent between subgroup
To the heterozygosity difference value (Fst) on anchor point.The corresponding site of front 1%Fst value is considered as candidate locus, i.e. more than horizontal line
Point be the site by selection detected.
Iii) BayeScan analysis result, as shown in figure 11.Figure 11 shows that BayeScan program is subject to based on colony SNP detection
The result in selection site.Transverse axis is represented to the anchor point heterozygosity at population level, and the longitudinal axis represents to the inspection of anchor point
The value (with 10 as the truth of a matter) that q value (q value) is taken the logarithm.The site of q value < 0.1 is considered as that candidate is by selection position
Point, being i.e. positioned on figure the point on the right of vertical line is that candidate is by selection site.
In conjunction with Fig. 9-Figure 11, when point selection in place is analyzed, it is thus achieved that have above at least two method support is judged to final being selected
Action site.
In the description of this specification, reference term " embodiment ", " some embodiments ", " example ", " concrete
Example " or the description of " some examples " etc. means to combine this embodiment or example describes specific features, structure, material or
Person's feature is contained at least one embodiment or the example of the present invention.In this manual, the schematic representation to above-mentioned term
It is not necessarily referring to identical embodiment or example.And, the specific features of description, structure, material or feature can be in office
What one or more embodiments or example combine in an appropriate manner.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: without departing from this
These embodiments can be carried out multiple change in the case of the principle of invention and objective, revise, replace and modification, the present invention's
Scope is limited by claim and equivalent thereof.
Claims (11)
1. the method determining in colony site by selection, described selection includes that artificial selection acts on and natural
At least one of selection, it is characterised in that comprise the following steps:
(1) obtaining the nucleic acid sequencing data of population sample, described population sample is from multiple individualities of species, optional,
Described population sample is from the homologue of a multiple individuality of species or the same area of multiple individualities of species, described
Population sample can be divided into 2n one-level subpopulation according to n to desired indicator, and n is natural number;
(2) based on the nucleic acid sequencing data in (1), detect to obtain colony's SNP data, described colony SNP packet
Include multiple one-level subpopulation SNP data;
(3) based on the colony's SNP data in (2), relatively the difference of the polymorphism of different one-level subpopulations, is subject to determine
The SNP of selection, the described SNP by selection are the described site by selection.
2. the method for claim 1, it is characterised in that after carrying out before (3) or carrying out (3), including:
Based on described colony SNP data, described colony is carried out group structure analysis, it is thus achieved that group structure analysis result;
Optional, carry out described group structure analysis and include that phylogenetic tree construction, principal component analysis and STRUCTURE analyze
At least one in.
3. the method for claim 2, it is characterised in that based on described group structure analysis result, described population sample is carried out
Divide, substitute described one-level subpopulation with the division result obtained.
4. claim 1-3 either method, it is characterised in that the nucleic acid sequencing data of described population sample are by forming population sample
Each individual specimen nucleic acid sequencing data composition, the nucleic acid sequencing data of each individual specimen be no less than 4G.
5. claim 1-3 either method, it is characterised in that described nucleic acid sequencing data are transcript sequencing data.
6. claim 1-3 either method, it is characterised in that described desired indicator is geographical and/or biological character is relevant.
7. claim 1-3 either method, it is characterised in that each described one-level subpopulation includes at least 10 individualities.
8. claim 1-3 either method, it is characterised in that described one-level subpopulation includes at least one two grades of subpopulation;
Optional, described two grades of subpopulations include at least 10 individualities.
9. claim 1-3 either method, it is characterised in that described based on colony's SNP data, relatively different one-level subgroups
The difference of body polymorphism, to determine the SNP by selection, including:
Based on colony's SNP data, utilize the identical SNP position in the more described different one-level subpopulations of at least two method of inspection
The difference of the heterozygosity of point, is defined as the SNP by selection by the SNP site obtaining at least two method of inspection support;
Optional, the described method of inspection includes F statistic, molecular variant analysis and multilayer protection.
10. the method for claim 9, it is characterised in that described utilize at least two method of inspection to carry out described in comparison different one-levels
The difference of the heterozygosity of the identical SNP site in subpopulation, determines the SNP site obtaining at least two method of inspection support
For the SNP by selection, including:
Calculate described SNP site heterozygosity difference value in different one-level subpopulations, by heterozygosity difference value not less than threshold value
SNP site is defined as the site by selection.
The device in 11. 1 kinds of sites determined in colony by selection, it is characterised in that including:
Data input cell, is used for inputting data;
Data outputting unit, is used for exporting data;
Processor, is used for performing machine-executable program, performs described machine-executable program and has included that claim 1-10 is appointed
The method of one;
Memory element, is connected with described data input cell, data outputting unit and processor, is used for storing data, wherein wraps
Include described machine-executable program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510358145.3A CN106326689A (en) | 2015-06-25 | 2015-06-25 | Method and device for determining site subject to selection in colony |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510358145.3A CN106326689A (en) | 2015-06-25 | 2015-06-25 | Method and device for determining site subject to selection in colony |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106326689A true CN106326689A (en) | 2017-01-11 |
Family
ID=57729366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510358145.3A Pending CN106326689A (en) | 2015-06-25 | 2015-06-25 | Method and device for determining site subject to selection in colony |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106326689A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109486975A (en) * | 2018-12-29 | 2019-03-19 | 深圳华大生命科学研究院 | The molecular typing methods of salmonella typhimurium and the combination of special SNP site |
CN109545278A (en) * | 2018-12-18 | 2019-03-29 | 北京林业大学 | A kind of method of plant identification lncRNA and interaction of genes |
CN111462812A (en) * | 2020-03-11 | 2020-07-28 | 西北大学 | Multi-target phylogenetic tree construction method based on feature hierarchy |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101914628A (en) * | 2010-09-02 | 2010-12-15 | 深圳华大基因科技有限公司 | Method and system for detecting polymorphism locus of genome target region |
CN102952854A (en) * | 2011-08-25 | 2013-03-06 | 深圳华大基因科技有限公司 | Single cell sorting and screening method and device thereof |
CN103898199A (en) * | 2012-12-27 | 2014-07-02 | 上海天昊生物科技有限公司 | High-flux nucleic acid analysis method and application thereof |
CN103911380A (en) * | 2013-02-06 | 2014-07-09 | 深圳华大基因科技有限公司 | EPAS1 gene mutant and application thereof |
-
2015
- 2015-06-25 CN CN201510358145.3A patent/CN106326689A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101914628A (en) * | 2010-09-02 | 2010-12-15 | 深圳华大基因科技有限公司 | Method and system for detecting polymorphism locus of genome target region |
CN102952854A (en) * | 2011-08-25 | 2013-03-06 | 深圳华大基因科技有限公司 | Single cell sorting and screening method and device thereof |
CN103898199A (en) * | 2012-12-27 | 2014-07-02 | 上海天昊生物科技有限公司 | High-flux nucleic acid analysis method and application thereof |
CN103911380A (en) * | 2013-02-06 | 2014-07-09 | 深圳华大基因科技有限公司 | EPAS1 gene mutant and application thereof |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109545278A (en) * | 2018-12-18 | 2019-03-29 | 北京林业大学 | A kind of method of plant identification lncRNA and interaction of genes |
CN109486975A (en) * | 2018-12-29 | 2019-03-19 | 深圳华大生命科学研究院 | The molecular typing methods of salmonella typhimurium and the combination of special SNP site |
CN111462812A (en) * | 2020-03-11 | 2020-07-28 | 西北大学 | Multi-target phylogenetic tree construction method based on feature hierarchy |
CN111462812B (en) * | 2020-03-11 | 2023-03-24 | 西北大学 | Multi-target phylogenetic tree construction method based on feature hierarchy |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sheng et al. | Multi-perspective quality control of Illumina RNA sequencing data analysis | |
Vincent et al. | Next-generation sequencing (NGS) in the microbiological world: How to make the most of your money | |
CN111933218B (en) | Optimized metagenome binding method for analyzing microbial community | |
Guichoux et al. | Outlier loci highlight the direction of introgression in oaks | |
Clark et al. | Performance comparison of exome DNA sequencing technologies | |
Wolf | Principles of transcriptome analysis and gene expression quantification: an RNA‐seq tutorial | |
Emerson et al. | The genetic basis of evolutionary change in gene expression levels | |
Wang et al. | Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium | |
Oliveira et al. | High-throughput sequencing for algal systematics | |
Hejase et al. | From summary statistics to gene trees: methods for inferring positive selection | |
Zhou et al. | A chronological atlas of natural selection in the human genome during the past half-million years | |
EP3497241A1 (en) | Ultra-low coverage genome sequencing and uses thereof | |
CN106326689A (en) | Method and device for determining site subject to selection in colony | |
Wallace et al. | Molecular population genetics of inversion breakpoint regions in Drosophila pseudoobscura | |
CN109461473B (en) | Method and device for acquiring concentration of free DNA of fetus | |
US11473133B2 (en) | Methods for validation of microbiome sequence processing and differential abundance analyses via multiple bespoke spike-in mixtures | |
Hollister et al. | Bioinformation and’omic approaches for characterization of environmental microorganisms | |
Ye et al. | High-resolution metagenomics of human gut microbiota generated by nanopore and illumina hybrid metagenome assembly | |
Goswami et al. | RNA-Seq for revealing the function of the transcriptome | |
Hu et al. | Inferring species compositions of complex fungal communities from long-and short-read sequence data | |
Holtgräwe et al. | A partially phase-separated genome sequence assembly of the Vitis rootstock ‘Börner’(Vitis riparia× Vitis cinerea) and its exploitation for marker development and targeted mapping | |
Rossato et al. | CRISPR-Cas9-based repeat depletion for high-throughput genotyping of complex plant genomes | |
Yuan et al. | An organism-wide ATAC-seq peak catalog for the bovine and its use to identify regulatory variants | |
Zhang et al. | NyuWa Genome Resource: Deep Whole Genome Sequencing Based Chinese Population Variation Profile and Reference Panel | |
Kielpinski et al. | Reproducible analysis of sequencing-based RNA structure probing data with user-friendly tools |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170111 |