CN103914631A - Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip - Google Patents

Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip Download PDF

Info

Publication number
CN103914631A
CN103914631A CN201410067189.6A CN201410067189A CN103914631A CN 103914631 A CN103914631 A CN 103914631A CN 201410067189 A CN201410067189 A CN 201410067189A CN 103914631 A CN103914631 A CN 103914631A
Authority
CN
China
Prior art keywords
breeding value
information
pedigree
snp
genome breeding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410067189.6A
Other languages
Chinese (zh)
Inventor
丁向东
张勤
李秀金
王胜
张哲�
王重龙
黄菊
李乐义
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Agricultural University
Original Assignee
China Agricultural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Agricultural University filed Critical China Agricultural University
Priority to CN201410067189.6A priority Critical patent/CN103914631A/en
Publication of CN103914631A publication Critical patent/CN103914631A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention belongs to the technical field of bioinformatics and provides a comprehensive genomic estimated breeding value (GEBV) method on the basis of a single nucleotide polymorphism (SNP) chip. The method includes that (1), data files are obtained, and pre-processing is performed on the data files to obtain reliable pre-processing data; (2), genomic breeding value estimation is performed on the pre-processed data obtained in the step (1), and a genomic breeding value is estimated by a genomic best linear unbiased prediction (GBLUP) method or the genomic breeding value is estimated by a bayes method to obtain a single character individual genomic breeding value; (3), a comprehensive genomic breeding value is estimated, the step (2) is repeated to obtain a multi-character individual genomic breeding value, and the comprehensive genomic breeding value is calculated. The method is combined with SNP chip information, pedigree information and phenotype information, animal breeding selection can be determined, the application of genomic selection to the field of domestic animal breeding can be promoted, and advantages of the genomic selection in the field of animal breeding can be excellently utilized.

Description

A kind of comprehensive genome breeding value method of estimation and application based on SNP chip
Technical field
The invention belongs to bioinformatics technique field, be specifically related to a kind of comprehensive genome breeding value method of estimation and application based on SNP chip.
Background technology
Heredity theory and computing machine are widely used in the essential characteristic that animal breeding is modern animal breeding.Since the eighties in 20th century, the livestock and poultry seed selection and selective pairing carrying out based on breeding value becomes the main method of Animal Breeding, and breeding value estimates to become the core content of Animal Genetics.The essence of breeding value method of estimation is exactly to utilize individuality itself and (or) relatives' characteristics record, carries out suitable weighting and improves the accuracy of selection.
Along with the development of molecular biology and computer technology, a large amount of Molecular Marker Information are found, breeding scholar starts to explore genomic information is added in the Genetic Evaluation of livestock and poultry, thereby realize individual early stage selection and reduce generation inteval and breeding cost, can also realize low-heritability traits and threshold character are selected to effect preferably simultaneously.Current, the model of applying gene group information mainly comprises two kinds, and one is mark auxiliary selection method (MAS); Another kind is genome system of selection (GS).Mark auxiliary selection method is to have applied the supplementary of portion gene group information as polygenes information, and genome selection is the expansion of marker assisted selection, it is attempted to apply whole genomic information animal individual is carried out to Genetic Evaluation, thereby realizes the subdivision to true breeding value.
The proposition of genome system of selection has solved the problem that marker assisted selection faces.Genome selecting party ratio juris is the effect value of the whole genome high density marker figure spectrum information of application and the each mark of phenotype information estimator or chromosome segment, thus adding and obtaining genome estimated breeding value by all effect value.The basic assumption that genome is selected is, at least one mark in each QTL that affects quantitative character and the full genome mark of high density collection of illustrative plates is in linkage disequilibrium (linkage disequilibrium, LD) state.Therefore, genome is selected to trace back to all QTL of impact, explains the less shortcoming of genetic variance thereby overcome mark in conventional tag assisted Selection, realizes the Accurate Prediction to breeding value.
At present, genome is selected to be widely used in various countries' livestock and poultry molecular breeding, and especially, aspect milk cow, pig and chicken breeding, China has also carried out Primary Study and application in this field.Different from the only animal Genetic Evaluation based on pedigree information of routine, genome selects to relate to the many-side such as processing, the estimation of animal individual genome breeding value and the aggregate breeding value calculating that comprises multiple proterties of chip data, it utilizes, and information is more, calculated amount is larger, therefore need an efficient platform to carry out the integration of the information such as molecular data, phenotypic data, pedigree information, the calculating of genome breeding value, to realize robotization and the systematization of livestock and poultry molecular breeding.
Summary of the invention
Breeding value: the kind of breeding stock, with being worth, calls the additive effect of gene summation that determines a certain quantitative character the individual breeding value of a certain proterties in Quantitative Genetics.
Genome breeding value: the cumulative breeding value obtaining of SNP effect of individual whole genome.
Aggregate breeding value: according to difference of importance in various trait breeding and economically, the breeding value weighting of multiple various traits is merged, can be expressed as a ifor the breeding value of certain individual characters i, w ifor the weight of proterties i.
Reference group: in colony, individuality has SNP chip gene type information and phenotypic data information, can estimate whole genome SNP marker effect according to this reference group, and then the genome breeding value of predicting candidate colony individuality.
Candidate colony: formed by the individuality only with SNP chip gene type information.
For prior art deficiency, the object of this invention is to provide a kind of comprehensive genome breeding value method of estimation and application based on SNP chip.
For achieving the above object, the invention provides a kind of comprehensive genome breeding value method of estimation based on SNP chip, comprise the following steps:
Step S1, obtains data file, and described data file is carried out to pre-service, obtains reliable preprocessed data;
Step S2, the preprocessed data that step S1 is obtained carries out the estimation of genome breeding value, utilizes GBLUP method to estimate genome breeding value, or utilizes bayes method to estimate genome breeding value, obtains the genes of individuals group breeding value of single proterties;
Step S3, comprehensive genome breeding value is estimated, repeating step S2 obtains the genes of individuals group breeding value of multiple proterties, calculates comprehensive genome breeding value.
Preferably, described step S1 comprises:
Step S11, obtains SNP chip data and pre-service, comprises that SNP chip data reads, missing gene type is filled;
Step S12, obtains pedigree file and pre-service, upwards reviews father and mother's pedigree 5-10 generation;
Step S13, obtains phenotypic data and pre-service, individual phenotypic number in pedigree file described in screening step S12.
Preferred, described step S1 specifically comprises:
Step S11, obtains SNP chip data, and adopts compressed format storage file to save hard drive space; Described missing gene type is filled and is utilized Beagle program in chip gene type, to exist SNP mark or the individuality of disappearance to fill, and improves chip gene type and detects quality;
Step S12, according to SNP chip data individuality in described step S11, from pedigree file, screening has the individuality of SNP chip information, upwards review father and mother's pedigree information 5-10 generation, according to SNP information, pedigree file father and mother and offspring's sibship are carried out paternity test and adjust original pedigree file according to paternity test result, when original pedigree file parent child relationship and paternity test result are when inconsistent, pedigree file is by the layout again of paternity test result, wherein, pedigree file including individuality, female number, father's number three field informations.
Preferably, in step S11, described missing gene type also comprises genotype quality control after filling.
Preferably, the quality-controlling parameters of described genotype quality control is each SNP mark recall rate and minimum gene frequency, Hardy-Weinberg balance check, individual recall rate.
Preferably, in step S2,
The described GBLUP method of utilizing is estimated genome breeding value, and the selection of data is divided into only Select gene group information or Select gene group information and only has pedigree information or Select gene group information and the pedigree information of SNP genotype individuality;
The described bayes method that utilizes is estimated genome breeding value, and data selection is genomic information only, estimates the effect of each SNP mark in SNP chip by Markov chain Monte-Carlo algorithm.
Preferably, data only Select gene group information utilize GBLUP method to estimate genome breeding value, according to SNP chip data file and phenotypic data file, set up the individual sibship matrix-G battle array based on molecular information between reference group and candidate colony, and invert, solve the genome breeding value of candidate colony by Mixed model mixed;
Data selection genomic information and residue polygenes information utilize GBLUP method to estimate genome breeding value, set up the individual sibship matrix-G battle array based on molecular information between reference group and candidate colony, and set up the genetic connection matrix-A battle array based on pedigree information between reference group and candidate colony individuality, and G battle array and A battle array are inverted, solve the genome breeding value of candidate colony by Mixed model mixed;
Data selection genomic information and pedigree information utilize GBLUP method to estimate genome breeding value, according to reference group and candidate's community information, increase and have phenotypic data information but enter reference group without the individuality of genotype information, expand reference group's scale, set up sibship matrix-H battle array of reference group and candidate's group relation, and invert, solve the genome breeding value of candidate colony by Mixed model mixed.
Preferably, in step S3, described comprehensive genome breeding value estimates at two kinds of methods:
(1) do not consider pedigree index (parents' breeding value mean value), using the economic weight of each proterties as weight, weighting generates comprehensive genome breeding value, for individual choice;
Or, (2) consider pedigree index, first to single proterties, individual system spectrum index and its genome breeding value weighting are merged into a new value as the final genome breeding value of this proterties, merging weight is respectively the reliability of this individual system spectrum index and genome breeding value, obtain after the genome breeding value that all proterties are new, according to the comprehensive genome breeding value of (1) described calculating.
Another object of the present invention is to provide a kind of comprehensive genome breeding value method of estimation based on SNP chip in the application aspect animal breeding.
Beneficial effect of the present invention:
1, this method has been integrated SNP chip information, pedigree information, phenotype information, can the breeding of animal be selected and remain and be judged according to the SNP information of animal, has accomplished robotization and the systematization of Animal molecular breeding;
2, this method utilization scientific language Fortran writes and forms, and can adopt multithreading to calculate, and can accelerate to calculate, and shortens computing time, is adapted at carrying out computing under Linux and Windows system;
3, this method can advance genome to be chosen in the application in domestic animal breeding field, can bring into play better genome and be chosen in the advantage in animal breeding field.
Brief description of the drawings
The process flow diagram of the inventive method in Fig. 1 embodiment 1;
In Fig. 2 embodiment 1, GBLUP class methods build relational matrix process flow diagram;
The process flow diagram of the inventive method in Fig. 3 embodiment 3.
Embodiment
Following examples are used for illustrating the present invention, but are not used for limiting the scope of the invention.
Embodiment 1
Experimental data is 5439 Chinese holstein cattle cows, be born between 2004-2012, all cows have carried out Illunima50K SNP chip (containing 54001SNP mark) genotype detection, and 5 milk production trait outputs of milk, Milk protein yield, protein ratio, butterfat production, butterfat percnetage are carried out to traditional breeding method value and estimated, referring to Fig. 1, comprise the following steps:
Step S1, obtains data file, and described data file is edited to pre-service, obtains reliable preprocessed data;
Step S11, read SNP chip data file, first carry out the filling of missing gene type by Beagle program, then choose minimum gene frequency (MAF) 0.01 of SNP as quality control standard, reject minimum gene frequency (MAF) lower than 0.01 and the SNP of chromosome position the unknown, final 47160SNP is for analyzing;
Step S12, according to 5439 cow pedigree informations, upwards review 10 generation data, comprise 130852 individualities, build for A battle array;
Step S13, in selecting step S12,5 milk production trait breeding values of 130852 individualities, as phenotype, generate final phenotypic data file.
Step S2, the preprocessed data that step S1 is obtained carries out the estimation of genome breeding value, utilizes GBLUP method to estimate genome breeding value, obtains the genes of individuals group breeding value of single proterties.
Selected reference group and candidate colony: 5439 cows are divided into two parts, and the cow of 4455 births before 2008 is as with reference to colony, the cow of 984 2008 and 2008 rear births is as candidate colony.
Selection utilizes the GBLUP method of genomic information and pedigree information, as shown in Figure 2, specifically comprises the following steps:
1. there is the cow of genotype information build G battle array and invert according to 5439.
Wherein, according to SNP information architecture G battle array in tab file.
G = z z ′ 2 Σ p i ( 1 - p i )
Two, each SNP site allele is encoded with 1,2, p irepresent the 2nd allelic frequency of i SNP (calculate by sample), the line number of-z battle array represents number of individuals, and columns represents SNP number of sites used, for each element, if homozygote 11 is 0-2p i; Homozygote 22 is 2-2p i; Heterozygote 12 or 21 is 1-2p i.
130852 the individual A of structure battle arrays of 2. reviewing according to pedigree in step S12, and invert.22106 cows that do not carry out genotype detection choosing wherein add reference group.
Wherein, build the method for genetic connection matrix A battle array according to pedigree information, each element building in A battle array adopts following recursion formula to calculate:
Wherein, s i(s j) and d i(d j) be father and mother of individual i (j).
The feature of pedigree file: all individualities are listed as into three lists by individual number, father number with mother number, and data file it should be noted that
A: should comprise all at father and female individuality of listing existing mistake in individual row;
B: should ensure that offspring will not appear at its father and mother in individual row before, generally can sort by the date of birth, first birth front;
C: the individual natural number of using is since 1 serial number;
D: according to A -1and G -1, calculate the H battle array that comprises and genotype individuality individual without genotype and invert.
Be On-step Blending, consider the individuality without genotype information simultaneously, comprehensive utilization pedigree information and genomic information, new matrix is called as H matrix, that is:
H = H 11 H 12 H 21 H 22 = A 11 + A 12 A 22 - 1 ( G - A 22 ) A 22 - 1 A 21 A 12 A 22 - 1 G GA 22 - 1 A 21 G
Mixed model mixed solves the inverse matrix that need to use H battle array, if calculate more loaded down with trivial details while directly first asking H battle array.Can skip the complicated processes of asking H battle array, directly invert by following formula, that is:
H - 1 = A - 1 + 0 0 0 G - 1 - A 22 - 1
Build H -1g in formula -1have two methods, a method is utilized G battle array, and another method is utilized G a.
4. set up mixture model equation, estimate 5 milk production trait genome breeding values of candidate colony, Mixed model mixed can be expressed as following form:
(1) if e~N is (0, I σ e 2) time,
1 ′ 1 1 ′ Z Z ′ 1 Z ′ Z + A - 1 k μ α = 1 ′ y Z ′ y
(2) if e~N is (0, R σ e 2) time,
1 ′ R - 1 1 1 ′ R - 1 Z Z ′ R - 1 1 Z ′ R - 1 Z + A - 1 k μ α = 1 ′ R - 1 y Z ′ R - 1 y
Wherein, can pass through iterative genes of individuals group breeding value.
Step S3, comprehensive genome breeding value is estimated.
Consider that butterfat production, Milk protein yield are affected by the output of milk, butterfat percnetage and protein ratio mainly, therefore weigh the aggregate breeding value of milk production trait and mainly consider the output of milk, butterfat percnetage and three proterties of protein ratio, the output of milk has reflected the milking capacity of milk cow, butterfat percnetage and protein ratio reflect milk quality, and three proterties can be evaluated the value of milk cow aspect milk production trait together.The genome breeding value (GEBV) of for example certain the cow output of milk, butterfat percnetage and protein ratio is respectively 114.7737,0.06257 and 0.09157.Do not consider the pedigree index of this cow, the weight (according to its Economic Importance) of three proterties is respectively 0.09,93.75,312.50, and the comprehensive genome breeding value of milk production trait is 114.7737*0.09+0.06257*93.75+0.09157*312.50=44.81.In like manner, other individual aggregate breeding values can obtain successively, and can be used for individual choice.
Embodiment 2 genome breeding values are estimated accuracy
For the efficiency of the practice of inspection the inventive method, using in embodiment 1,984 5 milk production trait genome breeding value accuracys of candidate colony cow (individual traditional breeding method value is relevant to genome breeding value) are as test stone, the higher explanation of accuracy calculating effect of the present invention is better.
Table 1. genome breeding value is estimated accuracy comparing result
Referring to table 1, rely on the pedigree index (parents' breeding value mean value) of parent information with tradition compared with, genome breeding value accuracy increase rate is at 13%-30%, illustrate that carrying out the individuality accuracy of choosing seeds by genome breeding value improves greatly.Because only relying on SNP chip gene type information, genes of individuals group breeding value in candidate colony can obtain simultaneously, and SNP chip gene type information is only extracted DNA from any tissue according to individuality, can realize in early days animal by high flux genotype detection technology, therefore genome select can be used for animal individual early stage select and accuracy than only selecting higher according to parent information.
Embodiment 3 emulated datas are calculated genome breeding value accuracy
Emulated data is to be utilized an outbreeding colony of LDSO software simulation by QTL-MAS Workshop in 2011, by simulating 1000 generations of historical group, per generations 1000 individuality, in 30 generations of the group present age, in per generations 150, are individual and build.Analyzing the data that use is last codes or datas, totally 20 male animal familys, and each male animal and 100 dam mating, each dam produces 15 offsprings.9990 SNP marks are evenly distributed on 5 chromosomes, and every chromosome length is 1 Morgan (Morgen), is uniform-distribution with 1998 SNP marks.
The concrete data message that emulated data provides is: pedigree file, and SNP tab file, and phenotype file, wherein phenotype file is that in each 15 full sibs offsprings, stochastic simulation is 10 individual phenotypic numbers wherein, and other 5 individualities are checking individuality.3000 offsprings' true breeding value is provided simultaneously.
To this emulated data, use this method, experimental procedure is with embodiment 1, wherein, step S2, the preprocessed data that step S1 is obtained carries out the estimation of genome breeding value, except utilizing GBLUP method to estimate genome breeding value, also utilize bayes method to estimate genome breeding value, obtain the genes of individuals group breeding value of single proterties.Bayes method is divided into again following 3 kinds of methods:
1, utilize BayesA method to estimate genome breeding value
The prior distribution hypothesis of BayesA method:
1. the effect gk that is each SNP has separately different effect variances
2. the effect variance obedience degree of freedom that is SNP is v, and the contrary card side that scale parameter is S distributes;
3. the effect variance obedience degree of freedom that is residual error is-2, and the contrary card side that scale parameter is 0 distributes;
4. P (b) ∝ constant, fixed effect and colony's average are all obeyed and are uniformly distributed.
Likelihood function P ( y | g , b , σ e 2 ) ∝ ( σ e 2 ) - n 2 exp ( - 1 2 σ e 2 Σ i = 1 n ( y i - Σ j = 1 f x ij b j - Σ k = 1 q z ik g k ) 2 ) ,
So by joint distribution
P ( g , b , σ g 2 , σ e 2 | y ) = P ( y | g , b , σ e 2 ) P ( g , b , σ g 2 , σ e 2 ) = P ( y | g , b , σ e 2 ) P ( g | σ g 2 ) P ( σ g 2 ) P ( b ) P ( σ e 2 )
The full terms posteriority that obtains each variable distributes, then by after the full terms of each variable
Testing distributes carries out Gibbs sampling, obtains the estimated value of each variable.
The process of Gibbs sampling:
The 1st step, initialization
Give all unknown parameter initializes θ ( 0 ) = [ b 1 ( 0 ) , . . . , b f ( 0 ) , g 1 ( 0 ) , . . . , g q 0 , σ g 1 2 ( 0 ) , . . . , σ g q 2 ( 0 ) , σ e 2 ( 0 ) ] .
The 2nd step, upgrades
From full terms posteriority distribution χ -2(n-2, and (y-Xb-Zg) ' (y-Xb-Zg)/(n-2)) sampling.
The 3rd step, upgrades b j
From b jfull terms posteriority distribute
N ( ( X ′ j X j ) - 1 ( X ′ j y - X ′ j X - j b - j - X ′ j Zg ) , ( X ′ j X j ) - 1 σ e 2 ) Sampling.
The 4th step, upgrades
From full terms posteriority distribution χ -2(v+1, (vS+g' kg k)/(v+1)) sampling.
The 5th step, upgrades g k
From g kfull terms posteriority distribute
N ( ( z k ′ z k + σ e 2 / σ g 2 ) - 1 ( z k ′ y - z k ′ Xb - z k ′ z - k g - k ) , ( z k ′ z k + σ e 2 / σ g 2 ) - 1 σ e 2 ) Sampling.
The 6th step, repeats 2-5 step
Until convergence reaches stationary distribution, and obtain enough samples.
2, utilize BayesB method to estimate genome breeding value
The important difference of BayesB and BayesA is that the hypothesis of SNP effect is different, and BayesA supposes that all SNP have effect, and the effect variance of SNP is different; And the hypothesis of BayesB is, most SNP does not have effect value, only has minority SNP to have effect and has effect variance separately.Right a priori assumption be: under the probability of π, under the probability of 1-π, and π is artificial setting, represent that effect is 0 or there is no the SNP mark ratio of effect, is set as 0.95 conventionally.The MCMC sampling of application BayesB, except right adopt Metropolis-Hasting (MH) sampling, instead of beyond simply sampling obtains from contrary card side distributes, other parts are identical with the Gibbs sampling of BayesA.
3, utilize BayesCpi method to estimate genome breeding value
BayesCpi proposes for the deficiency of BayesA and BayesB.Compare with BayesB, BayesCpi has 2 significantly differences, and first, BayesB artificially sets π value, and BayesCpi is by the worthwhile π unknown parameter of doing, and is inferred and is obtained by data message and prior imformation.Secondly, aspect SNP effect variance a priori assumption, under the probability of 1-π, BayesB hypothesis SNP has effect and effect variance different separately, and BayesCpi hypothesis SNP has effect, and has identical effect variance.In theory, BayesCpi is more excellent, but under different situations, and BayesB and BayesA effect and its approach.
Choose 1000 offsprings' of GBLUP method and Bayes's class methods prediction genome breeding value, and with the related coefficient of genome breeding value breeding value true with it as accuracy, related coefficient is higher, illustrates that the estimation of genome breeding value is more accurate.Genome breeding value is returned true breeding value simultaneously, the unbiasedness that regression coefficient measurement genome breeding value is estimated, regression coefficient more approaches 1.0, illustrates that the estimation of genome breeding value is more without inclined to one side.
Table 21000 offspring's genome breeding value is estimated accuracy and unbiasedness
Table 2 is three kinds of bayes methods (BayesA, BayesB, BayesCpi) and GBLUP method estimation genome breeding value accuracy and the unbiasedness utilized of the present invention, the accuracy of three kinds of bayes methods is all more than 0.91, three kinds of methods are all applicable to the analysis of these data, and accuracy and unbiasedness almost approach.Table 2 shows that the genome breeding value that three kinds of bayes methods are estimated is true breeding value substantially simultaneously, estimate accuracy 0.65 higher than traditional breeding method value, although GBLUP is lower slightly, because it has only utilized genomic information, when pedigree information being considered after together consideration, its accuracy is increased to 0.956, behaves oneself best.The unbiasedness of the whole bag of tricks all approaches 1.0 simultaneously, illustrates compared with true genome, and genome breeding value is not almost over-evaluated or underestimated to this method.Prove the reliability of this method.
Embodiment 4 applies
Chinese Holstein is the main dairy bread of China, extensively raises in all parts of the country, and our daily milk of drinking and milk powder are mainly provided by this kind, and therefore seed selection good quality and high output milk cow becomes China's cattle breeding core.But traditional breeding system cost is very high, cause cattle breeding benefit not high.Along be selected to a study hotspot in international cattle breeding field taking SNP chip as basic genome, China, from 2008, starts Chinese Holstein Niu Jinhang genome to select.End 2013, Chinese holstein cattle genome selects reference group to be made up of 6000 cows and 400 bulls.From 2012,1224 young bulls of 28 bull stations in the whole nation having been carried out to the full genome chip of high density detects, and utilize the inventive method to carry out genome genetic evaluation to these young bulls, 3 milk production traits such as the output of milk, protein ratio, butterfat percnetage are estimated respectively, the genome breeding value of the proterties such as build, mammary system, limb hoof, somatic number, and according to following formula
GPI 1 = 20 × 30 × Milk 459 + 15 × Fatpct 0.16 + 25 × Propct 0.08 + 5 Type 5 + 10 × MS 5 + 5 × PL 5 - 10 × SCS - 3 0.16
Wherein, Milk is output of milk GEBV (genome breeding value), 30/459 is that the weight of this proterties (is calculated and obtained according to its Economic Importance, other proterties are in like manner), Fatpct is butterfat percnetage GEBV, Propct is protein ratio GEBV, Type represents build GEBV, MS represents mammary system GEBV, and FL represents limb hoof GEBV, and SCS represents somatic number GEBV.Calculated the comprehensive genome breeding value of Chinese Holstein black ox bull, i.e. genome performance index (CPI), as the ultimate criterion of selecting outstanding black ox bull.2012 and 2013 2 years, according to the genome performance index of 1224 young bulls, select 491 young bulls and participated in subsidies for growing superior grain cultivators, the frozen semen of outstanding ox can be sold nationwide, thereby promote the quick transmission of outstanding gene, promote China milk industry integral level, promote its healthy and orderly development.
Although, above use general explanation, embodiment and test, the present invention is described in detail, on basis of the present invention, can make some modifications or improvements it, and this will be apparent to those skilled in the art.Therefore, these modifications or improvements without departing from theon the basis of the spirit of the present invention, all belong to the scope of protection of present invention.

Claims (9)

1. the comprehensive genome breeding value method of estimation based on SNP chip, is characterized in that, described method comprises:
Step S1, obtains data file, and described data file is carried out to pre-service, obtains reliable preprocessed data;
Step S2, the preprocessed data that step S1 is obtained carries out the estimation of genome breeding value, utilizes GBLUP method to estimate genome breeding value, or utilizes bayes method to estimate genome breeding value, obtains the genes of individuals group breeding value of single proterties;
Step S3, comprehensive genome breeding value is estimated, repeating step S2 obtains the genes of individuals group breeding value of multiple proterties, calculates comprehensive genome breeding value.
2. method according to claim 1, is characterized in that, described step S1 comprises:
Step S11, obtains SNP chip data and pre-service, comprises that SNP chip data reads, missing gene type is filled;
Step S12, obtains pedigree file and pre-service, upwards reviews father and mother's pedigree 5-10 generation;
Step S13, obtains phenotypic data and pre-service, individual phenotypic number in pedigree file described in screening step S12.
3. method according to claim 2, is characterized in that, described step S1 specifically comprises:
Step S11, obtains SNP chip data, and adopts compressed format storage file to save hard drive space; Described missing gene type is filled and is utilized Beagle program in chip gene type, to exist SNP mark or the individuality of disappearance to fill, and improves chip gene type and detects quality;
Step S12, according to SNP chip data individuality in described step S11, from pedigree file, screening has the individuality of SNP chip information, upwards review father and mother's pedigree information 5-10 generation, according to SNP information, pedigree file father and mother and offspring's sibship are carried out paternity test and adjust original pedigree file according to paternity test result, when original pedigree file parent child relationship and paternity test result are when inconsistent, pedigree file is by the layout again of paternity test result, wherein, pedigree file including individuality, female number, father's number three field informations.
4. according to the method in claim 2 or 3, it is characterized in that, in step S11, described missing gene type also comprises genotype quality control after filling.
5. method according to claim 4, is characterized in that, the quality-controlling parameters of described genotype quality control is each SNP mark recall rate and minimum gene frequency, Hardy-Weinberg balance check, individual recall rate.
6. method according to claim 1, is characterized in that, in step S2,
The described GBLUP method of utilizing is estimated genome breeding value, and the selection of data is divided into only Select gene group information or Select gene group information and only has pedigree information or Select gene group information and the pedigree information of SNP genotype individuality;
The described bayes method that utilizes is estimated genome breeding value, and data selection is genomic information only, estimates the effect of each SNP mark in SNP chip by Markov chain Monte-Carlo algorithm.
7. method according to claim 6, it is characterized in that, data only Select gene group information utilize GBLUP method to estimate genome breeding value, according to SNP chip data file and phenotypic data file, set up the individual sibship matrix-G battle array based on molecular information between reference group and candidate colony, and invert, solve the genome breeding value of candidate colony by Mixed model mixed;
Data selection genomic information and residue polygenes information utilize GBLUP method to estimate genome breeding value, set up the individual sibship matrix-G battle array based on molecular information between reference group and candidate colony, and set up the genetic connection matrix-A battle array based on pedigree information between reference group and candidate colony individuality, and G battle array and A battle array are inverted, solve the genome breeding value of candidate colony by Mixed model mixed;
Data selection genomic information and pedigree information utilize GBLUP method to estimate genome breeding value, according to reference group and candidate's community information, increase and have phenotypic data information but enter reference group without the individuality of genotype information, expand reference group's scale, set up sibship matrix-H battle array of reference group and candidate's group relation, and invert, solve the genome breeding value of candidate colony by Mixed model mixed.
8. method according to claim 1, is characterized in that, in step S3, described comprehensive genome breeding value estimates at two kinds of methods:
(1) do not consider pedigree index, using the economic weight of each proterties as weight, weighting generates comprehensive genome breeding value, for individual choice;
Or, (2) consider pedigree index, first to single proterties, individual system spectrum index and its genome breeding value weighting are merged into a new value as the final genome breeding value of this proterties, merging weight is respectively the reliability of this individual system spectrum index and genome breeding value, obtain after the genome breeding value that all proterties are new, according to the comprehensive genome breeding value of (1) described calculating.
According to the arbitrary described method of claim 1-8 in the application aspect animal breeding.
CN201410067189.6A 2014-02-26 2014-02-26 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip Pending CN103914631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410067189.6A CN103914631A (en) 2014-02-26 2014-02-26 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410067189.6A CN103914631A (en) 2014-02-26 2014-02-26 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip

Publications (1)

Publication Number Publication Date
CN103914631A true CN103914631A (en) 2014-07-09

Family

ID=51040308

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410067189.6A Pending CN103914631A (en) 2014-02-26 2014-02-26 Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip

Country Status (1)

Country Link
CN (1) CN103914631A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512510A (en) * 2015-12-03 2016-04-20 集美大学 Algorithm for assessing heritability through genome data
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106169034A (en) * 2016-05-26 2016-11-30 中国农业科学院作物科学研究所 Genomic information auxiliary breeding means I breeding parent based on SNP clustering information and PAV variation information selects
CN106570350A (en) * 2015-12-17 2017-04-19 复旦大学 Single nucleotide polymorphisms site parting algorithm
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN107590364A (en) * 2017-08-29 2018-01-16 集美大学 A kind of quick bayes method of new estimation genomic breeding value
CN107967409A (en) * 2017-11-24 2018-04-27 中国农业大学 One boar full-length genome low-density SNP chip and preparation method thereof and application
CN108388765A (en) * 2018-02-12 2018-08-10 中国农业科学院作物科学研究所 Full-length genome selection and use value unbiased esti-mator tool GS1.0 based on the network platform
CN109101786A (en) * 2018-08-29 2018-12-28 广东省农业科学院动物科学研究所 A kind of genomic breeding value estimation method for integrating dominant effect
WO2019047083A1 (en) * 2017-09-06 2019-03-14 深圳华大生命科学研究院 Method and device for determining snp loci set and applications thereof
CN109727642A (en) * 2019-01-22 2019-05-07 袁隆平农业高科技股份有限公司 Full-length genome prediction technique and device based on Random Forest model
CN109727641A (en) * 2019-01-22 2019-05-07 袁隆平农业高科技股份有限公司 A kind of full-length genome prediction technique and device
CN109741789A (en) * 2019-01-22 2019-05-10 袁隆平农业高科技股份有限公司 A kind of full-length genome prediction technique and device based on RRBLUP
WO2019153823A1 (en) * 2018-02-07 2019-08-15 山东省农业科学院奶牛研究中心 Pedigree information simplification-based breeding method for selecting high-production performance a2a2 homozygous genotype dairy cattle
CN110564832A (en) * 2019-09-12 2019-12-13 广东省农业科学院动物科学研究所 Genome breeding value estimation method based on high-throughput sequencing platform and application
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN111223524A (en) * 2020-01-10 2020-06-02 多谱(武汉)生物科技有限公司 Genotype determination method and system for biological breeding
CN111524545A (en) * 2020-04-30 2020-08-11 天津诺禾致源生物信息科技有限公司 Method and apparatus for whole genome selective breeding
WO2020229641A1 (en) 2019-05-14 2020-11-19 Agriculture And Food Development Authority (Teagasc) A method and system for estimation of the breeding value of an animal for eating quality and/or commercial yield prediction
CN113470744A (en) * 2021-06-04 2021-10-01 中国农业大学 Pedigree inference method and device based on SNP (Single nucleotide polymorphism) site data and electronic equipment
CN113555063A (en) * 2021-07-28 2021-10-26 仲恺农业工程学院 Threshold character genome breeding value estimation method based on SNP chip and application

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101647423A (en) * 2008-08-11 2010-02-17 北京六马养猪科技有限公司 Method for influence factor analysis and character precisely-quantifying breeding of boar population
CN101792798A (en) * 2010-01-19 2010-08-04 扬州大学 Method for breeding new variety of Sutai ETEC F18+ and disease resistant pigs
CN101892312A (en) * 2010-06-10 2010-11-24 扬州大学 Molecular breeding method for chicken quality

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101647423A (en) * 2008-08-11 2010-02-17 北京六马养猪科技有限公司 Method for influence factor analysis and character precisely-quantifying breeding of boar population
CN101792798A (en) * 2010-01-19 2010-08-04 扬州大学 Method for breeding new variety of Sutai ETEC F18+ and disease resistant pigs
CN101892312A (en) * 2010-06-10 2010-11-24 扬州大学 Molecular breeding method for chicken quality

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHONG-LONG WANG ET AL: "Comparision of five methods for genomic breeding value estimation for the common dataset of the 15th QTL-MAS Workshop", 《PROCEEDINGS OF THE 15TH EUROPEAN WORKSHOP ON QTL MAPPING AND MARKER ASSISTED SELECTION》 *
何桑 等: "基因型填充方法介绍及比较", 《中国畜牧杂志》 *
张勤 等: "《动物重要经济性状基因的分离与应用》", 29 February 2012 *
张哲等: "畜禽基因组选择研究进展", 《科学通报》 *
张文灿等: "《国外畜禽生产新技术》", 30 April 2003 *
张猛: "西门塔尔牛部分经济性状全基因组选择的初步研究", 《中国优秀硕士学位论文全文数据库 农业科技辑》 *
范翌鹏等: "全基因组选择及其在奶牛育种中应用进展", 《中国奶牛》 *
赵书广: "《中国养猪大成》", 31 May 2013 *

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512510A (en) * 2015-12-03 2016-04-20 集美大学 Algorithm for assessing heritability through genome data
CN105512510B (en) * 2015-12-03 2019-03-08 集美大学 A method of genetic force is assessed by genomic data
CN106570350A (en) * 2015-12-17 2017-04-19 复旦大学 Single nucleotide polymorphisms site parting algorithm
CN106570350B (en) * 2015-12-17 2019-04-05 复旦大学 Mononucleotide polymorphic site parting algorithm
CN105868584B (en) * 2016-05-23 2019-03-26 厦门胜芨科技有限公司 The method for carrying out full-length genome selection and use by choosing extreme character individual
CN105868584A (en) * 2016-05-23 2016-08-17 厦门胜芨科技有限公司 Method for performing whole genome selective breeding by selecting extreme character individual
CN106169034A (en) * 2016-05-26 2016-11-30 中国农业科学院作物科学研究所 Genomic information auxiliary breeding means I breeding parent based on SNP clustering information and PAV variation information selects
CN106169034B (en) * 2016-05-26 2019-03-26 中国农业科学院作物科学研究所 Breeding parent selection of the genomic information auxiliary breeding means I- based on SNP clustering information and PAV variation information
CN106755441A (en) * 2016-12-29 2017-05-31 华南农业大学 A kind of method that gene group selection based on multiple characters carries out forest multiple characters pyramiding breeding
CN106755441B (en) * 2016-12-29 2020-08-07 华南农业大学 Method for performing forest multi-character polymerization breeding based on multi-character genome selection
CN107590364A (en) * 2017-08-29 2018-01-16 集美大学 A kind of quick bayes method of new estimation genomic breeding value
WO2019047083A1 (en) * 2017-09-06 2019-03-14 深圳华大生命科学研究院 Method and device for determining snp loci set and applications thereof
CN111051537A (en) * 2017-09-06 2020-04-21 深圳华大生命科学研究院 Method and device for determining SNP site set and application thereof
CN111051537B (en) * 2017-09-06 2024-05-14 深圳华大生命科学研究院 Method and device for determining SNP locus set and application of method and device
CN107967409B (en) * 2017-11-24 2021-04-23 中国农业大学 Pig whole genome low-density SNP chip and manufacturing method and application thereof
CN107967409A (en) * 2017-11-24 2018-04-27 中国农业大学 One boar full-length genome low-density SNP chip and preparation method thereof and application
WO2019153823A1 (en) * 2018-02-07 2019-08-15 山东省农业科学院奶牛研究中心 Pedigree information simplification-based breeding method for selecting high-production performance a2a2 homozygous genotype dairy cattle
CN108388765A (en) * 2018-02-12 2018-08-10 中国农业科学院作物科学研究所 Full-length genome selection and use value unbiased esti-mator tool GS1.0 based on the network platform
CN109101786A (en) * 2018-08-29 2018-12-28 广东省农业科学院动物科学研究所 A kind of genomic breeding value estimation method for integrating dominant effect
CN109101786B (en) * 2018-08-29 2021-02-09 广东省农业科学院动物科学研究所 Genome breeding value estimation method integrating dominant effect
CN109741789A (en) * 2019-01-22 2019-05-10 袁隆平农业高科技股份有限公司 A kind of full-length genome prediction technique and device based on RRBLUP
CN109741789B (en) * 2019-01-22 2021-02-02 隆平农业发展股份有限公司 Whole genome prediction method and device based on RRBLUP
CN109727642A (en) * 2019-01-22 2019-05-07 袁隆平农业高科技股份有限公司 Full-length genome prediction technique and device based on Random Forest model
CN109727641A (en) * 2019-01-22 2019-05-07 袁隆平农业高科技股份有限公司 A kind of full-length genome prediction technique and device
WO2020229641A1 (en) 2019-05-14 2020-11-19 Agriculture And Food Development Authority (Teagasc) A method and system for estimation of the breeding value of an animal for eating quality and/or commercial yield prediction
GB2599289A (en) * 2019-05-14 2022-03-30 Agriculture And Food Dev Authority Teagasc A method and system for estimation of the breeding value of an animal for eating quality and/or commercial yield prediction
CN110610744A (en) * 2019-09-11 2019-12-24 华中农业大学 Efficient whole genome selection method capable of realizing parallel operation and high accuracy
CN110564832A (en) * 2019-09-12 2019-12-13 广东省农业科学院动物科学研究所 Genome breeding value estimation method based on high-throughput sequencing platform and application
CN110564832B (en) * 2019-09-12 2023-06-23 广东省农业科学院动物科学研究所 Genome breeding value estimation method based on high-throughput sequencing platform and application
CN111223524A (en) * 2020-01-10 2020-06-02 多谱(武汉)生物科技有限公司 Genotype determination method and system for biological breeding
CN111524545A (en) * 2020-04-30 2020-08-11 天津诺禾致源生物信息科技有限公司 Method and apparatus for whole genome selective breeding
CN111524545B (en) * 2020-04-30 2023-11-10 天津诺禾致源生物信息科技有限公司 Method and device for whole genome selective breeding
CN113470744A (en) * 2021-06-04 2021-10-01 中国农业大学 Pedigree inference method and device based on SNP (Single nucleotide polymorphism) site data and electronic equipment
CN113470744B (en) * 2021-06-04 2024-05-24 中国农业大学 Pedigree inference method and device based on SNP locus data and electronic equipment
CN113555063A (en) * 2021-07-28 2021-10-26 仲恺农业工程学院 Threshold character genome breeding value estimation method based on SNP chip and application

Similar Documents

Publication Publication Date Title
CN103914631A (en) Comprehensive genomic estimated breeding value (GEBV) method and application on the basis of single nucleotide polymorphism (SNP) chip
Duchemin et al. Genomic selection in the French Lacaune dairy sheep breed
Wolc et al. Implementation of genomic selection in the poultry industry
Jannink Dynamics of long-term genomic selection
CN103914632A (en) Method for rapidly evaluating genome breeding value and application
Lillehammer et al. A comparison of dairy cattle breeding designs that use genomic selection
Dodds et al. Genetic evaluation using parentage information from genetic markers
Xiang et al. Genomic evaluation for crossbred performance in a single-step approach with metafounders
Legarra et al. Can we frame and understand cross-validation results in animal breeding?
Faggion et al. Population-specific variations of the genetic architecture of sex determination in wild European sea bass Dicentrarchus labrax L.
CN110250094A (en) A kind of group breeding method of mutton sheep
Boichard et al. AccurAssign, software for accurate maximum-likelihood parentage assignment
Raoul et al. Genetic and economic effects of the increase in female paternal filiations by parentage assignment in sheep and goat breeding programs
Yan et al. Accuracy of genomic selection for important economic traits of cashmere and meat goats assessed by simulation study
Hill et al. Parentage identification using single nucleotide polymorphism genotypes: application to product tracing
Liu et al. Selection for duration of fertility and mule duck white plumage colour in a synthetic strain of ducks (Anas platyrhynchos)
MacLeod et al. Pitfalls of pre-selecting subsets of sequence variants for genomic prediction.
Calus et al. Assessment of sire contribution and breed-of-origin of alleles in a three-way crossbred broiler dataset
Villumsen et al. Genomic selection in American mink (Neovison vison) using a single-step genomic best linear unbiased prediction model for size and quality traits graded on live mink
Wolc et al. Applications of genomic selection in poultry
Lee et al. Genetic evaluation and accuracy analysis of commercial Hanwoo population using genomic data
Bani Saadat et al. Comparing machine learning algorithms and linear model for detecting significant SNPs for genomic evaluation of growth traits in F2 chickens
Houlahan Understanding the Genomic Architecture of Feed Efficiency and Implications of Selection for it in Dairy Cattle
CN113470744B (en) Pedigree inference method and device based on SNP locus data and electronic equipment
O'Connell et al. Selection of sequence variants to improve genomic predictions

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140709