CN115050419A - Breeding method for selecting corn bract tightness based on whole genome - Google Patents

Breeding method for selecting corn bract tightness based on whole genome Download PDF

Info

Publication number
CN115050419A
CN115050419A CN202210572175.4A CN202210572175A CN115050419A CN 115050419 A CN115050419 A CN 115050419A CN 202210572175 A CN202210572175 A CN 202210572175A CN 115050419 A CN115050419 A CN 115050419A
Authority
CN
China
Prior art keywords
data
model
tightness
breeding
bract
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210572175.4A
Other languages
Chinese (zh)
Inventor
崔震海
敖曼
关义新
刘云灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Institute of Geography and Agroecology of CAS
Original Assignee
Northeast Institute of Geography and Agroecology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Institute of Geography and Agroecology of CAS filed Critical Northeast Institute of Geography and Agroecology of CAS
Priority to CN202210572175.4A priority Critical patent/CN115050419A/en
Publication of CN115050419A publication Critical patent/CN115050419A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Animal Husbandry (AREA)
  • Human Resources & Organizations (AREA)
  • Ecology (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Mining & Mineral Resources (AREA)
  • Physiology (AREA)
  • Economics (AREA)
  • Agronomy & Crop Science (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a breeding method for selecting corn bract tightness based on a whole genome, and belongs to the technical field of plant breeding. The invention provides a method for obtaining corn bract tightness by whole genome selection, which comprises the steps of selecting whole genome data of a corn inbred line in a related group as original data, performing genotype deficiency supplementation after quality control pretreatment, screening out gene data marked by different densities, performing principal component analysis and genetic relationship analysis to obtain covariance data, bringing corn bract tightness data in different environments into a mixed model to obtain BLUP values, substituting the genotype data, the covariance data and the BLGS values obtained by analysis into different statistical models for analysis, selecting an optimal prediction model, and predicting the optimal bract tightness according to the genotype of a material to be tested. The method provides a basis for GS breeding and molecular breeding of other agronomic traits of the corn.

Description

Breeding method for selecting corn bract tightness based on whole genome
Technical Field
The invention belongs to the technical field of plant breeding, and particularly relates to a breeding method for selecting corn bract tightness based on a whole genome.
Background
The corn bracts are used as the metamorphosis leaves for wrapping the ears, and can protect seeds from diseases and insect pests and dehydration in the development process. With the continuous expansion of corn planting area, ordinary manpower harvesting is difficult to keep pace, so mechanized harvesting becomes an inevitable trend. The properties of the ear, including the length of the bracts, the number of bracts and the tightness, directly or indirectly affect the mechanized harvesting of the corn. The bract firmness is a comprehensive trait based on various bract measurements including bract length, bract width, bract thickness, etc. In the study by Jiang et al (2020), it was found that the degree of bract tightness was significantly inversely related to both its thickness and width, indicating that thicker and wider bracts would wrap more tightly. High kernels et al (1999) analyzed domestic and foreign literature through research and study, believes that the degree of bract tightness as a negative correlation property affects the water content of corn kernels. The proper bract tightness is beneficial to the growth and development and mechanical harvest of the corn.
With the continuous development of sequencing technology and the development of molecular markers in the field of molecular breeding, marker-assisted selection (MAS) has become the dominant force in crop improvement in recent years (Stuber et al 1999). However, many agronomic traits of corn are quantitative traits, and effective utilization of MAS in complex quantitative trait breeding is severely limited by some limiting factors such as marker quality, small effect QTL detection accuracy and environmental variation (Hasan et al 2021, Platten et al 2019, Hospital et al 2009). GS breeding is to construct a training model by analyzing the genotype and phenotype of a population, substitute the genotype data of the breeding population into the model to obtain GEBVs, and select a population to be tested according to the GEBVs (Cui et al 2020).
Analysis of markers covering the whole genome has made GS capable of interpreting total genetic variation, superior to MAS. This feature of GS introduces the effect of phenotype from the selection of lines to the level of the construction of the model. How to improve the accuracy of the model constructed by the GS is the key of the current GS research. At present, when the GS is carried out by using a multi-platform sequencing genotype, a good screening method does not exist, so that the prediction precision of a directly integrated genotype is too low. In addition, no studies have been made on GS analysis for corn bract tightness. Therefore, it is desirable to provide a method for analyzing the degree of tightness of bract in genome-wide selection (GS).
Disclosure of Invention
The invention aims to provide a method for obtaining corn bract elasticity by using whole genome selection and a method for breeding according to the corn bract elasticity.
The invention provides a breeding method for selecting corn bract tightness based on a whole genome, which comprises the following steps:
step 1: selecting corn complete genome data of 4 gene sequencing platforms as original data, and performing quality control pretreatment screening on the original data;
step 2: the data pre-treated and screened in the step 1 are supplemented with genotype deletion, and the linkage disequilibrium degree r of different markers of the genome is calculated 2 Screening out gene data of markers with different densities;
and step 3: performing principal component analysis on the gene data of different density markers to obtain principal component analysis gene data; carrying out genetic relationship analysis on the genetic data of different density markers to obtain genetic relationship analysis genetic data;
and 4, step 4: counting the corn bract tightness data in advance, and bringing the corn bract tightness data into a mixed model to obtain a BLUP value;
and 5: analyzing the main component analysis gene data and genetic relationship analysis gene data obtained in the step 3, and performing GS analysis on the main component analysis gene data and the BLUP value obtained in the step 4 to obtain an optimal statistical model;
and 6: and (5) determining the genotype of the maize inbred line to be detected, substituting the genotype into the optimal model obtained in the step 5 to obtain a genome estimated breeding value, and preferably selecting partial GEBV value ranking as a preferred breeding material according to the needs of a breeder.
Further defined, the four sequencing platform data in step 1 are: SNP data of two gene chips from a maize inbred line, transcriptome sequencing genotype data from the maize inbred line, and genotyping data of a simplified genome sequencing platform from the maize inbred line; the SNP data of the two gene chips are 50K and 600K.
Further defined, the pretreatment method in the step 1 is as follows: the screening is carried out according to the standard that the deletion rate is less than 20 percent and the MAF is more than 0.05.
Further limiting, the genotype deletion in step 2 utilizes the beagle4.0 software; in step 2, the r2 parameters are 0.8, 0.5, 0.2, 0.1 and 0.01 respectively.
Further limiting, in step 3, performing principal component analysis by using a function prcomp in the R language; and carrying out genetic relationship analysis by using a GAPIT software package.
Further defined, the hybrid linear model in step 4 is: and the phenotype value of the i-th family in the model represents the mean value of the phenotypes in a plurality of environments, is a breed effect, is an environmental effect and is a residual error.
Further limiting, the specific steps of the GS analysis in step 5 are as follows:
(1) genome prediction: adopting a BayseA model, a BayseB model, a BayseC model, a BL model, a BRR model and a gBLUP model, and utilizing an R package 'rrBLUP v 4.5' and 'BGLR' to carry out genome prediction;
(2) prediction accuracy: randomly extracting 80% of inbred lines from corn bract elasticity data of known genotypes to be used as a training set, using the rest 20% of inbred lines as a test set, introducing the genotype data of the test set into a prediction model, calculating to obtain a genome estimated breeding value of the test set, repeating for hundreds of times, and performing correlation analysis on the phenotype data of the test set, namely the actual breeding value and the genome estimated breeding value to obtain a statistical model with the highest accuracy.
Further defined, the prediction model is: a Bayesian model, a Bayesian C model, a BL model, a BRR model, and a gBLUP model.
The invention provides computer equipment for a breeding method for selecting corn bract tightness, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the computer program, the breeding method for selecting corn bract tightness based on whole genome is realized.
The present invention provides a non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements a method of selecting corn bract tightness based on whole genome as described above.
Has the beneficial effects that: the method uses corn related groups as materials, uses corn bract elasticity characters as phenotype indexes, carries out GS analysis on the corn bract elasticity characters under different statistical models, different marking densities, different sequencing platforms, different sampling strategies and different group structures, integrates multiple factors such as sequencing price, calculation time and prediction accuracy to obtain an optimal prediction mode, and provides a basis for GS breeding of other agronomic characters of corn.
According to the method, the GS analysis is carried out on different gene sequencing platforms and the bract tightness of different marker densities, so that the optimal platform and marker density of the bract tightness of corn are selected under the condition of not influencing the prediction accuracy of GS, and the sequencing cost is greatly saved; in addition, different population structures are selected for GS analysis of the bract tightness, the subgroup with the highest GS prediction accuracy in the property of the bract tightness is selected, and if the subgroup is applied to breeding of the bract tightness, the breeding efficiency can be greatly improved. According to the GS result of the corn bract tightness in the patent, the selection method can be expanded to more corn groups and characters.
Comparing the sequencing platform, the molecular marker density, the statistical model, the sampling strategy and the prediction precision of the population structure, trying to search the optimal strategy of important corn agronomic traits GS breeding including the degree of tightness of the bract, enabling the prediction accuracy and the research cost to be optimal, and providing a theoretical basis for further molecular breeding.
Drawings
FIG. 1 is a flow chart of optimal statistical model screening for the bract tightness GS;
FIG. 2 is a flow chart of optimal marker density and sequencing platform screening for the bract turgor GS;
FIG. 3 is a flow chart of optimal sampling strategy and population structure screening for the bract tightness GS;
FIG. 4 shows GS prediction accuracy of bract tightness of related groups in different genotyping platforms under different statistical models;
FIG. 5 shows GS prediction accuracy of bract tightness of the associated population in different genotyping platforms at different marker densities;
FIG. 6 is a heritability estimate of bract tightness. The abscissa is a sequencing platform and the ordinate is heritability;
FIG. 7 is GS prediction accuracy for bract tightness under different sampling strategies;
FIG. 8 is the GS prediction accuracy of the bract tightness under different population structures;
Detailed Description
1. Genome wide selection (GS): based on the genome Breeding Value (GEBV), the genetic evaluation is carried out on individuals by detecting the molecular markers covering the whole genome and utilizing the genetic information of the genome level to obtain higher Breeding Value estimation accuracy.
BLUP value: optimal linear unbiased prediction is a statistical method that is suitable for application in breeding to predict individual breeding values, i.e., genetic assessments. Prediction accuracy can be improved.
3. And (3) correlation analysis: the correlation analysis refers to the analysis of two or more variable elements with correlation, so as to measure the degree of closeness of correlation of the two variable elements. Certain connection or probability is required to exist between elements of the correlation so as to carry out correlation analysis.
4. Force transmission: the ratio of genetic variance to total variance (phenotypic variance) is also called genetic rate, and can be used as an index for hybrid progeny selection. Heritability is divided into generalized heritability and narrowly inherited heritability. Quantitative traits are greatly affected by environmental factors, and phenotypic variation may be inherited, environmental, and even environmental and genetic interactions.
Example 1 Breeding method based on prediction of maize bract tightness
A breeding method based on prediction of corn bract tightness is characterized by comprising the following steps:
(1) the genotype files required for the study from the 4 gene sequencing platforms and 1 integration were from the http:// maizego. org/resources. html website. The 50K genotype is DNA extraction by using leaves of an association group (513 maize inbred lines) such as Yang (2011), and genotyping is obtained by using a gene chip of MaizesNP50 Beadchip, wherein the genotype comprises 56110 SNP; the transcriptome genotype is that seeds 15 days after the 368 selfing lines in the related groups are pollinated by Fu (2013) and the like are subjected to RNA extraction, and sequencing of an Illumina platform is carried out; GBS and 600K genotypes were Liu (2017) et al DNA extracted from 469 and 153 inbred leaves and genotyped using GBS (reduced genome sequencing) and Affymetrix Axiom Maize 600K array sequencing platforms. http:// maizego. org/resources. html website downloaded raw data was subjected to the following steps to obtain 5 genotype files.
Firstly, original data are opened by using Tassel 5 software, and screening is carried out by taking the deletion rate of less than 20 percent and the MAF (minimum allele frequency) of more than 0.05 as a standard;
supplementing genotype deletion by using Beagle4.0 software;
(iii) calculating the degree of Linkage Disequilibrium (LD) of different markers in the genome using plink software 2 Is provided with r 2 The parameters are 0.8, 0.5, 0.2, 0.1 and 0.01 respectively, and the mark names reserved under different densities can be obtained, and the larger the value is, the higher the mark density is. Extracting according to the reserved mark names to obtain genotype files under different mark densities, and converting the genotype files into numerical types 1, 0 and 2 from a base type ATGC and the like in R language software;
fourthly, performing principal component analysis by using the function prcomp in the R language to obtain a PCA file.
Utilizing GAPIT software package to make genetic relationship analysis so as to obtain Kinship (K) file.
(2) The data on the tightness of bracts in san city, Hainan province, 2015 and 2016 and in Liaoning province, 2016 were determined and calculated using a mixed linear model to obtain BLUP (the best linear unbiased prediction value). The mixed linear model is: y is i =μ+f i +e ii In model y i Phenotypic values representing the "i" th family, μ represents the mean of phenotypes in multiple environments, f i For variety effects, e i Is an environmental effect, epsilon i Is the residual error. For each environment, the mean value is taken as a fixed effect, the variety and environment effects are treated as random effects, and finally the fixed effect is summed with the estimated value of the variety effect, namely the BLUP value. The selfing coefficient of the bract tightness measured in the method is 438, but the number of selfing lines of related groups contained in different genotype files is different. Therefore, based on 438 inbred lines in the phenotype file, comparing the inbred line names shared by the different sequencing platforms and the phenotype file, respectively, to finally generate 5 different phenotype files containing corresponding inbred lines (the genotype integration file, 50K, 600K, GBS, and RNA-seq integrating 4 sequencing platforms have 438, 133, 380, and 315 inbred lines, respectively);
(3) and (3) performing GS analysis by using the genotype file of the step (1) and the phenotype file of the step (2).
Epsilon 0 genome prediction model: taking environment (environmental factors are influences of different environments on prediction) and the first three Principal Components (PCA) as fixed effects, and all additive genetic effects (genotypes) and residual errors are random effects, so as to obtain a mixed linear statistical model: y ═ μ + X β + Zu + epsilon, where y is the vector of the observed values of the degree of tightness of the bract (n × 1), and n is the house coefficient; μ is the overall average of the phenotype; x is a design matrix of fixed effects (n × p), β is a vector of fixed effects (p × 1), where when p is equal to 3, the first three principal components of PCA obtained by ε 1 in (1) are considered fixed effects; z is a design matrix (n × q) of random effects; u is the vector of the random effects of the individual's total additive genetic effect (genotype) (qx 1); ε (n × 1) is the residual error. RandomThe effects follow a normal distribution: u to N (0) are selected from,
Figure BDA0003659224980000051
),ε~N(0,
Figure BDA0003659224980000052
) Wherein I is a homology matrix, K is a genetic relationship matrix derived from (1) middle (c),
Figure BDA0003659224980000053
is the variance of the additive genetic effect of an individual,
Figure BDA0003659224980000054
is the variance of the residual. u is the estimated genomic breeding value (GEBV) that ultimately needs to be calculated.
The gBLUP was genomically predicted by the R package "rrBLUP V4.5" using the mixed linear model described above, and BayseA, BayesB, BayesC, BL, BRR by the R package "BGLR".
Calculating the prediction precision: 80% of the inbred lines were randomly selected as training set and the remaining 20% as test set among all inbred lines and repeated 100 times. The test set of phenotypic data is the actual breeding value (TBV), and the training set of genotypic and phenotypic data is used to "train" the predictive model. After the genotype data of the test set is introduced into the prediction model of (3), the estimated genome breeding value (GEBV) of the test set can be calculated. And performing Pearson correlation analysis on the TBV and the GEBV to obtain the prediction accuracy of genome selection.
And performing GS analysis on the obtained five genotype files by respectively using BayseA, BayseB, BayseC, BL, BRR and gBLUP models, performing significance test on the prediction result through the sps software, and comparing the prediction accuracy among the models to show that the gBLUP model is the optimal model.
The model is specifically as follows:
bayes model is
Figure BDA0003659224980000061
Wherein y is a vector of n phenotypic observations;b is a fixed effect vector, and X is a correlation matrix of the fixed effect; q is the number of SNPs, z k Genotype vector for the kth SNP, g k The effect value of the kth SNP; e is the residual random residual vector.
BayseA: it is assumed that all SNPs respond effectively and that the variance of the effect of all SNPs follows a normal distribution of the inverse chi-square distribution of the scale.
Bayes B: only a small fraction of the marker sites are effective, while most other chromosome segments are not. The ratio of effective response sites needs to be set in advance.
Bayes C: the same as for Bayes B, but the proportion of effective response sites is unknown, without presetting.
BL: the variance follows a normal distribution of exponential distributions.
BRR: obey a gaussian distribution, and the hyper-parametric mean α and standard deviation λ obey a gamma distribution.
gBLUP: y ═ Xb + Zg + e, y is the phenotype vector, X is the design matrix associated with the fixed effect, b is the vector of the fixed effect, Z is the design matrix of the genetic effect, g is the vector of the additive genetic effect, and e is the vector of the random normal dispersion of the variance. And replacing the genetic relationship matrix constructed based on pedigree information by constructing a genome relationship matrix.
(4) In actual breeding, the genotype of a maize inbred line to be tested is firstly determined, a gBLUP statistical model obtained by the training population is utilized, namely GEBV is obtained through calculation, breeding materials with partial GEBV values ranked in front or partial breeding materials ranked in the back are selected according to the requirements of breeders, and the elimination of the breeding materials can be determined without wasting time and energy to determine phenotypic data.
The technical scheme of the invention is mainly divided into the following steps:
1. screening of optimal statistical models
As shown in FIG. 1, data for 4 sequencing platforms and integrated genotypes for 50K, 600K, GBS and RNA-seq were processed into 5 genotype files in the desired format, and the bract tightness BLUP value was used as the phenotype file. 80% of the entire association population was randomly drawn as the training population, and the remaining 20% was the testing population. Bayes A, Bayes B, Bayes C, BL, BRR model in BGLR package in R language software and gBLUP model in rrBLUP package are selected for GS analysis. And (3) performing correlation analysis on the GS prediction result of the bract tightness by using a sps software to obtain prediction precision so as to screen an optimal statistical model.
2. Screening for optimal tag Density and sequencing platforms
As shown in FIG. 2, the pLink software was used to adjust the r2 parameter to obtain genotype files for different marker densities and sequencing platforms. The 5 sequencing platforms had 6 marker densities, respectively, for a total of 30 genotype data. Randomly drawing 80% of the whole association population as a training population, taking the rest 20% as a testing population, and selecting the optimal statistical model in 1 for GS analysis. And (3) performing correlation analysis on the GS prediction result of the bract tightness by using a sps software to obtain prediction precision so as to screen an optimal marking density and an optimal sequencing platform.
3. Screening of optimal sampling strategy and population structure:
as shown in fig. 3, the sampling strategy is set to: randomly drawing 10-90% of the associated population as a prediction population, and the rest is a training population. The method comprises the following steps of (1) setting two treatments of Within-subgroup (Within a group) and between subgroups (Across), wherein Within the subgroups, 4 subgroups (MIXED, NSS, SS, TST) are sampled according to proportion to serve as training groups, and other families of the same subgroup serve as prediction groups; inter-subpopulation GS prediction was scaled from 1 of 4 subpopulations as training population, except that all pedigrees (including 4 subpopulations) were used as prediction population. According to 1 and 2, the optimal statistical model, the marking density and the sequencing platform of the bract tightness GS are selected, the prediction precision of different sampling strategies and group structures is compared, and the optimal sampling strategy and the group structures are finally screened out.
Example 2 computer apparatus for a Breeding method for selecting corn bract tightness
A computer apparatus for selecting a breeding method for corn bract tightness, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the whole genome-based selection of corn bract tightness breeding method of embodiment 1 when executing the computer program.
Example 3. a non-transitory computer-readable storage medium
A non-transitory computer-readable storage medium having stored thereon a computer program that, when executed by a processor, implements the whole genome selection-based corn bract tightness breeding method of example 1. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The test effect was verified using the following tests:
1. effect of GS models on prediction accuracy under different genotyping platforms
In order to screen the optimal statistical model most suitable for the bract tightness GS, Bayes A, Bayes B, Bayes C, BL, BRR models of BGLR package in R language and gBLUP model of rrBLUP package were selected for comparison. And comprehensively considering by using a multigenome sequencing platform (integration, 50K, 600K, GBS and RNA-seq), and comparing the influence of the genotyping platform and the statistical model on the prediction of the corn bract tightness GS. One sampling proportion is selected for explanation, 80% of the whole related population is randomly selected as a training population, and the rest 20% is selected as a testing population. As shown in fig. 4, the significance test found that 5 genotyping platforms, the gblupp model GS prediction accuracy was significantly higher than the other models. After a gBLUP model is selected, the 50K gene chip with the highest prediction precision in the 5 sequencing platforms is selected, and the prediction precision reaches 36.74%; secondly, a GBS sequencing platform with the prediction precision of 36.18 percent; followed by RNA-seq, integration and 600K gene chip in that order. The result shows that the optimal statistical model for predicting the tightness GS of the bract is gBLUP, and the prediction accuracy of 50K and GBS sequencing platforms under the model is slightly higher than that of other platforms.
The legend in fig. 4 represents different GS statistical models, with different genotyping platforms on the abscissa and GS prediction accuracy on the ordinate. The sampling strategy is to randomly sample in all the associated groups, and the sampling times are 100 times. The Duncan statistical method was chosen for the significance test of the differences, the results are shown as letters a-h on the figure.
2. Effect of marker Density on GS prediction accuracy under different genotyping platforms
The primary marker numbers of different genotyping platforms (integration, 50K, 600K, GBS, RNA-seq) are as follows: 196758, 47368, 436972, 887, 93081 SNPs. To screen for optimal marker density for the bract tightness GS, different marker density genotypes were set by varying the Linkage Disequilibrium (LD) r2 parameter in the plink software. r2 is respectively 0.01, 0.1, 0.2, 0.5 and 0.8, namely the marking density is as follows: the integrated genotypes respectively have 167, 8417, 20905, 52826 and 90292 SNPs, the 50K genotypes respectively have 113, 8485, 18292, 29537 and 35964 SNPs, the 600K genotypes respectively have 428, 4524, 33568, 111713 and 172319 SNPs, and the GBS genotypes respectively have 64, 676, 743, 801 and 817; RNA-seq genotypes have 80, 4395, 14465, 33978, and 49888 SNPs, respectively. One sampling proportion is selected for explanation, 80% of the whole related population is randomly selected as a training population, and the rest 20% is selected as a testing population. As shown in FIG. 5, the optimal marker density of different genotyping platforms was found to be different by significance test. The 50K and GBS sequencing platforms have the highest prediction accuracy when r2 is 1.0, namely, any original mark is not deleted; 600K, RNA-seq and the integrated sequencing platform predicted the highest accuracy at r2 of 0.5, 0.2 and 0.8, respectively. However, the prediction accuracy of all marker densities under all sequencing platforms was not significantly different except 0.01. Therefore, the sequencing price and the prediction accuracy are comprehensively considered, the cost for selecting the marker density of 0.1 is low, and the GS result is not obviously influenced. In addition, the GBS sequencing platform is low in price, and the prediction accuracy reaches 34.92% when r2 is 0.1. In conclusion, the marker density (marker number is only 676) of r2 of 0.1 under the GBS platform is the best choice for breeding the GS with the degree of tightness of the bract.
The legend in fig. 5 represents different marker densities, with different genotyping platforms on the abscissa and GS prediction accuracy on the ordinate. The sampling strategy is to randomly sample in all the associated groups, and the sampling times are 100 times. The Duncan statistical method was chosen for the significance test of the differences, the results are shown as letters a-g on the figure.
3. Heritability of bract tightness under different genotyping platforms
As mentioned previously, the optimal marker density was a marker density of 0.1 for r2, where the integration, 50K, 600K, GBS, and RNA-seq platforms had 8417, 8485, 4524, 676, and 4395 SNP markers, respectively. As shown in FIG. 6, the narrow heritability of the phenotype values of the bract tightness and their BLUP values was evaluated in different sequencing platforms. The heritability of the bract tightness ranged from 0.02 to 0.71. The heritability of the 600K gene chip sequencing platform is the highest and reaches 0.71. The GBS sequencing platform heritability has no obvious difference with 50K, integration and RNA-seq, and has larger difference with 600K. Theoretically, higher heritability should bring higher prediction accuracy of GS, but the prediction accuracy under the 600K sequencing platform is not high, which may be caused by a large amount of missing values of 600K phenotype data. In conclusion, the heritability of the 600K sequencing platform is highest.
All the bract turgor phenotype data in fig. 6 and their BLUP values were derived from field surveys of 15SY, 16SY, and 16 FS.
4. Impact of different sampling strategies on GS prediction accuracy
To this end, we have selected the optimal model gBLUP, optimal marker density 0.1, and optimal sequencing platform GBS for GS analysis of the bract turgor. In order to further analyze the influence of different sampling strategies on the GS prediction accuracy, a GBS sequencing platform with the marker density of 0.1 is randomly sampled by adopting a gBLUP model, 10-90% of the whole related population is randomly extracted to be used as a prediction population, and the rest is a modeling population. As the prediction population sampling strategy increases, the GS prediction accuracy shows a downward trend (fig. 7). Wherein, when the ratio of the prediction group to the related group is 10%, the maximum prediction precision is 35.24%. The result shows that the maximum GS prediction is accurate when the percentage of the prediction group is 10%, namely the percentage of the training group is 90%, the abscissa of the graph 7 is the sampling proportion of the prediction group, and the ordinate is the prediction precision. The number of random samples is 100.
5. Effect of different population Structure on GS prediction accuracy
To analyze the effect of different population structures on the prediction accuracy of the degree of bract tightness GS, we still used gBLUP as the optimal statistical model, r2 as the optimal sequencing platform for optimal marker density and GBS of 0.1, and after dividing the related population into four sub-populations of MIXED, NSS, SS, and TST, random extraction was performed using two population structures Within sub-population (Within) and between sub-population (Across), respectively (fig. 8). The sub-population internal sampling means that the prediction population and the modeling population are limited in a specific sub-population, random extraction is carried out only in the sub-population, if 10-90% of SS sub-population is randomly extracted as the prediction population, the rest SS sub-population self-bred lines are the modeling population; the sampling among the sub-groups only limits the prediction group to be in a certain sub-group, while the modeling group can be from any sub-group, for example, 10-90% of the SS sub-groups are randomly drawn to be used as the prediction group, and all the rest inbred lines are regarded as the modeling group. Overall, the GS prediction precision showed a downward trend as the prediction population sampling rate increased. Within the MIXED, NSS, SS subpopulations, random sampling between subpopulations resulted in a higher prediction accuracy of GS than random sampling within subpopulations, and vice versa in the TST subpopulations. The results show that the highest prediction precision is shown in the SS subgroup in all subgroups no matter which sampling strategy is adopted, and the SS subgroup is more suitable for researching the bract tightness character of the related subgroup.
The legend in fig. 8 shows the sampling ratio of the test population in different colors, the abscissa shows the different population structure, i.e. the sampling within and between the sub-populations, all randomly drawn in four sub-populations (MIXED, NSS, SS, TST) respectively, and the ordinate is the prediction accuracy. The number of random samples is 100.
In conclusion, the best model for analyzing the bud tightness GS is gBLUP, the best labeling density and sequencing platform are GBS platforms (676 labels) with labeling density LD of 0.1, the sampling strategy is to extract 10% as a prediction population, the population structure selects an SS subgroup, and the GS breeding effect of the bud tightness of the related population is the best.

Claims (10)

1. A breeding method for selecting corn bract tightness based on whole genome is characterized in that the breeding method comprises the following steps:
step 1: selecting corn complete genome data of 4 gene sequencing platforms as original data, and performing quality control pretreatment screening on the original data;
step (ii) of2: the data pre-treated and screened in the step 1 are supplemented with genotype deletion, and the linkage disequilibrium degree r of different markers of the genome is calculated 2 Screening out gene data of markers with different densities;
and step 3: performing principal component analysis on the gene data of different density markers to obtain principal component analysis gene data; carrying out genetic relationship analysis on the genetic data of different density markers to obtain genetic relationship analysis genetic data;
and 4, step 4: counting the corn bract tightness data in advance, and bringing the corn bract tightness data into a mixed model to obtain a BLUP value;
and 5: analyzing the main component analysis gene data and genetic relationship analysis gene data obtained in the step 3, and performing GS analysis on the main component analysis gene data and the BLUP value obtained in the step 4 to obtain an optimal statistical model;
step 6: and (4) determining the genotype of the maize inbred line to be detected, carrying the genotype into the optimal model obtained in the step (4), obtaining a genome estimated breeding value, and preferably selecting partial GEBV value as a preferred breeding material according to the requirements of a breeder.
2. The method of claim 1, wherein the four sequencing platform data in step 1 are: SNP data of two gene chips from a maize inbred line, transcriptome sequencing genotype data from the maize inbred line, and genotyping data from a simplified genome sequencing platform of the maize inbred line; the SNP data of the two gene chips are 50K and 600K.
3. The method according to claim 1, wherein the pretreatment method in step 1 is: the screening is carried out according to the standard that the deletion rate is less than 20 percent and the MAF is more than 0.05.
4. The method of claim 1, wherein the genotype deletion in step 2 utilizes the beagle4.0 software; r in step 2 2 The parameters are 0.8, 0.5, 0.2, 0.1 and 0.01 respectively; calculating r 2 The plink software was utilized.
5. The method of claim 1, wherein step 3 is performed by using a function prcomp in the R language; and carrying out genetic relationship analysis by using a GAPIT software package.
6. The method according to claim 1, wherein the hybrid linear model in step 4 is: y is i =μ+f i +e ii In model y i Phenotypic values representing the "i" th family, μ represents the mean of phenotypes in multiple environments, f i For variety effects, e i Is an environmental effect, epsilon i Is the residual error.
7. The method according to claim 1, wherein the specific steps of the GS analysis in step 5 are as follows:
(1) genome prediction: adopting a BayseA model, a BayseB model, a BayseC model, a BL model, a BRR model and a gBLUP model, and utilizing an R package 'rrBLUP v 4.5' and 'BGLR' to carry out genome prediction;
(2) prediction accuracy: randomly extracting 80% of inbred lines from corn bract elasticity data of known genotypes to be used as a training set, using the rest 20% of inbred lines as a test set, introducing the genotype data of the test set into a prediction model, calculating to obtain a genome estimated breeding value of the test set, repeating for hundreds of times, and performing correlation analysis on the phenotype data of the test set, namely the actual breeding value and the genome estimated breeding value to obtain a statistical model with the highest accuracy.
8. The method of claim 7, wherein the predictive model is: a Bayesian model, a Bayesian C model, a BL model, a BRR model, and a gBLUP model.
9. A computer apparatus for selecting a breeding method for corn bract tightness, characterized in that it comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor when executing the computer program realizes the whole genome-based selection of corn bract tightness breeding method as claimed in any one of claims 1 to 8.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the method for whole genome selection of corn bract tightness-based breeding according to any one of claims 1-8.
CN202210572175.4A 2022-05-24 2022-05-24 Breeding method for selecting corn bract tightness based on whole genome Pending CN115050419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210572175.4A CN115050419A (en) 2022-05-24 2022-05-24 Breeding method for selecting corn bract tightness based on whole genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210572175.4A CN115050419A (en) 2022-05-24 2022-05-24 Breeding method for selecting corn bract tightness based on whole genome

Publications (1)

Publication Number Publication Date
CN115050419A true CN115050419A (en) 2022-09-13

Family

ID=83158940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210572175.4A Pending CN115050419A (en) 2022-05-24 2022-05-24 Breeding method for selecting corn bract tightness based on whole genome

Country Status (1)

Country Link
CN (1) CN115050419A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467596A (en) * 2023-04-11 2023-07-21 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN117672360A (en) * 2024-01-30 2024-03-08 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116467596A (en) * 2023-04-11 2023-07-21 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN116467596B (en) * 2023-04-11 2024-03-26 广州国家现代农业产业科技创新中心 Training method of rice grain length prediction model, morphology prediction method and apparatus
CN117672360A (en) * 2024-01-30 2024-03-08 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning
CN117672360B (en) * 2024-01-30 2024-06-11 北京市农林科学院信息技术研究中心 Genome selection method, device, equipment and medium based on transfer learning

Similar Documents

Publication Publication Date Title
Heffner et al. Genomic selection accuracy using multifamily prediction models in a wheat breeding program
Roorkiwal et al. Genomic-enabled prediction models using multi-environment trials to estimate the effect of genotype× environment interaction on prediction accuracy in chickpea
CN113519028B (en) Methods and compositions for estimating or predicting genotypes and phenotypes
Rincent et al. Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.)
Cros et al. Genomic selection prediction accuracy in a perennial crop: case study of oil palm (Elaeis guineensis Jacq.)
Leon et al. Genetic analysis of seed‐oil concentration across generations and environments in sunflower
Manel et al. Landscape genetics: combining landscape ecology and population genetics
Bacles et al. Paternity analysis of pollen-mediated gene flow for Fraxinus excelsior L. in a chronically fragmented landscape
Ukrainetz et al. Assessing the sensitivities of genomic selection for growth and wood quality traits in lodgepole pine using Bayesian models
Lund et al. Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis
Chung et al. Spatial genetic structure in a Neolitsea sericea population (Lauraceae)
AU2011261447B2 (en) Methods and compositions for predicting unobserved phenotypes (PUP)
CN115050419A (en) Breeding method for selecting corn bract tightness based on whole genome
Pace et al. Genomic prediction of seedling root length in maize (Zea mays L.)
CN111524545B (en) Method and device for whole genome selective breeding
Fountain et al. Inferring dispersal across a fragmented landscape using reconstructed families in the Glanville fritillary butterfly
Minamikawa et al. Tracing founder haplotypes of Japanese apple varieties: application in genomic prediction and genome-wide association study
Kelly et al. Inbreeding and the genetic variance in floral traits of Mimulus guttatus
Slavov et al. Population substructure in continuous and fragmented stands of Populus trichocarpa
Lorenz et al. Training population design and resource allocation for genomic selection in plant breeding
Kuhn et al. Estimation of genetic diversity and relatedness in a mango germplasm collection using SNP markers and a simplified visual analysis method
Rosvall Using Norway spruce clones in Swedish forestry: general overview and concepts
Wedger et al. Genomic revolution of US weedy rice in response to 21st century agricultural technologies
Estopa et al. Genomic prediction of growth and wood quality traits in Eucalyptus benthamii using different genomic models and variable SNP genotyping density
Tang et al. A strategy for the acquisition and analysis of image-based phenome in rice during the whole growth period

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination