CN117524308A - SNP locus combination for presuming human genetic relationship grade and application thereof - Google Patents

SNP locus combination for presuming human genetic relationship grade and application thereof Download PDF

Info

Publication number
CN117524308A
CN117524308A CN202310586302.0A CN202310586302A CN117524308A CN 117524308 A CN117524308 A CN 117524308A CN 202310586302 A CN202310586302 A CN 202310586302A CN 117524308 A CN117524308 A CN 117524308A
Authority
CN
China
Prior art keywords
snp
individuals
relationship
individual
genetic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310586302.0A
Other languages
Chinese (zh)
Other versions
CN117524308B (en
Inventor
李彩霞
赵雯婷
刘京
赵兴春
姚伊人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Appraisal Center Of Ministry Of Public Security
Original Assignee
Appraisal Center Of Ministry Of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Appraisal Center Of Ministry Of Public Security filed Critical Appraisal Center Of Ministry Of Public Security
Priority to CN202310586302.0A priority Critical patent/CN117524308B/en
Priority claimed from CN202310586302.0A external-priority patent/CN117524308B/en
Publication of CN117524308A publication Critical patent/CN117524308A/en
Application granted granted Critical
Publication of CN117524308B publication Critical patent/CN117524308B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Genetics & Genomics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Ecology (AREA)
  • Physiology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention provides SNP locus combination for presuming the human genetic relationship grade and application thereof. The first aspect of the invention provides an SNP locus combination for presuming the personal affinity level, which comprises 9000 SNP loci, is suitable for detecting on-site common biological detection materials such as trace and degradation on the basis of meeting the affinity level analysis requirement, and has strong forensic scene applicability. Based on the 9000 SNP locus combinations, a genealogy relation inference algorithm based on likelihood ratios is established in the second aspect of the invention, and the algorithm is used for accurately predicting the genealogy relation within 5 grades (including the genealogy relation) of forensic genealogy, and has the accuracy of 99 percent and no false negative.

Description

SNP locus combination for presuming human genetic relationship grade and application thereof
Technical Field
The invention relates to SNP locus combination for estimating human genetic relationship grade, and relates to the technical field of biological information.
Background
The single nucleotide polymorphism (Single Nucleotide Polymorphism, SNP) is a third generation genetic marker in the forensic field, has the characteristics of wide distribution, low mutation rate and high genetic stability, can be used for individual identification, genetic relationship analysis, biological geographic inference, morphological phenotype prediction and the like, and is an important genetic marker for forensic individual identification. The technology of genetic relationship, especially remote genetic relationship presumption, based on SNP is also called as forensic SNP pedigree technology, and high-density SNP typing data information is obtained by adopting a whole genome SNP chip or whole genome resequencing, so that the genetic relationship within 7 grades is presumed through a calculation model.
However, DNA obtained from forensic biological samples is often limited in terms of conditions, and is subject to trace and degradation. The whole genome resequencing SNP analysis cost is high, the demand on DNA sample is large, and the SNP is difficult to be applied to forensic application; the genome-wide SNP chip has high requirement on DNA quality and limited forensic application scenes. Therefore, it is required to screen and construct low-density SNP site combinations that can be used to infer relatedness based on the whole genome SNP analysis results.
Disclosure of Invention
The invention provides a SNP locus combination for presumption of human affinity class and application thereof in presumption of human affinity class, wherein the combination is a low-density SNP locus combination, and is more suitable for forensic sample identification on the basis of meeting the affinity class analysis requirement.
The first aspect of the present invention provides a SNP site combination for predicting a human genetic relationship grade, comprising 9000 SNP sites, and information of the 9000 SNP sites is shown in Table 2.
The second aspect of the present invention provides a primer, probe or gene chip for detecting the above SNP site combination.
The third aspect of the present invention provides the use of a combination of SNP loci as described above in any of the following aspects:
(1) Constructing a DNA chip, capturing and sequencing or other application kits;
(2) Presuming the affinity level;
(3) Genetic analysis of genetic relationship.
The fourth aspect of the present invention provides the use of a primer, probe or gene chip as described above in any one of the following:
(1) Constructing a DNA chip, capturing and sequencing or other application kits;
(2) Presuming the affinity level;
(3) Genetic analysis of genetic relationship.
In a fifth aspect, the present invention provides a method for estimating a relationship level, wherein the relationship level is estimated from the SNP site combination.
In one embodiment, the method comprises the following steps:
collecting DNA of a first individual and a second individual of which the genetic relationship is to be judged;
acquiring typing data of SNP locus combinations of the first individual and the second individual;
when the relationship between the first and second individuals is assumed to be m-rank, a genotype joint likelihood function P of the SNP locus combination of the first and second individuals at m-rank relationship is calculated (P (s 1, s2, s9000, v N )|H m ) And a genotype joint likelihood function P (s 1, s2, s9000, v) N )|H 0 ) M is an integer of 1 to 7, and LR is calculated according to formula 1 m When LR (LR) m When the threshold value t is greater than or equal to the threshold value t, the first individual and the second individual are indicated to have m-level genetic relationship;
when the relatedness of the first and second individuals is not at the presumed level, then a genotype joint likelihood function P of the SNP site combinations of the first and second individuals under the relatedness of 1 to 7 levels, respectively, is calculated (P (s 1, s2, s9000, v N )|H 1~7 ) And a genotype joint likelihood function P (s 1, s2, s9000, v) N )|H 0 ) Calculating LR according to 2 1~7 Taking LR 1~7 The affinity level corresponding to the maximum value of the first and second individuals is the affinity level of the first and second individuals;
in one embodiment, the affinity level is defined as:
when the first individual and the second individual are in direct line, the relationship grade is the meiosis times of the first individual and the second individual; when the first and second individuals are whole siblings, the affinity ranking is 1 minus the sum of the meiosis numbers of the first and second individuals to the common ancestor; when the first and second individuals are hemisiblings, the affinity ranking is the sum of the number of meiosis times of the first and second individuals to their respective common ancestors.
In one embodiment, biological samples of a first individual and a second individual whose relationship is to be determined are collected, and DNA of the first individual and the second individual is obtained from the biological samples.
In one specific embodiment, a primer, a probe or a gene chip for detecting the SNP locus combination is designed, and the primer, the probe or the gene chip is used for detecting the DNA of the first individual and the second individual to acquire the typing data of the SNP locus combination of the first individual and the second individual.
In a specific embodiment, the calculation mode of the genotype joint likelihood function of the SNP locus combination comprises the following steps:
acquiring the frequency of genotypes corresponding to 9000 SNP loci in the crowd;
obtaining the probability of the genetic flow represented by the genetic vector v corresponding to 9000 SNP loci according to the assumed genetic relationship level and the hidden Markov model;
the genotype joint likelihood function for 9000 SNP locus combinations was calculated according to equation 3:
P(s1,s2,...,s9000,v N )=∑P(v N )*P(S N |v N ) 3
In formula 3, P (v N ) Represents probability of gene flow represented by genetic vector v corresponding to the N-th SNP site, P (S) N |v N ) The product of the frequencies of genotypes corresponding to the nth SNP locus in the population is represented.
The invention provides a low-density SNP locus combination, which comprises 9000 SNP locus combinations, and the locus number is suitable for detecting common biological detection materials on site on the basis of meeting the genetic relationship level analysis requirement, so that the forensic scene applicability is strong. Based on the 9000 SNP locus combinations, the invention establishes a genealogy relation inference algorithm based on likelihood ratio, which is used for accurately predicting the internal genealogy relation within 5 grades (including the forensic genealogy, and has the accuracy of 99 percent and no false negative.
Drawings
FIG. 1 is a diagram showing the distribution of SNP locus combinations on a human chromosome according to one embodiment of the invention;
FIG. 2 is a graph showing the distribution of centiMorgan distances and physical distances between 9 kSNPs according to one embodiment of the present invention;
FIG. 3 is a map of MAF for a 9kSNP according to one embodiment of the invention;
FIG. 4a is a LR profile of primary affinity;
FIG. 4b is a LR profile of secondary relationships;
FIG. 4c is a LR profile of three levels of genetic relationship;
FIG. 4d is a LR profile of four levels of relationships;
FIG. 4e is a LR profile of five levels of relationships;
FIG. 4f is a LR profile of six levels of genetic relationship;
FIG. 4g is a distribution plot of LR in seven levels of relationships;
fig. 5 shows ROC curves in true pedigrees distinguishing relatives versus irrelevant pairs based on lgLR.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
EXAMPLE 1 screening of 9kSNP loci
Detecting the whole human genome, and screening to obtain a 9kSNP locus combination by a screening method comprising the following steps:
1.1, preliminary screening: a. detecting SNP loci of the whole genome of a human by using a Wegene GSA chip and a Wegene CGA chip, wherein 69 ten thousand SNPs are detected by the Wegene GSA chip, 71 ten thousand SNPs are detected by the Wegene CGA chip, and 541,756 SNPs are selected as intersection loci; b. x, Y and mtSNPs are deleted, autosomal SNPs are reserved, and 500,753 SNPs are remained; c. selecting a coincident locus with the The Single Nucleotide Polymorphism Database database (dbSNP 151), and remaining 463,744 SNPs; d. selecting bi-alleles, deleting multi-allele loci and remaining 387,026 SNPs; e. removing the sites with other mutations at the same position, and leaving 386,731 SNPs; f. finding intersection sites with thousands of people genome data, and remaining 386,077 SNPs; g. deleting the MAF (Minor Allele Frequency) site of 0 in the east Asia population, and remaining 374,010 SNPs; g. sites with the chip typing detection rate lower than 99.9% are removed, and 311,979 SNPs are remained.
1.2, fine screening: a. setting 9200 targets of the positioning points; b. distributing the number of sites to each chromosome according to the centiMorgan length proportion of the chromosomes; c. dividing the centiMorgan length of the chromosome by the number of allocated sites to obtain the centiMorgan length of the fragment; d. selecting the highest MAF locus in each fragment; e. some fragments do not contain 311,979 sites for preliminary screening, and the target for adjusting the set point number is 9000; f. each fragment is selected to have the highest MAF site, and the fragments without the primary screening site are not selected; g. 9000 sites with larger MAF were selected from the selected sites.
The final number of sites was 9000, the number of SNP sites allocated per chromosome and the distribution on human chromosomes are shown in Table 1 and FIG. 1, and the positions of the vacancies are shown in FIG. 1, mainly because the sites of the chip design do not contain this region, and possible reasons are: the region is a centromere region; the region sites are substantially free of frequency information, etc. The IDs of all SNP sites are shown in Table 2.
By combining selected SNP loci, mapping on autosomes according to their centimorgan distance and physical distance, and drawing a histogram according to the number of mole distances between every two and the physical distance statistics, as shown in FIG. 2, the selected loci are substantially uniformly distributed over the centimorgan distance and are also more dispersed over the physical distance.
MAF values were calculated for the designed SNP site combinations and histograms were plotted, as shown in FIG. 3, with MAF values approaching 50% for most points containing more frequency information, which can help distinguish between sample pairs that are related.
TABLE 1 SNP site information assigned per chromosome
Chromosome of the human body Li mo (Chinese character) Duty ratio of Number of regional sites Final number of bits
1 292.77157 0.080864 744 717
2 274.21712 0.07574 697 689
3 227.849039 0.062933 579 577
4 219.797643 0.060709 559 557
5 208.955208 0.057714 531 531
6 198.241916 0.054755 504 504
7 190.378338 0.052583 484 483
8 178.147677 0.049205 453 435
9 180.275973 0.049793 458 422
10 182.455785 0.050395 464 459
11 161.850521 0.044704 411 408
12 174.962081 0.048325 445 432
13 129.577583 0.03579 329 329
14 116.764941 0.032251 297 297
15 150.765135 0.041642 383 313
16 131.108411 0.036213 333 330
17 128.534337 0.035502 326 324
18 120.076088 0.033165 305 304
19 106.849993 0.029512 271 268
20 110.205396 0.030439 280 280
21 63.751435 0.017608 162 161
22 72.986753 0.020159 185 180
Totalizing 3620.522943 1 9200 9000
TABLE 2 9kSNP site combination
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
In Table 2, chr represents the number of the chromosome in which the SNP site is located, pos represents the position of the chromosome in which the SNP site is located, and id represents the identification number of the SNP site.
Example 2 evaluation of accuracy of the calculation of the affinity class based on 9kSNP
2.1, DNA extraction and detection of actual genetic relationship sample set
Saliva samples of 304 members of 7 volunteer families in China were collected, which contained 4525 vs. 1-7-level relatives (244 vs. Paternity (PO), 131 vs. isotactic (FS), 333 vs. 2 (nd), 439 vs. 3 (rd), 602 vs. 4 (4 th), 915 vs. 5 (5 th), 976 vs. 6 (6 th), 885 vs. 7 (7 th)), and 25270 vs. no-relationship (UN). All participants signed informed consent and were approved by the public security material evidence identification center ethical committee (number: 2022-017).
Saliva DNA was extracted and the concentration and purity of DNA was measured using a NanoDrop 2000c ultra micro spectrophotometer. And then, carrying out typing detection on 9kSNP loci of all DNA samples, and calculating the genetic relationship grade after carrying out quality control on the data.
2.2, affinity level calculation
Assuming the genetic relationship class b, determining the probability of the genetic flow represented by the genetic vector v corresponding to the 1st SNP locus, wherein the specific calculation method is represented as 2 -2b The method comprises the steps of carrying out a first treatment on the surface of the Taking the parent-child relationship as an example, two individuals are in a direct relationship, the number of meiosis is 1, and therefore the relationship level is 1, the probability of the genetic flow represented by the genetic vector v corresponding to the 1st SNP site is 2 -2
Site frequency of east Asian population in third phase of 1000genome Project database according to the thousand genome Project (database source)The website is as follows: http:// ftp.1000genome.ebi.ac.uk/vol 1/ftp/data_collections/gambian_genome_variation_project/data /), and determining the occurrence frequency of genotypes corresponding to the 1st SNP locus in the east Asian crowd, namely, the frequency is fn; for example, taking a first individual and a second individual in paternity as examples, where the genotype corresponding to the first individual is { A, T }, the genotype corresponding to the second individual is { A, C }, the frequencies of A, C, T in the population are f A 、f T And f C
Calculating probability P (S1) =2 corresponding to the 1st SNP site according to 3 -2 ×f A 2 ×f T ×f C
Similarly, when calculating the probability corresponding to the 2nd SNP site, the probability of the genetic vector transfer state of the N-th SNP site depends on the state of the N-1 st SNP site, i.e., P (v) N ) Depending on the genetic vector v N-1 Therefore, the hidden Markov model is introduced to calculate the probability of the genetic flow represented by the genetic vector v corresponding to the subsequent SNP locus, the specific input data and model parameters can be set according to the conventional technical means in the field, and the algorithm model can refer to the literature: idury, R.M.&Elston,R.C.A faster and more general hidden Markov model algorithm for multipoint likelihood calculations.Hum.Hered.47,197-202(1997)。
By adjusting the state of the affinity level, the genotype joint likelihood function of two individuals under m-level affinity level and the genotype joint likelihood function of two individuals under no affinity level can be calculated, and the genotype joint likelihood function are calculated according to the formula shown in the formula 1 to obtain an LR value.
The LR for other affinity classes can be calculated using the same method.
It can be understood that the volunteer sample collected by the invention is an east asian crowd, the occurrence frequency of the genotype corresponding to the SNP locus in the east asian crowd is determined, and when the crowd to be targeted changes, the corresponding genotype frequency database also needs to be changed correspondingly, and can be confirmed according to the conventional technical means in the field.
2.3, the affinity class presumes the accuracy under the known scene
When the affinity assumption of two individuals is known (affinity level assumption is known to mean that there is one possible assumed level of affinity level of the first individual and the second individual in the forensic authentication process): according to the known genetic relationship, making an assumption to obtain a corresponding lgLR, drawing an LR distribution diagram of the corresponding genetic relationship by taking the lgLR as an abscissa and taking a distribution condition as an ordinate, wherein the lgLR distribution diagram is not overlapped within 4 levels of genetic relationship as shown in fig. 4 a-4 g, and the 9k site combination is indicated to be capable of distinguishing the genetic relationship within 4 levels from unrelated individuals; starting from level 5, there is little overlap in the lgLR distributions and as the affinity level increases, the lgLR values for unrelated individuals gradually increase and those for related individuals gradually decrease, resulting in a larger overlap region.
Based on the distribution of the related lgLR values of different grades, parameters such as sensitivity, specificity, system efficiency, positive predictive value, negative predictive value, false positive rate, false negative rate and error rate are counted to evaluate the related deduction efficiency of the 9k locus set based on the LR method. All probability-based forensic methods must set a suitable threshold t for balancing the false positive rate and the false negative rate. In affinity inference, false positives will incorporate unrelated individuals into the results, while false negatives exclude individuals who are in affinity, and potentially incorporate unrelated individuals into the results in affinity.
The invention adopts a single threshold method to balance the reliability of the result.
Sensitivity refers to all relation pairs corresponding to related investigation affinities, and a prediction result is also related relation pairs, and the proportion is occupied; sensitivity=a/(a+c).
Specificity refers to investigation of all relationship pairs with the relationship of irrelevant, and the predicted result is the relationship pair with the relationship of irrelevant, and the ratio is occupied; specificity = D/(b+d).
False negative rate refers to all relation pairs corresponding to related investigation relationships, and the prediction result is an unrelated relation pair, and the proportion is occupied; false negative rate = C/(a+c).
False positive rate refers to investigation of all relationship pairs with the relationship of "irrelevant", and the predicted result is the relationship pair with the relationship of "relevant", and the proportion is occupied; false positive rate = B/(b+d).
The positive predictive value refers to all relation pairs of which the predictive result is "related", and the investigation relatives are relation pairs of which the relationship is "related", and the proportion is occupied; positive predictive value = a/(a+b).
Negative predictive value refers to all relation pairs with predictive result of irrelevant, and investigation of relation pairs with relativity of irrelevant, and the proportion of the relation pairs is occupied; negative predictive value = D/(c+d).
Error rate refers to the proportion of all relationship pairs corresponding to investigation affinities, investigation affinities being relationship pairs predicted as "related" and investigation affinities being relationship pairs predicted as "related" by "unrelated"; error rate= (b+c)/(a+b+c+d).
The system efficiency refers to all relation pairs corresponding to investigation affinities, wherein the investigation affinity is predicted to be related and the investigation affinity is not related; system performance= (a+d)/(a+b+c+d).
TABLE 3 relatedness inference efficacy of 9 k-family SNP on putative known relatedness
As shown in table 3, when t=0, the relationship within 3 levels deduces that the system performance reaches 100%, and the error rate is 0; the genetic relationship between the 4 level and the 6 level deduces that the system efficiency is 99.90%, 97.82% and 91.05% respectively, and the error rate is 0.10%, 2.18% and 8.95% respectively; the genetic inference performance of grade 7 was the lowest (82.16%) and the error rate was also higher (17.84%). From level 4 to level 7, as the threshold t increases, the system performance increases and the error rate decreases. Therefore, when the threshold t=0, the 9k locus set can accurately infer the affinity of less than or equal to 4 levels based on the LR method; when the threshold t=2, the 9k locus set can accurately infer the affinity of less than or equal to 5 levels based on the LR method, and can also provide a reference for the inference of 6 levels of affinities. The false negative rate of class 6 is high (> 10%) and requires careful discrimination at the time of application.
2.4 evaluation of AUC values
For ease of calculation, subject work characteristic curves (Receiver Operating Characteristic, ROC) of different class affinity pairs versus irrelevant pairs are analyzed based on lgLR values of all relationship pairs. Area under ROC curve (Area Under ROC Curve, AUC) values were used to evaluate the efficacy of the 9k locus for relationship rank inference:
m is the number of positive samples; n is the number of negative samples. The AUC value is generally between 0.5 and 1, and the larger the AUC value is, the better the classification effect of the classifier is.
AUC calculation is performed on the result in 2.3, and the calculated results are shown in fig. 5, wherein AUC values of 1st, 2nd, 3rd and 4th affinity grades are 1, which indicates that the affinity deducing efficacy of the 9k locus set within 4 grades reaches the maximum; in addition, AUCs for 5th and 6th were 0.997 and 0.961, respectively, indicating that the 9k locus sets were also higher in the relatedness inferred efficacy at 5-and 6-orders.
2.5 accuracy in a scene with unknown affinity class
When the affinities of the two individuals are unknown, the affinity level at the time of taking the LR maximum value is assumed to be the final predicted affinity level. The predicted affinity class of the pairwise relationship pairs of 304 individuals is compared with the actual investigated affinity class, thereby assessing the inferred efficacy of the 9k family SNPs for unknown affinities. The predicted affinity class and survey affinity class for the pairwise relationship pairs of 304 individuals are shown in table 4, and the indices for evaluating affinity inference performance, such as absolute Accuracy (AC), confidence Interval Accuracy (CIA), prediction Reliability (PR), and False Negatives (FN) and False Positives (FP), are counted.
Wherein absolute Accuracy (AC) =relationship pairs in which predicted affinity results agree with survey affinity results in this class/all survey affinity pairs in this class;
confidence interval accuracy (Confidence interval accuracy, CIA) =the relationship pairs of the survey affinity ±1 level for the predicted affinity in this level/all survey affinity pairs of this level;
prediction reliability (Predicted reliability, PR) =the relationship pair of which the predicted affinity result of this rank is "relevant"/all the predicted affinity pairs of this rank;
false Negative (FN) =relationship pair with the predicted relationship result of this class "irrelevant"/all investigation relationship pairs of this class;
false Positive (FP) =relationship pair with predicted affinity of "related" for relationship pair with survey affinity of "unrelated".
TABLE 4 genetic inference efficacy of 9 k-family SNPs for unknown genetic relationship
The results show that the prediction reliability within the genetic relationship level 4 is 100%, and no false negative and false positive exist; in addition, the prediction reliability of 5 stages reaches 99.77%, and the false negative rate and the false positive rate are low; in the investigation relationship within 4 levels, the confidence interval accuracy is higher than 99.70%, and the confidence interval accuracy of the investigation relationship of 5 levels also reaches 96%; starting from level 6, as the affinity level increases, the false negative and false positive rates gradually increase, and the absolute accuracy, confidence interval accuracy, and prediction reliability gradually decrease. Therefore, for the situation that the affinity level is unknown, the method can also realize accurate prediction of the affinity less than or equal to 5 levels.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The SNP locus combination for presuming the human genetic relationship grade is characterized by comprising 9000 SNP loci, wherein information of the 9000 SNP loci is specifically as follows:
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
wherein chr represents the number of the chromosome where the SNP site is located, pos represents the position of the chromosome where the SNP site is located, and id represents the identification number of the SNP site.
2. A primer, probe or gene chip for detecting the SNP locus combination according to claim 1.
3. Use of the SNP site combination of claim 1 in any of the following aspects:
(1) Constructing a DNA chip, capturing and sequencing or other application kits;
(2) Presuming the affinity level;
(3) Genetic analysis of genetic relationship.
4. Use of the primer, probe or gene chip of claim 2 in any of the following:
(1) Constructing a DNA chip, capturing and sequencing or other application kits;
(2) Presuming the affinity level;
(3) Genetic analysis of genetic relationship.
5. A method for estimating a relationship level, wherein the relationship level is estimated from the SNP site combination according to claim 1.
6. The method according to claim 5, comprising the steps of:
collecting DNA of a first individual and a second individual of which the genetic relationship is to be judged;
acquiring typing data of SNP locus combinations of the first individual and the second individual;
when the relationship between the first and second individuals is assumed to be m-rank, a genotype joint likelihood function P of the SNP locus combination of the first and second individuals at m-rank relationship is calculated (P (s 1, s2, s9000, v N )|H m ) Genotype joint likelihood function P combined with SNP locus of first individual and second individual under no relationshipP(s1,s2,...,s9000,v N )|H 0 ) M is an integer of 1 to 7, and LR is calculated according to formula 1 m When LR (LR) m When the threshold value t is greater than or equal to the threshold value t, the first individual and the second individual are indicated to have m-level genetic relationship;
when the relatedness of the first and second individuals is not at the presumed level, then a genotype joint likelihood function P of the SNP site combinations of the first and second individuals under the relatedness of 1 to 7 levels, respectively, is calculated (P (s 1, s2, s9000, v N )|H 1~7 ) And a genotype joint likelihood function P (s 1, s2, s9000, v) N )|H 0 ) Calculating LR according to 2 1~7 Taking LR 1~7 The affinity level corresponding to the maximum value of the first and second individuals is the affinity level of the first and second individuals;
7. the method of claim 6, wherein the affinity class is defined as:
when the first individual and the second individual are in direct line, the relationship grade is the meiosis times of the first individual and the second individual; when the first and second individuals are whole siblings, the affinity ranking is 1 minus the sum of the meiosis numbers of the first and second individuals to the common ancestor; when the first and second individuals are hemisiblings, the affinity ranking is the sum of the number of meiosis times of the first and second individuals to their respective common ancestors.
8. The method according to claim 6, wherein biological samples of the first and second individuals to be determined for relationship are collected, and DNA of the first and second individuals is obtained from the biological samples.
9. The method according to claim 6, wherein a primer, a probe or a gene chip for detecting the SNP site combination is designed, and the DNA of the first and second individuals is detected using the primer, the probe or the gene chip, and the genotyping data of the SNP site combination of the first and second individuals is obtained.
10. The method according to claim 6, wherein the calculation method of the genotype joint likelihood function of the SNP locus combination comprises the steps of:
acquiring the frequency of genotypes corresponding to 9000 SNP loci in the crowd;
obtaining the probability of the genetic flow represented by the genetic vector v corresponding to 9000 SNP loci according to the assumed genetic relationship level and the hidden Markov model;
the genotype joint likelihood function for 9000 SNP locus combinations was calculated according to equation 3:
P(s1,s2,...,s9000,v N )=∑P(v N )*P(S N |v N ) 3
In formula 3, P (v N ) Represents probability of gene flow represented by genetic vector v corresponding to the N-th SNP site, P (S) N |v N ) The product of the frequencies of genotypes corresponding to the nth SNP locus in the population is represented.
CN202310586302.0A 2023-05-23 SNP locus combination for presuming human genetic relationship grade and application thereof Active CN117524308B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310586302.0A CN117524308B (en) 2023-05-23 SNP locus combination for presuming human genetic relationship grade and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310586302.0A CN117524308B (en) 2023-05-23 SNP locus combination for presuming human genetic relationship grade and application thereof

Publications (2)

Publication Number Publication Date
CN117524308A true CN117524308A (en) 2024-02-06
CN117524308B CN117524308B (en) 2024-08-02

Family

ID=

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108998507A (en) * 2018-07-24 2018-12-14 陈梦麟 A kind of noninvasive high-flux detection method applied to crowd's complexity Relationship iden- tification
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN111748637A (en) * 2020-07-23 2020-10-09 中国人民解放军军事科学院军事医学研究院 SNP molecular marker combination, multiplex composite amplification primer set, kit and method for genetic relationship analysis and identification
CN113584178A (en) * 2020-04-30 2021-11-02 深圳华大法医科技有限公司 Noninvasive paternity testing analysis method and device
WO2022087839A1 (en) * 2020-10-27 2022-05-05 深圳华大基因股份有限公司 Non-invasive prenatal genetic testing data-based kinship determining method and apparatus
CN115565604A (en) * 2022-08-18 2023-01-03 武汉蓝沙医学检验实验室有限公司 SNP-based genetic relationship identification method
CN116052766A (en) * 2023-01-09 2023-05-02 北京交通大学 Detection method and system for chromosome homozygous region and electronic equipment
CN116083605A (en) * 2023-03-09 2023-05-09 四川大学 Genetic marker system containing 67 high-efficiency autosomal micro haplotypes and detection primer and application thereof
CN116769924A (en) * 2023-04-24 2023-09-19 河南省人民医院 Detection kit comprising 90 short fragment micro-haplotype sites and application thereof
CN116926208A (en) * 2023-08-09 2023-10-24 中国人民解放军军事科学院军事医学研究院 Molecular marker combination, primer group, kit and analysis method for complex genetic relationship analysis

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108998507A (en) * 2018-07-24 2018-12-14 陈梦麟 A kind of noninvasive high-flux detection method applied to crowd's complexity Relationship iden- tification
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN113584178A (en) * 2020-04-30 2021-11-02 深圳华大法医科技有限公司 Noninvasive paternity testing analysis method and device
CN111748637A (en) * 2020-07-23 2020-10-09 中国人民解放军军事科学院军事医学研究院 SNP molecular marker combination, multiplex composite amplification primer set, kit and method for genetic relationship analysis and identification
WO2022087839A1 (en) * 2020-10-27 2022-05-05 深圳华大基因股份有限公司 Non-invasive prenatal genetic testing data-based kinship determining method and apparatus
CN115565604A (en) * 2022-08-18 2023-01-03 武汉蓝沙医学检验实验室有限公司 SNP-based genetic relationship identification method
CN116052766A (en) * 2023-01-09 2023-05-02 北京交通大学 Detection method and system for chromosome homozygous region and electronic equipment
CN116083605A (en) * 2023-03-09 2023-05-09 四川大学 Genetic marker system containing 67 high-efficiency autosomal micro haplotypes and detection primer and application thereof
CN116769924A (en) * 2023-04-24 2023-09-19 河南省人民医院 Detection kit comprising 90 short fragment micro-haplotype sites and application thereof
CN116926208A (en) * 2023-08-09 2023-10-24 中国人民解放军军事科学院军事医学研究院 Molecular marker combination, primer group, kit and analysis method for complex genetic relationship analysis

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHO 等: "Set up of cutoff thresholds for kinship determination using SNP loci", FORENSIC SCIENCE INTERNATIONAL: GENETICS, vol. 29, 8 March 2017 (2017-03-08), pages 1 - 8, XP085071964, DOI: 10.1016/j.fsigen.2017.03.009 *
张文杰 等: "对未知关系个体进行系谱推断的自动分析***", 生命科学研究, vol. 26, no. 6, 30 December 2022 (2022-12-30), pages 515 - 521 *
杨澜 等: "SNP 芯片数据系谱推断技术用于微量DNA 检测研究", 刑事技术, 10 April 2024 (2024-04-10), pages 1 - 10 *
谢全 等: "基于全基因组重测序的法医SNP 系谱推断研究", 生物化学与生物物理进展, 29 March 2023 (2023-03-29), pages 1 - 14 *

Similar Documents

Publication Publication Date Title
JP7164125B2 (en) Quality control templates to ensure validity of sequencing-based assays
US20190316209A1 (en) Multi-Assay Prediction Model for Cancer Detection
US11961589B2 (en) Models for targeted sequencing
Cole et al. Controlling false-negative errors in microarray differential expression analysis: a PRIM approach
KR20240014606A (en) Methods and processes for non-invasive assessment of genetic variations
Hanssen et al. Optimizing body fluid recognition from microbial taxonomic profiles
WO2019222757A1 (en) Inferring selection in white blood cell matched cell-free dna variants and/or in rna variants
CN106460045B (en) Common copy number variation of human genome for risk assessment of susceptibility to cancer
Snedecor et al. Fast and accurate kinship estimation using sparse SNPs in relatively large database searches
CN115394357B (en) Site combination for judging sample pairing or pollution and screening method and application thereof
CN114530198A (en) Screening method of SNP (single nucleotide polymorphism) sites for detecting sample pollution level and detection method of sample pollution level
US20180225413A1 (en) Base Coverage Normalization and Use Thereof in Detecting Copy Number Variation
EP4193362A1 (en) Detecting cross-contamination in sequencing data
CN117524308B (en) SNP locus combination for presuming human genetic relationship grade and application thereof
TWI781230B (en) Method, system and computer product using site-specific noise model for targeted sequencing
US20230090925A1 (en) Methylation fragment probabilistic noise model with noisy region filtration
CN117524308A (en) SNP locus combination for presuming human genetic relationship grade and application thereof
JP2022534236A (en) A method for discovering a marker for predicting depression or suicide risk using multiple omics analysis, a marker for predicting depression or suicide risk, and a method for predicting depression or suicide risk using multiple omics analysis
CN116469552A (en) Method and system for breast cancer polygene genetic risk assessment
US20200105374A1 (en) Mixture model for targeted sequencing
JP2022513946A (en) Identification of comprehensive sequence features in whole-genome sequence data from circulating nucleic acids
CN116622827A (en) Biomarkers for ASD and uses thereof
CN116209777A (en) Genetic relationship judging method and device based on noninvasive prenatal gene detection data
KR20240054201A (en) Method and apparatus for predicting risk of disease
CN116343902A (en) Method and system for complex disease polygenic genetic risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant