CN116209777A - Genetic relationship judging method and device based on noninvasive prenatal gene detection data - Google Patents

Genetic relationship judging method and device based on noninvasive prenatal gene detection data Download PDF

Info

Publication number
CN116209777A
CN116209777A CN202080104999.8A CN202080104999A CN116209777A CN 116209777 A CN116209777 A CN 116209777A CN 202080104999 A CN202080104999 A CN 202080104999A CN 116209777 A CN116209777 A CN 116209777A
Authority
CN
China
Prior art keywords
child
genetic
affinity
mother
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080104999.8A
Other languages
Chinese (zh)
Inventor
黄树嘉
李志超
蒋晓森
金鑫
尹烨
王洪琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huada Forensic Technology Co ltd
BGI Shenzhen Co Ltd
Original Assignee
Shenzhen Huada Forensic Technology Co ltd
BGI Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huada Forensic Technology Co ltd, BGI Shenzhen Co Ltd filed Critical Shenzhen Huada Forensic Technology Co ltd
Publication of CN116209777A publication Critical patent/CN116209777A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

A genetic relationship judging method and device based on noninvasive prenatal gene detection data, the method comprises the following steps: comparing the whole genome sequencing data of the child to be detected to a maternal noninvasive prenatal gene detection sequencing database; respectively extracting the credible base sets of each potential mother and child to be detected on the appointed site set; calculating the genetic similarity between the child to be detected and each potential mother based on the trusted base set; calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix; and determining the exact relationship between the child to be detected and the potential mother according to the relationship probability matrix. Genetic relationship information in noninvasive prenatal gene detection data is mined through comparison of the noninvasive prenatal gene detection data set and gene sequences obtained through whole gene sequencing of children.

Description

Genetic relationship judging method and device based on noninvasive prenatal gene detection data Technical Field
The invention relates to the technical field of paternity test, in particular to a genetic relationship judging method and device based on noninvasive prenatal gene detection data.
Background
At present, judicial paternity judgment is mainly applied to two scenes, namely paternity identification and recovery of lost children. The two methods used in judicial are identical, mainly by typing short tandem repeats (short tandem repeat, STR) to determine paternity.
Specifically, the STR typing method detects 13 or more than 13 specific autosomal STR loci first, if necessary, the loci of Y chromosome, X chromosome and mitochondrial DNA need to be detected continuously, and then the cumulative genetic index (CPI) is calculated, so as to finally obtain a detection conclusion. Although STR is widely used for paternity test, this method has a defect in practical use. First, the high mutation rate of STR in the genetic process tends to result in a situation where the paternity cannot be determined or the paternity is erroneously determined. Secondly, for highly degraded test materials, capillary electrophoresis based STR assays may not yield complete typing results for all loci, requiring test material DNA lengths above 150bp even with the use of miniSTR kits. Finally, STR typing methods are often limited to paternity testing of diads or triplets and are not suitable for searching for corresponding relationships among people.
The judgment of the relatedness of people is the most important aspect of lost children to retrieve families, but the quantity and the range of people contained in an STR database of a judicial institution are limited to a certain extent. For families which have not entered STR data at the judicial authorities, if a child gets lost, the child cannot find his home for the child due to the lack of data even if the child is finally retrieved by the public security authorities.
In addition to the short tandem repeat method, there are also new identification techniques that detect Single Nucleotide Polymorphism (SNP) sites of samples using the second generation high throughput sequencing technology (NGS) and then obtain paternity by alignment. Although this aspect is more comprehensive, the method is not widely popularized due to higher cost, so that the original data is seriously accumulated and is difficult to play a role in parent-child recovery.
Currently, noninvasive prenatal gene detection (Noninvasive prenatal testing, NIPT) is becoming more and more mature clinically and is being widely popularized in large cities and areas throughout the country. NIPT technology, also known as NIPS (noninvasive prenatal screening), is a method of detecting the likelihood of a fetus having a genetic disorder. The technology is used for detecting whether the fetus has chromosome aneuploidy variation by collecting peripheral blood of pregnant women, extracting free DNA and adopting a high-throughput sequencing technology and combining with bioinformatic analysis. At present, over nine million pregnant women in China have performed the detection, and the coverage of people is wide and is growing. This data contains both maternal and small fetal DNA information. In principle, it will be possible to assist the jurisdiction in finding home for lost children based on such data. The data has two obvious advantages, namely, the crowd is accurate, the pregnant mother carries out the detection, namely, the family with the child can play a larger role in lost child recovery compared with other irrelevant groups; secondly, the detection data can continuously grow in the whole country, which lays a continuous data foundation for the application of the detection data in the family recovery of lost children. There is no clear way how this data can be applied to this aspect.
Whole genome sequencing (Whole genome sequencing, WGS) is to sequence all genes in the genome of an individual organism by using a high throughput sequencing platform and determine the base sequence of the DNA. The technology can be used for detecting mutation information of a Single Nucleotide Variation (SNV), inDel (InDel), copy Number Variation (CNV), structural Variation (SV) and the like on the whole genome level.
Disclosure of Invention
The invention aims to provide a genetic relationship judging method and device based on noninvasive prenatal gene detection (NIPT) data, which fully excavates genetic relationship information in the NIPT data by comparing a NIPT data set with a gene sequence obtained by whole gene sequencing of children so as to achieve the aim of carrying out genetic relationship judgment in NIPT detection crowds efficiently.
According to a first aspect of the present invention, there is provided a genetic relationship determination method based on noninvasive prenatal gene detection data, comprising:
comparing the whole genome sequencing data of the child to be detected with a non-invasive prenatal gene detection sequencing database of the mother, wherein the database contains the gene sequencing data of a plurality of potential mothers;
respectively extracting the credible base sets of each potential mother and child to be detected on the appointed site set;
calculating the genetic similarity between the child to be detected and each potential mother based on the trusted base set;
calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix;
and determining the exact relationship between the child to be detected and the potential mother according to the relationship probability matrix.
In a preferred embodiment, the whole genome sequencing data is 3X in depth.
In a preferred embodiment, the maternal gene sequencing data is sequenced to a depth of 0.08X.
In a preferred embodiment, the set of designated sites includes dibasic multiple polymorphism sites having a base alignment quality value higher than a first preset value and a base quality value higher than a second preset value and located in a database of genetic polymorphisms.
In a preferred embodiment, the first preset value is 30, the second preset value is 20, and the gene polymorphism database is a million chinese gene polymorphism database.
In a preferred embodiment, the set of trusted bases comprises a plurality of trusted bases at a specified set of sites, the trusted bases being the most read-long bases of the sequencing that cover the specified site.
In a preferred embodiment, the above genetic similarity is calculated by the following formula:
Figure PCTCN2020124079-APPB-000001
wherein n represents the total number of the two-base polymorphism sites, i represents the number of the child to be detected, j represents the number of the mother in the non-invasive prenatal gene detection sequencing database of the mother, and d s Representing the genetic distance of the locus, PE s Represents the probability of excluding the difference between the mother and child bases at the two-base polymorphism site s, p s Represents the frequency of one of the two genotypes of the two nucleotide polymorphism sites s in the population.
In the above formula, "identical" means that child i and the jth mother have the same base at site s, "different" means that child i and the jth mother have different bases at site s, and "uncovered" means that no sequence information is detected at site s in the sequencing data of child i and the jth mother.
In a preferred embodiment, the above-mentioned affinity probability is calculated by the following formula:
Figure PCTCN2020124079-APPB-000002
wherein p represents the affinity probability of the child to the mother, g mean Is the average value of the genetic similarity between the child and all potential mothers, std is the standard deviation of the genetic similarity between the child and all potential mothers, N (0, 1) is the standard normal distribution with the average value of 0 and the standard deviation of 1, and Z g Representing the value after normalization of genetic similarity, N (0, 1). Cdf (Z) g ) Representing Z g Probability values obtained on a standard normal distribution.
In a preferred embodiment, the above method further comprises:
and moving mother samples with low mother-child relationship specificity out of the affinity probability matrix to obtain an adjusted affinity probability matrix, and judging the exact affinity between the child to be detected and the potential mother according to the adjusted affinity probability matrix, wherein the low mother-child relationship specificity means that the genetic similarity of the mother samples and all the children to be detected is higher than a similarity threshold value.
In a preferred embodiment, the similarity threshold is 0.9 or more.
In a preferred embodiment, the determining the exact relationship between the child to be examined and the potential mother with the adjusted affinity probability matrix includes: and judging that the parent-child combination with the affinity probability larger than the affinity probability threshold value has affinity.
In a preferred embodiment, the threshold of affinity probability is 0.99 or more.
According to a second aspect of the present invention, there is provided a genetic relationship determination apparatus based on noninvasive prenatal gene detection data, comprising:
the data acquisition unit is used for comparing the whole genome sequencing data of the child to be detected to a maternal noninvasive prenatal gene detection sequencing database, wherein the database contains the gene sequencing data of a plurality of potential mothers;
the trusted base extraction unit is used for respectively extracting the trusted base set of each potential mother and child to be detected on the designated site set;
a genetic similarity calculation unit for calculating the genetic similarity between the child to be inspected and each potential mother based on the above-mentioned trusted base set;
the genetic probability calculation unit is used for calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix;
and the affinity judging unit is used for judging the exact affinity between the child to be detected and the potential mother according to the affinity probability matrix.
According to a third aspect of the present invention there is provided a computer readable storage medium comprising a program executable by a processor to implement a method as in the first aspect.
According to the genetic relationship judging method, genetic relationship information in NIPT data is fully mined through comparison of the NIPT data set and the genetic sequences obtained through whole gene sequencing of children, so that missing children are helped to find parents, the problem that the current lost children cannot locate their families after being found by public security authorities is solved, and the value of the NIPT data in judicial aspects is expanded.
Drawings
FIG. 1 is a flow chart of a genetic relationship determination method based on noninvasive prenatal gene detection data in an embodiment of the present invention;
FIG. 2 is an exemplary flow chart of a genetic relationship determination method based on noninvasive prenatal gene detection data in an embodiment of the present invention;
FIG. 3 is a block diagram showing the construction of a genetic relationship determination device based on noninvasive prenatal gene detection data according to an embodiment of the present invention;
FIG. 4 is a graph of the results of the calculated affinity probability matrix in one embodiment of the invention;
FIG. 5 is a graph of the results of the calculated affinity probability matrix in another embodiment of the present invention;
FIG. 6 is a graph of the results of the adjusted affinity probability matrix according to another embodiment of the present invention.
Detailed Description
The invention will be described in further detail below with reference to the drawings by means of specific embodiments. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present invention. However, one skilled in the art will readily recognize that some of the features may be omitted in various situations, or replaced by other materials, methods.
Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.
As shown in fig. 1, the embodiment of the invention provides a genetic relationship determination method based on noninvasive prenatal gene detection data, which comprises the following steps:
s110: comparing the whole genome sequencing data of the child to be detected with a non-invasive prenatal gene detection sequencing database of the mother, wherein the database contains the gene sequencing data of a plurality of potential mothers.
According to the genetic relationship judging method, genetic relationship information in the NIPT data is fully mined by utilizing noninvasive prenatal gene detection (NIPT) data and comparing the genetic sequence obtained by Whole Genome Sequencing (WGS) of children by means of the existing maternal NIPT database, so that the genetic relationship between the NIPT data and the lost children (namely the children to be detected) is determined, the affinity searching screening range can be narrowed, and the judicial authorities are assisted in helping the lost children to complete affinity searching.
The genetic relationship determination method of the present invention can perform genetic relationship determination with Whole Genome Sequencing (WGS) data (sequencing depth may be as low as 3X) using NIPT data of ultra-low sequencing depth (for example, sequencing depth may be as low as 0.08X).
In the embodiment of the invention, the maternal noninvasive prenatal gene detection sequencing database contains the gene sequencing data of a plurality of potential mothers. The potential mother had performed a noninvasive prenatal gene test during pregnancy and stored its sequencing data in a database for females.
As shown in fig. 2, the 3X whole genome sequencing data of the child can be obtained by data quality control (e.g., filtering using soap nuke software), alignment (e.g., alignment using BWA, edico, etc.), and an alignment file of maternal NIPT data and whole genome sequencing data of the child.
S120: the credible base sets of each potential mother and child to be detected on the appointed site set are extracted respectively.
In the embodiment of the invention, the designated site set comprises dibasic multiple polymorphism sites, wherein the base comparison quality value is higher than a first preset value, the base quality value is higher than a second preset value and the dibasic multiple polymorphism sites are positioned in a gene polymorphism database. For example, in one embodiment of the invention, the set of designated sites includes dibasic multiple polymorphic sites with base alignment quality values above 30, base quality values above 20, and located in the million chinese gene polymorphism database (CMDB).
In the embodiment of the invention, the appointed site set is extracted according to the standard, and then the highly-trusted base of each site is extracted to form a trusted base set (called Germbase). In one embodiment of the invention, the set of trusted bases is extracted in terms of the number of sequencing reads (reads) overlaid on a specified site. In detail, sequencing reads (reads) are ranked according to the number of sequencing reads covered by different bases at a given site, with the most reliable base covered by the sequencing read (reads). However, if the base covered by the sequencing read length has ALT (a base other than the reference base, i.e., a base that is mutated to a base different from the reference sequence base), and the number of sequencing reads (reads) for ALT is greater than 3, ALT is noted. If there are a plurality of ALTs, the sequencing read length (reads) is the most.
As shown in fig. 2, by this step, the trusted base set of the child to be detected on the designated site set (child gemmbie in the figure) and the trusted base set of the mother on the designated site set in the NIPT database (mother gemmbie database in the figure) are obtained, respectively.
S130: the genetic similarity between the child to be examined and each potential mother is calculated based on the set of trusted bases.
In the method, the genetic similarity of the mother and the child and the calculation mode thereof are defined and used for quantitatively describing the relationship between two samples of each potential mother and the child to be detected. And calculating the genetic similarity between the child and all mothers in the database, converting the genetic similarity into the genetic probability, and finally judging the most probable genetic relationship.
According to the genetic law, the base site of any mother and child difference mainly comes from two cases of random mutation or non-genetic relationship of the child genome. From the hadi-weinberg law, it can be inferred that for any one dibasic multiple sex locus i, it is assumed that there are two genotypes a and a, the frequencies of which in the population are p and q, respectively, where q=1-p. If the mother and child bases at the two-base polymorphism site i are different, then the exclusion probability is:
Figure PCTCN2020124079-APPB-000003
in the calculation of the genetic similarity, the accumulated exclusion probability of all sites needs to be calculated, and the genetic similarity between the child and each mother in the database can be finally obtained.
In one embodiment of the invention, the genetic similarity is calculated by the following formula:
Figure PCTCN2020124079-APPB-000004
wherein n represents the total number of the two-base polymorphism sites, i represents the number of the child to be detected, j represents the number of the mother in the non-invasive prenatal gene detection sequencing database of the mother, and d s Representing the genetic distance of the locus, PE s Represents the probability of excluding the difference between the mother and child bases at the two-base polymorphism site s, p s Two genes representing two-base polymorphism sites sFrequency of one genotype among the genotypes across the population.
S140: and calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix.
In one embodiment of the invention, the affinity probability is calculated by the following formula:
Figure PCTCN2020124079-APPB-000005
wherein p represents the affinity probability of the child to the mother, g mean Is the average value of the genetic similarity between the child and all potential mothers, std is the standard deviation of the genetic similarity between the child and all potential mothers, N (0, 1) is the standard normal distribution with the average value of 0 and the standard deviation of 1, and Z g Representing the value after normalization of genetic similarity, N (0, 1). Cdf (Z) g ) Representing Z g Probability values taken over a standard normal distribution, which are defined in the present invention as "affinity probabilities".
Each child, after calculation, gets a set of affinity probability data describing the probability that the child came from a mother. And calculating the affinity probability between the plurality of children to be checked and the plurality of potential mothers, and forming an affinity probability matrix which contains the affinity probability data of each child to be checked and each potential mother.
In some embodiments, maternal samples with low maternal-maternal relationship (M-C) specificity are present in the affinity probability matrix, and these samples are very similar to the genetic similarity of all children to be tested, e.g., the genetic similarity to all children to be tested is above a similarity threshold (e.g., above 0.9). These samples have less effect on genetic judgment. Thus, in one embodiment of the invention, the method of the invention further comprises the steps of:
and (3) moving the mother samples with low mother-child relationship specificity out of the affinity probability matrix to obtain an adjusted affinity probability matrix, and judging the exact affinity relationship between the child to be detected and the potential mother by using the adjusted affinity probability matrix.
S150: and determining the exact relationship between the child to be detected and the potential mother according to the relationship probability matrix.
According to the invention, under the condition that no mother sample with low mother-child relationship specificity exists, the exact relationship between the child to be detected and the potential mother can be judged directly by using the relationship probability matrix obtained by the calculation in the previous step. And under the condition that mother samples with low mother-child relationship specificity exist, determining the exact relationship between the child to be detected and the potential mother by using the adjusted relationship probability matrix. Specifically, in one embodiment of the present invention, a combination of a parent and child whose affinity probability is greater than an affinity probability threshold (e.g., 0.99 or more) is determined to have affinity.
For example, in one embodiment of the invention, NIPT data with lower M-C specificity in the affinity probability matrix is moved to the alternate matrix, and then M-C combinations greater than 0.99 are determined to be the combination with affinity in the remaining affinity probability matrix. If the effective genetic relationship cannot be judged in the samples at the moment, the NIPT sample data in the alternative matrix is further utilized to re-detect the abnormal NIPT sample and re-match the abnormal NIPT sample.
Compared with the traditional tandem repeat Sequence (STR) based method and SNP method, the genetic relationship judging method is more favorable for regional scale popularization, NIPT detection is gradually becoming a clinical pregnancy screening project, the user basis is increasingly large, and the method is also a deep application of NIPT data which is considered to have low secondary application value. In addition, the genetic relationship judging method does not need to be sampled again by the mother, so that the cost is saved. The NIPT product market is rapidly expanded, the accumulation of data volume is facilitated, the application is further promoted, and along with the accumulation of the data volume, the utilization value of the genetic relationship judging method is more important.
Corresponding to the genetic relationship determination method of the present invention, an embodiment of the present invention further provides a genetic relationship determination device based on noninvasive prenatal gene detection data, as shown in fig. 3, including: the data acquisition unit 310 is configured to compare the whole genome sequencing data of the child to be detected to a maternal noninvasive prenatal gene detection sequencing database, where the database contains gene sequencing data of a plurality of potential mothers; a trusted base extraction unit 320, configured to extract a set of trusted bases on the set of designated sites for each potential mother and child to be examined, respectively; a genetic similarity calculation unit 330 for calculating the genetic similarity between the child to be examined and each potential mother based on the set of trusted bases; the genetic probability calculation unit 340 is configured to calculate the genetic probability between the child to be detected and each potential mother according to the genetic similarity, so as to form a genetic probability matrix; the affinity determination unit 350 is configured to determine an exact affinity between the child to be examined and the potential mother according to the affinity probability matrix.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.
Accordingly, in one embodiment of the present invention, a computer-readable storage medium is provided, comprising a program executable by a processor to implement the genetic relationship determination method based on noninvasive prenatal gene detection data of the present invention.
The technical scheme and effects of the present invention are described in detail through the following examples, it being understood that the examples are only exemplary and are not to be construed as limiting the present invention.
The following examples were run in two rounds of testing on 15 families and 15 children-1000 NIFTY data.
Example 1:
samples of this family were randomly drawn from maternal NIFTY (fetal chromosomal abnormality non-invasive prenatal gene detection) data characterizing normal and WGS sequencing data after birth of the child.
(1) Firstly, intercepting comparison files of all mother NIFTY data to a depth of about 0.08X, and intercepting WGS data of a child to a depth of about 3X so as to simulate a use scene.
(2) Germbase extraction
First, high quality sites are extracted. Those base information at the CMDB two base polymorphism site were selected with alignment quality values above 30, base quality values above 20.
Then, a highly trusted base set for each site of the sample, the Germbase file, is extracted. The most authentic base at each site is determined based on the base (Read bases) corresponding to the sequencing reads (reads) covering that site.
(3) Mother matches child (M-C). And calculating the genetic similarity of the mother and the child at the CMDB binary site, and then further calculating the genetic probability matrix according to the genetic similarity matrix formed by the genetic similarity of the Germbase files between M and C.
(4) And judging the affinity according to the affinity probability matrix. As shown in FIG. 4, m1-m15 represent maternal NIPT data, c1-c15 represent child WGS data, and family is one-to-one with the number. The test result shows that the relatives of the 15 families are judged to be correct and meet the expectations.
Example 2
This example expands the sample range, samples were randomly drawn from 1000 normal maternal NIFTY data and WGS data for the mother, child of 15 families in example 1.
(1) Firstly, intercepting comparison files of all mother NIFTY data to a depth of about 0.08X, and intercepting WGS data of a child to 3X so as to fit a use scene.
(2) Germbase extraction
First, high quality sites are extracted. Those base information at the CMDB two base polymorphism site were selected with alignment quality values above 30, base quality values above 20.
Then, a highly trusted base set for each site of the sample, the Germbase file, is extracted. The most authentic base at each site is determined based on the base (Read bases) corresponding to the sequencing reads (reads) covering that site.
(3) Mother matches child (M-C). And calculating the genetic similarity of the mother and the child at the CMDB binary site, and then further calculating the genetic probability matrix according to the genetic similarity matrix formed by the genetic similarity of the Germbase files between M and C.
(4) And judging the affinity according to the affinity probability matrix. As shown in fig. 5, the preliminary affinity probability matrix indicates that the affinity decision probabilities for the 15 families are significantly highest and all decisions are correct. However, some of the 1000 human samples have lower specificity, such as sample No. 810, and the genetic similarity with all the child samples is higher than 0.9. After such samples are culled to the alternative databases, the one-to-one correspondence of m1-15 and c1-c15 can still be correctly determined by using the remaining databases. As shown in fig. 6, a partially representative result is shown.
The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims (14)

  1. A genetic relationship determination method based on noninvasive prenatal gene detection data, the method comprising:
    comparing the whole genome sequencing data of the child to be detected to a maternal noninvasive prenatal gene detection sequencing database, wherein the database contains the gene sequencing data of a plurality of potential mothers;
    respectively extracting the credible base sets of each potential mother and child to be detected on the appointed site set;
    calculating the genetic similarity between the child to be detected and each potential mother based on the trusted base set;
    calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix;
    and determining the exact relationship between the child to be detected and the potential mother according to the relationship probability matrix.
  2. The genetic relationship determination method according to claim 1, wherein the whole genome sequencing data has a sequencing depth of 3X.
  3. The genetic relationship determination method according to claim 1, wherein the maternal gene sequencing data is sequenced to a depth of 0.08X.
  4. The genetic relationship determination method according to claim 1, wherein the designated set of sites comprises dibasic multiple polymorphism sites having a base alignment quality value higher than a first preset value and a base quality value higher than a second preset value and located in a gene polymorphism database.
  5. The genetic relationship determining method according to claim 4, wherein the first preset value is 30, the second preset value is 20, and the gene polymorphism database is a million chinese gene polymorphism database.
  6. The genetic relationship determination method according to claim 1, wherein the set of trusted bases comprises a plurality of trusted bases on a set of designated sites, the trusted bases being bases with the most sequencing reads covered on the designated sites.
  7. The genetic relationship determination method according to claim 1, wherein the genetic similarity is calculated by the following formula:
    Figure PCTCN2020124079-APPB-100001
    wherein n represents the total number of the two-base polymorphism sites, i represents the number of children to be detected, j represents the number of the mother in the non-invasive prenatal gene detection sequencing database of the mother, and d s Representing the genetic distance of the locus, PE s Represents the probability of excluding the difference between the mother and child bases at the two-base polymorphism site s, p s Represents the frequency of one of the two genotypes of the two nucleotide polymorphism sites s in the population.
  8. The affinity determination method according to claim 1, wherein the affinity probability is calculated by the following formula:
    Figure PCTCN2020124079-APPB-100002
    wherein p represents the affinity probability of the child to the mother, g mean Is the average value of the genetic similarity between the child and all potential mothers, std is the standard deviation of the genetic similarity between the child and all potential mothers, N (0, 1) is the standard normal distribution with the average value of 0 and the standard deviation of 1, and Z g Representing the value after normalization of genetic similarity, N (0, 1). Cdf (Z) g ) Representing Z g Probability values obtained on a standard normal distribution.
  9. The affinity determination method according to claim 1, characterized in that the method further comprises:
    and moving mother samples with low mother-child relationship specificity out of the affinity probability matrix to obtain an adjusted affinity probability matrix, and judging the exact affinity between the child to be detected and the potential mother according to the adjusted affinity probability matrix, wherein the low mother-child relationship specificity means that the genetic similarity of the mother samples and all the children to be detected is higher than a similarity threshold.
  10. The affinity determination method according to claim 9, wherein the similarity threshold is 0.9 or more.
  11. The affinity determination method of claim 9, wherein determining the exact affinity between the candidate child and the potential mother with the adjusted affinity probability matrix comprises: and judging that the parent-child combination with the affinity probability larger than the affinity probability threshold value has affinity.
  12. The affinity determination method according to claim 11, wherein the affinity probability threshold is 0.99 or more.
  13. A genetic relationship determination apparatus based on noninvasive prenatal gene detection data, the apparatus comprising:
    the data acquisition unit is used for comparing the whole genome sequencing data of the child to be detected to a maternal noninvasive prenatal gene detection sequencing database, wherein the database contains the gene sequencing data of a plurality of potential mothers;
    the trusted base extraction unit is used for respectively extracting the trusted base set of each potential mother and child to be detected on the designated site set;
    a genetic similarity calculation unit for calculating the genetic similarity between the child to be inspected and each potential mother based on the set of trusted bases;
    the genetic probability calculation unit is used for calculating the genetic probability between the child to be detected and each potential mother according to the genetic similarity to form a genetic probability matrix;
    and the affinity judging unit is used for judging the exact affinity between the child to be detected and the potential mother according to the affinity probability matrix.
  14. A computer readable storage medium comprising a program executable by a processor to implement the method of any one of claims 1 to 12.
CN202080104999.8A 2020-10-27 2020-10-27 Genetic relationship judging method and device based on noninvasive prenatal gene detection data Pending CN116209777A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/124079 WO2022087839A1 (en) 2020-10-27 2020-10-27 Non-invasive prenatal genetic testing data-based kinship determining method and apparatus

Publications (1)

Publication Number Publication Date
CN116209777A true CN116209777A (en) 2023-06-02

Family

ID=81383404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080104999.8A Pending CN116209777A (en) 2020-10-27 2020-10-27 Genetic relationship judging method and device based on noninvasive prenatal gene detection data

Country Status (2)

Country Link
CN (1) CN116209777A (en)
WO (1) WO2022087839A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117524308A (en) * 2023-05-23 2024-02-06 公安部鉴定中心 SNP locus combination for presuming human genetic relationship grade and application thereof

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559427B (en) * 2013-11-12 2017-10-31 高扬 A kind of use Digital ID biological sequence and the method for inferring species affiliation
CN106521017B (en) * 2016-12-30 2019-07-09 中国医学科学院医学生物学研究所 With the method for the duplicate microsatellite identification source of people affiliation of two nucleotide
CN107217095B (en) * 2017-06-15 2021-06-04 广东腾飞基因科技股份有限公司 Multiple PCR primer set for human paternity test and detection method
CN109207600A (en) * 2017-07-06 2019-01-15 深圳华大法医科技有限公司 The method and system of affiliation between identification biological sample
CN111247599A (en) * 2017-09-07 2020-06-05 瑞泽恩制药公司 System and method for predicting relationships in a human population
CN108491691B (en) * 2018-03-23 2020-06-09 河北医科大学 Genetic relationship identification method and terminal equipment
CN108998507B (en) * 2018-07-24 2022-03-29 广州万德基因医学科技有限公司 Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification
CN109207606B (en) * 2018-09-26 2019-06-21 西南民族大学 The screening technique in the site SSR for paternity identification and application

Also Published As

Publication number Publication date
WO2022087839A1 (en) 2022-05-05

Similar Documents

Publication Publication Date Title
CN110800063B (en) Detection of tumor-associated variants using cell-free DNA fragment size
BR112020013636A2 (en) method to facilitate the prenatal diagnosis of a genetic disorder from a maternal sample associated with the pregnant woman, method for identifying contamination associated with at least one between preparation of sequencing library and high-throughput sequencing and method for characterization associated with at least one between sequencing library preparation and sequencing
CN109767810B (en) High-throughput sequencing data analysis method and device
CN113366122B (en) Free DNA end characterization
US11581062B2 (en) Systems and methods for classifying patients with respect to multiple cancer classes
US11961589B2 (en) Models for targeted sequencing
CN107849612A (en) Compare and variant sequencing analysis pipeline
KR20200093438A (en) Method and system for determining somatic mutant clonability
CN113724791B (en) CYP21A2 gene NGS data analysis method, device and application
US20190355438A1 (en) Inferring selection in white blood cell matched cell-free dna variants and/or in rna variants
CN111139291A (en) High-throughput sequencing analysis method for monogenic hereditary diseases
AU2021248502A1 (en) Cancer classification with synthetic spiked-in training samples
US20190287646A1 (en) Identifying copy number aberrations
Beal et al. Whole genome sequencing for quantifying germline mutation frequency in humans and model species: cautious optimism
CN112735599A (en) Evaluation method for judging rare hereditary diseases
US20190073445A1 (en) Identifying false positive variants using a significance model
CN110400602A (en) A kind of ABO blood group system classifying method and its application based on sequencing data
CN114891876A (en) Functional genome area biomarker combination for diagnosing high myopia
CN114694750A (en) Single-sample tumor somatic mutation distinguishing and TMB (Tetramethylbenzidine) detecting method based on NGS (Next Generation System) platform
CN115836349A (en) System and method for evaluating longitudinal biometric data
McAleenan et al. Diagnostic test accuracy and cost‐effectiveness of tests for codeletion of chromosomal arms 1p and 19q in people with glioma
CN116209777A (en) Genetic relationship judging method and device based on noninvasive prenatal gene detection data
CN110373458A (en) A kind of kit and analysis system of thalassemia detection
US20190108311A1 (en) Site-specific noise model for targeted sequencing
CN116469552A (en) Method and system for breast cancer polygene genetic risk assessment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination