CN108998507B - Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification - Google Patents

Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification Download PDF

Info

Publication number
CN108998507B
CN108998507B CN201810816125.XA CN201810816125A CN108998507B CN 108998507 B CN108998507 B CN 108998507B CN 201810816125 A CN201810816125 A CN 201810816125A CN 108998507 B CN108998507 B CN 108998507B
Authority
CN
China
Prior art keywords
snp
chrn
snps
sequencing
mitochondrial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810816125.XA
Other languages
Chinese (zh)
Other versions
CN108998507A (en
Inventor
黄凯铃
陈梦麟
骆颖筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU WANDE GENE MEDICAL TECHNOLOGY Co.,Ltd.
Original Assignee
Guangzhou Wande Gene Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Wande Gene Medical Technology Co ltd filed Critical Guangzhou Wande Gene Medical Technology Co ltd
Priority to CN201810816125.XA priority Critical patent/CN108998507B/en
Publication of CN108998507A publication Critical patent/CN108998507A/en
Application granted granted Critical
Publication of CN108998507B publication Critical patent/CN108998507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a noninvasive high-throughput detection method for identifying genetic relationship, which selects 500 SNP for each chromosome, and the MAF frequency of the SNP is between 0.4 and 0.6. And 1 tagSNP with MAF value in the range of 0.4 to 0.6 is selected in each area of 500 shares; and in the detection process, 142 mitochondrial SNPs and 37 special genes are specially added, which is beneficial to the identification of complex genetic relationship. The number of SNP sites is as high as 11714, so that the effective SNP number of paternity test is greatly increased. The SNP loci cover different regions of the whole genome region, so that the condition that a large amount of SNP information is lost due to possible microdeletion and microduplication of individuals is avoided. The method of the invention is suitable for the identification of the paternity relationships of siblings/non-siblings and the like.

Description

Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification
Technical Field
The invention belongs to the field of genetic engineering and the field of court science, and particularly relates to a noninvasive high-throughput detection method applied to the identification of complex relationships of people, which can be used for noninvasive paternity identification and maternal tracking of siblings/non-siblings, so that the method can be particularly suitable for special noninvasive paternity identification (only used for providing genetic evidence for court disputes) of third-generation test-tube infants (non-ovum donors are pregnant). Meanwhile, the method collects the high-frequency SNP sites of Chinese people, and is more suitable for genetic relationship identification of the Chinese people.
Background
1. Paternity testing
Paternity testing refers to the scientific determination of whether biological relationships exist between subjects by detecting genetic markers using medical, biological and anthropological methods and analyzing according to the theory of genetics. The range of paternity testing is very wide, including the determination of the relationship between two generations of orthotics, and also the determination of the relationship between siblings, alternate generations of orthotics, and collateral individuals (uncle and aunt nephew, etc.).
SNP and typing
SNPs refer primarily to DNA sequence polymorphisms at the genomic level caused by variations of a single nucleotide. It is the most common one of the heritable variations in humans. Accounting for more than 90% of all known polymorphisms. SNPs are widely existed in human genome, and the average number of the SNPs is 1 in every 500-1000 base pairs, and the total number of the SNPs can be estimated to be 300 ten thousand or more. SNP is called Single Nucleotide Polymorphisms, and refers to genetic markers formed by variation of Single Nucleotide on genome, including conversion, transversion, deletion and insertion, and the genetic markers are large in quantity and rich in polymorphism. Theoretically, each SNP site can have 4 different variants, but actually only two variants occur, namely, transition and transversion, with a ratio of 2: 1. SNPs occur most frequently in the CG sequence and are mostly C-to-T, because cytosines in CG are often methylated and then spontaneously deaminated to thymines. Generally, a SNP refers to a single nucleotide variation with a variation frequency of greater than 1%. There is roughly one SNP every 1000 bases in the human genome, and the total number of SNPs on the human genome is roughly 3 x 10^ 6. Thus, SNPs become third generation genetic markers, and many phenotypic differences in humans, susceptibility to drugs or diseases, and the like may be associated with SNPs.
3. Mitochondrial DNA and maternal inheritance
Mitochondria have genetic material DNA, called maternity inheritance. Mitochondrial DNA, unlike DNA found in the nucleus, is inherited from parents, whereas mitochondrial DNA is derived from maternal genes only. Mitochondria can only be passed on to the child by the mother, as stated by the production of sperm and eggs. Both the ovum and the sperm are meiotically formed from germ cells (oogonium) which proliferate and differentiate to form primary oocytes. A primary oocyte divides first by a decrement to form a secondary oocyte and a polar body (first polar body), the secondary oocyte divides second by a decrement to form an egg cell and a polar body (second polar body), and finally both polar bodies die, leaving only the egg cell. The oogonium cells collect most of their material in the egg cell during both divisions and only a small fraction of it remains in the polar body, so that the polar body is useless and dies, leaving the egg cell, the genetic material and organelles in the egg cell such as the nucleus, mitochondria, etc., and nutrients for the initial development of the fetus. The meiosis of sperm is different from that of egg cell, although it also undergoes two times of meiosis, the sperm is evenly divided into four sperm without polar body production, and all organelles and nutrients are abandoned during the division process, only the nucleus is left, and the genetic material is left. In short, although the female germ cells divide into four cells, only one ovum is left, mitochondria, genetic materials and nutrient substances are left, and the ovum is bigger; the male germ cells undergo two divisions leaving 4 sperm, but the only genetic material in the sperm. Therefore, the number of eggs is small, the size is large, and the number of sperms is small.
4. High throughput target region sequencing
High-throughput sequencing, also known as "Next-generation" sequencing technology, is marked by the ability to sequence hundreds of thousands to millions of DNA molecules in parallel at one time, and by the short read length. Whole genome sequencing can obtain structural variation of mutation, insertion, deletion, copy number and the like of the whole genome. However, due to the enormous amount of whole genome data, in the case of 30X sequencing, the human whole genome will yield over 9OG of sequencing data. While low mutation frequency related to tumors and the like or fetal free dna detection of pregnant woman peripheral blood requires coverage of at least 5000X level or more, whole genome sequencing can generate sequencing data volume of more than 15T. Especially, the content of tumor-associated free DNA in a body fluid specimen or free DNA derived from a fetus in the peripheral blood of a pregnant woman is extremely low, and the sequencing amount of a whole genome exceeds 200T. Thus, the large-scale sequencing data obviously increases the sequencing cost, causes great difficulty to the data analysis work, and further restricts the sequencing application. Therefore, NGS high-throughput sequencing is performed without any signal amplification, and the vast majority of the information obtained is that of normal cell genomes. With such strong background noise, both specificity and sensitivity of detection become problematic. Not only does this happen because 99.99% of the sequencing effort, which is a lot of labor and money, is useless information, which is equivalent to generating garbage in high throughput. To solve this problem, a capture and enrichment technique for multiple PCR target regions established for high throughput sequencing platforms has been developed.
The main disadvantages of the present genetic relationship identification are: 1) the traditional generation sequencing technology cannot detect multiple SNP sites simultaneously (with great cost and great DNA amount). 2) At present, the existing noninvasive paternity test kit selects SNP locus combinations with high population frequency in thousands of human genomes, but the frequencies of Chinese population and foreign population are still mostly different. 3) At present, a high-throughput sequencing kit for paternity test is not suitable for paternity test of siblings/non-siblings and the like. 4) The third generation of test tube infants (non-ovum donors are pregnant) are not allowed by policies and are suspected of being illegal, but corresponding disputes are still rare on various big news platforms. The paternity test relates to the tetragonal relationship of an ovum donor female, a pregnant female, a sperm donor and a fetus, is more complex than the conventional triplet paternity test, and the current noninvasive paternity test kit is not suitable for solving the problem temporarily.
Disclosure of Invention
In order to solve the existing problems, the invention provides a noninvasive genetic relationship identification method capable of simultaneously detecting 11714 SNP sites and 37 special genes based on a high-throughput sequencing platform.
The invention aims to provide a noninvasive high-throughput detection method applied to the identification of the complex genetic relationship of people.
The technical scheme adopted by the invention is as follows:
a noninvasive high-throughput detection method for identifying genetic relationship comprises the following steps:
1) the frequency MAF of all SNPs in 24 human genome chromosomes chrN, N ═ {1-22, X, Y } in the corresponding genome database is totally aligned, and the total set of all SNPs in 24 chromosomes is S1chrN
2) Selecting SNP with MAF value of 0.4-0.6 in 24 chromosomes, and setting SNP set meeting the condition as S2chrN
3) Find set S2chrNThe position of all SNPs on each chromosome; intercepting a region between the SNP with the minimum position and the SNP with the maximum position for each chromosome, equally dividing the region into 500 equal parts, and selecting 1 tagSNP from each 1 equal part, thereby obtaining an SNP set S3chrN
4) Selecting 142 mitochondrial SNPs, and S3chrNThe SNP set obtained by combination is S4chrN
5) Further 37 mitochondrial DNA regions were selected as part of the target regionIs designated as S4chrNA supplemental region of the set;
6) designed to detect the above S4chrNHigh throughput sequencing multiplex PCR primers aggregating SNP and 37 mitochondrial DNA regions;
7) extracting DNA of a sample to be detected, performing multiplex PCR (polymerase chain reaction) by using the designed primer to construct a library, and performing high-throughput sequencing on the library;
8) analyzing the sequencing result to obtain S4 in the sample to be detectedchrNSpecific genotypes of the SNPs and 37 mitochondrial DNA regions were pooled;
9) and (4) according to the identification mode of the specific genetic relationship, combining the comparative analysis of the specific genotype of the sample to be detected, and judging whether the sample has the corresponding genetic relationship.
Further, the specific positions of the 37 mitochondrial DNA regions in the mitochondrial genome are as follows:
Figure BDA0001740368980000031
Figure BDA0001740368980000041
further, in step 1), the genome database is a thousand human genome database.
Further, in step 1), the frequency MAF is a frequency MAF in the east asian population.
Further, in step 3), the specific method for selecting tagSNP comprises: calculating the largest SNP colony by using software with each of 500 equal parts as subareas, and selecting Max (1-R) in the SNP colony2) "SNP with the largest value as tagSNP for each aliquot.
Further, R2Is a numerical representation of SNP linkage.
Further, the step S4chrNThe number of SNPs in the set was 11714.
Further, the names of the 142 mitochondrial SNPs are respectively:
Figure BDA0001740368980000051
Figure BDA0001740368980000061
further, the identification of the genetic relationship comprises paternity and child identification, maternal homology identification, sister relationship identification, grandfather and grandfather relationship identification and ancestor relationship identification.
Further, in the step 8), in the analysis process of the sequencing result, the quality control is carried out on the sequencing data, and the low-quality sequencing is removed.
The invention has the beneficial effects that:
1) the principle of SNP selection quantity is as follows: on 24 chromosomes, 500 SNPs per chromosome were chosen and the frequency of MAF for SNPs was between 0.4 and 0.6.
2) The principle of SNP selection position is as follows: each chromosome was divided into 500 parts on average, and 1 tagSNP with a MAF value within a predetermined range was selected in each region.
3) In the detection process, 142 amplification primer pairs for mitochondrial SNP and 37 amplification primer pairs for special genes are specially added, which is beneficial to the identification of complex genetic relationship. And the number of SNP sites is as high as 11714, so that the effective SNP number of paternity test is greatly increased.
4) The SNP loci cover different regions of the whole genome region, so that the condition that a large amount of SNP information is lost due to possible microdeletion and microduplication of individuals is avoided.
5) The method of the invention can be applied to paternity relationship identification of siblings/non-siblings and the like.
6) The third generation tube babies (non-ovum donors are pregnant) are not allowed by policies and are suspected of being illegal, but corresponding disputes are still rare on various big news platforms. The paternity test involves a tetragonal relationship among egg donor females, pregnant females, sperm donors and fetuses, and is more complex than the conventional triplet paternity test. The detection method can deduce whether the fetus and the female to be detected are from the same maternal line, and provides direct genetic evidence for possible court disputes.
Drawings
FIG. 1 shows S4chrNDistribution of SNPs in the set in the chromosome, M in the figure represents the mitochondrial genome.
Detailed Description
The present invention will be further described with reference to the following examples.
Embodiment 1 a noninvasive high-throughput detection method for identifying genetic relationship
The method comprises the following steps: selecting target region and designing primer
1) Using annovar software to align all east asian population frequencies MAF in thousand human genome database (2015, 8 months) corresponding to SNPs of 24 chromosomes chrN, N ═ {1-22, X, Y } in whole genome, at which time the total set of SNPs of 24 chromosomes is S1chrN
2) SNP with MAF between 0.4 and 0.6 per chromosome is selected, and the SNP set meeting the condition is S2chrN
3) For each chromosome chrN, set S2chrNThe SNP positions of (1) are sequenced from small to large, the SNP positions are evenly divided into 500 equal parts, 1 tagSNP is selected from each 500 equal parts, and the SNP set is S3chrN
The specific operation of selecting tagSNP is as follows: each chrN subregion (each of 500 divided regions) was used as an input file, the largest "SNP cluster (cluster)" was calculated using WCLUSTAG v2 software (http:// www.math.hkbu.edu.hk/. about ng/WCLUSTAG. html), and Max (1-R) in the "SNP cluster (cluster)" was selected2) "SNP with largest value (R)2Is a numerical representation of the SNP linkage), as the tagSNP of the chrN region. The calculation step is based on the SNP frequency data of the east Asian population, and when other people calculate, the SNP frequency data of other populations can be selected for calculation to obtain tagSNPs (reference Combining Functional and Linkage Disequilibrium Information in the Selection of Tag SNPs) of different populations.
4) Extracting 142 mitochondrial SNPs (see Table 1), and extracting S3chrNThe SNP set obtained by combination is S4chrN. The number of SNPs was 11714, and the distribution thereof on each chromosome was as shown in FIG. 1.
TABLE 1142 detailed information of mitochondrial SNPs
Figure BDA0001740368980000071
Figure BDA0001740368980000081
Figure BDA0001740368980000091
Figure BDA0001740368980000101
Figure BDA0001740368980000111
5) Specifically, the 37 mitochondrial DNA regions in Table 2 below were used as part of the target region, S4chrNA supplemental region of the collection.
Specific locations of the 237 mitochondrial DNA regions in the mitochondrial genome
Figure BDA0001740368980000112
Figure BDA0001740368980000121
6) Set S4chrNThe SNPs and mitochondria of (a) complement the target region (table 2, above), and are combined into a BED file suitable for use in the present detection scheme.
7) The BED file is used for an ION AMPLISEQ DESIGNER website (other sequencing primer design software can be used for replacing the sequencing primer design software) to design multiple PCR primers for high-throughput sequencing library construction, and the multiple PCR primers are used as a website input file.
8) The BED file is also used as an input file for high-throughput sequencing target area sequencing quality evaluation.
Step two: and extracting the sample DNA of the sample to be detected.
Step three: and (3) carrying out multiple PCR library construction on the extracted DNA sample by using the designed primers and carrying out high-throughput sequencing.
Step four: and (4) performing quality control on sequencing data, and removing low-quality sequencing reads.
1) Performing quality control on raw data by using a fastqc tool to obtain a fastqc result report file;
2) and according to the fastqc result report file, performing fine filtering on raw reads in need by using cutdata/trimmatic to obtain clean reads so as to carry out subsequent analysis.
Step five: performing genome mapping on the sequencing clean reads obtained in the last step:
1) utilizing bowtie2 to align clean _ raw _ data to a reference genome hg19 to obtain an alignment result sam format file;
2) utilizing a samtools to convert the sam format into the bam format, and sequencing the bam files;
3) removing PCR duplicate by utilizing rmdup function of samtools, and establishing a bam file index;
step six: detecting specific genotypes of 11714 SNPs and 37 special genes of a sample to be detected:
1) using samtools mpieup and/or GATK open source software to search the mutation sites of the processed bam file to obtain an SNP information initial file;
2) converting open source software bcftools into a vcf-format mutation site result file, thereby obtaining specific genotypes of all samples to be detected on 11714 SNPs;
step seven: and according to a specific genetic relationship identification form, carrying out genetic relationship identification according to corresponding SNP result calculation.
Embodiment 2 a noninvasive high-throughput detection method for identifying genetic relationship (noninvasive paternity test)
Brief description of case:
1) entrustment items: detecting whether there is a biological paternity between fetal DNA in the sample labeled as "xu-lady" and DNA in the samples labeled as "Mr. Li" and "Mr. Zheng", the source of the DNA.
2) Samples and information collection are shown in table 3:
TABLE 3 samples and information Collection
Sample(s) Race of a person Sample information Type of sample Sex Sample numbering
Xu women Asia 10w of gestational week Whole blood Woman K207
Mr. Li Asia Suspicion of father Oral swab For male K208
A detection step:
the method comprises the following steps: separating the plasma and the brown yellow layer of the K207 blood sample to be detected.
1) Pre-cooling the low-speed centrifuge, setting the temperature to be 4 ℃, placing the centrifuge into a blood collection tube after the temperature is stable, centrifuging the centrifuge for 10 minutes at 1,600g, and sucking supernatant plasma into an EP tube for the operation of the step 2). The intermediate white blood cell buffy coat was transferred to a 2.0mL EP tube and immediately stored in a-80 ℃ freezer.
2) Pre-cooling high-speed centrifuge, setting temperature at 4 deg.C, placing into the plasma obtained in step 1), centrifuging at 16,000g for 10 min, sucking supernatant plasma (avoiding sucking precipitation), subpackaging into EP tube, and immediately storing in-80 deg.C refrigerator.
Step two: maternal DNA and maternal free DNA (containing fetal DNA) were extracted.
1) Add 20. mu.l of Proteinase K to the centrifuge tubes (self-contained).
2) 200 μ l of sample was added.
3) Add 160. mu.l Buffer CL, mix by inversion and shake vigorously for at least 30 seconds.
4) Incubate at 60 ℃ for 30 minutes, during which the mixture is inverted and mixed several times. Note that: 200 mul serum/plasma samples were incubated at 60 ℃ for 10-15 minutes.
5) Add 360. mu.l Buffer CB (check for isopropanol before use) and shake until well mixed.
6) Ice-cooling for 5 min, and centrifuging briefly to collect the liquid on the tube wall and the wall cover to the bottom of the tube.
7) And (3) adding all the solution obtained in the step (6) into an adsorption column (Spin Columns DF) filled with a collecting tube, and transferring for many times if the solution cannot be added at one time. Centrifuge at 12,000rpm for 1 minute, remove waste from the collection tube, and replace the adsorption column back into the collection tube.
8) Mu.l of Buffer GW1 (checked for absolute ethanol addition before use) was added to the adsorption column, centrifuged at 12,000rpm for 30 seconds, the waste liquid in the collection tube was discarded, and the adsorption column was replaced in the collection tube.
9) 750. mu.l of Buffer GW2 (checked for absolute ethanol addition before use) was added to the adsorption column, centrifuged at 12,000rpm for 30 seconds, the trap was discarded, and the adsorption column was replaced in the trap.
10) 750. mu.l of absolute ethanol was added to the adsorption column, centrifuged at 12,000rpm for 30 seconds, the waste liquid in the collection tube was discarded, and the adsorption column was replaced in the collection tube.
11) Centrifuge at 12,000rpm for 2 minutes and discard the tube. The column was left at room temperature for several minutes to dry thoroughly. Note that: the purpose of this step is to remove residual ethanol from the adsorption column, which could affect the subsequent enzymatic reactions.
12) Placing the adsorption column in a new centrifuge tube, suspending 20-100 μ l Buffer EBL or sterilized water in the middle of the adsorption column, standing at room temperature for 2-5 min, centrifuging at 12,000rpm for 1 min, collecting DNA solution, and storing DNA at-20 deg.C.
Step three: and extracting DNA of the sample K208 to be detected.
1) The cotton swab portion was cut from its shaft with scissors, placed in a 2ML centrifuge tube, and 400. mu.l of lysate ML was added. Then 20 mul of proteinase K (20mg/ml) solution is added, vortex immediately and shake to mix evenly,
2) optional steps (generally not required): the mixture was left at 56 ℃ for 1 hour, during which it was vortexed for 10 seconds every 10 minutes.
3) Add 400. mu.l of binding solution CB, vortex immediately and mix well, and stand at 70 ℃ for 10 minutes. At this point the solution was strained to clear, centrifuged briefly to remove the droplets on the inner wall of the tube cap, then the swab was squeezed off and as much lysate as possible was transferred to a new centrifuge tube.
4) After cooling, 200. mu.l of absolute ethanol was added, and immediately vortexed to mix well. Briefly centrifuge to remove droplets from the inner wall of the tube cover and collect all liquid to the bottom of the tube.
5) The mixture from the previous step was added to an adsorption column AC (which was placed in a collection tube) and centrifuged at 12,000rpm for 30-60 seconds, and the collection tube was discarded.
6) 500. mu.l of inhibitor-removing solution IR was added thereto, and the mixture was centrifuged at 12,000rpm for 30 seconds, and the waste liquid was discarded.
7) 500. mu.l of the rinsing solution WB were added (please check if absolute ethanol had been added! ) And centrifuged at 12,000rpm for 30 seconds, and the waste liquid was discarded.
8) 500. mu.l of the rinsing solution WB was added, and centrifuged at 12,000rpm for 30 seconds, and the waste liquid was discarded.
9) The adsorption column AC was returned to the empty collection tube and centrifuged at 13,000rpm for 2 minutes to remove the rinse as much as possible so as not to inhibit downstream reactions by residual ethanol in the rinse.
10) Taking out the adsorption column AC, placing into a clean centrifuge tube, adding 20-50 μ l elution buffer EB (the elution buffer is preheated in water bath at 65-70 deg.C in advance to obtain better effect), standing at room temperature for 1 min, and centrifuging at 12,000rpm for 1 min. The resulting solution was again introduced into the centrifugal adsorption column, and left at room temperature for 1 minute, followed by centrifugation at 12,000rpm for 1 minute.
Step four: and (3) carrying out multiple PCR library construction on the extracted DNA sample by using the designed primers and carrying out high-throughput sequencing. And (4) performing quality control on sequencing data, and removing low-quality sequencing clean reads.
1) Performing quality control on raw data by using a fastqc tool to obtain a fastqc result report file;
2) and according to the fastqc result report file, performing fine filtering on raw reads in need by using cutdata/trimmatic to obtain clean reads so as to carry out subsequent analysis.
Step five: performing genome mapping on the sequencing clean reads obtained in the last step:
1) utilizing bowtie2 to compare clean reads to a reference genome hg19 to obtain a sam-formatted file of a comparison result;
2) utilizing a samtools to convert the sam format into the bam format, and sequencing the bam files;
3) removing PCR duplicate by utilizing rmdup function of samtools, and establishing a bam file index;
4) the statistical alignment information is shown in table 4.
TABLE 4 statistical comparison information
Sample numbering Clean reads Mapped Reads Useful Reads Paired Reads SE Reads
K208 21,927,778 21359624 21219858 17983898 139766
Buffy coat of K207 21,086,226 20486144 20343456 17362826 142688
K207 plasma 71,400,538 64336752 63881342 52880508 455410
Step six: detecting specific genotypes of 11714 SNPs and 37 special genes of a sample to be detected:
1) searching the mutation sites of the processed bam file by using the BED file of the detection scheme and utilizing samtools mpieup and/or GATK open source software to obtain an SNP information initial file;
2) converting bcftools into a vcf-format mutation site result file by using open source software, thereby obtaining specific genotypes of all samples to be detected on 11714 SNPs, and acquiring all SNV (single nucleotide mutation (SNV) information of non-SNP) of 37 mitochondrial gene regions;
step seven: SNP site screening and site analysis.
Through the above detection and analysis, 139 spots were screened from the target region for the presence of fetal DNA that failed to match the suspected father DNA (see table 5).
TABLE 5 sites where fetal DNA and suspected paternal DNA did not match
Figure BDA0001740368980000151
Figure BDA0001740368980000161
Figure BDA0001740368980000171
Figure BDA0001740368980000181
Figure BDA0001740368980000191
Figure BDA0001740368980000201
Step eight: and (4) calculating the paternity index.
According to the analysis of the detection result, under the precondition of excluding genetic variation, same-egg multiple-placenta fetus, close-relative and exogenous interference (such as hematopoietic stem cell transplantation) and the like, 139 SNP loci existing between fetal DNA and Japanese plum-born DNA in a sample 'xu-lady' do not accord with the Mendel genetic rule described above, the Cumulative Paternity Index (CPI) is less than 0.0001, and the paternity probability (RCP) is less than 0.0001. Thus, in "xu-lady" samples that do not genetically support labeling, the fetal DNA source is in biological paternity with the source of the censored "mr. prune" sample.
Example 3 non-invasive high throughput detection method for identification of genetic relationship (sister identification)
Brief introduction of case: due to early-age straying, old and yellow grandma who meet after many years suspect that both are relatives and sisters, and are 75 and 86 years old, respectively, in the present year. Since parents are all lost, suspected sister identification is required.
A detection step:
the blood was collected from Chenopodium and Huangpo respectively, and DNA was extracted by the method described in examples 1 and 2 above. Since the present example requires the estimation of whether the samples are derived from the same parental line, analysis is performed from the mitochondrial perspective, and the effective mitochondrial DNA information in the analysis results is shown below.
According to the analysis of the detection result, 18 sites of DNA of two samples on the mitochondrial genome do not accord with the maternal inheritance rule (Table 6), so that the DNA source in the 'Lavero' sample which does not support the marking from the genetics perspective has the sibling/half-sibling relationship with the DNA source marked as 'Lavero' sample from the censorship perspective.
Table 618 loci not conforming to maternal inheritance rule
Figure BDA0001740368980000211
Figure BDA0001740368980000221
The genetic relationship identification method can be applied to the fields of non-invasive prenatal paternity and offspring identification, maternal homologous identification, sister relationship identification, grandfather and grandfather relationship identification and ancestral relationship judgment. Therefore, the method can be widely applied to the fields of paternity test, ancestral book tracking, missing population inquiry (turning), suspicion investigation and the like.
The method can be simultaneously applied to maternal tracking of sibling/non-sibling brothers and sisters, so the method can be particularly suitable for special noninvasive paternity test of third-generation test-tube infants (non-ovum donors are pregnant) (only used for providing genetic evidence for court disputes). In addition, the method can intensively capture the high-frequency SNP sites of Chinese people, and is more suitable for genetic relationship identification of the Chinese people.
The invention selects the loci with the allele frequency of 40-60% of the Chinese population, and is more suitable for the paternity relationship identification of the Chinese population. The SNP loci cover different regions of the whole genome region (because the genome is integrally and averagely cut when being selected, and then representative SNP is selected in each region), thereby avoiding the condition that a large amount of SNP information is lost due to possible microdeletion and microduplication of individuals.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (2)

1. A noninvasive high-throughput detection method for identifying genetic relationship is characterized by comprising the following steps:
1) all SNPs in 24 human genome-wide chromosomes chrN, N = {1-22, X, Y } are at 2015The frequency MAF of east Asian population in the genome database of thousand people, 8 months old, was all aligned, at which time the total set of all SNPs in 24 chromosomes was S1chrN
2) Selecting SNP with MAF value of 0.4-0.6 in 24 chromosomes, and setting SNP set meeting the condition as S2chrN
3) Find set S2chrNThe position of all SNPs on each chromosome; intercepting a region between the SNP with the minimum position and the SNP with the maximum position for each chromosome, equally dividing the region into 500 equal parts, using WCLUSTAG V2 software to calculate the maximum SNP community by taking each equal part in the 500 equal parts as a subarea, and selecting Max (1-R) in the SNP community2) "SNP with the largest value as tagSNP of each aliquot, thereby obtaining a SNP set of S3chrN,R2Is a numerical representation of SNP linkage;
4) selecting 142 mitochondrial SNPs, and S3chrNThe SNP set obtained by combination is S4chrNSaid S4chrNThe number of SNPs in the set was 11714;
5) further, 37 mitochondrial DNA regions were selected as a part of the detection target region as S4chrNA supplemental region of the set;
6) designed to detect the above S4chrNHigh throughput sequencing multiplex PCR primers aggregating SNP and 37 mitochondrial DNA regions;
7) extracting DNA of a sample to be detected, performing multiplex PCR (polymerase chain reaction) by using the designed primer to construct a library, and performing high-throughput sequencing on the library;
8) analyzing the sequencing result, performing quality control on the sequencing data, removing low-quality sequencing, and obtaining S4 in the sample to be detectedchrNSpecific genotypes of the SNPs and 37 mitochondrial DNA regions were pooled;
9) according to the identification mode of specific genetic relationship, the comparison and analysis of the specific genotype of the sample to be detected are combined to judge whether the sample has the corresponding genetic relationship;
the specific locations of the 37 mitochondrial DNA regions in the mitochondrial genome are shown below:
Figure 267219DEST_PATH_IMAGE001
the names of the 142 mitochondrial SNPs are respectively:
Figure 374852DEST_PATH_IMAGE002
2. the method of claim 1, wherein the identification of genetic relationship comprises paternity test, maternal homology test, sister test, grand-grand test, ancestry test.
CN201810816125.XA 2018-07-24 2018-07-24 Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification Active CN108998507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810816125.XA CN108998507B (en) 2018-07-24 2018-07-24 Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810816125.XA CN108998507B (en) 2018-07-24 2018-07-24 Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification

Publications (2)

Publication Number Publication Date
CN108998507A CN108998507A (en) 2018-12-14
CN108998507B true CN108998507B (en) 2022-03-29

Family

ID=64597655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810816125.XA Active CN108998507B (en) 2018-07-24 2018-07-24 Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification

Country Status (1)

Country Link
CN (1) CN108998507B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110684848A (en) * 2019-10-25 2020-01-14 广州万德基因医学科技有限公司 Multiple PCR primer group and kit for genetic tumor germ line mutation detection
CN111091869A (en) * 2020-01-13 2020-05-01 北京奇云诺德信息科技有限公司 Genetic relationship identification method using SNP as genetic marker
CN111534605A (en) * 2020-06-05 2020-08-14 复旦大学附属妇产科医院 Identification method of monozygotic twins, heterozygotic twins and second polar body participating in fertilization twins based on SNP genotype
CN116209777A (en) * 2020-10-27 2023-06-02 深圳华大基因股份有限公司 Genetic relationship judging method and device based on noninvasive prenatal gene detection data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103290108B (en) * 2013-04-08 2014-06-04 上海锦博生物技术有限公司 Mitochondrial SNP fluorescence labeling composite amplification kit and application thereof
CN103898226B (en) * 2014-04-11 2016-02-10 上海锦博生物技术有限公司 A kind of plastosome SNP fluorescence labeling composite amplification test kit and application thereof
WO2016049878A1 (en) * 2014-09-30 2016-04-07 深圳华大基因科技有限公司 Snp profiling-based parentage testing method and application
CN106399535A (en) * 2016-10-19 2017-02-15 江苏苏博生物医学股份有限公司 Method for detecting noninvasive paternity tests through high-throughput sequencing
CN107217095B (en) * 2017-06-15 2021-06-04 广东腾飞基因科技股份有限公司 Multiple PCR primer set for human paternity test and detection method

Also Published As

Publication number Publication date
CN108998507A (en) 2018-12-14

Similar Documents

Publication Publication Date Title
CN108998507B (en) Noninvasive high-throughput detection method applied to crowd complex genetic relationship identification
US11519031B2 (en) Non-invasive prenatal diagnosis of fetal genetic condition using cellular DNA and cell free DNA
US10658070B2 (en) Resolving genome fractions using polymorphism counts
Underhill et al. Use of Y chromosome and mitochondrial DNA population structure in tracing human migrations
Szibor X-chromosomal markers: past, present and future
Bassett Chromosomal aberrations and schizophrenia: autosomes
Liu et al. A genome-wide screen of gene–gene interactions for rheumatoid arthritis susceptibility
Tao et al. Separation/extraction, detection, and interpretation of DNA mixtures in forensic science
Zafari et al. Non-invasive prenatal diagnosis of β-thalassemia by detection of the cell-free fetal DNA in maternal circulation: a systematic review and meta-analysis
CN107937513B (en) 50 kinds of hereditary disease genetic test probe groups of newborn and screening method
US6221585B1 (en) Method for identifying genes underlying defined phenotypes
Fuller et al. Extensive recombination suppression and epistatic selection causes chromosome-wide differentiation of a selfish sex chromosome in Drosophila pseudoobscura
WO2023246949A1 (en) Non-invasive method for determining parentage before birth by using microhaplotypes
Lv et al. Noninvasive prenatal diagnosis for pregnancies at risk for β‐thalassaemia: a retrospective study
Yang et al. Noninvasive fetal genotyping of paternally inherited alleles using targeted massively parallel sequencing in parentage testing cases
Dehghanifard et al. Prenatal diagnosis of different polymorphisms of β-globin gene in Ahvaz
Nsengimana et al. Design considerations for genetic linkage and association studies
US20190130996A1 (en) Human haplotyping system and method
Shaw et al. Non‐invasive fetal genotyping for maternal alleles with droplet digital PCR: A comparative study of analytical approaches
Syndercombe Court The Y chromosome and its use in forensic DNA analysis
Burek Kamenaric et al. Detection of novel and confirmation of very rare and rare HLA alleles by next generation sequencing in Croatia
US11869630B2 (en) Screening system and method for determining a presence and an assessment score of cell-free DNA fragments
Peretz-Machluf et al. Genome-wide noninvasive prenatal diagnosis of de novo mutations
Davis et al. Nonparametric linkage regression II: Identification of influential pedigrees in tests for linkage
Tipu Comparison of sequence specific primers in the next generation sequencing in human leukocyte antigen typing for transplant recipients

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Huang Kailing

Inventor after: Chen Menglin

Inventor after: Luo Yingjun

Inventor before: Chen Menglin

Inventor before: Huang Kailing

Inventor before: Luo Yingjun

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220216

Address after: 510535 floor 6, building D, No. 188, Kaiyuan Avenue, high tech Industrial Development Zone, Guangzhou, Guangdong

Applicant after: GUANGZHOU WANDE GENE MEDICAL TECHNOLOGY Co.,Ltd.

Address before: Sixth Floor of D Building, 188 Kaiyuan Avenue, Guangzhou High-tech Industrial Development Zone, Guangzhou, Guangdong Province

Applicant before: Chen Menglin

Applicant before: Zhang Nan

Applicant before: Huang Kailing

Applicant before: Luo Yingjun

Applicant before: Liu Yanhui

GR01 Patent grant
GR01 Patent grant