CN112735518B - ROH data analysis system based on chromosome microarray - Google Patents
ROH data analysis system based on chromosome microarray Download PDFInfo
- Publication number
- CN112735518B CN112735518B CN202011612380.6A CN202011612380A CN112735518B CN 112735518 B CN112735518 B CN 112735518B CN 202011612380 A CN202011612380 A CN 202011612380A CN 112735518 B CN112735518 B CN 112735518B
- Authority
- CN
- China
- Prior art keywords
- roh
- module
- fragment
- upd
- analyzing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 210000000349 chromosome Anatomy 0.000 title claims abstract description 42
- 238000002493 microarray Methods 0.000 title claims abstract description 22
- 238000007405 data analysis Methods 0.000 title claims abstract description 14
- 238000004458 analytical method Methods 0.000 claims abstract description 30
- 238000012216 screening Methods 0.000 claims abstract description 14
- 238000004364 calculation method Methods 0.000 claims abstract description 5
- 239000012634 fragment Substances 0.000 claims description 64
- 206010028980 Neoplasm Diseases 0.000 claims description 26
- 108090000623 proteins and genes Proteins 0.000 claims description 23
- 208000026350 Inborn Genetic disease Diseases 0.000 claims description 17
- 208000016361 genetic disease Diseases 0.000 claims description 17
- 230000002068 genetic effect Effects 0.000 claims description 16
- 230000001717 pathogenic effect Effects 0.000 claims description 16
- 208000011580 syndromic disease Diseases 0.000 claims description 12
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 4
- 208000005623 Carcinogenesis Diseases 0.000 claims description 3
- 230000036952 cancer formation Effects 0.000 claims description 3
- 231100000504 carcinogenesis Toxicity 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 9
- 238000005516 engineering process Methods 0.000 abstract description 4
- 238000000034 method Methods 0.000 description 7
- 108091092878 Microsatellite Proteins 0.000 description 6
- 230000005856 abnormality Effects 0.000 description 5
- 230000007918 pathogenicity Effects 0.000 description 5
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 230000013011 mating Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 208000031655 Uniparental Disomy Diseases 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 230000011365 genetic imprinting Effects 0.000 description 2
- 208000024200 hematopoietic and lymphoid system neoplasm Diseases 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000007838 multiplex ligation-dependent probe amplification Methods 0.000 description 2
- 230000035772 mutation Effects 0.000 description 2
- IFYDWYVPVAMGRO-UHFFFAOYSA-N n-[3-(dimethylamino)propyl]tetradecanamide Chemical compound CCCCCCCCCCCCCC(=O)NCCCN(C)C IFYDWYVPVAMGRO-UHFFFAOYSA-N 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 210000004881 tumor cell Anatomy 0.000 description 2
- 208000009575 Angelman syndrome Diseases 0.000 description 1
- 206010003805 Autism Diseases 0.000 description 1
- 208000020706 Autistic disease Diseases 0.000 description 1
- 201000000046 Beckwith-Wiedemann syndrome Diseases 0.000 description 1
- 201000007145 CD45 deficiency Diseases 0.000 description 1
- 206010010356 Congenital anomaly Diseases 0.000 description 1
- 206010013801 Duchenne Muscular Dystrophy Diseases 0.000 description 1
- 206010058314 Dysplasia Diseases 0.000 description 1
- 206010053759 Growth retardation Diseases 0.000 description 1
- 108091028043 Nucleic acid sequence Proteins 0.000 description 1
- 201000010769 Prader-Willi syndrome Diseases 0.000 description 1
- 108700020978 Proto-Oncogene Proteins 0.000 description 1
- 102000052575 Proto-Oncogene Human genes 0.000 description 1
- 108700005079 Recessive Genes Proteins 0.000 description 1
- 102000052708 Recessive Genes Human genes 0.000 description 1
- 206010062282 Silver-Russell syndrome Diseases 0.000 description 1
- 208000026300 T-B+ severe combined immunodeficiency due to CD45 deficiency Diseases 0.000 description 1
- 208000026487 Triploidy Diseases 0.000 description 1
- 108700025716 Tumor Suppressor Genes Proteins 0.000 description 1
- 102000044209 Tumor Suppressor Genes Human genes 0.000 description 1
- 206010000210 abortion Diseases 0.000 description 1
- 231100000176 abortion Toxicity 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 208000036878 aneuploidy Diseases 0.000 description 1
- 231100001075 aneuploidy Toxicity 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 210000000845 cartilage Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 230000002559 cytogenic effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 206010012601 diabetes mellitus Diseases 0.000 description 1
- 210000001840 diploid cell Anatomy 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 231100000001 growth retardation Toxicity 0.000 description 1
- 210000005260 human cell Anatomy 0.000 description 1
- 208000002780 macular degeneration Diseases 0.000 description 1
- 230000036244 malformation Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000011987 methylation Effects 0.000 description 1
- 238000007069 methylation reaction Methods 0.000 description 1
- 201000006938 muscular dystrophy Diseases 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 230000004983 pleiotropic effect Effects 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 239000002994 raw material Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 208000002491 severe combined immunodeficiency Diseases 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 210000001082 somatic cell Anatomy 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Chemical & Material Sciences (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Ecology (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a ROH data analysis system based on chromosome microarray, comprising a ROH data acquisition and screening module, a calculation module and a retrieval and analysis module 1-4 which are connected in sequence; the ROH data acquisition and screening module is used for acquiring the on-machine data and screening ROH which is more than or equal to 5 Mb; the calculating module is used for calculating the ratio of the total length of ROH (ROH) which is more than or equal to 5Mb to the total length of all autosomes; the searching and analyzing module 1 is used for searching and analyzing the ROH from the calculating module; a retrieving and analyzing module 2 for retrieving and analyzing the ROH from the module 1; a retrieving and analyzing module 3 for retrieving and analyzing the ROH from the module 2; the retrieving and analyzing module 4 is used for retrieving and analyzing the ROH from the module 3. The invention is not only suitable for the analysis of the ROH data of the chromosome microarray, but also suitable for the analysis of the ROH data detected by other detection technologies, such as STR and WGS.
Description
Technical Field
The invention relates to bioinformatics, in particular to a ROH data analysis system based on a chromosome microarray.
Background
Chromosomes are materials with genetic information in the nucleus, normal human cells have 23 pairs of chromosomes, including 22 pairs of autosomes and 1 pair of sex chromosomes. Chromosomal or genomic abnormalities such as chromosomal triploid, aneuploidy, microdeletion, microreplication, homozygous regions, etc., are one of the important etiologies for abortion, congenital malformations, mental disorders, growth retardation, tumorigenesis, etc. Chromosome microarray is a screening technology with higher resolution and higher throughput than traditional cytogenetic means (such as karyotyping, FISH, etc.), and can find changes in chromosome microscopic level, and can find changes in chromosome submicron level-microdeletion and micro-duplication (CNV change), most importantly, can find abnormalities in homozygous state segment ROH (regions of homozygosity), and ROH is a phenomenon of loss of heterozygosity continuously present in a certain range in a genome region. For most diploid cells, such as human somatic cells, there are two genomes, one from the father and the other from the mother, at a certain allelic locus, if the bases from the father and the mother differ, the locus is heterozygous (heterozygous). If, due to a mechanism such as marital or gene conversion, a distant or close relationship, successive allelic sequences are homozygous for a given range and not heterozygous (copy number is still 2), the region is a region of genomic homozygosity ROH. ROH production involves the cause of the same blood lineage (identity by descent, IBD) or of the uniparental disomy (Uniparental disomy, UPD). Whereas IBD refers to two or more individuals inheriting similar nucleotide sequences from a common ancestor. UPD refers to homologous chromosomes or partial fragments on chromosomes both originating from one of the parents, which do not conform to mendelian genetics rules, and can be followed by homozygous mutations in the recessive gene or genetic imprinting disorders, resulting in a wide variety of clinical phenotypes. ROH caused by IBD or UPD is ubiquitous in the population, over 0.5-1Mb ROH is commonly used for study of genetic characteristics of the population, over 3-5Mb ROH is commonly used for clinical analysis, over 3-5Mb ROH of multiple chromosomes often suggests parents to have relatedness, and over 10Mb ROH alone suggests that UPD may be present. Recessive genetic diseases (secondary single gene homozygous mutations) caused by UPD are commonly known as autism, macular degeneration of the fundus, type 2 cartilage dysplasia, severe combined immunodeficiency disease with CD45 deficiency, duchenne muscular dystrophy, spinal muscular dystrophy, and the like. The genetic imprinting disorder caused by UPD is commonly such as Prader-Willi syndrome, angelman syndrome, neonatal transient diabetes, silver Russell syndrome, beckwith-Wiedemann syndrome and the like, meanwhile, the occurrence of acquired UPD (aUPD) in tumor cells is a common molecular event, and a large fragment aUPD is equal to homozygote of a gene accumulation effect, so that the silencing of cancer suppressor genes or the expression of protooncogenes can be caused, and the cloning evolution of tumor cells can be caused.
Current methods of ROH detection include Short Tandem Repeat (STR), methylation detection (MS PCR/MS MLPA), whole Exome (WES)/Whole Genome (WGS) sequencing, and Chromosome Microarray (CMA). Wherein, STR detection needs to select high polymorphism STR markers according to detection purposes and genome positions, so that a detection method is limited to a certain extent. MS PCR/MS MLPA cannot detect IBD in ROH, only UPD, and cannot distinguish UPD from imprinting defects. WES/WGS detection can misjudge the hemizygous loss as ROH fragment, and needs subsequent detection and verification to distinguish, thus having high cost. The most ideal technology for detecting ROH is Chromosome Microarray (CMA), but CMA is used as a high-throughput high-resolution screening technology, on the premise of ensuring data accuracy, the obtained ROH information is very large, different thresholds are required to be set for screening according to different purposes, and a large number of documents or databases are required to be consulted for annotating the data aiming at the screened information so as to finally obtain a reasonable result report, which is time-consuming and labor-consuming. And the analysis of the chromosome array ROH data still stays in traditional personal experience, and a scientific system analysis method is lacked, so that a great challenge is brought to the analysis of the chromosome array ROH data. Therefore, it is urgent to establish a scientific system of ROH data analysis method based on chromosome microarray.
Disclosure of Invention
The invention aims to provide a ROH data analysis system based on chromosome microarrays.
The invention aims to provide a ROH data analysis method based on chromosome microarrays.
In order to achieve the object of the present invention, in a first aspect, the present invention provides a ROH data analysis system based on a chromosome microarray, which comprises a ROH data acquisition and screening module, a calculation module, a search and analysis module 1, a search and analysis module 2, a search and analysis module 3 and a search and analysis module 4, which are sequentially connected:
1) The ROH data acquisition and screening module is used for acquiring the off-machine data of the chromosome microarray and screening out the data with the ROH fragment size of more than or equal to 5 Mb;
2) The calculating module is used for calculating the ratio of the total length of the ROH fragment which is more than or equal to 5Mb to the sum of the lengths of all autosomes, and if the ratio is more than or equal to 6.25%, the risk of the autosomal recessive genetic disease is high, the possible pathogenic ROH is reported; if the ratio is less than 6.25%, inputting the ROH fragment into the searching and analyzing module 1;
3) The retrieval and analysis module 1 is used for retrieving and analyzing the ROH fragments from the calculation module; wherein, the module 1 comprises a normal crowd ROH database;
If the ROH fragment has a crowd proportion of more than 1% in the ROH database of normal crowd and overlaps with the genome coordinates of the target ROH fragment by more than or equal to 80%, reporting as benign ROH;
If the condition is not satisfied, inputting the ROH fragment into the searching and analyzing module 2;
4) The searching and analyzing module 2 is used for further searching and analyzing the ROH fragments from the module 1; wherein module 2 comprises a UPD-associated database of known genetic syndromes;
If the genome coordinates of the target UPD fragment in the database of known genetic syndromes related to UPD overlap by more than or equal to 80%, indicating UPD risk, and reporting as possible pathogenic ROH;
if the above conditions are not satisfied, inputting the ROH fragment into the retrieval and analysis module 3;
5) A retrieving and analyzing module 3 for further retrieving and analyzing the ROH fragments from the module 2; wherein, module 3 contains a UPD-related tumor case database;
If the genome coordinates of the target UPD fragment in the UPD related tumor case database overlap by more than or equal to 80 percent and the ROH fragment is positioned at the tail end of a chromosome, reporting as pathogenic ROH;
If the genome coordinates of the target UPD fragment in the UPD related tumor case database overlap by more than or equal to 80% and the ROH is not located at the tail end of a chromosome, prompting the risk of tumorigenesis and reporting as possible pathogenic ROH;
if the above conditions are not satisfied, inputting the ROH fragment into the retrieval and analysis module 4;
6) A retrieving and analyzing module 4 for further retrieving and analyzing the ROH fragments from the module 3; wherein, module 4 contains a UCSC database;
If the ROH fragment contains the Mendelian recessive genetic disease related genes or tumor related genes recorded in the UCSC database, prompting a sequencing test to exclude the potential pathogenic gene homozygous variation event, and reporting as clinically-significant tentative ROH;
If the ROH fragment does not contain the Mendelian recessive genetic disease related gene or the tumor related gene recorded in the UCSC database, the clinical meaning of the ROH fragment is reported to be tentatively ambiguous.
Preferably, the normal population ROH database is set forth in table 1:
TABLE 1
Preferably, the UPD-related database of known genetic syndromes is shown in table 2:
TABLE 2
/>
/>
Preferably, the UPD-related tumor case database is presented in table 3:
TABLE 3 Table 3
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
/>
In a second aspect, the present invention provides a method for analyzing ROH data based on a chromosome microarray, comprising:
1. Setting 5Mb as a result report threshold of ROH on the premise that the data quality control MAPD is less than or equal to 0.25, SNPQC is less than or equal to 15.0 and WAVINESS SD is less than or equal to 0.12, wherein the threshold of 5Mb is an ideal threshold of ROH clinical analysis and is obtained through early large sample verification. Data with ROH fragment size not less than 5Mb is selected.
2. The proportion of ROH fragments to the sum of all autosomal lengths (2881 Mb) was calculated:
if the risk of the autosomal recessive genetic disease is not less than 6.25%, the autosomal recessive genetic disease is considered to be close to mating, and the risk is increased, and the autosomal recessive genetic disease is directly reported as possible pathogenicity ROH;
if the ROH fragment is less than 6.25%, the ROH fragment is input into a ROH database of normal people for searching.
3. If the proportion of the population in the ROH database of the normal population is greater than 1%, and the segment overlapping with the target ROH is not less than 80%, reporting as benign ROH;
If the above conditions are not met, the ROH fragment is entered into a UPD-related database of known genetic syndromes for retrieval.
4. If there is more than 80% overlap of UPD with the ROH fragment of interest in the UPD-related database of known genetic syndromes, then prompting UPD risk, suggesting that the parent sample be sent to verify if the ROH is UPD, and reporting as a possible pathogenic ROH;
If the above condition is not satisfied, the ROH fragment is input into a UPD related tumor case database for retrieval.
5. If UPD which is overlapped with more than 80% of the target ROH fragment exists in the UPD related tumor case database, and ROH is abnormal at the tail end of a chromosome, reporting as pathogenicity ROH;
If UPD which is overlapped with more than 80% of the target ROH fragment exists in a UPD related tumor database, but ROH is chromosome non-terminal abnormality, prompting the risk of tumor occurrence, suggesting a checking control to verify whether the target ROH is acquired ROH/UPD, and reporting the target ROH as possible pathogenic ROH;
if the above conditions are not met, the ROH fragment is entered into the UCSC database according to whether the ROH segment contains a Mendelian recessive genetic disease-related gene or a tumor closely-related gene that may cause serious consequences.
If the ROH fragment of interest contains a Mendelian recessive genetic disease-related gene or a tumor closely related gene that may lead to serious consequences, a sequencing test is suggested to exclude potential pathogenic gene homozygous variant events, while reporting as clinically meaningful tentative ROH;
If the above conditions are not satisfied, a clinically significant tentative ROH is reported as well.
Compared with the prior art, the invention has at least the following advantages:
The invention establishes a scientific and strict ROH screening threshold value which is obtained based on large sample data and is obtained by statistics according to the relationship between the ROH size of a large sample and the clinical phenotype of the sample.
The segment ratio is calculated firstly instead of directly analyzing the obtained ROH data, and since 6.25% homozygosity represents a three-level close relationship, if the ROH obtained by screening is distributed on a plurality of chromosomes and the sum of the segment sizes accounts for more than 6.25% of the proportion of all autosomes, the close mating is prompted, and the ROH can be directly reported as the possible pathogenicity ROH without carrying out subsequent analysis.
The third invention provides 3 search databases, namely a normal population ROH database, a UPD related known genetic syndrome database and a UPD related tumor case database, wherein the normal population ROH database is established based on 381 chromosome microarray data of normal people, and the normal population ROH is distributed on 1-22 chromosomes and XY chromosomes, but the distribution frequency is high and low. Meanwhile, according to the data of the literature report and the public database, the known genetic comprehensive database and the tumor case database related to UPD are respectively arranged.
(IV) the known genetic syndrome related to ROH/UPD has more than 10 kinds of known disease causes, and the known genetic syndrome has more than 10 kinds of disease causes and is recorded in a database. In contrast, UPD related tumor databases mainly contain UPD related to blood tumor, and ROH/UPD of 538 cases of blood tumor in total. The 3 databases can greatly simplify the ROH data analysis time, and the analysis time can be reduced from 0.5-1 day to 0.5-1 hour.
The invention is applicable not only to the analysis of the ROH data of a chromosome microarray, but also to other detection techniques, such as the analysis of the ROH data detected by Short Tandem Repeat (STR) and Whole Exome (WES)/Whole Genome (WGS) sequencing.
And (six) the ROH analysis method of the invention simultaneously considers the etiology of the constitutional change and the etiology of the acquired change.
Drawings
FIG. 1 is a flow chart of ROH data analysis based on a chromosome microarray in a preferred embodiment of the invention.
FIG. 2 is a roadmap of ROH data analysis based on a chromosome microarray in accordance with a preferred embodiment of the invention.
Detailed Description
The following examples are illustrative of the invention and are not intended to limit the scope of the invention. Unless otherwise indicated, the technical means used in the examples are conventional means well known to those skilled in the art, and all raw materials used are commercially available.
Example 1 establishment of chromosome microarray-based ROH data analysis method
The embodiment provides a ROH data analysis method based on chromosome microarray, the analysis flow is shown in FIG. 1, and the analysis route is shown in FIG. 2. The specific method comprises the following steps:
1. Setting 5Mb as a result report threshold of ROH on the premise that the data quality control MAPD is less than or equal to 0.25, SNPQC is less than or equal to 15.0 and WAVINESS SD is less than or equal to 0.12, wherein the threshold of 5Mb is an ideal threshold of ROH clinical analysis and is obtained through early large sample verification. Data with ROH fragment size not less than 5Mb is selected.
2. The proportion of ROH fragments to the sum of all autosomal lengths (2881 Mb) was calculated:
if the risk of the autosomal recessive genetic disease is not less than 6.25%, the autosomal recessive genetic disease is considered to be close to mating, and the risk is increased, and the autosomal recessive genetic disease is directly reported as possible pathogenicity ROH;
if the ROH fragment is less than 6.25%, the ROH fragment is input into a ROH database of normal people for searching.
3. If the proportion of the population in the ROH database of the normal population is greater than 1%, and the segment overlapping with the target ROH is not less than 80%, reporting as benign ROH;
If the above conditions are not met, the ROH fragment is entered into a UPD-related database of known genetic syndromes for retrieval.
4. If there is more than 80% overlap of UPD with the ROH fragment of interest in the UPD-related database of known genetic syndromes, then prompting UPD risk, suggesting that the parent sample be sent to verify if the ROH is UPD, and reporting as a possible pathogenic ROH;
if the above condition is not satisfied, the ROH fragment is input into a UPD related tumor database for retrieval.
5. If UPD which is overlapped with more than 80% of the target ROH fragment exists in a UPD related tumor database, and ROH is abnormal at the tail end of a chromosome, reporting as pathogenicity ROH;
If UPD which is overlapped with more than 80% of the target ROH fragment exists in a UPD related tumor database, but ROH is chromosome non-terminal abnormality, prompting the risk of tumor occurrence, suggesting a checking control to verify whether the target ROH is acquired ROH/UPD, and reporting the target ROH as possible pathogenic ROH;
if the above conditions are not met, the ROH fragment is entered into the UCSC database according to whether the ROH segment contains a Mendelian recessive genetic disease-related gene or a tumor closely-related gene that may cause serious consequences.
If the ROH fragment of interest contains a Mendelian recessive genetic disease-related gene or a tumor closely related gene that may lead to serious consequences, a sequencing test is suggested to exclude potential pathogenic gene homozygous variant events, while reporting as clinically meaningful tentative ROH;
If the above conditions are not satisfied, a clinically significant tentative ROH is reported as well.
The invention can more objectively and comprehensively understand the objective ROH abnormality and the clinical significance by utilizing a plurality of databases. However, the ROH result interpretation depends on the existing database retrieval and literature report, and the clinical meaning interpretation of the ROH result interpretation is related to the development of the current scientific research state of related cases. And the clinical manifestations of the subject may have individual differences from the interpretation results due to complex causes of genetic pleiotropic, delayed dominant, incomplete extinguishment and differences in manifestation.
While the invention has been described in detail in the foregoing general description and with reference to specific embodiments thereof, it will be apparent to one skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (1)
1. The ROH data analysis system based on the chromosome microarray is characterized by comprising an ROH data acquisition and screening module, a calculation module, a retrieval and analysis module 1, a retrieval and analysis module 2, a retrieval and analysis module 3 and a retrieval and analysis module 4 which are connected in sequence:
1) The ROH data acquisition and screening module is used for acquiring the off-machine data of the chromosome microarray and screening out the data with the ROH fragment size of more than or equal to 5 Mb;
2) The calculating module is used for calculating the ratio of the total length of the ROH fragment which is more than or equal to 5Mb to the sum of the lengths of all autosomes, and if the ratio is more than or equal to 6.25%, the risk of the autosomal recessive genetic disease is high, the possible pathogenic ROH is reported; if the ratio is less than 6.25%, inputting the ROH fragment into the searching and analyzing module 1;
3) The retrieval and analysis module 1 is used for retrieving and analyzing the ROH fragments from the calculation module; wherein, the module 1 comprises a normal crowd ROH database;
if the ROH fragment has a crowd proportion of more than 1% in the ROH database of normal crowd and the genome coordinate overlap of the ROH fragment with the target ROH fragment is more than or equal to 80%, reporting as benign ROH;
If the condition is not satisfied, inputting the ROH fragment into the searching and analyzing module 2;
4) The searching and analyzing module 2 is used for further searching and analyzing the ROH fragments from the module 1; wherein module 2 comprises a UPD-associated database of known genetic syndromes;
If the genome coordinates of the target UPD fragment in the database of known genetic syndromes related to UPD overlap by more than or equal to 80%, indicating UPD risk, and reporting as possible pathogenic ROH;
if the above conditions are not satisfied, inputting the ROH fragment into the retrieval and analysis module 3;
5) A retrieving and analyzing module 3 for further retrieving and analyzing the ROH fragments from the module 2; wherein, module 3 contains a UPD-related tumor case database;
If the genome coordinates of the target UPD fragment in the UPD related tumor case database overlap by more than or equal to 80 percent and the ROH fragment is positioned at the tail end of a chromosome, reporting as pathogenic ROH;
if the overlap of the ROH fragment and the target UPD genome coordinates in the UPD related tumor case database is not less than 80%, and the ROH is not positioned at the tail end of a chromosome, prompting the risk of tumorigenesis and reporting as possible pathogenic ROH;
if the above conditions are not satisfied, inputting the ROH fragment into the retrieval and analysis module 4;
6) A retrieving and analyzing module 4 for further retrieving and analyzing the ROH fragments from the module 3; wherein, module 4 contains a UCSC database;
If the ROH fragment contains the Mendelian recessive genetic disease related genes or tumor related genes recorded in the UCSC database, prompting a sequencing test to exclude the potential pathogenic gene homozygous variation event, and reporting as clinically-significant tentative ROH;
If the ROH fragment does not contain the Mendelian recessive genetic disease related gene or the tumor related gene recorded in the UCSC database, the clinical meaning of the ROH fragment is reported to be tentatively ambiguous.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011612380.6A CN112735518B (en) | 2020-12-30 | 2020-12-30 | ROH data analysis system based on chromosome microarray |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011612380.6A CN112735518B (en) | 2020-12-30 | 2020-12-30 | ROH data analysis system based on chromosome microarray |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112735518A CN112735518A (en) | 2021-04-30 |
CN112735518B true CN112735518B (en) | 2024-04-23 |
Family
ID=75611059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011612380.6A Active CN112735518B (en) | 2020-12-30 | 2020-12-30 | ROH data analysis system based on chromosome microarray |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112735518B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107058547A (en) * | 2017-04-28 | 2017-08-18 | 首度生物科技(苏州)有限公司 | A kind of detection method of sperm |
CN110021364A (en) * | 2017-11-24 | 2019-07-16 | 上海暖闻信息科技有限公司 | Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene |
CN110931081A (en) * | 2019-11-28 | 2020-03-27 | 广州基迪奥生物科技有限公司 | Biological information analysis method for human monogenic genetic disease detection |
CN111199773A (en) * | 2020-01-20 | 2020-05-26 | 中国农业科学院北京畜牧兽医研究所 | Evaluation method of fine positioning character associated genome homozygous fragments |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190057134A1 (en) * | 2017-08-21 | 2019-02-21 | Eitan Moshe Akirav | System and method for automated microarray information citation analysis |
-
2020
- 2020-12-30 CN CN202011612380.6A patent/CN112735518B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107058547A (en) * | 2017-04-28 | 2017-08-18 | 首度生物科技(苏州)有限公司 | A kind of detection method of sperm |
CN110021364A (en) * | 2017-11-24 | 2019-07-16 | 上海暖闻信息科技有限公司 | Analysis detection system based on patients clinical symptom data and full sequencing of extron group data screening single gene inheritance disease Disease-causing gene |
CN110931081A (en) * | 2019-11-28 | 2020-03-27 | 广州基迪奥生物科技有限公司 | Biological information analysis method for human monogenic genetic disease detection |
CN111199773A (en) * | 2020-01-20 | 2020-05-26 | 中国农业科学院北京畜牧兽医研究所 | Evaluation method of fine positioning character associated genome homozygous fragments |
Also Published As
Publication number | Publication date |
---|---|
CN112735518A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ilyas et al. | The genetics of intellectual disability: advancing technology and gene editing | |
Gorkin et al. | Common DNA sequence variation influences 3-dimensional conformation of the human genome | |
Bakker et al. | How to count chromosomes in a cell: An overview of current and novel technologies | |
KR102384620B1 (en) | Methods and processes for non-invasive assessment of genetic variations | |
KR102447079B1 (en) | Methods and processes for non-invasive assessment of genetic variations | |
Feng et al. | Improved molecular diagnosis by the detection of exonic deletions with target gene capture and deep sequencing | |
Cheung et al. | Novel applications of array comparative genomic hybridization in molecular diagnostics | |
Mikhail | Copy number variations and human genetic disease | |
Bastida et al. | Molecular diagnosis of inherited coagulation and bleeding disorders | |
CN117778576A (en) | Free DNA end characterization | |
CN113113081B (en) | System for detecting polyploid and genome homozygous region ROH based on CNV-seq sequencing data | |
Hu et al. | Integrated sequence analysis pipeline provides one‐stop solution for identifying disease‐causing mutations | |
Sun et al. | Characterizing sensitivity and coverage of clinical WGS as a diagnostic test for genetic disorders | |
Leu et al. | Neurological disorder-associated genetic variants in individuals with psychogenic nonepileptic seizures | |
Kockum et al. | Overview of genotyping technologies and methods | |
Xu et al. | Genetic deconvolution of fetal and maternal cell-free DNA in maternal plasma enables next-generation non-invasive prenatal screening | |
CN109524060B (en) | Genetic disease risk prompting gene sequencing data processing system and processing method | |
Zhao et al. | BreakSeek: a breakpoint-based algorithm for full spectral range INDEL detection | |
Hofmeister et al. | Parent-of-Origin inference for biobanks | |
Pankratov et al. | Prioritizing autoimmunity risk variants for functional analyses by fine-mapping mutations under natural selection | |
CA3143723C (en) | Systems and methods for determining pattern of inheritance in embryos | |
Keren et al. | Oligonucleotide microarrays in constitutional genetic diagnosis | |
Shi et al. | Novel perspectives in fetal biomarker implementation for the noninvasive prenatal testing | |
CN112735518B (en) | ROH data analysis system based on chromosome microarray | |
Liu et al. | Novel Y-chromosomal microdeletions associated with non-obstructive azoospermia uncovered by high throughput sequencing of sequence-tagged sites (STSs) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |