CN115948521A - Method for detecting aneuploid missing chromosome information - Google Patents
Method for detecting aneuploid missing chromosome information Download PDFInfo
- Publication number
- CN115948521A CN115948521A CN202211716471.3A CN202211716471A CN115948521A CN 115948521 A CN115948521 A CN 115948521A CN 202211716471 A CN202211716471 A CN 202211716471A CN 115948521 A CN115948521 A CN 115948521A
- Authority
- CN
- China
- Prior art keywords
- chromosome
- sequencing
- organism
- detecting
- scatter diagram
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 210000000349 chromosome Anatomy 0.000 title claims abstract description 77
- 238000000034 method Methods 0.000 title claims abstract description 39
- 208000036878 aneuploidy Diseases 0.000 title claims abstract description 25
- 230000003322 aneuploid effect Effects 0.000 title abstract description 8
- 238000012163 sequencing technique Methods 0.000 claims abstract description 68
- 238000010586 diagram Methods 0.000 claims abstract description 15
- 231100001075 aneuploidy Toxicity 0.000 claims description 17
- 238000012070 whole genome sequencing analysis Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 238000000246 agarose gel electrophoresis Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 abstract description 8
- 238000012165 high-throughput sequencing Methods 0.000 abstract description 6
- 238000012217 deletion Methods 0.000 abstract description 2
- 230000037430 deletion Effects 0.000 abstract description 2
- 241000196324 Embryophyta Species 0.000 description 12
- 239000000523 sample Substances 0.000 description 7
- 238000009395 breeding Methods 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 230000001488 breeding effect Effects 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000007400 DNA extraction Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 208000037280 Trisomy Diseases 0.000 description 2
- 230000002759 chromosomal effect Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 238000001727 in vivo Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000000178 monomer Substances 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 201000010374 Down Syndrome Diseases 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 241000219000 Populus Species 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000000684 flow cytometry Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 238000012268 genome sequencing Methods 0.000 description 1
- 238000007901 in situ hybridization Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a method for detecting aneuploid deletion chromosome information, which comprises the following steps: extracting DNA of an organism to be tested and sequencing a whole genome to obtain a sequencing sequence; comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected; fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve; and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected. The method is based on the second-generation high-throughput sequencing technology, can greatly shorten the time compared with the traditional detection method when facing large-batch samples, can realize automatic operation, and has the advantages of standard property, repeatability and the like.
Description
Technical Field
The invention belongs to the field of genome sequencing and bioinformatics, and particularly relates to a method for detecting aneuploid deletion chromosome information.
Background
In most cases, aneuploidies are fatal to animals and humans, but plants often exhibit greater tolerance to aneuploidies, particularly in allopolyploid plants. The aneuploidy has incomparable advantages in the physical position determination of genes and molecular markers, gene transfer and the establishment of the corresponding relation between linkage groups and chromosomes, has important significance for the research of heredity and breeding of plants, and simultaneously obtains a plurality of achievements in the application of actual breeding.
Through aneuploidy research, the genetic rules among various properties of the plants can be cleared up more quickly and systematically, and the relationship between the chromosomes of the plants and the related plants of the plants can be determined, so that various special and excellent new varieties can be bred more systematically. However, since such research involves a large number of hybridization experiments, the workload is large and the time is long, and the research in the forest field is slightly deficient. Because the growth period of the forest is long and the direction is difficult to adjust in the breeding process, enough chromosome information must be obtained before the formal experiment is carried out.
Karyotyping based on individual chromosome sets, such as C-band method, G-band method, flow cytometry and Fluorescence In Situ Hybridization (FISH) based on chromosome specific probes are common aneuploidy identification methods today. However, most of the above methods have strong preference for the type of experimental material and require long-term experimental preparation, and will be slightly laborious in the face of screening work of large-scale experimental materials. Ploidy analyzers can quickly determine whether a created population is aneuploid, but it is difficult to determine the specific chromosome composition of each individual.
In addition, the method for detecting whether the coverage depth of a sample and a standard reference sample has a significant difference by using a T test after the traditional high-throughput sequencing is applied to the clinic of human diseases such as Down syndrome, 18-trisomy syndrome and the like. However, in the case of aneuploidy breeding of plants, it is difficult to develop the breeding method because of the factors such as the large number of hybrid varieties, large genome variation, large increase and decrease of the number of chromosomes, and difficulty in obtaining standard references. Therefore, it is desirable to provide a method for detecting aneuploidy missing chromosome information.
Disclosure of Invention
The invention aims to provide a method for detecting aneuploidy missing chromosome information so as to solve the problems in the prior art.
In order to achieve the above object, the present invention provides a method for detecting aneuploid missing chromosome information, comprising the steps of:
extracting DNA of an organism to be tested and sequencing a whole genome to obtain a sequencing sequence;
comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected;
fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve;
and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected.
Optionally, the whole genome sequencing of the extracted DNA further comprises: and detecting the integrity of the DNA based on agarose gel electrophoresis, and detecting the concentration of the DNA by using a microplate reader.
Optionally, the reference genome is selected from the species itself of the test organism or the genome of a closely derived species and has been mounted to the chromosomal level.
Optionally, the process of obtaining a frequency scatter plot of each chromosome comprises: and acquiring the sequencing depth of each base on each chromosome, and counting the occurrence frequency of each sequencing depth to further acquire a frequency scatter diagram of each chromosome.
Optionally, the fitting the frequency scatter diagram of each chromosome includes: and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of a single Gaussian curve or x Gaussian curves, wherein x is the number of peaks in the frequency scatter diagram of each chromosome.
Optionally, before the clustering the sequencing depth, the method further comprises: and sequencing the sequencing depth to obtain a Gaussian peak with the maximum sequencing depth in each chromosome.
Optionally, the process of obtaining the chromosomal ploidy of the test organism comprises: performing one-dimensional array clustering on the sequencing depth to obtain different clustering groups; obtaining the median of the sequencing depths of the different clustering groups based on the ploidy relationship among the medias of the different clustering groups; and clustering different chromosomes based on the median of the sequencing depths of the different clustering groups and the Gaussian peak with the largest sequencing depth in each chromosome, thereby obtaining the ploidy of the chromosome of the organism to be detected.
The invention has the technical effects that:
the method identifies the real sequencing depth of the chromosome by using a method for counting the occurrence frequency of each sequencing depth on a genome, splits a sequencing depth frequency curve by using a mixed Gaussian fitting model so as to comb various factors causing unstable sequencing depth of the chromosome, and applies a clustering algorithm to group all sequencing depth peak values so as to determine the sequencing depth of the monomer, thereby improving the detection precision of the ploidy of the chromosome.
The method is based on the second-generation high-throughput sequencing technology, can greatly shorten the time compared with the traditional detection method when facing large-batch samples, can realize automatic operation, and has the advantages of standard property, repeatability and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flowchart of a method for detecting aneuploidy missing chromosome information in an embodiment of the invention;
FIG. 2 is a frequency scattergram of 19 chromosomes according to an embodiment of the present invention;
FIG. 3 is a graph showing the fitting of sample No. 1 in the example of the present invention after Gaussian fitting.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example one
As shown in fig. 1, the present embodiment provides a method for detecting aneuploidy missing chromosome information, including the following steps:
DNA extraction:
according to the characteristics of the animal and plant samples to be detected, a proper DNA extraction scheme is selected, the DNA content and the purity are detected, and the quality of the samples meets the official computer standard of a sequencer.
In the examples, 6 aneuploid plants obtained by hybridization were selected as experimental samples and two aneuploid plants were selected as controls, and standard DNA extraction was performed using MGIEasy universal DNA library preparation kit. And after extraction, detecting the integrity of the sample by using agarose gel electrophoresis, detecting the concentration by using an enzyme-linked immunosorbent assay, wherein the detection kit adopts DNABR. The results show that the quality of the extracted samples all meet the on-machine standard of the sequencing platform.
Whole genome sequencing:
based on the second generation high-throughput sequencing technology, sequencing library preparation and on-machine detection are carried out on an Illumina or BGI sequencing platform according to an official instruction manual, and instrument parameters and an operation method are all strictly carried out by referring to the instruction manual corresponding to the sequencing platform.
Based on the second generation high-throughput sequencing technology, the preparation of a sequencing library and the detection on a computer are carried out on a MGISEQ-2000 sequencing platform according to an official instruction manual. The library building type is DNBSEQ WGS, the sequencing mode is selected as PE150 whole genome sequencing, and instrument parameters and an operation method are carried out by strictly referring to an instruction manual of a corresponding sequencing platform.
Comparing the sequencing sequence with a reference genome and counting the sequencing depth:
and after off-machine data are obtained, aligning the sequences obtained by double-end sequencing to a reference genome, wherein the reference genome can be selected from the species or the genome of a closely-sourced species, but must be mounted to the chromosome level. In view of the allelic differences between different individuals within a species, the alignment scheme should select a method with a high tolerance to errors as much as possible. After the alignment is completed, the sequencing depth of each nucleotide on the reference genome is calculated respectively, and the frequency of occurrence of each sequencing depth is counted in units of chromosomes. The results are presented in a scatter plot, with the abscissa being the sequencing depth and the ordinate being the frequency of occurrence corresponding to that sequencing depth.
Taking sample No. 1 as an example, 266.24M double-ended sequences were obtained by the following machine. After filtering out low-quality sequences, the sequenced sequences were aligned to the poplar reference genome using BWA-MEM with an alignment rate of 92.06%. The sequencing depth of each base on the chromosome was then calculated, and the frequency was counted in units of chromosomes. As shown in FIG. 2, a line-linked scatter plot of 19 chromosomes is shown, and the abscissa represents the sequencing depth and the ordinate represents the frequency of occurrence of the sequencing depth.
Drawing a fitting curve by taking a chromosome as a unit and calculating a peak value:
and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of single Gaussian curves or x Gaussian curves by utilizing a Gaussian fitting principle, wherein x is the number of peaks appearing in the frequency curves. After the fit was completed, the sequencing depths corresponding to all gaussian peaks were recorded and ranked from small to large.
Curve fitting was performed on each chromosome, and as shown in FIG. 3, using chromosome 1 of sample No. 1 as an example, gaussian fitting was performed according to the number of peaks to obtain 2 normal distribution curves, R-Square (R) 2 ) Was 0.988. Record the sequencing depth values 34X and 68 corresponding to the fitted curve peaksX, and this is repeated for the remaining 18 chromosomes, and finally 38 normal distribution curves can be obtained.
Judging the specific ploidy of the chromosome in the organism:
and for all the numerical values recorded in the last step, performing one-dimensional array clustering by using DBSCAN or other clustering algorithms without setting the group number in advance. The median of the different clustering groups should have a ploidy relationship, and assuming that the sequencing depth corresponding to the first group of peaks (i.e., monomers) is y, the sequencing depth corresponding to the second group of peaks should be 2y, the sequencing depth corresponding to the third group of peaks should be 3y, and the sequencing depth corresponding to the nth group of peaks should be nxy. And then determining a Gaussian peak with the largest sequencing depth in each chromosome, wherein if the corresponding sequencing depth is clustered to the nth group, the ploidy of the chromosome in the organism is n.
And performing one-dimensional array clustering on the recorded results by using a DBSCAN algorithm, and dividing the curves into three groups, wherein the median of sequencing depth in each group is 34X, 68X and 101X. Wherein the last peak (sequencing depth: 68X) of chromosome 1 is finally clustered into a second group, representing that the ploidy of the chromosome in vivo is 2, i.e., disomic. And the last peaks of chromosomes 5, 8, 13 and 19 are clustered into a third group, representing a ploidy of 3 in vivo, i.e., trisomy, of the above chromosomes. In this way, the sample No. 1 is finally judged to be a three-body plant of chromosomes 5, 8, 13 and 19, and 42 chromosomes are contained in the cell. The results of the tests on the samples of this example are shown in table 1:
TABLE 1
As can be seen from Table 1, the detection results of this example are consistent with those of the ploidy analyzer. Because the second generation high-throughput sequencing technology is used as a basis, compared with the traditional detection method, the method can greatly shorten the time when a large number of samples are detected, can realize automatic operation, and has the advantages of standard property, repeatability and the like.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (7)
1. A method for detecting aneuploidy deleted chromosome information, comprising the steps of:
extracting DNA of an organism to be detected and carrying out whole genome sequencing to obtain a sequencing sequence;
comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected;
fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve;
and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected.
2. The method for detecting aneuploidy deleted chromosome information according to claim 1,
before whole genome sequencing of the extracted DNA, the method also comprises the following steps: and detecting the integrity of the DNA based on agarose gel electrophoresis, and detecting the concentration of the DNA by using a microplate reader.
3. The method for detecting aneuploidy deleted chromosome information according to claim 1,
the reference genome is selected from the species itself of the organism to be tested or the genome of a closely-derived species, and is mounted to the chromosome level.
4. The method for detecting aneuploidy deleted chromosome information according to claim 1,
the process of obtaining a frequency scattergram for each chromosome includes: and acquiring the sequencing depth of each base on each chromosome, and counting the occurrence frequency of each sequencing depth to further acquire a frequency scatter diagram of each chromosome.
5. The method for detecting aneuploidy deleted chromosome information according to claim 4,
the process of fitting the frequency scatter diagram of each chromosome includes: and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of a single Gaussian curve or x Gaussian curves, wherein x is the number of peaks in the frequency scatter diagram of each chromosome.
6. The method for detecting aneuploidy deleted chromosome information according to claim 1,
before the clustering process is carried out on the sequencing depth, the method further comprises the following steps: and sequencing the sequencing depth to obtain a Gaussian peak with the maximum sequencing depth in each chromosome.
7. The method for detecting aneuploidy deleted chromosome information according to claim 6,
the process of obtaining the chromosome ploidy of the test organism comprises: performing one-dimensional array clustering on the sequencing depth to obtain different clustering groups; obtaining the median of the sequencing depths of the different clustering groups based on the ploidy relationship among the medias of the different clustering groups; and clustering different chromosomes based on the median of the sequencing depths of the different clustering groups and the Gaussian peak with the largest sequencing depth in each chromosome, thereby obtaining the ploidy of the chromosome of the organism to be detected.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211716471.3A CN115948521A (en) | 2022-12-29 | 2022-12-29 | Method for detecting aneuploid missing chromosome information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211716471.3A CN115948521A (en) | 2022-12-29 | 2022-12-29 | Method for detecting aneuploid missing chromosome information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115948521A true CN115948521A (en) | 2023-04-11 |
Family
ID=87287350
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211716471.3A Pending CN115948521A (en) | 2022-12-29 | 2022-12-29 | Method for detecting aneuploid missing chromosome information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115948521A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117012274A (en) * | 2023-10-07 | 2023-11-07 | 北京智因东方转化医学研究中心有限公司 | Device for identifying gene deletion based on high-throughput sequencing |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107922959A (en) * | 2015-07-02 | 2018-04-17 | 阿瑞玛基因组学公司 | The accurate molecular of blend sample deconvolutes |
CN108804876A (en) * | 2017-05-05 | 2018-11-13 | 中国科学院上海药物研究所 | Method and apparatus for calculating cancer sample purity and ploidy |
CN111091868A (en) * | 2019-12-23 | 2020-05-01 | 江苏先声医学诊断有限公司 | Method and system for analyzing chromosome aneuploidy |
-
2022
- 2022-12-29 CN CN202211716471.3A patent/CN115948521A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107922959A (en) * | 2015-07-02 | 2018-04-17 | 阿瑞玛基因组学公司 | The accurate molecular of blend sample deconvolutes |
CN108804876A (en) * | 2017-05-05 | 2018-11-13 | 中国科学院上海药物研究所 | Method and apparatus for calculating cancer sample purity and ploidy |
CN111091868A (en) * | 2019-12-23 | 2020-05-01 | 江苏先声医学诊断有限公司 | Method and system for analyzing chromosome aneuploidy |
Non-Patent Citations (1)
Title |
---|
刘明铭;王满;姜盼盼;刘茜;余莎;刘靖;: "全外显子测序技术进行***基因分析", 深圳中西医结合杂志, no. 15, 15 August 2018 (2018-08-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117012274A (en) * | 2023-10-07 | 2023-11-07 | 北京智因东方转化医学研究中心有限公司 | Device for identifying gene deletion based on high-throughput sequencing |
CN117012274B (en) * | 2023-10-07 | 2024-01-16 | 北京智因东方转化医学研究中心有限公司 | Device for identifying gene deletion based on high-throughput sequencing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115428088A (en) | Systems and methods for joint interactive visualization of gene expression and DNA chromatin accessibility | |
CN109559780A (en) | A kind of RNA data processing method of high-flux sequence | |
CN114708910B (en) | Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data | |
CN115052994A (en) | Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof | |
CN115948521A (en) | Method for detecting aneuploid missing chromosome information | |
CN111778353A (en) | SNP molecular marker for identifying common wheat variety and SNP molecular marker detection method | |
CN105925722B (en) | QTL related to soybean protein content, method for obtaining molecular marker, molecular marker and application | |
CN110970091A (en) | Label quality control method and device | |
CN111292806B (en) | Transcriptome analysis method by using nanopore sequencing | |
CN107885972A (en) | It is a kind of based on the fusion detection method of single-ended sequencing and its application | |
CN112102944A (en) | NGS-based brain tumor molecular diagnosis analysis method | |
CN104769133A (en) | Method of improving microarray performance by strand elimination | |
CN116312779A (en) | Method and apparatus for detecting sample contamination and identifying sample mismatch | |
CN113293220B (en) | Gene chip for analyzing ear size of sheep, molecular probe combination, kit and application | |
CN114400045A (en) | Method, probe set, kit and system for detecting homologous recombination repair defects based on second-generation sequencing | |
CN104573409B (en) | The multiple check method of the assignment of genes gene mapping | |
Meyer et al. | ReadZS detects developmentally regulated RNA processing programs in single cell RNA-seq and defines subpopulations independent of gene expression | |
Wainer-Katsir et al. | BIRD: identifying cell doublets via biallelic expression from single cells | |
CN110684830A (en) | RNA analysis method for paraffin section tissue | |
CN117089636B (en) | Molecular marker combination for analyzing goat meat performance and application | |
CN117089635B (en) | Molecular marker combination for analyzing goat reproductive performance and application | |
CN117106936B (en) | Molecular marker combination for analyzing goat hair color character and application | |
CN117757979B (en) | Primer group, kit and identification method for identifying soybean varieties | |
CN117089634B (en) | Molecular marker combination for analyzing goat milk performance and application | |
CN117089633B (en) | Molecular marker combination for analyzing existence of goat fluff and application |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |