CN115948521A - Method for detecting aneuploid missing chromosome information - Google Patents

Method for detecting aneuploid missing chromosome information Download PDF

Info

Publication number
CN115948521A
CN115948521A CN202211716471.3A CN202211716471A CN115948521A CN 115948521 A CN115948521 A CN 115948521A CN 202211716471 A CN202211716471 A CN 202211716471A CN 115948521 A CN115948521 A CN 115948521A
Authority
CN
China
Prior art keywords
chromosome
sequencing
organism
detecting
scatter diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211716471.3A
Other languages
Chinese (zh)
Inventor
陈肃
陈嵩
于越
王鑫宇
周妍
何蕊含
刘文轩
刘宣晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Forestry University
Original Assignee
Northeast Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Forestry University filed Critical Northeast Forestry University
Priority to CN202211716471.3A priority Critical patent/CN115948521A/en
Publication of CN115948521A publication Critical patent/CN115948521A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for detecting aneuploid deletion chromosome information, which comprises the following steps: extracting DNA of an organism to be tested and sequencing a whole genome to obtain a sequencing sequence; comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected; fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve; and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected. The method is based on the second-generation high-throughput sequencing technology, can greatly shorten the time compared with the traditional detection method when facing large-batch samples, can realize automatic operation, and has the advantages of standard property, repeatability and the like.

Description

Method for detecting aneuploid missing chromosome information
Technical Field
The invention belongs to the field of genome sequencing and bioinformatics, and particularly relates to a method for detecting aneuploid deletion chromosome information.
Background
In most cases, aneuploidies are fatal to animals and humans, but plants often exhibit greater tolerance to aneuploidies, particularly in allopolyploid plants. The aneuploidy has incomparable advantages in the physical position determination of genes and molecular markers, gene transfer and the establishment of the corresponding relation between linkage groups and chromosomes, has important significance for the research of heredity and breeding of plants, and simultaneously obtains a plurality of achievements in the application of actual breeding.
Through aneuploidy research, the genetic rules among various properties of the plants can be cleared up more quickly and systematically, and the relationship between the chromosomes of the plants and the related plants of the plants can be determined, so that various special and excellent new varieties can be bred more systematically. However, since such research involves a large number of hybridization experiments, the workload is large and the time is long, and the research in the forest field is slightly deficient. Because the growth period of the forest is long and the direction is difficult to adjust in the breeding process, enough chromosome information must be obtained before the formal experiment is carried out.
Karyotyping based on individual chromosome sets, such as C-band method, G-band method, flow cytometry and Fluorescence In Situ Hybridization (FISH) based on chromosome specific probes are common aneuploidy identification methods today. However, most of the above methods have strong preference for the type of experimental material and require long-term experimental preparation, and will be slightly laborious in the face of screening work of large-scale experimental materials. Ploidy analyzers can quickly determine whether a created population is aneuploid, but it is difficult to determine the specific chromosome composition of each individual.
In addition, the method for detecting whether the coverage depth of a sample and a standard reference sample has a significant difference by using a T test after the traditional high-throughput sequencing is applied to the clinic of human diseases such as Down syndrome, 18-trisomy syndrome and the like. However, in the case of aneuploidy breeding of plants, it is difficult to develop the breeding method because of the factors such as the large number of hybrid varieties, large genome variation, large increase and decrease of the number of chromosomes, and difficulty in obtaining standard references. Therefore, it is desirable to provide a method for detecting aneuploidy missing chromosome information.
Disclosure of Invention
The invention aims to provide a method for detecting aneuploidy missing chromosome information so as to solve the problems in the prior art.
In order to achieve the above object, the present invention provides a method for detecting aneuploid missing chromosome information, comprising the steps of:
extracting DNA of an organism to be tested and sequencing a whole genome to obtain a sequencing sequence;
comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected;
fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve;
and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected.
Optionally, the whole genome sequencing of the extracted DNA further comprises: and detecting the integrity of the DNA based on agarose gel electrophoresis, and detecting the concentration of the DNA by using a microplate reader.
Optionally, the reference genome is selected from the species itself of the test organism or the genome of a closely derived species and has been mounted to the chromosomal level.
Optionally, the process of obtaining a frequency scatter plot of each chromosome comprises: and acquiring the sequencing depth of each base on each chromosome, and counting the occurrence frequency of each sequencing depth to further acquire a frequency scatter diagram of each chromosome.
Optionally, the fitting the frequency scatter diagram of each chromosome includes: and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of a single Gaussian curve or x Gaussian curves, wherein x is the number of peaks in the frequency scatter diagram of each chromosome.
Optionally, before the clustering the sequencing depth, the method further comprises: and sequencing the sequencing depth to obtain a Gaussian peak with the maximum sequencing depth in each chromosome.
Optionally, the process of obtaining the chromosomal ploidy of the test organism comprises: performing one-dimensional array clustering on the sequencing depth to obtain different clustering groups; obtaining the median of the sequencing depths of the different clustering groups based on the ploidy relationship among the medias of the different clustering groups; and clustering different chromosomes based on the median of the sequencing depths of the different clustering groups and the Gaussian peak with the largest sequencing depth in each chromosome, thereby obtaining the ploidy of the chromosome of the organism to be detected.
The invention has the technical effects that:
the method identifies the real sequencing depth of the chromosome by using a method for counting the occurrence frequency of each sequencing depth on a genome, splits a sequencing depth frequency curve by using a mixed Gaussian fitting model so as to comb various factors causing unstable sequencing depth of the chromosome, and applies a clustering algorithm to group all sequencing depth peak values so as to determine the sequencing depth of the monomer, thereby improving the detection precision of the ploidy of the chromosome.
The method is based on the second-generation high-throughput sequencing technology, can greatly shorten the time compared with the traditional detection method when facing large-batch samples, can realize automatic operation, and has the advantages of standard property, repeatability and the like.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a flowchart of a method for detecting aneuploidy missing chromosome information in an embodiment of the invention;
FIG. 2 is a frequency scattergram of 19 chromosomes according to an embodiment of the present invention;
FIG. 3 is a graph showing the fitting of sample No. 1 in the example of the present invention after Gaussian fitting.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Example one
As shown in fig. 1, the present embodiment provides a method for detecting aneuploidy missing chromosome information, including the following steps:
DNA extraction:
according to the characteristics of the animal and plant samples to be detected, a proper DNA extraction scheme is selected, the DNA content and the purity are detected, and the quality of the samples meets the official computer standard of a sequencer.
In the examples, 6 aneuploid plants obtained by hybridization were selected as experimental samples and two aneuploid plants were selected as controls, and standard DNA extraction was performed using MGIEasy universal DNA library preparation kit. And after extraction, detecting the integrity of the sample by using agarose gel electrophoresis, detecting the concentration by using an enzyme-linked immunosorbent assay, wherein the detection kit adopts DNABR. The results show that the quality of the extracted samples all meet the on-machine standard of the sequencing platform.
Whole genome sequencing:
based on the second generation high-throughput sequencing technology, sequencing library preparation and on-machine detection are carried out on an Illumina or BGI sequencing platform according to an official instruction manual, and instrument parameters and an operation method are all strictly carried out by referring to the instruction manual corresponding to the sequencing platform.
Based on the second generation high-throughput sequencing technology, the preparation of a sequencing library and the detection on a computer are carried out on a MGISEQ-2000 sequencing platform according to an official instruction manual. The library building type is DNBSEQ WGS, the sequencing mode is selected as PE150 whole genome sequencing, and instrument parameters and an operation method are carried out by strictly referring to an instruction manual of a corresponding sequencing platform.
Comparing the sequencing sequence with a reference genome and counting the sequencing depth:
and after off-machine data are obtained, aligning the sequences obtained by double-end sequencing to a reference genome, wherein the reference genome can be selected from the species or the genome of a closely-sourced species, but must be mounted to the chromosome level. In view of the allelic differences between different individuals within a species, the alignment scheme should select a method with a high tolerance to errors as much as possible. After the alignment is completed, the sequencing depth of each nucleotide on the reference genome is calculated respectively, and the frequency of occurrence of each sequencing depth is counted in units of chromosomes. The results are presented in a scatter plot, with the abscissa being the sequencing depth and the ordinate being the frequency of occurrence corresponding to that sequencing depth.
Taking sample No. 1 as an example, 266.24M double-ended sequences were obtained by the following machine. After filtering out low-quality sequences, the sequenced sequences were aligned to the poplar reference genome using BWA-MEM with an alignment rate of 92.06%. The sequencing depth of each base on the chromosome was then calculated, and the frequency was counted in units of chromosomes. As shown in FIG. 2, a line-linked scatter plot of 19 chromosomes is shown, and the abscissa represents the sequencing depth and the ordinate represents the frequency of occurrence of the sequencing depth.
Drawing a fitting curve by taking a chromosome as a unit and calculating a peak value:
and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of single Gaussian curves or x Gaussian curves by utilizing a Gaussian fitting principle, wherein x is the number of peaks appearing in the frequency curves. After the fit was completed, the sequencing depths corresponding to all gaussian peaks were recorded and ranked from small to large.
Curve fitting was performed on each chromosome, and as shown in FIG. 3, using chromosome 1 of sample No. 1 as an example, gaussian fitting was performed according to the number of peaks to obtain 2 normal distribution curves, R-Square (R) 2 ) Was 0.988. Record the sequencing depth values 34X and 68 corresponding to the fitted curve peaksX, and this is repeated for the remaining 18 chromosomes, and finally 38 normal distribution curves can be obtained.
Judging the specific ploidy of the chromosome in the organism:
and for all the numerical values recorded in the last step, performing one-dimensional array clustering by using DBSCAN or other clustering algorithms without setting the group number in advance. The median of the different clustering groups should have a ploidy relationship, and assuming that the sequencing depth corresponding to the first group of peaks (i.e., monomers) is y, the sequencing depth corresponding to the second group of peaks should be 2y, the sequencing depth corresponding to the third group of peaks should be 3y, and the sequencing depth corresponding to the nth group of peaks should be nxy. And then determining a Gaussian peak with the largest sequencing depth in each chromosome, wherein if the corresponding sequencing depth is clustered to the nth group, the ploidy of the chromosome in the organism is n.
And performing one-dimensional array clustering on the recorded results by using a DBSCAN algorithm, and dividing the curves into three groups, wherein the median of sequencing depth in each group is 34X, 68X and 101X. Wherein the last peak (sequencing depth: 68X) of chromosome 1 is finally clustered into a second group, representing that the ploidy of the chromosome in vivo is 2, i.e., disomic. And the last peaks of chromosomes 5, 8, 13 and 19 are clustered into a third group, representing a ploidy of 3 in vivo, i.e., trisomy, of the above chromosomes. In this way, the sample No. 1 is finally judged to be a three-body plant of chromosomes 5, 8, 13 and 19, and 42 chromosomes are contained in the cell. The results of the tests on the samples of this example are shown in table 1:
TABLE 1
Figure BDA0004026626560000071
/>
Figure BDA0004026626560000081
As can be seen from Table 1, the detection results of this example are consistent with those of the ploidy analyzer. Because the second generation high-throughput sequencing technology is used as a basis, compared with the traditional detection method, the method can greatly shorten the time when a large number of samples are detected, can realize automatic operation, and has the advantages of standard property, repeatability and the like.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A method for detecting aneuploidy deleted chromosome information, comprising the steps of:
extracting DNA of an organism to be detected and carrying out whole genome sequencing to obtain a sequencing sequence;
comparing the sequencing sequences to a reference genome, and acquiring a frequency scatter diagram of each chromosome of the organism to be detected;
fitting the frequency scatter diagram of each chromosome, and acquiring sequencing depths corresponding to all Gaussian peaks in a fitting curve;
and clustering the sequencing depth to further obtain the chromosome ploidy of the organism to be detected.
2. The method for detecting aneuploidy deleted chromosome information according to claim 1,
before whole genome sequencing of the extracted DNA, the method also comprises the following steps: and detecting the integrity of the DNA based on agarose gel electrophoresis, and detecting the concentration of the DNA by using a microplate reader.
3. The method for detecting aneuploidy deleted chromosome information according to claim 1,
the reference genome is selected from the species itself of the organism to be tested or the genome of a closely-derived species, and is mounted to the chromosome level.
4. The method for detecting aneuploidy deleted chromosome information according to claim 1,
the process of obtaining a frequency scattergram for each chromosome includes: and acquiring the sequencing depth of each base on each chromosome, and counting the occurrence frequency of each sequencing depth to further acquire a frequency scatter diagram of each chromosome.
5. The method for detecting aneuploidy deleted chromosome information according to claim 4,
the process of fitting the frequency scatter diagram of each chromosome includes: and fitting the frequency scatter diagram of each chromosome into a mixed Gaussian model formed by superposition of a single Gaussian curve or x Gaussian curves, wherein x is the number of peaks in the frequency scatter diagram of each chromosome.
6. The method for detecting aneuploidy deleted chromosome information according to claim 1,
before the clustering process is carried out on the sequencing depth, the method further comprises the following steps: and sequencing the sequencing depth to obtain a Gaussian peak with the maximum sequencing depth in each chromosome.
7. The method for detecting aneuploidy deleted chromosome information according to claim 6,
the process of obtaining the chromosome ploidy of the test organism comprises: performing one-dimensional array clustering on the sequencing depth to obtain different clustering groups; obtaining the median of the sequencing depths of the different clustering groups based on the ploidy relationship among the medias of the different clustering groups; and clustering different chromosomes based on the median of the sequencing depths of the different clustering groups and the Gaussian peak with the largest sequencing depth in each chromosome, thereby obtaining the ploidy of the chromosome of the organism to be detected.
CN202211716471.3A 2022-12-29 2022-12-29 Method for detecting aneuploid missing chromosome information Pending CN115948521A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211716471.3A CN115948521A (en) 2022-12-29 2022-12-29 Method for detecting aneuploid missing chromosome information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211716471.3A CN115948521A (en) 2022-12-29 2022-12-29 Method for detecting aneuploid missing chromosome information

Publications (1)

Publication Number Publication Date
CN115948521A true CN115948521A (en) 2023-04-11

Family

ID=87287350

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211716471.3A Pending CN115948521A (en) 2022-12-29 2022-12-29 Method for detecting aneuploid missing chromosome information

Country Status (1)

Country Link
CN (1) CN115948521A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117012274A (en) * 2023-10-07 2023-11-07 北京智因东方转化医学研究中心有限公司 Device for identifying gene deletion based on high-throughput sequencing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107922959A (en) * 2015-07-02 2018-04-17 阿瑞玛基因组学公司 The accurate molecular of blend sample deconvolutes
CN108804876A (en) * 2017-05-05 2018-11-13 中国科学院上海药物研究所 Method and apparatus for calculating cancer sample purity and ploidy
CN111091868A (en) * 2019-12-23 2020-05-01 江苏先声医学诊断有限公司 Method and system for analyzing chromosome aneuploidy

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107922959A (en) * 2015-07-02 2018-04-17 阿瑞玛基因组学公司 The accurate molecular of blend sample deconvolutes
CN108804876A (en) * 2017-05-05 2018-11-13 中国科学院上海药物研究所 Method and apparatus for calculating cancer sample purity and ploidy
CN111091868A (en) * 2019-12-23 2020-05-01 江苏先声医学诊断有限公司 Method and system for analyzing chromosome aneuploidy

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘明铭;王满;姜盼盼;刘茜;余莎;刘靖;: "全外显子测序技术进行***基因分析", 深圳中西医结合杂志, no. 15, 15 August 2018 (2018-08-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117012274A (en) * 2023-10-07 2023-11-07 北京智因东方转化医学研究中心有限公司 Device for identifying gene deletion based on high-throughput sequencing
CN117012274B (en) * 2023-10-07 2024-01-16 北京智因东方转化医学研究中心有限公司 Device for identifying gene deletion based on high-throughput sequencing

Similar Documents

Publication Publication Date Title
CN115428088A (en) Systems and methods for joint interactive visualization of gene expression and DNA chromatin accessibility
CN109559780A (en) A kind of RNA data processing method of high-flux sequence
CN114708910B (en) Method for calculating enrichment score of cell subpopulations in cell sequencing by using single cell sequencing data
CN115052994A (en) Method for determining base type of predetermined site in chromosome of embryonic cell and application thereof
CN115948521A (en) Method for detecting aneuploid missing chromosome information
CN111778353A (en) SNP molecular marker for identifying common wheat variety and SNP molecular marker detection method
CN105925722B (en) QTL related to soybean protein content, method for obtaining molecular marker, molecular marker and application
CN110970091A (en) Label quality control method and device
CN111292806B (en) Transcriptome analysis method by using nanopore sequencing
CN107885972A (en) It is a kind of based on the fusion detection method of single-ended sequencing and its application
CN112102944A (en) NGS-based brain tumor molecular diagnosis analysis method
CN104769133A (en) Method of improving microarray performance by strand elimination
CN116312779A (en) Method and apparatus for detecting sample contamination and identifying sample mismatch
CN113293220B (en) Gene chip for analyzing ear size of sheep, molecular probe combination, kit and application
CN114400045A (en) Method, probe set, kit and system for detecting homologous recombination repair defects based on second-generation sequencing
CN104573409B (en) The multiple check method of the assignment of genes gene mapping
Meyer et al. ReadZS detects developmentally regulated RNA processing programs in single cell RNA-seq and defines subpopulations independent of gene expression
Wainer-Katsir et al. BIRD: identifying cell doublets via biallelic expression from single cells
CN110684830A (en) RNA analysis method for paraffin section tissue
CN117089636B (en) Molecular marker combination for analyzing goat meat performance and application
CN117089635B (en) Molecular marker combination for analyzing goat reproductive performance and application
CN117106936B (en) Molecular marker combination for analyzing goat hair color character and application
CN117757979B (en) Primer group, kit and identification method for identifying soybean varieties
CN117089634B (en) Molecular marker combination for analyzing goat milk performance and application
CN117089633B (en) Molecular marker combination for analyzing existence of goat fluff and application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination