Summary of the invention
One aspect of the present invention provides a composition, and the composition includes probe, and the probe is fixed on solid phase carrier
It above or is free in solution, the probe can at least cover each gene region of 5 genes in following 15 genes
At least part: HBA1, HBA2, HBB, GJB2, SLC26A4, SMN1, DMD, GALT, PAH, F8, F9, ATP7B,
CYP21A2, GAA and PKHD1.This 15 genes, it is related to 12 kinds of high-incidence hereditary diseases in China, including thalassemia (α, β/
HBA1, HBA2, HBB), hereditary hearing impairment (GJB2, SLC26A4), myeloid muscular dystrophy (SMN1), the third urine ketosis (PAH),
Glycogenic thesaurismosis II type (GAA), galactosemia (GALT), hemophilia A (F8), hemophilia B (F9), liver lenticular nucleus become
Property (ATP7B), congenital adrenal cortical hyper plasia (CYP21A2), progressive Erb's atrophy (DMD), often contaminate
Colour solid recessiveness polycystic kindey (PKHD1).At least partially, for example the exon region of each gene, and/or gene are connected with exon
Upstream and downstream 30bp include subregion and/or full genome region, conducive to the subsequent single nucleotide mutation with the gene-correlation,
The detection of structure variation and shearing mutation.
Another aspect provides above-mentioned compositions to measure the every of at least five gene in following 15 genes
Purposes at least part of a gene order: HBA1, HBA2, HBB, GJB2, SLC26A4, SMN1, DMD, GALT, PAH,
F8, F9, ATP7B, CYP21A2, GAA and PKHD1.After obtaining sequence, it can be used for further sequence variations detection, definitive variation class
Type is conducive to detection or auxiliary detects above-mentioned disease.
Another aspect of the present invention additionally provides a kind of method for detecting CNV, this method comprises: A. carries out sample to be tested
Target area capture, is sequenced the target area, obtains sequencing data;B. by the sequencing data and reference sequences into
Row compares, and obtains comparison result;C. setting sliding length L1 and window size L2 carries out cutting to the target area, obtains more
A window is based on the comparison result, the sequencing depth of each window is calculated, according to the window size where each window
And/or the sliding length area size at place determines the average sequencing depth of each window, the length summation of the multiple window
Cover a part, whole primary or whole multiple of the target area, the sequencing depth of each window be than
The ratio of amount and the window size to the sequencing data of the above window;D. judge the respective window of the window and check sample
Average sequencing depth difference degree, significant difference then determines that there are the CNV;Wherein, A is mentioned using one aspect of the present invention
What the composition of confession carried out, L1 and L2 are natural number, and L1 and L2 are definite value or are variation numerical value.
Another aspect of the invention provides method that is a kind of while detecting several genes variation, the several genes variation packet
At least two in point mutation, CNV and inversion are included, this method comprises: (i) target area capture is carried out to sample to be tested, to institute
It states target area to be sequenced, obtains sequencing data;(ii) based on the sequencing data in (i), while several genes variation is detected
Detection;Wherein, the composition for be utilized one aspect of the present invention (i).
Another aspect of the present invention additionally provides a kind of device for detecting CNV, which includes: A1. sequencing unit, is used for
Target area capture is carried out to sample to be tested, the target area is sequenced, obtains sequencing data, the sequencing is to utilize
What the composition of one aspect of the present invention carried out;B1. comparing unit is connect with the sequencing unit, is used for the sequencing data
It is compared with reference sequences, obtains comparison result;C1. window average sequencing depth determination unit connects with the comparing unit
It connects, cutting is carried out to the target area for setting sliding length L1 and window size L2, multiple windows are obtained, based on described
Comparison result calculates the sequencing depth of each window, the sliding according to window size and/or place where each window
Length areas size determines that the average sequencing depth of each window, the length summation of the multiple window cover the target area
A part, whole primary or whole multiple, the sequencing depth of each window be the sequencing that the window is gone up in comparison
The amount of data and the ratio of the window size;Unit is averagely sequenced with the window and is connected, for judging for D1.CNV judging unit
The difference degree of the average sequencing depth of the respective window of the window and check sample, significant difference then determine to exist described
CNV.The device can to execute one aspect of the present invention offer CNV detection method some or all steps.
Another aspect of the invention additionally provides device that is a kind of while detecting several genes variation, and the several genes become
Different at least two including in point mutation, CNV and inversion, described device include: A2. sequencing unit, for sample to be tested into
The capture of row target area, is sequenced the target area, obtains sequencing data, and the present invention is utilized in the carrying out of the sequencing
On the one hand the composition provided;B2. more kinds of variations while detection unit, are connected, for based in A2 with the sequencing unit
Sequencing data, while detecting several genes variation detection.The inspection while device can be provided to execute one aspect of the present invention
Survey some or all steps of the method for several genes variation.
Using composition of the invention, method and/or device, point mutation can be accurately detected, can also be detected well outer
Aobvious son missing and/or repetition (CNV).Using composition of the invention, probe is designed using the sequence information based on candidate gene,
The DNA fragmentation that capture enrichment obtains is sequenced, using method and/or device of the invention, candidate gene point can be obtained
Mutation, CNV and inversion information.Using the method developed of the present invention and/or device, can low expense, easy to operate cover
A variety of disease genes and various mutation types are detected, passes through the detection chip and information analysis process designed and developed, base
It, being capable of healthy reproduction age normal to phenotype, without hereditary patient and his family race history in target area capture technique combination high throughput sequencing technologies
Mr. and Mrs, disposable detection or auxiliary 15 common causative genes of detection, screening or the auxiliary high-incidence serious something lost in 12 kinds of China of screening
Disease is passed, including thalassemia (α, β/HBA1, HBA2, HBB), hereditary hearing impairment (GJB2, SLC26A4), myeloid muscle wither
Contracting disease (SMN1), phenylpropyl alcohol urinate ketosis (PAH), glycogenic thesaurismosis II type (GAA), galactosemia (GALT), hemophilia A
(F8), hemophilia B (F9), hepatolenticular degeneration (ATP7B), congenital adrenal cortical hyper plasia (CYP21A2) carry out
Property Erb's atrophy (DMD) and autosomal recessive polycystic kindey (PKHD1), determine the above Disease-causing gene of couple
Carriage, assess the risk of offspring, reach reduce birth defect, scientific guidance prenatal and postnatal care, improve Population Birth
The purpose of quality, can be realized it is accurate, efficiently carry out pregnant preceding Genetic Detection.Can according to various regions single-gene disorder incidence trend into
The optimization of row adaptation to local conditions simultaneously by this mode extend to the whole nation in addition the whole world, promote fertility medical domain scientific technological advance with
Industrial progress.Composition, method and/or device of the invention is capable of providing a kind of pregnant preceding gene screening method for masses,
Can the disposable high-incidence recessive hereditary disease in 12 kinds of China of screening, the mutation type of screening gene is comprehensive and accurate.It can utilize
Target area capture combine high-flux sequence, by make a variation information analysis method, while detect candidate gene point mutation and
CNV, convenient and efficient, accuracy is high.
Specific embodiment
According to embodiment of the present invention, a composition is provided, the composition includes probe, and the probe is solid
It is scheduled on solid phase carrier or is free in solution, the probe can at least cover 5 genes in following 15 genes
At least part of each gene region: HBA1, HBA2, HBB, GJB2, SLC26A4, SMN1, DMD, GALT, PAH, F8, F9,
ATP7B, CYP21A2, GAA and PKHD1.This 15 genes, it is related to 12 kinds of high-incidence hereditary diseases in China, including thalassemia
(α, β/HBA1, HBA2, HBB), hereditary hearing impairment (GJB2, SLC26A4), myeloid muscular dystrophy (SMN1), the third urine ketosis
(PAH), glycogenic thesaurismosis II type (GAA), galactosemia (GALT), hemophilia A (F8), hemophilia B (F9), liver beans
Shape nuclear degeneration (ATP7B), congenital adrenal cortical hyper plasia (CYP21A2), progressive Erb's atrophy
(DMD), autosomal recessive polycystic kindey (PKHD1).At least partially, for example, the exon region of the gene that is included, and/or
The upstream and downstream 30bp's that gene is connected with exon includes subregion and/or full genome region, is conducive to the subsequent and gene-correlation
Single nucleotide mutation, structure variation and shearing mutation detection.
In the specific embodiment of the present invention, the probe can at least cover 10 in above-mentioned 15 genes
At least part of each gene region of gene.In the specific embodiment of the present invention, the probe can be covered
At least part of each gene region in above-mentioned 15 genes.At least partially, for example, the exon region of each gene,
And/or the upstream and downstream 30bp that is connected with exon of gene includes subregion and/or full genome region, is conducive to subsequent with the base
Because of the detection of relevant single nucleotide mutation, structure variation and shearing mutation.
In the specific embodiment of the present invention, the design of the probe is according to pre-coated cover area in reference gene
Starting and final position in group successively intercept predetermined sequence length on the reference genome since one end of position
Until the other end of the position, the length summation of the predetermined sequence length are at least 1 times of the pre cap area size,
At least part of the pre-coated cover area each gene region of at least five gene in 15 genes, it is described pre-
Sequence length is 50~250nt, and the predetermined sequence length is predetermined probe length.In a specific implementation of the invention
In mode, the summation of predetermined probe length is 5 times of target area (pre-coated cover area) size, that is, the every probe designed is most
The copy number of at least 5 copies, each Area Probe can form and to the portion in target area according to target area series
Divide or whole capture demands adjusts.
In the specific embodiment of the present invention, the region that the probe in the composition can cover includes:
Chr16:59501-60501, chr16:84266-84414, chr16:84536-84781, chr16:84831-85101,
Chr16:93451-94030, chr16:97141-97648, chr16:101131-101752, chr16:103124-104124,
Chr16:131785-131980, chr16:163436-163820, chr16:172301-173301, chr16:188799-
189799, chr16:193176-194176, chr16:195937-196208, chr16:197099-201899, chr16:
201949-203073, chr16:203090-203162, chr16:203170-203270, chr16:203799-206016,
Chr16:206540-207072, chr16:207565-209099, chr16:209110-209857, chr16:209909-
210084, chr16:210715-210790, chr16:210835-211713, chr16:211845-211910, chr16:
212174-221360, chr16:221499-224753, chr16:225065-233910, chr16:234200-236401,
Chr16:237220-237735, chr16:237800-239636, chr16:242780-243149, chr16:254625-
255199, chr16:258005-258928, chr16:303840-304852, chr16:313300-314300, chr13:
52506776-52508891, chr13:52508892-52509195, chr13:52509699-52509861, chr13:
52511382-52511559, chr13:52511582-52511845, chr13:52513157-52513359, chr13:
52515187-52515390, chr13:52516492-52516720, chr13:52518215-52518457, chr13:
52520390-52520644, chr13:52523768-52523962, chr13:52524113-52524327, chr13:
52524378-52524565, chr13:52531622-52531773, chr13:52532417-52532710, chr13:
52534156-52534157, chr13:52534254-52534488, chr13:52535943-52536079, chr13:
52538978-52539199, chr13:52542550-52542773, chr13:52544598-52544915, chr13:
52548041-52549334, chr13:52585393-52585473, chr13:52585474-52585660, chr11:
47599-48599, chr11:67836-68836, chr11:95661-96661, chr11:102658-103658, chr11:
5133611-5135542, chr11:5151625-5154197, chr11:5161020-5162526, chr11:5165060-
5166060, chr11:5167335-5168901, chr11:5169775-5171446, chr11:5173950-5174950,
Chr11:5179167-5180167, chr11:5180196-5181196, chr11:5187425-5187801, chr11:
5190647-5194526, chr11:5194530-5197746, chr11:5198220-5200664, chr11:5207460-
5209762, chr11:5215188-5216188, chr11:5217440-5218440, chr11:5218705-5219705,
Chr11:5220930-5223803, chr11:5223825-5224976, chr11:5227490-5229196, chr11:
5230655-5232014, chr11:5233456-5234456, chr11:5235175-5236175, chr11:5236680-
5238663, chr11:5239076-5240500, chr11:5240607-5241607, chr11:5243655-5249785,
Chr11:5249788-5251275, chr11:5252088-5253088, chr11:5253894-5256137, chr11:
5256268-5257393, chr11:5257616-5260578, chr11:5261109-5262109, chr11:5263479-
5264479, chr11:5264956-5266333, chr11:5266863-5268837, chr11:5269306-5272017,
Chr11:5272027-5273936, chr11:5274421-5276011, chr11:5276020-5276313, chr11:
5277190-5281370, chr11:5289580-5292060, chr11:5296973-5297773, chr11:5300302-
5301302, chr11:5301646-5302822, chr11:5305045-5309165, chr11:5316550-5319380,
Chr11:5331720-5334343, chr11:5340085-5341085, chr11:5342735-5347056, chr11:
5349993-5350993, chr11:5354065-5357231, chr13:20766920-20766921, chr6:31947356-
31948352, chr6:31973287-31974926, chr6:31975247-31975249, chr6:31975263-
31976013, chr6:31976102-31976233, chr6:31976572-31976708, chr6:31978697-
31978852, chr6:31979037-31979174, chr6:31980912-31981045, chr6:31981442-
31981762, chr6:31983786-31987153, chr6:31988701-31990021, chr6:31995360-
31995913, chr6:31996044-31996977, chr6:31996987-31997123, chr6:31997174-
31997873, chr6:32001038-32002048, chr6:32002642-32003451, chr6:32003664-
32005516, chr6:32005832-32010535, chr6:32010692-32018023, chr6:32022262-
32023655, chr6:32036417-32037465, chr6:32040732-32041800, chrX:31137275-
31140117, chrX:31144689-31144860, chrX:31152149-31152381, chrX:31164338-
31164601, chrX:31165322-31165705, chrX:31187490-31187788, chrX:31190395-
31190600, chrX:31191586-31191791, chrX:31195979-31196157, chrX:31196716-
31196992, chrX:31198417-31198668, chrX:31200785-31201091, chrX:31222008-
31222305, chrX:31224629-31224854, chrX:31226399-31226400, chrX:31227545-
31227886, chrX:31241094-31241308, chrX:31279002-31279203, chrX:31341645-
31341845, chrX:31366603-31366821, chrX:31382270-31382271, chrX:31462528-
31462814, chrX:31496153-31496561, chrX:31497030-31497290, chrX:31514835-
31515131, chrX:31525328-31525640, chrX:31645720-31646049, chrX:31676037-
31676331, chrX:31697422-31697773, chrX:31747678-31747935, chrX:31792007-
31792379, chrX:31838022-31838270, chrX:31854765-31855009, chrX:31893238-
31893560, chrX:31947643-31947932, chrX:31950127-31950414, chrX:31986386-
31986701, chrX:32234963-32235250, chrX:32305576-32305888, chrX:32328129-
32328463, chrX:32360147-32360469, chrX:32361181-32361473, chrX:32363990-
32364267, chrX:32366453-32366715, chrX:32380835-32381145, chrX:32382629-
32382897, chrX:32383067-32383386, chrX:32398557-32398867, chrX:32404357-
32404652, chrX:32407548-32407861, chrX:32408118-32408368, chrX:32429799-
32430100, chrX:32456288-32456577, chrX:32459227-32459501, chrX:32466503-
32466825, chrX:32472709-32473019, chrX:32479315-32479316, chrX:32481486-
32481781, chrX:32482633-32482886, chrX:32486545-32486897, chrX:32490211-
32490496, chrX:32502966-32503286, chrX:32509324-32509705, chrX:32519802-
32520029, chrX:32536055-32536318, chrX:32563206-32563521, chrX:32583749-
32584068, chrX:32591577-32592033, chrX:32613804-32614063, chrX:32632350-
32632640, chrX:32654052-32654053, chrX:32662179-32662500, chrX:32663011-
32663339, chrX:32715917-32716185, chrX:32717159-32717480, chrX:32827540-
32827798, chrX:32834515-32834827, chrX:32841342-32841574, chrX:32862830-
32863047, chrX:32867775-32868007, chrX:33038186-33038387, chrX:33229329-
33229743, chrX:154063994-154066097, chrX:154088637-154088953, chrX:154089923-
154090211, chrX:154091288-154091572, chrX:154108920-154118775, chrX:154124282-
154124577, chrX:154128071-154128296, chrX:154129576-154129787, chrX:154130256-
154130512, chrX:154132111-154132433, chrX:154132501-154132869, chrX:154133016-
154133368, chrX:154134625-154134918, chrX:154156776-154160021, chrX:154175903-
154176252, chrX:154182097-154182387, chrX:154185162-154185516, chrX:154189280-
154189513, chrX:154194175-154194486, chrX:154194631-154195032, chrX:154197536-
154197897, chrX:154212892-154213148, chrX:154215442-154215650, chrX:154221141-
154221493, chrX:154225178-154225440, chrX:154227684-154227945, chrX:154234213-
154235593, chrX:154250615-154251068, chrX:154375887-154377267, chrX:154605985-
154615881, chrX:154684139-154694039, chrX:138612825-138613081, chrX:138619099-
138619402, chrX:138619451-138619615, chrX:138623165-138623418, chrX:138630452-
138630720, chrX:138633151-138633493, chrX:138642830-138643084, chrX:138643613-
138645687, chr17:78075325-78075719, chr17:78078303-78078304, chr17:78078324-
78078385, chr17:78078386-78078961, chr17:78079518-78079723, chr17:78081326-
78081551, chr17:78081569-78081725, chr17:78082059-78082238, chr17:78082258-
78082436, chr17:78082466-78082657, chr17:78083714-78083884, chr17:78084496-
78084669, chr17:78084710-78084854, chr17:78085752-78085929, chr17:78086347-
78086540, chr17:78086645-78086856, chr17:78086987-78087195, chr17:78090737-
78090938, chr17:78091369-78091578, chr17:78091962-78092186, chr17:78092422-
78092634, chr17:78093041-78093130, chr17:78093131-78093709, chr9:34646605-
34646701, chr9:34646702-34646813, chr9:34647056-34647285, chr9:34647459-
34647594, chr9:34647624-34647732, chr9:34647799-34647988, chr9:34648081-
34648198, chr9:34648301-34648483, chr9:34648729-34648921, chr9:34648965-
34649108, chr9:34649377-34649591, chr9:34649616-34649617, chr9:34650336-
34650446, chr9:34650447-34650603, chr13:20761574-20763039, chr13:20763040-
20763720, chr13:20763721-20763772, chr13:20766892-20767144, chr12:103232074-
103232952, chr12:103232953-103233026, chr12:103234148-103234323, chr12:
103237394-103237587, chr12:103238016-103238017, chr12:103238074-103238075,
Chr12:103238084-103238239, chr12:103240643-103240759, chr12:103245435-
103245564, chr12:103246563-103246758, chr12:103248884-103249140, chr12:
103260344-103260471, chr12:103271210-103271358, chr12:103288483-103288726,
Chr12:103306539-103306706, chr12:103310819-103310908, chr12:103310909-
103311411, chr6:51480115-51483878, chr6:51483879-51484348, chr6:51491765-
51491944, chr6:51497314-51497316, chr6:51497333-51497551, chr6:51503617-
51503784, chr6:51503824-51503825, chr6:51512799-51512946, chr6:51513500-
51513501, chr6:51513853-51514048, chr6:51523720-51524797, chr6:51609153-
51609370, chr6:51611343-51611344, chr6:51611489-51611717, chr6:51612514-
51612515, chr6:51612555-51613493, chr6:51617969-51618182, chr6:51618696-
51618697, chr6:51619443-51619444, chr6:51619529-51619530, chr6:51619537-
51619538, chr6:51619552-51619766, chr6:51637470-51637617, chr6:51640576-
51640749, chr6:51640750-51640751, chr6:51640786-51640787, chr6:51655962-
51655963, chr6:51656004-51656201, chr6:51695629-51695817, chr6:51701089-
51701090, chr6:51701100-51701101, chr6:51701164-51701165, chr6:51701172-
51701297, chr6:51701330-51701331, chr6:51712543-51712798, chr6:51720661-
51720898, chr6:51732597-51732598, chr6:51732617-51732618, chr6:51732627-
51732628, chr6:51732631-51732937, chr6:51735272-51735467, chr6:51747237-
51747238, chr6:51747861-51748055, chr6:51750562-51750563, chr6:51750635-
51750800, chr6:51751388-51751389, chr6:51751901-51752073, chr6:51768351-
51768352, chr6:51768360-51768361, chr6:51768365-51768555, chr6:51768754-
51768870, chr6:51770983-51771168, chr6:51774051-51774302, chr6:51776567-
51776784, chr6:51776917-51776918, chr6:51777134-51777404, chr6:51798878-
51799150, chr6:51824638-51824854, chr6:51875077-51875287, chr6:51875298-
51875299, chr6:51882178-51882457, chr6:51887569-51887772, chr6:51889342-
51891009, chr6:51891010-51891011, chr6:51892597-51892724, chr6:51892924-
51893179, chr6:51897798-51897993, chr6:51900359-51900549, chr6:51907627-
51907962, chr6:51908393-51908558, chr6:51908909-51908910, chr6:51909524-
51909525, chr6:51909734-51909916, chr6:51910772-51911016, chr6:51911406-
51911407, chr6:51912915-51912916, chr6:51913221-51913222, chr6:51913236-
51913237, chr6:51913239-51913240, chr6:51913247-51913248, chr6:51913252-
51913253, chr6:51913260-51913447, chr6:51913451-51913452, chr6:51914897-
51914898, chr6:51914925-51915123, chr6:51917844-51918079, chr6:51918531-
51918532, chr6:51918806-51918993, chr6:51920355-51920557, chr6:51920558-
51920560, chr6:51920885-51920886, chr6:51921466-51921616, chr6:51921658-
51921807, chr6:51923091-51923429, chr6:51924562-51924563, chr6:51924696-
51924870, chr6:51924885-51924886, chr6:51927287-51927488, chr6:51927665-
51927666, chr6:51929723-51929878, chr6:51930436-51930437, chr6:51930744-
51930905, chr6:51934225-51934355, chr6:51934394-51934395, chr6:51934732-
51934733, chr6:51934975-51934976, chr6:51934977-51934978, chr6:51935060-
51935061, chr6:51935174-51935273, chr6:51935702-51935703, chr6:51935746-
51935747, chr6:51935774-51935898, chr6:51936257-51936258, chr6:51936505-
51936506, chr6:51936558-51936559, chr6:51936716-51936717, chr6:51936883-
51937017, chr6:51937026-51937027, chr6:51937470-51937471, chr6:51937752-
51937753, chr6:51937976-51937977, chr6:51938066-51938067, chr6:51938209-
51938210, chr6:51938231-51938369, chr6:51938416-51938417, chr6:51941044-
51941161, chr6:51944668-51944836, chr6:51944846-51944847, chr6:51947160-
51947370, chr6:51947946-51948083, chr6:51948471-51948472, chr6:51948574-
51948575, chr6:51949650-51949731, chr6:51949732-51949845, chr6:51949861-
51949862, chr6:51952202-51952453, chr7:107301050-107301330, chr7:107302054-
107302086, chr7:107302087-107302280, chr7:107303711-107303910, chr7:107312553-
107312723, chr7:107314579-107314823, chr7:107315360-107315584, chr7:107323617-
107323829, chr7:107323870-107324012, chr7:107329468-107329675, chr7:107330539-
107330712, chr7:107334818-107334955, chr7:107335036-107335191, chr7:107336348-
107336514, chr7:107338457-107338586, chr7:107340498-107340650, chr7:107341516-
107341671, chr7:107342242-107342532, chr7:107344746-107344860, chr7:107350469-
107350674, chr7:107352954-107353097, chr7:107355838-107355891, chr7:107355892-
107358282, chr5:70220738-70248868, chr5:70255752-70255888 and chr5:70305353-
70306150.The region of these probes covering includes all alpha globin genes region, all beta globin genes area
Domain, the exon of ATP7B gene and includes subregion, CYP21A2 full genome, DMD gene with each 30bp of exon upstream and downstream
Exon and include subregion with each 30bp of exon upstream and downstream, the exon of F8 gene and each with exon upstream and downstream
70bp includes subregion, the exon of F9, GAA, GALT, GJB2, PAH, PKHD1 and SLC26A4 and with exon upstream and downstream
Each 30bp includes subregion and SMN1 full genome region.
In the specific embodiment of the present invention, the composition further includes such as SEQ ID NO:1-15 and SEQ ID
Sequence shown in NO:22-23.These sequences are obtained by repeatedly design, experiment sieving, can be used in target area library structure
Build, the sequencing in target area library, detection and/or verifying variation type.
In the specific embodiment of the present invention, the composition also includes the sequence as shown in SEQ ID NO:16-21
Column.These sequences are obtained by repeatedly design, experiment sieving, can be used in target area library construction, target area library
Sequencing, detection and/or verifying variation type.
According to another implementation of the invention, the composition of one aspect of the present invention offer is provided in measurement following 15
Purposes at least part of each gene order of at least five gene in a gene: HBA1, HBA2, HBB, GJB2,
SLC26A4, SMN1, DMD, GALT, PAH, F8, F9, ATP7B, CYP21A2, GAA and PKHD1.Obtain sequence after, can be used for into
The variation of one step sequence detects, and definitive variation type is conducive to detection or the relevant hereditary disease of auxiliary detection said gene variation.
In the specific embodiment of the present invention, the composition for providing one aspect of the present invention offer is above-mentioned in measurement
Purposes at least part of each gene order of at least ten gene in 15 genes.It is specific at of the invention one
In embodiment, the composition for providing one aspect of the present invention offer is measuring each gene order in above-mentioned 15 genes
Purposes at least part.
Yet another embodiment according to the present invention provides a kind of method for detecting CNV, this method comprises: A. is treated
This progress of test sample target area capture, is sequenced the target area, obtains sequencing data, sequencing data is by multiple reads
Composition;B. the sequencing data is compared with reference sequences, obtains comparison result, comparison result includes each in sequencing data
The information such as position of the read on reference sequences;C. setting sliding length L1 and window size L2 cuts the target area
Point, multiple windows are obtained, the comparison result is based on, calculates the sequencing depth of each window, according to where each window
Window size and/or the sliding length area size at place determine the average sequencing depth of each window, the multiple window
Length summation cover a part of the target area, whole primary or whole multiple, surveys of each window
Sequence depth is the amount for comparing the sequencing data of the upper window and the ratio of the window size;D. judge the window and check sample
Respective window average sequencing depth difference degree, significant difference then determines that there are the CNV;Wherein, A is to utilize this hair
What the composition that bright one side provides carried out, L1 and L2 are natural number, and L1 and L2 are definite value or are variation numerical value.In the present invention
One embodiment in, L1=20bp, L2=200bp.It is suitble to the description of the feature or advantage of one aspect of the present invention composition,
Suitable for this method.
In the specific embodiment of the present invention, the above method further includes in advance or simultaneously at least one control
Sample carries out A-C processing, obtains the average sequencing depth of respective window in check sample.The average sequencing of the window of check sample
Depth data can save use when for detecting other sample to be tested CNV.
In the specific embodiment of the present invention, the average sequencing depth of each window in this method step C
Determine to include several situations: C1. works as L1 > L2, i.e., each window is disjunct, intermediate spaced, the average survey of each window
Sequence depth is the average value of the sequencing depth of the window and the sequencing depth of non-windowed area adjacent thereto;C2. when L1 <
L2, i.e., each window are partial intersection overlappings, and the average sequencing depth of each window is successively to slide since one end of the window
The average value of the sequencing depth in the region for all L1 sizes that the other end of region to the window of dynamic L1 long is included;C3.
Work as L1=L2, i.e., be between window it is end to end, the average sequencing depth of each window be comprising it is including the window, with should
The average value of the sequencing depth of each continuous window of the connected several continuous windows in one end of window.It is put down by above-mentioned
Depth is sequenced, can reduce and build in library and/or sequencing procedure bring since the base composition difference bring of each window is inclined
Tropism.
In the specific embodiment of the present invention, this method further includes the depth of the average sequencing to each window in C
Degree is modified, and the amendment includes the amendment based on whole genome duplication sequence, and/or the amendment based on G/C content, and/or
Amendment based on chromosomal copy number.Amendment based on whole genome duplication sequence can be performed such that (1) sets numerical value K, system
Count frequency of occurrence of the sequence of K bases longs on reference sequences since the i-th base on entire reference sequences;(2) it goes
Except the appearance on the reference sequences in (1) is greater than the primary K bp sequence, modified reference sequences are obtained;(3) with
(2) the modified reference sequences in replace the reference sequences in B, carry out B-C, obtain each window and are based on full-length genome weight
The revised average sequencing depth of complex sequences;Wherein, i is the base number of reference sequences, and K is natural number, the sequencing data in A
In read length > K > A in sequencing data in read length/2.In one embodiment of the invention, the length of read
For 101bp, K=63.The described amendment based on G/C content is carried out using the relationship of established GC ratio and sequencing depth
, GC ratio and sequencing depth relationship foundation include: to multiple samples carry out target area capture, to the target area into
Row sequencing, obtains the sequencing data of multiple samples;The sequencing data of the multiple sample is compared with reference sequences, is obtained
The comparison result of multiple samples;The target area of the multiple sample is respectively divided, so that the target area of each sample includes
Identical window is based on the comparison result, calculates the sequencing depth of each window of each sample, obtains each window
The average value of depth is sequenced;The reference sequences are divided, so that the reference sequences include window identical with the target area
Mouthful, determine ratio shared by GC base in each window;Average value and the window based on each window sequencing depth
GC base ratio establishes the GC ratio and the relationship of depth is sequenced.Multiple samples are multiple normal samples, preferably, sample
Number is more than 30.In one embodiment of the invention, established GC ratio and sequencing depth relationship are saved, for other samples
It is used when detection.In one embodiment of the invention, based on the amendment of chromosomal copy number, comprising: based between multiple samples
Each chromosome sequencing depth consistency determine the copy number of chromosome, if having 3 copy chromosomes, Huo Zhenan
Property it is non-2 copy X and Y, correct the chromosomal copy number be its true value.By above based on the amendment of repetitive sequence, and/
Or the amendment based on G/C content, and/or the amendment based on chromosome true copies number, it can effectively reduce or eliminate due to building library
In amplification, sequencing or each sample different tests condition bring sequencing data deviation.
In the specific embodiment of the present invention, the judgement of CNV in this method D step further include: according in step B
Comparison result in have distance of the pairs of read of fixed range relationship on reference sequences, the type of CNV is determined, with L
The fixed range for indicating two reads in a pair of read in pairs, indicates that this is referring to two reads in pairs of read with L '
Distance in sequence works as L ' > L, then determines that the CNV is missing from type, works as L ' < L, then determines the CNV to be inserted into type;
Wherein, described to have the pairs of read of fixed range relationship from the both ends of a sequencing library, the building of the sequencing library
The sequencing being contained in step A, such as double ends (pair-end, PE) sequencing, obtain PE reads.In a tool of the invention
In body embodiment, when because actually building library, the size in the library of acquisition is not usually a fixed numbers but a numberical range,
For example there is no the library accurately cut glue or obtain a fixed size without purifying in other ways when building library, in this way, such as pre- structure
The library size built is 500bp, and the library size finally obtained is generally in 300-900bp, so, more preferably, work as L ' >=2L,
Determine that the CNV is missing from type, work as L '≤0.2L, determines the CNV to be inserted into type, detect more acurrate.
In the specific embodiment of the present invention, the judgement of CNV in this method D step further include: according to the ratio in B
To the incomplete comparison in result to the read on reference sequences, exact position and the size of the CNV are determined.Of the invention
In one embodiment, described incomplete comparison also referred to as isolates read to the read on reference sequences, and that isolates read cannot
Whole section compares upper reference sequences, and one end compares upper one end and cannot compare, and isolates the exact position that read determines CNV according to these
Include: the part that cannot be compared for intercepting and isolating in read with size, the part of interception is defined as one and isolates segment;It will
It isolates segment to compare to reference sequences, obtains and isolate position of the segment on reference sequences;Based on isolating segment in reference sequences
On position and this isolate position and described two position of the affiliated read of segment on reference sequences on reference sequences
Distance determines exact position and the size of the CNV.
Another embodiment according to the present invention provides method that is a kind of while detecting several genes variation, described
Several genes variation includes at least two in point mutation, CNV and inversion, this method comprises: (i) carrying out target to sample to be tested
The target area is sequenced in areas captured, obtains sequencing data;(ii) it based on the sequencing data in (i), detects simultaneously
Several genes variation detection;Wherein, the composition for be utilized one aspect of the present invention (i) is suitble to one aspect of the present invention group
The feature or advantage for closing object, are also applied for this method.The data that can be obtained using primary test, while carrying out multinomial variation inspection
It surveys, very with practical value and meaning, such as the detection of sample, the detection of the low sample of nucleic acid content and the needs of hardly possible acquisition
Multiclass variation type is detected simultaneously.
Yet another embodiment again according to the present invention provides a kind of device for detecting CNV, the device packet
Include: unit is sequenced in A1., for carrying out target area capture to sample to be tested, is sequenced, is surveyed to the target area
Ordinal number evidence, the sequencing are carried out using the composition of one aspect of the present invention;B1. comparing unit connects with the sequencing unit
It connects, for the sequencing data to be compared with reference sequences, obtains comparison result;C1. window average sequencing depth determines
Unit is connect with the comparing unit, carries out cutting to the target area for setting sliding length L1 and window size L2,
Multiple windows are obtained, the comparison result is based on, the sequencing depth of each window is calculated, according to the window where each window
Mouth size and/or the sliding length area size at place determine the average sequencing depth of each window, the length of the multiple window
Degree summation covers a part of the target area, whole primary or whole multiple, sequencing depths of each window
Degree is the amount of the sequencing data of the upper window of comparison and the ratio of the window size;D1.CNV judging unit is flat with the window
Sequencing unit is connected, the difference degree of the average sequencing depth of the respective window for judging the window and check sample,
Significant difference then determines that there are the CNV.The device can to execute one aspect of the present invention offer CNV detection method portion
Point or all steps.Suitable for one aspect of the present invention provide detection CNV method the advantages of or technical characteristic description, be also suitable
In the device, details are not described herein.
Last aspect according to the present invention provides device that is a kind of while detecting several genes variation, described a variety of
Genetic mutation includes at least two in point mutation, CNV and inversion, and described device includes: A2. sequencing unit, for to be measured
Sample carries out target area capture, and the target area is sequenced, and obtains sequencing data, and the sequencing be utilized
The composition that one aspect of the present invention provides;B2. more kinds of variations while detection unit, are connected, for being based on the sequencing unit
Sequencing data in A2, while detecting several genes variation detection.The device can be to execute one aspect of the present invention offer
Some or all steps of the method for several genes variation are detected simultaneously.It is detected while offer suitable for one aspect of the present invention a variety of
The description of the advantages of method of genetic mutation or technical characteristic is also applied for the device, and details are not described herein.
It, being capable of low expense, easy to operate and a variety of diseases of covering using composition of the invention, method and/or device
Pregnant preceding genetic test, through the invention on the one hand design chip, can health normal to phenotype, without hereditary patient and his family race history educate
Age Mr. and Mrs disposably detect 15 common causative genes, the high-incidence serious single-gene recessive hereditary disease in 12 kinds of China of screening.And it is complete
Face and it is accurate, can disposably detect various mutations type, including missense (Missense), nonsense (Nonsense), spliceosome
(Splice), insertion and deletion (Indel), copy number variation (CNV) etc., lower machine sequencing data removes having after repeating reads
It imitates depth and is greater than 200X, effective mean coverage is greater than 99.68%, and accuracy rate is high.Pregnant cause detection can be introduced in China
Concept detects ordinary populace, and birth defect prevention and control are advanced to the pregnant preceding progress of wedding, passes through and determines taking for couple's Disease-causing gene
Band situation, assesses the risk of offspring, and reaching reduces birth defect, scientific guidance prenatal and postnatal care, improves Population Birth quality
Purpose.
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not considered as limiting the invention.In the present invention
Description in, unless otherwise indicated, the meaning of " plurality " is two or more.
It is all conventional commercial reagent, kit or instrument involved in following embodiment, for example be purchased from except as otherwise explaining
Illumina company.
Example IV CNV is tested and analyzed
Having chosen 2 pairs of phenotypes normally is had the sister of DMD family history by inspection Mr. and Mrs and 1 couple, extracts peripheral blood.Detection is all
The high gene of the CNV mutant proportion such as target gene, especially HBA1, HBA2, DMD.
Steps are as follows:
(1) sequencing data is compared, deduplication reads, obtains target area (capture region, full exon or complete
Genome) depth, and inferior bit base frequency (the B allele including target area SNP site identified
Frequency, BAF);
(2) to target area splitter, seek the sequencing depth of each window, setting sliding cutting length L1=20bp and
Window size L2=200bp carries out cutting to target area, obtains multiple windows, is based on above-mentioned comparison result, calculates each window
Mouthful sequencing depth, the sequencing depth of each window be amount (such as reads number or the alkali for the sequencing data that the window is gone up in comparison
Radix) with the ratio of the window size, the sliding length area size according to window size and/or place where each window
Determine the average sequencing depth of each window, the length coverage goal region of whole windows at least 1 time.Because each window is that part is handed over
Fork overlapping, the average sequencing depth of each window is that the region of L1 long is successively slided since one end of the window to the window
The other end all L1 sizes for being included region sequencing depth average value.
(3) average sequencing depth of each window is modified
A. mean depth is corrected according to the uniqueness (unique) of K-mer on reference sequences (reference), to avoid base
The influence that CNV is detected because organizing upper repetitive sequence.Specifically, when carrying out window cutting to reference sequences, window size, stroke are long
Degree is all can be with sets itself as parameter.Stroke length has overlapping when being less than window size, does not have when equal to window size
Overlapping, there are gap (gap) between window when being greater than window size.Most window sizes are as with setting, but each section
When capture region is sliced into end, the part less than a length of window can be integrated into the last one window and go.Window is big
The CNV length that small setting can according to need detection is set.Calculate the 63bp that each base starts on reference
(i.e. K=63) long sequence, will be non-in the frequency of occurrence (except each last 62 bases of chromosome) in whole gene group
The region (i.e. frequency of occurrence is more than 1 Kmer) of unique is deleted from capture region, i.e., by capture region (full exon
Group, full-length genome data are similarly) in the region of non-unique delete, only detect the CNV in the region unique.
If low depth data, such as the sequencing data of 5X, the CNV of big windows detecting long segment can only be divided, such as
Detect whole chromosome.Rule of thumb, the CNV that detect this precision of 100bp needs the valid data of about 30X or more (to remove
Build the overburden depth of the reads in library after repetitive sequence caused by PCR).
B. do GC amendment, the modified method of GC: carrying out window division to reference sequences can if the reads of pair end
To set GC window size according to the distance between Insert Fragment (segment) length, that is, PE reads, window is set herein
200, GC window stroke lengths 20 are drawn the GC ratio on window calculation reference sequences on capture region reference sequences, are counted simultaneously
Average sequencing depth of the sample in each window is calculated, such a number of GC ratio-sequencing depth can be obtained in each window
According to, then returned with lowess or loess is returned, obtain the regression curve of GC- sequencing depth.Further according to this curve pair
The depth of each base is corrected, specifically: calculating the GC ratio of 200bp window of each base centered on it, then
Divided by this corresponding depth value of GC ratio on GC curve, multiplied by the mean depth of window.
C. it is tested as unit of chromosome to chromosome average sequencing depth, all to the target interval of whole chromosome
Generation copy number variation record, while carrying out Sexual discriminating according to X, Y chromosome depth, to the XY depth of male's sample
It is corrected.Because of the inhomogeneities of capture sequencing, the capture region designed in different chromosomes length, sequence-specific,
Difference on G/C content, the practical sequencing depth that will lead to different chromosomes have differences, will affect non-to gender and chromosome
The judgement of ortholoidy.But performance of the sequencing depth difference of this different chromosomes between different samples be it is stable, have very
High correlation is modified by sequencing depth of this correlation to each chromosome, can exclude capture sequencing substantially
Non-uniform influence.The main basis of Sexual discriminating: if Y has capture, see whether the depth of Y is enough;Only X capture, sees X depth
With the relationship of autosome depth.For example the normal hidden state of male X is 1 copy, the normal hidden state of women X is 2 copies,
Do not determine ploidy first, it is unified with if 2, may cause detection male X generation this mistake of heterozygous deletion, and XXY type also without
Method identifies (singly seeing phenotype, the male of XXY may not have difference with XY male).
Chromosome depth correction is that is utilized is the correlation consistency of same this depth of lot sample, is put down to each chromosome
Equal depth is modified, and discovery has the X and Y of the chromosome of 3 copies or non-2 copy of male, and chromosomal copy number is set as
Not so true value is all normally 2.
(4) related coefficient sought common ground between the window depth vector of batch of data, criticizes the sample for meeting high correlation
Secondary amendment further decreases influence of the sequencing inhomogeneities to detection.Specifically, it to each chromosome of each sample, marks
Then the mean depth of each window come the vector one-dimensional as one can calculate each chromosome depth fluctuation between sample
Related coefficient.The step can also be used as the foundation of detection quality control (QC).The sample degraded individually is normal with other
The depth correlation of sample can be very low, and according to related coefficient given threshold, the sample very low with normal sample correlation is filtered,
For example related coefficient avoids throwing into question in the calculating below less than 0.8, or leads to false positive excessive in result.
(5) hidden horse model (HMM) will be put by all revised sequencing depth datas to calculate, predict each window
The copy number of mouth, and result is assessed, calculate the posterior probability values of every section of CNV.Depth enters as the aobvious state of HMM
HMM model, emission probability assume to obey negative binomial distribution.Die body part is according to the translations such as Wang Jun " biological sequence point
Analysis " the hidden horse model introduced in a book, the inspection software that method is weaved into according to the present invention, partially with reference to following several documents,
For example refer to Quantifying copy number variations using a hidden Markov model with
Inhomogeneous emission distributions, Mccalum, Wang et al.Biostatistics (2013),
The negative binomial distribution of 14,3, pp.600-611 are assumed.
Optionally, revised sequencing depth data is only used above, this is because SNP BAF data is heterozygosis
Snp is generally used to detect the repetition compared with long segment and makes a variation (duplication, dup), and captured in example the data of sequencing by
Fewer in SNP site in capture region, the region dup that CNV occurs is also few, so the exemplary CNV detection is without utilizing BAF
Information.If BAF data need to generally be used by detecting autosomal large fragment CNV, for example can refer to PennCNV:An
integrated hidden Markov model designed for high-resolution copy number
Variation detection in whole-genome SNP genotyping data, wang et al. is carried out, and is used
SNP BAF and setting in HMM transition probability apart from penalty term.
(6) screening arrangement is carried out to result, the section CNV, output formats occur for mark, and draw each area that CNV occurs
The depth map of domain and its near zone.Testing result is as shown in table 8.
(7) by comparing the depth map of sample to be tested and control sample, the CNV catastrophe of gene is obtained, judges each window
The difference degree of mouth and the average sequencing depth of the respective window of check sample, statistically significant difference then determines sample to be tested
There are CNV for this window.The average sequencing depth of respective window is referred to the respective window of sample to be tested in check sample
The acquisition process of average sequencing depth determines, for example can carry out target area at least one check sample in advance or simultaneously
Domain sequencing, the average sequencing depth that compares and calculate each window obtain, and preferably rely on multiple samples, for example be greater than 30 just
The data of normal check sample obtain.The average sequencing depth data of the window of check sample can save other to be measured for detecting
Sample CNV is used when detecting.
The detection of CNV further include: have the pairs of read of fixed range relationship in reference sequences according in comparison result
On distance, determine the type of CNV, the fixed range of two reads in a pair of read in pairs indicated with L, indicates that this is right with L '
Distance of two reads on reference sequences in pairs of read, works as L ' > L, then determines that the CNV is missing from type, work as L ' <
L determines the CNV then to be inserted into type;Wherein, the described pairs of read for having fixed range relationship comes from a sequencing library
Both ends, such as double ends (pair-end, PE) sequencing, obtain PE reads.When because actually building library, the size in the library of acquisition
A not usually fixed numbers but a numberical range, such as while building library, do not cut glue accurately or do not purify in other ways
The library of a fixed size is obtained, in this way, the library size of such as prebuild is 500bp, the library size finally obtained is usual
In 300-900bp, so, more preferably, work as L ' >=2L, determine that the CNV is missing from type, work as L '≤0.2L, determines that the CNV is
It is inserted into type, is detected more acurrate.Fig. 3 A and Fig. 3 C are the detections about deletion type CNV, and window more each first is corresponding
Depth finds out the window that obvious drop occurs, and detects the position lacked, but for the accurate location of this missing, precision exists
In one window ranges at two ends, the calculating of depth is sequenced by the above window, can substantially determine the boundary occurred.And if same
When detection pe reads (reads itself is not covered on absent region) between length exception has occurred compared with normal length
When, it just provides for this missing independently of evidence existing for the missing except depth, avoids the sun of vacation caused by other reasons
Property, while can also determine a rough occurrence scope.
The judgement of CNV further include: according to the incomplete comparison in comparison result to the read on reference sequences, determine CNV
Exact position and size.Described incomplete comparison also referred to as isolates read to the read on reference sequences, isolates read
(split reads) cannot whole section compare upper reference sequences, one end compares upper one end and cannot compare, and isolates reading according to these
Section determines the exact position of CNV and size includes: the part that cannot be compared for intercepting and isolating in read, and the part of interception is determined
Justice isolates segment for one;Segment will be isolated to compare to reference sequences, obtain and isolate position of the segment on reference sequences;It is based on
It isolates position of the segment on reference sequences and this isolates position of the affiliated read of segment on reference sequences and described two
Distance of the position on reference sequences determines exact position and the size of the CNV.As shown in Figure 3 C, if just isolated
Reads (split reads) has covered the position that this missing occurs, and can use the breakpoint location in split reads
The accurate position for determining missing and occurring.Fig. 3 B is the detection for repeating or being inserted into Type C NV, is similarly that difference occurs based on depth
Window determines approximate region, and pe reads provides auxiliary and supports, and then finds out breakpoint with split reads, finally integrates
The case where obtaining the structure variation actually occurred.