CN106021995A - Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree - Google Patents

Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree Download PDF

Info

Publication number
CN106021995A
CN106021995A CN201610318269.3A CN201610318269A CN106021995A CN 106021995 A CN106021995 A CN 106021995A CN 201610318269 A CN201610318269 A CN 201610318269A CN 106021995 A CN106021995 A CN 106021995A
Authority
CN
China
Prior art keywords
gene
checking
order
file
site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610318269.3A
Other languages
Chinese (zh)
Inventor
薛成海
雷文婕
侯婷婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wankangyuan (tianjin) Gene Technology Co Ltd
Original Assignee
Wankangyuan (tianjin) Gene Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wankangyuan (tianjin) Gene Technology Co Ltd filed Critical Wankangyuan (tianjin) Gene Technology Co Ltd
Priority to CN201610318269.3A priority Critical patent/CN106021995A/en
Publication of CN106021995A publication Critical patent/CN106021995A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression

Abstract

The invention provides a graphical evaluation method of a DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree. The graphical evaluation method comprises the following steps: 1) data extraction: extracting the sequencing depth data of each site which is contained in different gene areas in a gene list; 2) data merging: when a gene contains excess basic group sites, merging the sequencing depth data of N similar sites into a mean value; and 3) graphical evaluation: displaying the sequencing coverage situation of each site which is contained in different gene areas in the gene list. The graphical evaluation method not only can evaluate indexes including basic group contents and the like but also can evaluate the coverage situation of different gene areas and various types of statistics in multiple genes and samples, and also can vividly report an evaluation result in a graphical way.

Description

A kind of DNA target is to the order-checking graphical appraisal procedure of coverage
Technical field
The invention belongs to gene information data processing field, especially relate to a kind of DNA target to order-checking The graphical appraisal procedure of coverage.
Background technology
High throughput sequencing technologies is the most ripe, and time and expense needed for order-checking the most greatly reduce, Therefore, the research quantity applying this technology for detection genovariation also gets more and more.But high-flux sequence Technology is not perfect, owing to fragment to be measured is expanded by its PCR means to be passed through before order-checking Increase, therefore add the mistake of order-checking.When having taken original sequencing data, sequencing quality is commented Estimate and be just particularly important.Generally, obtaining sequencing data, the first step does quality control exactly, at this One step has the software of many to use, such as FastQC, and it can divide from G/C content, sequence length Sequencing data is estimated by cloth etc. aspect.But, this simply assesses order-checking from general level Whether data have reached to support the requirement of subsequent analysis.
The only exon to gene such as the order-checking of exon group, gene chip order-checking carries out capture order-checking, often Secondary order-checking can relate to many genes.Common quality evaluation software can only be in general level assessment order-checking matter The quality of amount.When paying close attention to the sequencing quality assessment of some concrete gene or assessing gene chip at each During capture level on gene, overall sequencing quality assessment cannot reflect exactly and focuses particularly on The sequencing quality of gene.
Summary of the invention
In view of this, the present invention proposes a kind of DNA target to the order-checking graphical appraisal procedure of coverage, no Only assess the indexs such as base contents, also include the assessment of gene zones of different coverage condition, and at many bases Multiple statistics in cause, multisample, the mode graphically changed reports assessment result visually.
For reaching above-mentioned purpose, the technical scheme is that and be achieved in that: a kind of DNA target is to survey The graphical appraisal procedure of sequence coverage, comprises the following steps:
1) data are extracted, and are used for extracting the order-checking depth data being included in gene zones of different each site interior;
2) data merge, when running into base position that gene comprises and being too much, by close N number of site Order-checking depth data merges into average;
3) figure is shown, shows the order-checking in the gene zones of different being included in list of genes each site interior Coverage condition.
Further, described step 1) information that inputs is bed file and order-checking depth data file depth File, described bed file comprises chromosome number, gene initiation site, gene end site, gene name Annotating with gene region, described depth file comprises chromosome number, chromosomal foci and the order-checking degree of depth.
Further, described bed file comprises the flanking region of gene, exon 1 and intron district, It is required for the targeting order-checking targeted region i.e. exon of gene and the zones of different of gene is carried out position The extraction of dot information, then corresponding site is carried out by the order-checking depth data in recycling depth file The annotation of the order-checking degree of depth.
Further, step 3) information that inputs includes step 1), 2) data that obtain and from bed The list of genes extracted in file, exports picture file, and the different piece of picture represents the not same district of gene Territory.
Further, picture file only shows the exon of gene, and the different piece of picture represents difference Exon.
Relative to prior art, a kind of DNA target of the present invention is graphically assessed to order-checking coverage Method has the advantage that
The present invention is with output result (bed file and the order-checking degree of depth of common exon group order-checking flow processing Data file) as input, based on sequencing data being processed the order-checking depth data that obtains and gene not With region annotate data, complete extraction and the integration of data, present exon group order-checking to individual gene outside The coverage condition of aobvious son, finally shows the order-checking depth profile situation of each gene with the form of picture. The present invention not only assesses the indexs such as base contents, also includes the assessment of gene zones of different coverage condition, and Its multiple statistics in polygenes, multisample, the mode graphically changed reports assessment result visually, Reflect the sequencing quality of concrete concerned gene exactly.
Accompanying drawing explanation
The accompanying drawing of the part constituting the present invention is used for providing a further understanding of the present invention, the present invention's Schematic description and description is used for explaining the present invention, is not intended that inappropriate limitation of the present invention.? In accompanying drawing:
Fig. 1 is the schematic flow sheet of the present invention.
Detailed description of the invention
It should be noted that the feature in the case of not conflicting, in embodiments of the invention and embodiment Can be mutually combined.
Describe the present invention below with reference to the accompanying drawings and in conjunction with the embodiments in detail.
As it is shown in figure 1, the step of the present invention is as follows:
1, data are extracted
Extracting the order-checking depth data being included in gene zones of different each site interior, input information includes Bed file, this document comprise chromosome number, gene initiation site, gene end site, gene name and Gene region annotates, and also includes the depth data file (depth file) that checks order, and this document comprises dyeing Body number, chromosomal foci and the order-checking degree of depth.
When bed file comes from exon group sequencing data, it just only includes chromosome number, gene rises Beginning site, gene end site and gene name, do not include that gene region annotates, and needs outside artificial interpolation aobvious Son numbering is to show differentiation, and the information that exports also is slightly different, and no longer includes gene region annotation information, But include the numbering of gene extron.
2, data merge
When running into base position that gene comprises and being too much, can be by the order-checking degree of depth in close N number of site Data merge into average.The number in the site that will merge.Concrete value can be according to range file Size determine.When range file is excessive, can suitably N value be arranged big the most a bit, otherwise Speed and the quality of final picture of mapping can be affected and present effect.
3, figure is shown
Show the order-checking coverage condition in the gene zones of different being included in list of genes each site interior.Defeated Go out the form of file, pdf or svg.Each output file may comprise multiple gene, therefore scheme Sheet may be exactly a pattern matrix.The first width figure at each output file is legend, is followed by other The order-checking coverage condition of each gene.
If simply showing the exon of gene, there is no legend, simply showing by two kinds of color interval and representing not Same exon region.
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all at this Within the spirit of invention and principle, any modification, equivalent substitution and improvement etc. made, should be included in Within protection scope of the present invention.

Claims (5)

1. a DNA target is to the order-checking graphical appraisal procedure of coverage, it is characterised in that include following Step:
1) data are extracted, and are used for extracting the order-checking depth data being included in gene zones of different each site interior;
2) data merge, when running into base position that gene comprises and being too much, by close N number of site Order-checking depth data merges into average;
3) figure is shown, shows the order-checking in the gene zones of different being included in list of genes each site interior Coverage condition.
A kind of DNA target the most according to claim 1 to the order-checking graphical appraisal procedure of coverage, It is characterized in that, described step 1) information that inputs is bed file and order-checking depth data file depth File, described bed file comprises chromosome number, gene initiation site, gene end site, gene name Annotating with gene region, described depth file comprises chromosome number, chromosomal foci and the order-checking degree of depth.
A kind of DNA target the most according to claim 2 to the order-checking graphical appraisal procedure of coverage, It is characterized in that, described bed file comprises the flanking region of gene, exon 1 and intron district, needs For the targeting order-checking targeted region i.e. exon of gene, the zones of different of gene is carried out site The extraction of information, then corresponding site is surveyed by the order-checking depth data in recycling depth file The annotation of the sequence degree of depth.
A kind of DNA target the most according to claim 1 to the order-checking graphical appraisal procedure of coverage, It is characterized in that, step 3) information that inputs includes step 1), 2) data that obtain and from bed The list of genes extracted in file, exports picture file, and the different piece of picture represents the not same district of gene Territory.
A kind of DNA target the most according to claim 1 to the order-checking graphical appraisal procedure of coverage, It is characterized in that, picture file only shows the exon of gene, the different piece of picture represent different outside Aobvious son.
CN201610318269.3A 2016-05-13 2016-05-13 Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree Pending CN106021995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610318269.3A CN106021995A (en) 2016-05-13 2016-05-13 Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610318269.3A CN106021995A (en) 2016-05-13 2016-05-13 Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree

Publications (1)

Publication Number Publication Date
CN106021995A true CN106021995A (en) 2016-10-12

Family

ID=57099550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610318269.3A Pending CN106021995A (en) 2016-05-13 2016-05-13 Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree

Country Status (1)

Country Link
CN (1) CN106021995A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914628A (en) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 Method and system for detecting polymorphism locus of genome target region
WO2014149134A2 (en) * 2013-03-15 2014-09-25 Guardant Health Inc. Systems and methods to detect rare mutations and copy number variation
CN104232649A (en) * 2013-06-10 2014-12-24 深圳华大基因科技有限公司 Genetic mutant and application of genetic mutant

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101914628A (en) * 2010-09-02 2010-12-15 深圳华大基因科技有限公司 Method and system for detecting polymorphism locus of genome target region
WO2014149134A2 (en) * 2013-03-15 2014-09-25 Guardant Health Inc. Systems and methods to detect rare mutations and copy number variation
CN104232649A (en) * 2013-06-10 2014-12-24 深圳华大基因科技有限公司 Genetic mutant and application of genetic mutant

Similar Documents

Publication Publication Date Title
Schwarzer et al. Two independent modes of chromatin organization revealed by cohesin removal
Chang et al. ‘Genetic heterogeneity'in HER2/neu testing by fluorescence in situ hybridization: a study of 2522 cases
Pender et al. Efficient genotyping of KRAS mutant non-small cell lung cancer using a multiplexed droplet digital PCR approach
Griffiths et al. The need for standardisation: Exemplified by a description of the diversity, community structure and ecological indices of soil nematodes
Giordano et al. Molecular testing for oncogenic gene mutations in thyroid lesions: a case-control validation study in 413 postsurgical specimens
ATE445841T1 (en) METHOD FOR THE SPECIFIC DETECTION OF TUMOR CELLS AND THEIR PRECURSORS IN CERVICAL SCREAMS BY SIMULTANEOUS MEASUREMENT OF AT LEAST 2 DIFFERENT MOLECULAR MARKERS
CN106372459A (en) Method and device for detecting copy number variation based on amplicon next generation sequencing
Fabrizii et al. Fractal analysis of cervical intraepithelial neoplasia
Lunde et al. A versatile toolbox for semi-automatic cell-by-cell object-based colocalization analysis
EP4234723A3 (en) Enhancement of cancer screening using cell-free viral nucleic acids
Frantz et al. Using genetic tools to estimate the prevalence of non‐native red deer (Cervus elaphus) in a Western European population
CN106011230A (en) Primer composition for detecting fragmentized DNA target area and application thereof
Rozovski et al. Why is the immunoglobulin heavy chain gene mutation status a prognostic indicator in chronic lymphocytic leukemia?
CN100530219C (en) Image processing apparatus
Lashen et al. The characteristics and clinical significance of atypical mitosis in breast cancer
Tian et al. ContrastRank: a new method for ranking putative cancer driver genes and classification of tumor samples
Van Dyk et al. RUBIC identifies driver genes by detecting recurrent DNA copy number breaks
Coulon et al. Mind the gap: genetic distance increases with habitat gap size in Florida scrub jays
Mena et al. Development and validation of a protocol for optimizing the use of paraffin blocks in molecular epidemiological studies: The example from the HPV-AHEAD study
Huret et al. Genetic population structure of anchovy (Engraulis encrasicolus) in North-western Europe and variability in the seasonal distribution of the stocks
Larsson et al. Cell line-based xenograft mouse model of paediatric glioma stem cells mirrors the clinical course of the patient
CN110246543B (en) Method and computer system for detecting copy number variation by using single sample based on second-generation sequencing technology
Hu et al. Exploiting noise in array CGH data to improve detection of DNA copy number change
van Kooten et al. Spatially resolved genetic analysis of tissue sections enabled by microscale flow confinement retrieval and isotachophoretic purification
CN106021995A (en) Graphical evaluation method of DNA (Deoxyribose Nucleic Acid) targeted sequencing cover degree

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20161012

RJ01 Rejection of invention patent application after publication