CN108256291A - It is a kind of to generate the method with higher confidence level detection in Gene Mutation result - Google Patents

It is a kind of to generate the method with higher confidence level detection in Gene Mutation result Download PDF

Info

Publication number
CN108256291A
CN108256291A CN201611234002.2A CN201611234002A CN108256291A CN 108256291 A CN108256291 A CN 108256291A CN 201611234002 A CN201611234002 A CN 201611234002A CN 108256291 A CN108256291 A CN 108256291A
Authority
CN
China
Prior art keywords
vcf
values
gene mutation
gene
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611234002.2A
Other languages
Chinese (zh)
Inventor
杨兴礼
付永全
刘小军
尹潼
陶涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bringspring Science And Technology Co Ltd
Hangzhou M Gene Technology Co Ltd
Original Assignee
Bringspring Science And Technology Co Ltd
Hangzhou M Gene Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bringspring Science And Technology Co Ltd, Hangzhou M Gene Technology Co Ltd filed Critical Bringspring Science And Technology Co Ltd
Priority to CN201611234002.2A priority Critical patent/CN108256291A/en
Publication of CN108256291A publication Critical patent/CN108256291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of methods for having higher confidence level detection in Gene Mutation result using having the generation of detection in Gene Mutation analysis result, the site mutation data file for the VCF forms that this method is generated using gene mutation analysis software, statistics is in each file, the genotype and its distribution situation of quality information recorded in variant sites removes the site of low genotype quality.And the mutational site data file of comprehensive two kinds of Software Creates, it extracts in the site for having identical variation with a position, generation relatively has the gene mutation data file that detection in Gene Mutation analysis result has higher confidence level, for subsequent analysis flow.

Description

It is a kind of to generate the method with higher confidence level detection in Gene Mutation result
Technical field
The present invention relates to biological gene mutation analysis fields, are analyzed more particularly to a kind of using existing detection in Gene Mutation As a result method of the generation with higher confidence level detection in Gene Mutation result.
Background technology
Genome sequencing is by the way that with the high-throughput gene sequencing technology of a new generation, high power is carried out to personal full-length genome The sequencing of rate, then result and standard human's genome are compared, obtain the process of complete personal whole genome sequence.Closely Over a little years, by extensive epidemiological survey, scientists have been found that a large amount of genetic mutation and the close phase of human diseases It closes.
With the rapid development of high throughput sequencing technologies, using newest large-scale gene sequencer and cost significantly Decline, full genome detection has been able to detect most hereditary information in genes of individuals group.But how to so huge Sequencing sequence analyzed and handled, exactly stubborn problem the most now.Therefore, a large amount of gene data analysis software It comes into being.
The data analysis flow of full-length genome mainly includes quality control, sequence alignment, abrupt climatic change, mutation annotation.Mesh Preceding application is the most universal the analysis of biological information software such as GATK, Atlas, BWA, Samtools.Wherein, gene comparison software BWA is mainly used for carrying out sequence alignment from the initial data of sequencing, and the comparison data of generation is usually deposited with SAM file formats Storage.Gene mutation analysis software Samtools can be converted into SAM files binary BAM files, substantially reduce required Memory space, and improve the efficiency of data analysis.Therefore BAM files are that most widely used sequencing data stores lattice at present Formula.By aforesaid operations, a human genome with standard gene group is compared, finds out various gene mutations, including SNPs, The several genes mutation analysis software such as Indels is adapted to, most widely used gene mutation analysis software include GATK, Atlas and Samtools.
The standards of grading and algorithm that each analysis software has its exclusive, then determine each according to these standards of grading Site is with the presence or absence of mutation.However pass through practice for many years, it has been found that all there are many deficiencies for each points-scoring system, cause to dash forward The recall rate and false positive of change are all excessively high, largely affect the development of subsequent experimental.
The standards of grading and algorithm that each gene mutation analysis software has its exclusive, then determine according to these standards of grading With the presence or absence of mutation, the testing result of a site mutation is that there are results for the maximum probability based on statistics in each fixed site. No matter which kind of algorithm, as long as the number of repetition of segment is more in initial data, around the site and reference gene group compares one Cause property is higher, and " support " which obtains is stronger, and credible result degree is higher.Confidence level is recorded in each prominent Become in the GQ labels in the Format row in site, referred to as genotype quality (Genotype Quality), be defined as GQ=-10* Log (1-p), p are probability existing for genotype, it is seen that GQ values are bigger, and the possibility of the site sequencing result is bigger.
By " high probability " principle of gene mutation analysis software it is found that in the detection knot to same gene detection different software In fruit, such as occur identical variation testing result in same site, illustrate this site in different software, all with high probability branch It holds this site and produces variation.I.e. in different genes mutation analysis software, such as certain variation is all detected in same position, that should The confidence level that this variation has occurred in position is higher.Therefore even if different genes mutation detecting analysis software is to same analysis pair In the testing result of elephant, it even if respectively the Mass Distribution of different genotype is different, can also merge, generate relatively existing inspection Survey the genetic test mutation result as a result, with higher confidence level
Invention content
According to above-mentioned technical background, the present invention uses following technical proposals:
S1, analysis gene initial data is pre-processed, obtains cdna sample data;
S2, using arbitrary two kinds of gene mutation analysis software, the cdna sample data are analyzed respectively, are obtained The mutational site detection data of VCF file formats;
S3, change point bit distribution statistics is carried out respectively to two kinds of Software Create VCF files, according to respective site GT values and GQ Distribution value situation determines GQ threshold values, and screens variant sites in original VCF files, each self-generating according to threshold value in certain proportion VCF files to be combined;
S4, processing is merged to the consistent site in the VCF files to be combined, generates and have detection in Gene Mutation point Analysing result generation has compared with high gene type quality detection in Gene Mutation result data files;Higher confidence level is defined as opposite S2 and walks The mutational site detection data confidence level higher generated in rapid;
Preferably, the step S1 includes:
S11, it is analysed to gene initial data and standard gene group is compared, determines the position of short sequence in the genome It puts, and by short sequence assembling into complete reference gene group data;
S12, it resequences to the reference gene group data according to predetermined order, obtains the reference gene group number According to binary format file;
S13, it adds sequence head accordingly on the binary format file of the reference gene group data, obtains binary system The cdna sample data of form;
Preferably, the step S1 further comprises:
S14, part is carried out to the gene data of the binary format, and comparison and sequencing quality recalculate again, obtain The cdna sample data of sequencing quality optimization.
Preferably, the step S2 includes:
S2, with two kinds of detection in Gene Mutation softwares respectively by sequencing quality optimize cdna sample data and standard gene group It is compared, finds out various gene mutation information, change information and small fragment insertion/deletion etc. including site, each self-generating is treated Optimize VCF files.
Preferably, the step S3 includes:
S31, using the combination of VCF files CHROM and POS to be optimized as index, to each VCF files carry out The progressive scan of valid data row;
S32, in S31 scanning processes, read the GT values being present in each variant sites in Format row and GQ values, and unite For meter in each VCF files, different GT values correspond to GQ Distribution values situation in variant sites.Count different GT values corresponding sites simultaneously Total quantity;
S33, different GT values GQ Distribution value situations are counted according to S32 steps, calculates difference GT values site pair in the VCF files The GQ threshold values answered;
S34, again using the combination of CHROM and POS as index, it is corresponding to each detection in Gene Mutation software VCF files carry out the progressive scan extraction GT and GQ values of valid data row respectively, when the GT that GQ values are more than S33 step calculating corresponds to threshold During value, the row site information is exported to a new VCF files.VCF files to be combined are generated after the end of scan;
Preferably, GT values described in S3 steps correspond to threshold value satisfaction, and in GT value corresponding sites, GQ values are less than the site of threshold value Quantity is a designated ratio of site total quantity;
Preferably, the step S4 includes:
S41, the VCF files to be combined to generation, respectively using the combination of CHROM and POS as index be index into Row progressive scan;
S42, in scanning process, such as in two files, same corresponding respective ALT of index is equal, by two files CHROM, POS, REF, ALT, GT, GQ and other information needed are combined by VCF file format rows, and are output to a new VCF In file.To the end of scan, this VCF file is exactly to generate with higher confidence level detection in Gene Mutation result data files. Higher confidence level is defined as the mutational site detection data confidence level higher generated in opposite S2 steps.
Beneficial effects of the present invention are as follows:
Original sample will be sequenced from same gene in technical solution of the present invention, with two kinds of different genes abrupt climatic changes The mutational site data file of the VCF forms of Software Create corresponds to variant sites by calculating each VCF files difference GT values GQ is distributed, and screening removes the site of low GQ values, and the variant sites information after screening is merged, and generation has higher credible The accidental data file of degree is laid a solid foundation for follow-up study exploitation.
Description of the drawings
The specific embodiment of the present invention is described in further detail below in conjunction with the accompanying drawings;
Fig. 1 shows typical VCF formatted files;
Fig. 2 shows removal low quality gene mutation site algorithm flow charts;
Fig. 3 shows the algorithm flow chart for merging the site of two kinds of gene mutation Software Creates;
Fig. 4 shows overall flow figure.
Specific embodiment
In order to illustrate more clearly of the present invention, the present invention is done further with reference to preferred embodiments and drawings It is bright.Similar component is indicated with identical reference numeral in attached drawing.It will be appreciated by those skilled in the art that institute is specific below The content of description is illustrative and be not restrictive, and should not be limited the scope of the invention with this.
The invention discloses a kind of gene point mutation data files using two kinds of Software Creates, optimize, combine To generate the method for confidence level detection in Gene Mutation data file.Concrete scheme of the present invention is as follows:
First, using several genes mutation analysis software, gene initial data to be analyzed is pre-processed, generates gene Sample data, specifically:
It is analysed to gene initial data and standard gene group and carries out comparison to be compared with standard gene group, determine short sequence Row position in the genome, and by short sequence assembling into complete reference gene group data, short sequence assembling into complete Mankind's reference gene group file;It resequences to the reference gene group data according to predetermined order, obtains the reference The binary format file of genomic data;Sequence is added accordingly on the binary format file of the reference gene group data Row head obtains the cdna sample data of binary format;Part is carried out to the gene data of the binary format to compare again It is recalculated with sequencing quality, obtains the cdna sample data of sequencing quality optimization.
Second step using two kinds of detection in Gene Mutation softwares, obtains the variation message file of VCF forms, specifically:
With the cdna sample data of the quality optimization, with two kinds of detection in Gene Mutation softwares and standard gene group respectively into Row compares, and finds out various gene mutation information, and generation includes the orderly to be optimized of site change information and small fragment insertion/deletion VCF files.
Generation mutational site file is VCF forms after gene mutation analysis detection, and what this form divided by " # " symbol started says Outside bright information row, valid data information row is by CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO, FORMAT etc. 10 Row and several self-defined row compositions, wherein CHROM are classified as chromosome, and POS is classified as position, and REF is reference nucleotide, and ALT is Variant nucleotides have recorded GT information and GQ information in FORMAT row, the genotype (Genotype) that respectively currently makes a variation and The mass value (Genotype Quality) of genotype, GQ values are represented in possibility existing for the genotype of the site.The value is got over Height, then the confidence level of testing result is bigger.Typical VCF formatted files are shown in Fig. 1.
Third walks, by the orderly VCF files of generation, according to the distribution that different GT and GQ is corresponded in respective site information, Low GQ values site is removed, specifically:
Using the combination of VCF files CHROM and POS to be optimized as index, each VCF files are carried out respectively The progressive scan of valid data row;The GT values in each row variation site and GQ values are read, and is counted in single VCF files, it is different GT values correspond to GQ Distribution values situation in variant sites.The total quantity of different GT values corresponding sites is counted simultaneously;It is different according to statistics GT value GQ Distribution value situations calculate the corresponding GQ threshold values in difference GT values site in the VCF files;VCF files are rescaned, are made The GQ values for obtaining designated ratio are less than the site of the threshold value, are all deleted, and export high GQ sites, form high VCF texts to be combined Part.Screening process is shown in Fig. 2.
VCF files to be combined are carried out the merging of identical entry, specifically by the 4th step:
By VCF files to be combined, progressively scanned respectively using the combination of CHROM and POS as index for index; In scanning process, such as in two files, same corresponding respective ALT of index is equal, by CHROM, POS of two files, REF, ALT, GT, GQ and other information needed are combined by VCF file format rows, and are output in a new VCF files.It closes And flow is shown in Fig. 3.
Below by one group of embodiment, the present invention will be further described:
1st, from thousand human genome FTP (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/ Data it is the data file under HG00731 files that lower number is downloaded on), then is selected using similary process flow early period, right Initial data to be analyzed obtained after genome alignment, genome rearrangement sequence, addition sequence head and sequencing quality recalculate Quality optimization cdna sample data:HG00731.mapped.ILLUMINA.bwa.PUR.low_ coverage.20130422.bam。
2nd, with the cdna sample data of the quality optimization, using human_g1k_v37.fasta as reference gene group, difference With Samtools and Gatk programs, everyone various gene mutation information are found out, generation includes site and changes information and small pieces Section is inserted into or the orderly VCF files of missing, software command form are respectively."samtools mpileup--skip-indels- Uf% (reference) s% (local_dir) s/% (bam) s | bcftools call--skip-variants indels- Mv-Ov-f GQ-o% (local_dir) s/% (vcf) s " and " java-Xmx2g-jar/mnt/bioinfo/GATK/ GenomeAnalysisTK.jar-T HaplotypeCaller-R% (reference) s--annotation QualByDepth--annotation HaplotypeScore--annotation MappingQualityRankSumTest--annotation ReadPosRankSumTest--annotation FisherStrand--annotation AlleleBalance--annotationRMSMappingQuality-l INFO- Stand_call_conf 50-stand_emit_conf 10-dcov 1000-I% (local_dir) s/% (bam) s-o% (local_dir) s/% (vcf) s-nct 2 ".Samtools.hg00731.vcf and gatk.hg00731.vcf texts are generated respectively Part
3rd, by the VCF files to be optimized --- samtools.hg00731.vcf and gatk.hg00731.vcf files The combination of CHROM and POS carries out each VCF files the progressive scan of valid data row as index;
In scanning process, the GT values being present in each variant sites in Format row and GQ values are read, and count each In VCF files, different GT values correspond to GQ Distribution values situation in variant sites.The sum of different GT values corresponding sites is counted simultaneously Amount;As a result such as table 1.
According to different GT values GQ Distribution value situations are come out, in the case where specified screening ratio is 0.01, calculate The corresponding GQ threshold values in difference GT values site in the VCF files;As a result such as table 2.
Again using the combination of CHROM and POS as indexing, to samtools.hg00731.vcf with Gatk.hg00731.vcf files progressive scan extraction GT and GQ values, when the GT that GQ values are more than the calculating of table 2 corresponds to threshold value, output The row site information is to a new VCF files.VCF files to be combined are generated after the end of scan: Filter.samtools.hg00731.vcf and filter.gatk.hg00731.vcf
4th, to filter.samtools.hg00731.vcf and filter.gatk.hg00731.vcf, respectively with CHROM Item and the combination of POS are progressively scanned as index to index;In scanning process, such as in two files, same index Corresponding respective ALT equal, by CHROM, POS, REF, ALT, GT, GQ of two files and other information needed, by VCF files Format string is combined, and is output to a new VCF files merge.hg00731.vcf.To the end of scan, Merge.hg00731.vcf files are exactly to generate with higher confidence level detection in Gene Mutation result.
Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention for those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways, all embodiments can not be exhaustive here, it is every to belong to this hair The obvious changes or variations that bright technical solution is extended out are still in the row of protection scope of the present invention.

Claims (7)

1. a kind of side that there is higher confidence level detection in Gene Mutation result using having the generation of detection in Gene Mutation analysis result Method, which is characterized in that the step of this method includes:
S1, analysis gene initial data is pre-processed, obtains cdna sample data;
S2, using arbitrary two kinds of gene mutation analysis software, the cdna sample data are analyzed respectively, obtain VCF texts The mutational site detection data of part form;
S3, change point bit distribution statistics is carried out respectively to two kinds of Software Create VCF files, according to respective site GT values and GQ values point Cloth situation determines GQ threshold values, and screens variant sites in original VCF files according to threshold value in certain proportion, and each self-generating is waited to close And VCF files;
S4, processing is merged to the consistent site in the VCF files to be combined, generates and have detection in Gene Mutation analysis knot Fruit generation has higher confidence level detection in Gene Mutation result data files;Higher confidence level is defined as generating in opposite S2 steps Mutational site detection data confidence level higher.
2. a kind of generated using existing detection in Gene Mutation analysis result according to claim 1 has higher confidence level base Because of the method for abrupt climatic change result, which is characterized in that the step S1 includes:
S11, it is analysed to gene initial data and standard gene group is compared, determines the position of short sequence in the genome, And by short sequence assembling into complete reference gene group data;
S12, it resequences to the reference gene group data according to predetermined order, obtains the reference gene group data Binary format file;
S13, it adds sequence head accordingly on the binary format file of the reference gene group data, obtains binary format Cdna sample data.
3. a kind of generated using existing detection in Gene Mutation analysis result according to claim 2 has higher confidence level base Because of the method for abrupt climatic change result, which is characterized in that the step S1 further comprises:
S14, part is carried out to the gene data of the binary format, and comparison and sequencing quality recalculate again, are sequenced The cdna sample data of quality optimization.
4. the method for evaluation detection in Gene Mutation analysis process error according to claim 1, which is characterized in that the step Rapid S2 includes:
S2, cdna sample data and standard gene group that sequencing quality optimizes are carried out respectively with two kinds of detection in Gene Mutation softwares It compares, finds out various gene mutation information, change information and small fragment insertion/deletion etc. including site, generate VCF texts to be optimized Part.
5. a kind of generated using existing detection in Gene Mutation analysis result according to claim 1 has higher confidence level base Because of the method for abrupt climatic change result, which is characterized in that the step S3 includes:
S31, using the combination of VCF files CHROM and POS to be optimized as index, each VCF files are carried out effective The progressive scan of data row;
S32, in S31 scanning processes, read be present in each variant sites Format row in GT values and GQ values, and count exist In each VCF files, different GT values correspond to GQ Distribution values situation in variant sites.Count the total of different GT values corresponding sites simultaneously Quantity;
S33, different GT values GQ Distribution value situations are counted according to S32 steps, it is corresponding calculates difference GT values site in the VCF files GQ threshold values;
S34, again using the combination of CHROM and POS as index, VCF corresponding to each detection in Gene Mutation software text Part carries out the progressive scan extraction GT and GQ values of valid data row respectively, when the GT that GQ values are more than S33 step calculating corresponds to threshold value, The row site information is exported to a new VCF files.VCF files to be combined are generated after the end of scan.
6. a kind of according to claim 5 have higher confidence level gene using existing detection in Gene Mutation analysis result generation The method of abrupt climatic change result, which is characterized in that the GT values correspond to threshold value satisfaction, and in GT value corresponding sites, GQ values are less than The bit number of points of threshold value are a designated ratio of site total quantity.
7. a kind of according to claim 1 have higher confidence level gene using existing detection in Gene Mutation analysis result generation The method of abrupt climatic change result, which is characterized in that the step S4 includes:
S41, the VCF files to be combined to generation, respectively using the combination of CHROM and POS as index for index progress by Row scanning;
S42, in scanning process, such as in two files, same corresponding respective ALT of index is equal, by two files CHROM, POS, REF, ALT, GT, GQ and other information needed are combined by VCF file format rows, and are output to a new VCF In file.To the end of scan, this VCF file is exactly to generate with higher confidence level detection in Gene Mutation result;It is higher credible Degree is defined as the mutational site detection data confidence level higher generated in opposite S2 steps.
CN201611234002.2A 2016-12-28 2016-12-28 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result Pending CN108256291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611234002.2A CN108256291A (en) 2016-12-28 2016-12-28 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611234002.2A CN108256291A (en) 2016-12-28 2016-12-28 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result

Publications (1)

Publication Number Publication Date
CN108256291A true CN108256291A (en) 2018-07-06

Family

ID=62719332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611234002.2A Pending CN108256291A (en) 2016-12-28 2016-12-28 It is a kind of to generate the method with higher confidence level detection in Gene Mutation result

Country Status (1)

Country Link
CN (1) CN108256291A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346127A (en) * 2018-08-09 2019-02-15 中山大学 A kind of statistical analysis technique driving gene for detecting potential cancer
CN109920484A (en) * 2019-02-14 2019-06-21 北京安智因生物技术有限公司 A kind of analysis method and system of the genetic test data of sequenator
CN111899790A (en) * 2020-08-17 2020-11-06 天津诺禾医学检验所有限公司 Sequencing data processing method and device
TWI785847B (en) * 2021-10-15 2022-12-01 國立陽明交通大學 Data processing system for processing gene sequencing data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346127A (en) * 2018-08-09 2019-02-15 中山大学 A kind of statistical analysis technique driving gene for detecting potential cancer
CN109346127B (en) * 2018-08-09 2021-10-08 中山大学 Statistical analysis method for detecting potential cancer driver gene
CN109920484A (en) * 2019-02-14 2019-06-21 北京安智因生物技术有限公司 A kind of analysis method and system of the genetic test data of sequenator
CN111899790A (en) * 2020-08-17 2020-11-06 天津诺禾医学检验所有限公司 Sequencing data processing method and device
TWI785847B (en) * 2021-10-15 2022-12-01 國立陽明交通大學 Data processing system for processing gene sequencing data

Similar Documents

Publication Publication Date Title
CN107615283B (en) Methods, software and systems for diploid genome assembly and haplotype sequence reconstruction
EP3837690B1 (en) Systems and methods for using neural networks for germline and somatic variant calling
Sakoparnig et al. Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species
CN108256291A (en) It is a kind of to generate the method with higher confidence level detection in Gene Mutation result
Olson et al. Variant calling and benchmarking in an era of complete human genome sequences
US10810239B2 (en) Sequence data analyzer, DNA analysis system and sequence data analysis method
Hills et al. BAIT: Organizing genomes and mapping rearrangements in single cells
CN105930690A (en) Whole-exome sequencing data analysis method
CN107451429A (en) A kind of system of keyization analysis RNA data
CN108681661A (en) A method of generating sample ID with experiment
He et al. Comprehensive fundamental somatic variant calling and quality management strategies for human cancer genomes
Tsui et al. Extracting allelic read counts from 250,000 human sequencing runs in Sequence Read Archive
Sakoparnig et al. Whole genome phylogenies reflect long-tailed distributions of recombination rates in many bacterial species
Alfonsi et al. Data-driven recombination detection in viral genomes
Schull et al. Champagne: automated whole-genome phylogenomic character matrix method using large genomic indels for homoplasy-free inference
Jiang et al. Long-read based novel sequence insertion detection with rCANID
JPWO2019132010A1 (en) Methods, devices and programs for estimating base species in a base sequence
Henke et al. Identification of Mutations in Zebrafish Using Next‐Generation Sequencing
CN109920485B (en) Method for carrying out mutation simulation on sequencing sequence and application thereof
Padmavathi et al. A comprehensive in-silico computational analysis of twenty cancer exome datasets and identification of associated somatic variants reveals potential molecular markers for detection of varied cancer types
US20110004616A1 (en) Base sequence determination program, base sequence determination device, and base sequence determination method
CN113793641B (en) Method for rapidly judging sample gender from FASTQ file
Bohutínská et al. Population genomic analysis of diploid-autopolyploid species
Olivieri et al. Iterative variable gene discovery from whole genome sequencing with a bootstrapped multiresolution algorithm
Zhang et al. Detecting complex indels with wide length-spectrum from the third generation sequencing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180706

WD01 Invention patent application deemed withdrawn after publication