CN110246544A - A kind of biomarker selection method and system based on confluence analysis - Google Patents

A kind of biomarker selection method and system based on confluence analysis Download PDF

Info

Publication number
CN110246544A
CN110246544A CN201910409758.3A CN201910409758A CN110246544A CN 110246544 A CN110246544 A CN 110246544A CN 201910409758 A CN201910409758 A CN 201910409758A CN 110246544 A CN110246544 A CN 110246544A
Authority
CN
China
Prior art keywords
gene
importance
algorithm
data
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910409758.3A
Other languages
Chinese (zh)
Other versions
CN110246544B (en
Inventor
刘婉婷
张弓
何庆瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
University of Jinan
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN201910409758.3A priority Critical patent/CN110246544B/en
Publication of CN110246544A publication Critical patent/CN110246544A/en
Application granted granted Critical
Publication of CN110246544B publication Critical patent/CN110246544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Abstract

The invention discloses a kind of biomarker selection method and system based on confluence analysis, this method include the following steps: to choose raw sequencing data;Raw sequencing data uses FANSe algorithm, carries out mapping analysis, obtains gene quantification information, sets gene original packet;Importance ranking of the gene in original packet is calculated using GWGS algorithm, then is integrated the importance of every group of gene using GWRS algorithm, the gene importance after being integrated arranges list, and gene is sorted from high to low according to importance;Data mining is carried out using the Wrapper Feature Selection model based on SVM, data sample type is distinguished, filters out biomarker in the gene high from importance.The present invention is according to sequencing data feature, the polycentric raw sequencing data of organic combination, and platform, sample, the systematical difference in experimental design are solved, depth data excavation is carried out using the high confluence analysis algorithm of robustness, is excavated to common, special, crucial large biological molecule.

Description

A kind of biomarker selection method and system based on confluence analysis
Technical field
The present invention relates to biomarker detection technique fields, and in particular to a kind of biomarker based on confluence analysis Selection method and system.
Background technique
Find more common and high specificity, key strong large biological molecule (including nucleic acid and protein), Ke Yiti Therapeutic treatment effect is risen, but existing molecular marker is difficult to meet common, special, crucial requirement, molecular marker is big Mostly analyze to obtain using multicenter data, and the conventional treatment mode of existing multicenter data (is assembled using Meta analysis Analysis), the conclusion of multicenter study is integrated, since multicenter data are commonly present experiment object disparity, instrumental method difference etc. Inconsistent factor is not added and respectively merges that the method that its initial data is analyzed is not appropriate, and meta-analysis is vulnerable to original number Bias is caused according to the influence of the factors such as quality, original researcher's analysis level, original research tool mistakes and omissions, so that a large amount of precious Your data fails to be fully used.
Summary of the invention
In order to overcome the shortcomings of the prior art, the present invention provides a kind of biomarker selection based on confluence analysis Method and system establish a kind of confluence analysis strategy, have the whole of strong robustness using high-precision bottom layer treatment algorithm development Hop algorithm directly carries out confluence analysis to multicenter raw sequencing data, to make full use of multicenter magnanimity sequencing data, digs Dig common, special, crucial large biological molecule.
In order to achieve the above object, the invention adopts the following technical scheme:
The present invention provides a kind of biomarker selection method based on confluence analysis, includes the following steps:
S1: raw sequencing data is chosen;
S2: raw sequencing data uses FANSe algorithm, carries out mapping analysis, obtains gene quantification information, sets base Because of original packet;
S3: importance ranking of the gene in original packet is calculated using GWGS algorithm, then using GWRS algorithm by every group The importance of gene is integrated, and the gene importance after being integrated arranges list, and gene is sorted from high to low according to importance;
S4: data mining is carried out using the Wrapper Feature Selection model based on SVM, distinguishes data sample This type filters out biomarker in the gene high from importance.
Raw sequencing data described in step S1 as a preferred technical solution, using what is generated from sequencing machine The sequencing file of fastq format.
Carry out mapping analysis, specific steps described in step S2 as a preferred technical solution, are as follows:
Short reading sequence is broken into multiple nonoverlapping seeds, each seed degree is identical, by all seeds and refers to base Because group is matched, statistics marking is carried out according to initiation site to the seed matched, is ranked according to score height, according to Coordination interception refers to gene order, short reading sequence is compared with intercepting with reference to genome sequence, by the highest order in comparison Short reading sequence location obtains gene quantification information as final position.
It is important in original packet that gene is calculated using GWGS algorithm described in step S3 as a preferred technical solution, Property sequence, first using GWRS algorithm to mapping analysis after sequencing data evaluate and test, according to expression significance degree assign Give different numerical value, the specific formula for calculation that GWRS algorithm is evaluated and tested are as follows:
Wherein, rijIndicate the rank value of the i-th gene in jth microarray, i ∈ (1, m), j ∈ (1, n), sijFor GWRS Value, to containing the gene of NA, s in microarrayijValue is also set as NA.
The importance of every group of gene is integrated using GWRS algorithm again in step S3 as a preferred technical solution, specifically Calculation formula are as follows:
Wherein, ωjIndicate the weighted value of jth microarray, sijFor GWRS value.
The Wrapper Feature Selection based on SVM is used described in step S4 as a preferred technical solution, Model carries out data mining, specific steps are as follows:
S41: Wrapper Feature Selection model, training Wrapper Feature are established based on SVM Selection model;
S42: trained Wrapper Feature Selection will be input to according to the good genome of importance ranking Model judges to export whether result can separate specimen types, reaches preset condition, export corresponding gene, not up to default Condition, loop-around data mining process is carried out, gradually adds gene until reaching preset condition, the corresponding base of output final result Cause.
The present invention provides a kind of biomarker selection system based on confluence analysis, comprising: raw sequencing data is chosen Module and data-mining module are integrated in module, quantitative analysis module, sequence;
The raw sequencing data chooses module for choosing raw sequencing data, chooses fastq lattice from sequencing machine The sequencing file of formula;
The quantitative analysis module carries out mapping analysis using FANSe algorithm to raw sequencing data, and it is fixed to obtain gene Measure information;
The sequence integrates module for generating the arrangement list of gene importance, calculates gene original using GWGS algorithm Importance ranking in grouping, then integrated the importance of every group of gene using GWRS algorithm, the gene after being integrated is important Property arrangement list, gene is sorted from high to low according to importance;
The data-mining module is for filtering out biomarker, using the Wrapper Feature based on SVM Selection model carries out data mining, distinguishes data sample type, filters out biological marker in the gene high from importance Object.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) present invention establishes sequencing data confluence analysis strategy, according to sequencing data feature, the polycentric original of organic combination Beginning sequencing data, and solve platform, sample, the systematical difference in experimental design, using the high confluence analysis algorithm of robustness into Row depth data excavates, and excavates to common, special, crucial large biological molecule.
Detailed description of the invention
Fig. 1 is the flow diagram of biomarker selection method of the present embodiment based on confluence analysis.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Embodiment
The biomarker selection method based on confluence analysis that the present embodiment provides a kind of, raw sequencing data is utilized FANSe serial algorithm mapping and it is quantitative after, first calculate importance of the gene in certain single data set using GWGS algorithm, Again by the multiple data sets of GWRS Algorithms Integration, importance ranking of the gene in all data sets is obtained, is arranged according to importance Sequence is that gene is gradually put into screening model by sequence, finally selects biomarker.
The present embodiment introduces high-precision sequencing analysis algorithm FANSe, FANSe algorithm and is based on Hash seed matching progress sequence Compare, can efficiently, high accurancy and precision by short reading sequence alignment into reference genome, algorithm accuracy is high, serious forgiveness pole By force, by the graceful algorithm of Smith-water to micro- insertion/micro-deleted extremely sensitive, while result has reliable experimental verification.
The present embodiment sequencing data amount is needed by a large amount of pre-processing, such as the step that mapping calculation amount is very big, And the step for precision will have a direct impact on Integrative analysis accuracy.
As shown in Figure 1, the biomarker selection method provided in this embodiment based on confluence analysis, specific steps are such as Under:
S1: raw sequencing data is chosen;
S2: raw sequencing data obtains accurate quantitative result after carrying out mapping analysis, obtains gene quantification information, if Determine gene original packet:
Raw sequencing data is the sequencing file of fastq format directly generated from sequencing machine, this document need with The reference sequences of corresponding species compare, and thus calculating in sequencing sample has what gene (qualitative part), the table of each gene Up to amount be how many (dosing section).Mapping analytical calculation process are as follows: short reading sequence is broken into several nonoverlapping seeds, Each seed degree is identical, and all seeds are matched with reference genome, are united to the seed matched according to initiation site Meter, marking, the higher ranking of score is more forward, refers to gene order according to coordination interception, and short reading sequence is referred to base with interception It because group sequence is precisely compared, is relatively given a mark according to base-base, by returning for the wherein graceful algorithm of Smith-water Mechanism of tracing back is cancelled, and acceleration purpose is had reached, and comparison result is arranged, using the short reading sequence location of the highest order in precise alignment as most Final position is set, that is, gene has been determined, completes mapping overall process.Then according to the sequence quantity on mapping, quantitative gene table Up to amount.Algorithm has robustness and serious forgiveness extremely strong by evaluation, therefore handles downloading again from different realities with this algorithm The data of platform are tested, experiment porch or different experiments bring experimental data bias can be removed or reduce;
S3: importance ranking of the gene in original packet is calculated using GWGS algorithm, then using GWRS algorithm by every group The importance of gene is integrated, and the gene importance after being integrated arranges list, and gene is sorted from high to low according to importance:
First using the GWRS algorithm as shown in formula (1) to being commented in the processed single centre sequencing data of FANSe It surveys, different numerical value is assigned according to the significance degree of expression,
Wherein, rijIndicate the rank value of the i-th gene in jth microarray, i ∈ (1, m), j ∈ (1, n), sijFor GWRS Value, to containing the gene of NA, s in microarrayijValue is also set as NA;
Confluence analysis is carried out to above-mentioned GWRS result using GWGS algorithm shown in formula (2), one group is generated and crosses in mostly The gene expression data of calculation evidence:
Wherein, ωjIndicate the weighted value of jth microarray;
S4: data mining is carried out using the Wrapper Feature Selection model based on SVM, distinguishes data sample This type filters out biomarker in the gene high from importance;
In the present embodiment, model based on support vector machines (SVM) based on establishing, at step S2, step S3 That managed sequences the genome of importance, is gradually added in circulation model, i.e., increased a gene than last time every time, and put into Into trained Wrapper Feature Selection model in advance, judge to export whether result meets optimal stabilization Whether accuracy can really separate specimen types, if reaching best stabilized accuracy, that is, jump out circulation and output reaches this As a result corresponding gene, if not up to best accuracy will be as a result, detection will be carried out persistently, gradually addition gene is until reaching Until optimum.Above step can accurately filter out both important from the gene importance list that step S2, S3 generates Property gene in the top and can accurately distinguishing sample type is as marker.
In the present embodiment, Wrapper Feature Selection model training method is Training, that is, is known Whether known sample answer, the gene for detecting investment can separate the sample of different phase, and the present embodiment is with random sampling What the mode of 1000 sample datas was groped is best suitable for the corresponding suitable parameter of the data type, i.e., related under this parameter Gene can distinguish sample and reach highest accuracy.
In the present embodiment, for improved model adapt to sequencing data, the relevant sequencing data of the present embodiment application sample into The adjustment of row model and preliminary experiment, according to data characteristics to GWRS, SVM etc. in GWGS and Wrapper feature selection Module is adjusted, while fully taking into account computational efficiency optimization, parallelization calculating and the problems such as distributed computing.
In the present embodiment, model needs to carry out appropriate adjustment according to the different of clinical sample:
1. needing to introduce FANSe serial algorithm for sequencing data to guarantee that quantitative result is sequenced, quantified in good sequencing As a result upper that screening could be unfolded;
The characteristics of 2.GWRS and GWGS have also contemplated sequencing data cannot such as only rely on and quantitatively make by means of mono- difference of P value For parameter, may need to introduce it is multiple, based on the present embodiment uses fold differences, weight of the P value as fold differences To give gene importance ranking;
3. a pair sequencing data is sampled, according to its feature, the sieve of Wrapper Feature Selection model is formulated Parameter is selected, guarantee obtains highest stable accuracy.
In the present embodiment, clinical sequencing data is screened from multiple databases, according to the step mentioned in technical solution Suddenly, first by all data through FANSe serial algorithm mapping and quantitative Treatment, after obtaining gene quantification information, with original number It is unit according to grouping, calculates importance ranking of the gene in original packet using GWGS algorithm, reapplying GWRS algorithm will be every The importance integration of group gene, one group of gene importance after being integrated arrange list.On earth according to importance height by gene Sequence, screens large biological molecule (i.e. from important gene using the Wrapper Feature Selection model based on SVM Biomarker), by the calculating and screening to this batch data, filter out common, special, crucial large biological molecule.
The present embodiment also provides a kind of biomarker selection system based on confluence analysis, comprising: raw sequencing data Module, quantitative analysis module are chosen, sorts and integrates module and data-mining module;
The raw sequencing data chooses module for choosing raw sequencing data, chooses fastq lattice from sequencing machine The sequencing file of formula;
The quantitative analysis module carries out mapping analysis using FANSe algorithm to raw sequencing data, and it is fixed to obtain gene Measure information;
The sequence integrates module for obtaining the arrangement list of gene importance, calculates gene original using GWGS algorithm Importance ranking in grouping, then integrated the importance of every group of gene using GWRS algorithm, the gene after being integrated is important Property arrangement list, gene is sorted from high to low according to importance;
The data-mining module is for filtering out biomarker, using the Wrapper Feature based on SVM Selection model carries out data mining, distinguishes data sample type, filters out biological marker in the gene high from importance Object.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims (7)

1. a kind of biomarker selection method based on confluence analysis, which is characterized in that include the following steps:
S1: raw sequencing data is chosen;
S2: raw sequencing data uses FANSe algorithm, carries out mapping analysis, obtains gene quantification information, and setting gene is former Begin to be grouped;
S3: importance ranking of the gene in original packet is calculated using GWGS algorithm, then uses GWRS algorithm by every group of gene Importance integration, after integrate gene importance arrangement list, gene is sorted from high to low according to importance;
S4: data mining is carried out using the Wrapper Feature Selection model based on SVM, distinguishes data sample class Type filters out biomarker in the gene high from importance.
2. the biomarker selection method according to claim 1 based on confluence analysis, which is characterized in that in step S1 The raw sequencing data, using the sequencing file of the fastq format generated from sequencing machine.
3. the biomarker selection method according to claim 1 based on confluence analysis, which is characterized in that in step S2 The carry out mapping analysis, specific steps are as follows:
Short reading sequence is broken into multiple nonoverlapping seeds, each seed degree is identical, by all seeds and refers to genome It is matched, statistics marking is carried out according to initiation site to the seed matched, ranked according to score height, according to coordination Interception refers to gene order, short reading sequence is compared with intercepting with reference to genome sequence, by the short reading of highest order in comparison Sequence location obtains gene quantification information as final position.
4. the biomarker selection method according to claim 1 based on confluence analysis, which is characterized in that in step S3 It is described that importance ranking of the gene in original packet is calculated using GWGS algorithm, first using GWRS algorithm to mapping points Sequencing data after analysis is evaluated and tested, and different numerical value, the tool that GWRS algorithm is evaluated and tested are assigned according to the significance degree of expression Body calculation formula are as follows:
Wherein, rijIndicate the rank value of the i-th gene in jth microarray, i ∈ (1, m), j ∈ (1, n), sijFor GWRS value, to micro- Contain the gene of NA, s in arrayijValue is also set as NA.
5. the biomarker selection method according to claim 1 based on confluence analysis, which is characterized in that in step S3 The importance of every group of gene is integrated using GWRS algorithm again, specific formula for calculation are as follows:
Wherein, ωjIndicate the weighted value of jth microarray, sijFor GWRS value.
6. the biomarker selection method according to claim 1 based on confluence analysis, which is characterized in that in step S4 It is described that data mining, specific steps are carried out using the Wrapper Feature Selection model based on SVM are as follows:
S41: Wrapper Feature Selection model, training Wrapper Feature are established based on SVM Selection model;
S42: trained Wrapper Feature Selection mould will be input to according to the good genome of importance ranking Type judges to export whether result can separate specimen types, reaches preset condition, export corresponding gene, not up to presets item Part, loop-around data mining process is carried out, gradually adds gene until reaching preset condition, the corresponding base of output final result Cause.
7. a kind of biomarker based on confluence analysis selects system characterized by comprising raw sequencing data chooses mould Module and data-mining module are integrated in block, quantitative analysis module, sequence;
The raw sequencing data chooses module for choosing raw sequencing data, chooses fastq format from sequencing machine File is sequenced;
The quantitative analysis module carries out mapping analysis using FANSe algorithm to raw sequencing data, obtains gene quantification letter Breath;
The sequence integrates module for generating the arrangement list of gene importance, calculates gene in original packet using GWGS algorithm In importance ranking, then the importance of every group of gene is integrated using GWRS algorithm, the gene importance after integrate is arranged Column list sorts gene according to importance from high to low;
The data-mining module is for filtering out biomarker, using the Wrapper Feature based on SVM Selection model carries out data mining, distinguishes data sample type, filters out biological marker in the gene high from importance Object.
CN201910409758.3A 2019-05-17 2019-05-17 Biomarker selection method and system based on integration analysis Active CN110246544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910409758.3A CN110246544B (en) 2019-05-17 2019-05-17 Biomarker selection method and system based on integration analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910409758.3A CN110246544B (en) 2019-05-17 2019-05-17 Biomarker selection method and system based on integration analysis

Publications (2)

Publication Number Publication Date
CN110246544A true CN110246544A (en) 2019-09-17
CN110246544B CN110246544B (en) 2021-03-19

Family

ID=67884226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910409758.3A Active CN110246544B (en) 2019-05-17 2019-05-17 Biomarker selection method and system based on integration analysis

Country Status (1)

Country Link
CN (1) CN110246544B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686580A (en) * 2021-01-31 2021-04-20 重庆渝高科技产业(集团)股份有限公司 Workflow definition method and system capable of customizing flow
CN114574582A (en) * 2022-03-21 2022-06-03 暨南大学 Transcriptomic standard and preparation method thereof
CN116543838A (en) * 2023-07-05 2023-08-04 苏州凌点生物技术有限公司 Data analysis method for biological gene selection expression probability

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050079524A1 (en) * 2000-01-21 2005-04-14 Shaw Sandy C. Method for identifying biomarkers using Fractal Genomics Modeling
CN104968802A (en) * 2012-11-16 2015-10-07 西门子公司 Novel miRNAs as diagnostic markers
CN105874080A (en) * 2013-09-09 2016-08-17 阿尔玛克诊断有限公司 Molecular diagnostic test for oesophageal cancer
CN105874079A (en) * 2013-09-09 2016-08-17 阿尔玛克诊断有限公司 Molecular diagnostic test for lung cancer
CN106845152A (en) * 2017-02-04 2017-06-13 北京林业大学 A kind of genome cytimidine site apparent gene type classifying method
CN109642256A (en) * 2016-07-28 2019-04-16 阿利瑟迪亚格公司 Rna editing as the biomarker tested for emotional handicap
CN109658980A (en) * 2018-03-20 2019-04-19 上海交通大学医学院附属瑞金医院 A kind of screening and application of excrement gene marker

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050079524A1 (en) * 2000-01-21 2005-04-14 Shaw Sandy C. Method for identifying biomarkers using Fractal Genomics Modeling
CN104968802A (en) * 2012-11-16 2015-10-07 西门子公司 Novel miRNAs as diagnostic markers
CN105874080A (en) * 2013-09-09 2016-08-17 阿尔玛克诊断有限公司 Molecular diagnostic test for oesophageal cancer
CN105874079A (en) * 2013-09-09 2016-08-17 阿尔玛克诊断有限公司 Molecular diagnostic test for lung cancer
CN109642256A (en) * 2016-07-28 2019-04-16 阿利瑟迪亚格公司 Rna editing as the biomarker tested for emotional handicap
CN106845152A (en) * 2017-02-04 2017-06-13 北京林业大学 A kind of genome cytimidine site apparent gene type classifying method
CN109658980A (en) * 2018-03-20 2019-04-19 上海交通大学医学院附属瑞金医院 A kind of screening and application of excrement gene marker

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686580A (en) * 2021-01-31 2021-04-20 重庆渝高科技产业(集团)股份有限公司 Workflow definition method and system capable of customizing flow
CN112686580B (en) * 2021-01-31 2023-05-16 重庆渝高科技产业(集团)股份有限公司 Workflow definition method and system capable of customizing flow
CN114574582A (en) * 2022-03-21 2022-06-03 暨南大学 Transcriptomic standard and preparation method thereof
CN116543838A (en) * 2023-07-05 2023-08-04 苏州凌点生物技术有限公司 Data analysis method for biological gene selection expression probability
CN116543838B (en) * 2023-07-05 2023-09-05 苏州凌点生物技术有限公司 Data analysis method for biological gene selection expression probability

Also Published As

Publication number Publication date
CN110246544B (en) 2021-03-19

Similar Documents

Publication Publication Date Title
Way et al. Predicting cell health phenotypes using image-based morphology profiling
CN110246544A (en) A kind of biomarker selection method and system based on confluence analysis
Duffy et al. Early phase drug discovery: cheminformatics and computational techniques in identifying lead series
CN110767266B (en) Graph convolution-based scoring function construction method facing ErbB targeted protein family
CN106021984A (en) Whole-exome sequencing data analysis system
Wu et al. Bayesian selection of nucleotide substitution models and their site assignments
Adebali et al. Phylogenetic analysis of SARS-CoV-2 genomes in Turkey
CN107066835A (en) A kind of utilization common data resource discovering and method and system and the application for integrating rectum cancer associated gene and its functional analysis
CN102884203A (en) Query sequence genotype or subtype classification method
Melquiond et al. Next challenges in protein–protein docking: from proteome to interactome and beyond
Liu et al. Strong partitioning of soil bacterial community composition and co-occurrence networks along a small-scale elevational gradient on Zijin Mountain
Yang et al. Detecting recent positive selection with a single locus test bipartitioning the coalescent tree
Liu et al. A comparison of topologically associating domain callers based on Hi-C data
CN101110095B (en) Method for batch detecting susceptibility gene of common brain disease
Zhong et al. G4Bank: a database of experimentally identified DNA G-quadruplex sequences
Lyu et al. High-resolution conodont unitary association zonations (UAZs) across the Induan-Olenekian boundary (Lower Triassic): A global correlation
CN111898807B (en) Tobacco leaf yield prediction method based on whole genome selection and application
CN110310706A (en) A kind of protein is without mark absolute quantification method
Sammeth et al. Global multiple‐sequence alignment with repeats
CN107665290A (en) A kind of method and apparatus of data processing
CN106701979A (en) Kit used for mycobacterium tuberculosis typing SNP site and application thereof
CN111243661A (en) Gene physical examination system based on gene data
CN112397140A (en) Target identification method and device based on allosteric mechanism and storage medium
Vohradsky et al. Proteome of Caulobacter crescentus cell cycle publicly accessible on SWICZ server
Imam et al. A comprehensive overview on application of bioinformatics and computational statistics in rice genomics toward an Amalgamated approach for improving acquaintance base

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant