CN102521529A - Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST) - Google Patents

Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST) Download PDF

Info

Publication number
CN102521529A
CN102521529A CN2011104102015A CN201110410201A CN102521529A CN 102521529 A CN102521529 A CN 102521529A CN 2011104102015 A CN2011104102015 A CN 2011104102015A CN 201110410201 A CN201110410201 A CN 201110410201A CN 102521529 A CN102521529 A CN 102521529A
Authority
CN
China
Prior art keywords
blast
task
mpi
thread
mpi thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011104102015A
Other languages
Chinese (zh)
Inventor
吴一雷
闫鹏程
刘充
李国锐
陈禹保
黄劲松
谢威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEJING COMPUTING CENTER
Original Assignee
BEJING COMPUTING CENTER
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEJING COMPUTING CENTER filed Critical BEJING COMPUTING CENTER
Priority to CN2011104102015A priority Critical patent/CN102521529A/en
Publication of CN102521529A publication Critical patent/CN102521529A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical fields of computer and bioinformatics, disclosing a distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST). The method comprises the following steps: S1, the program analyzes user parameters, determines MPI thread serial number and reads query sequence file; query sequences are divided based on task number, and every MPI thread reads corresponding MPI thread serial number; S2, according to the MPI thread serial number, the program judges if the present MPI thread is head node; if the present MPI thread is a head node, the program waits for communication requests of other MPI threads; if a communication request exists, the response exists and then the present task is allocated to the thread making the request; the program continuously allocates task; if the present MPI thread is not a head node, the program requests a task serial number from the head node, reads the query sequence file segment according to the task serial number and performs BLAST to obtain BLAST alignment result; the program subtracts 1 in the task serial number and requests the task serial number after performing BLAST; and S3, the program combines all BLAST alignment results. The method can reduce hardware cost of the bioinformatics research.

Description

Distributed gene order comparison method based on BLAST
Technical field
The present invention relates to computing machine and bioinformatics technique field, be specifically related to a kind of distributed gene order comparison method based on BLAST.
Background technology
In time, (Next Generation Sequencing, NGS) technology has been brought huge change to biological study to new-generation sequencing, has obtained remarkable development at aspects such as order-checking principle, details of operation, technological expansion in the past few years.With respect to traditional Sanger PCR sequencing PCR; The NGS technology platform has been avoided clone's process; Directly use joint to carry out Parallel PC R (polymerase chain reaction), sequencing reaction, so its data throughput is largely increased, can in the shorter time, checks order more DNA.For example, use the Sanger PCR sequencing PCR to draw the 1st human genome collection of illustrative plates front and back and expend 13 years and hundreds of platform sequenator altogether, and NGS can accomplish this work in the time of some months now.In addition, the cost of NGS reduces greatly, if keep present speed of development, the expense of individual gene sequencing can drop to below 1000 dollars in several years, and when the time comes, the scientific research of NGS and clinical practice environment will be further strengthened.
On announced gene database basis, carrying out functional annotation is one of basic skills to the sequencing data analysis; BLAST (Basic Local Alignment Search Tool wherein; Basic local comparison research tool) [1] software portfolio is the sequence similarity search program by NCBI (National Center for Biotechnology Information) issue, is the most frequently used functional annotation software of increasing income of present academia.Different with accurate matching algorithm is that BLAST adopts seed-and-extend approximate match technology to come similar section between the quick search sequence.In addition, BLAST can (Symmetrical Multiprocessor SMP) moves through multithreading on the machine of structure, to improve counting yield at many symmetrical treatment device.
In general, the NGS data are made up of millions of short sequence DNA sections of reading, and have characteristics such as scale is big, data volume height; The normal high-performance computer cluster that adopts carries out the note analysis to the NGS data in the bioinformatics research; Although BLAST has realized multithreading, but still just to unit operation, and many restrictions are arranged on scale; For example for the SMP machine that surpasses 4 nuclears, processor resource can't be fully used.In order to adapt to the bioinformatic data amount that is exponential increase; Further improve the operational efficiency of BLAST, quicken the process of bioinformatic analysis and research, the research staff has developed multiple parallel BLAST version towards the cluster application environment; MpiBlast [2] for example, pBlast [3] etc.Though these concurrent softwares have strengthened the scalability of analytical algorithm; Can extend to hundreds of easily even thousands of processors are carried out simultaneously; Yet but there are some common shortcomings in they: 1) not all parallel version can both produce the result [4] consistent with NCBI Blast unit operation, and this is to cause owing to having used different database cuttings or result to merge method; 2) in traditional high-performance calculation; Usually adopt and share storage system, that is to say that database, Blast binary file, sequential file, intermediate result all are to leave on the same physical store, though more convenient from the angle of system maintenance; But when degree of parallelism is high; The polymerization IO of all nodes is to Internet resources expense very big [5], and with the overall execution efficient that has a strong impact on whole software, so the IO bandwidth often becomes the bottleneck place of multisequencing compare of analysis; 3) these softwares all need use the degree of coupling high high-performance computer cluster and High Performance Cache and Memory System, and hardware cost is expensive.
Above-cited list of references is following:
[1]S.F.Altschul,W.Gish,W.Miller,E.W.Myers,and?D.J.Lipman,″Basic?Local?Alignment?Search?Tool,″Journal?of?Molecular?Biology,No215,pp.403-410,1990.
[2]A.E.Darling,L.Carey,and?W.C.Feng,″The?design,implementation,and?evaluation?of?mpiBlast,″In?proceedings?of?4th?International?Conference?on?Linux?Clusters:The?HPC?revolution?2003,2003.
[3]D.R.Mathog,″Parallel?Blast?on?Split?Databases,″in?Bioinformatics?Applications?Note,vol.19,no.14,pp.1865-1866,2003.
[4]dBlast,http://www.cmbi.kun.nl/software/dBlast.
[5]M.C.Schatz,″Blast?Reduce:High?Performance?Short?Read?Mapping?with?MapReduce,″2008.
Summary of the invention
The technical matters that (one) will solve
The technical matters that the present invention will solve is: how to design a kind of NCBI BLAST based on traditional unit operation and develop distributed gene order comparison method; Make that on the one hand parallel note analysis result and unit operation result are in full accord; Make the IO bandwidth that adds up of total system be improved on the other hand; The bandwidth of sharing the formula storage networking that surpasses far away alleviates the IO ink-bottle effect that the gene function note is analyzed.
(2) technical scheme
For solving the problems of the technologies described above, the invention provides a kind of distributed gene order comparison method based on BLAST, may further comprise the steps:
S1, program are resolved customer parameter, and definite MPI number of threads, read the search sequence file, cut apart the search sequence file according to the task number and obtain the search sequence file fragment, and each MPI thread reads MPI thread sequence number separately respectively then;
S2, judge according to said MPI thread sequence number whether current MPI thread is head node; If current MPI thread is a head node; Then wait for the communication request of other MPI thread; If communication request is arranged then respond the communication request of other MPI thread, and current task is distributed to the thread that proposes communication request, task number is subtracted 1; Continue allocating task, finish, then finish current MPI thread up to all Task Distribution; If current MPI thread is not a head node, then, read the search sequence file fragment according to task number earlier to head node request task number; And to database execution BLAST; Obtain the BLAST comparison result, then task number is subtracted 1, execution BLAST finishes and asks task number again; Continue to carry out BLAST, until last task;
S3, merge all BLAST comparison results.
Preferably, when cutting apart the search sequence file according to the task number among the step S1, be with respect to the more piece of node number with the search sequence file division.
Preferably, among the step S2,, then be computing node if the MPI thread is not a head node, resulting Blast intermediate result all is stored in the local storage of said computing node when said database and execution BLAST.
Preferably, said task number is more than or equal to the MPI number of threads.
Preferably, said search sequence file is the FASTA form.
Preferably, said database is a gene database.
(3) beneficial effect
The NCBI BLAST that the present invention is based on traditional unit operation has developed a kind of distributed gene order comparison method (being also referred to as annotate method); It is on the one hand by means of the original program of BLAST; Make that parallel note analysis result and unit operation result are in full accord; On the other hand, through the distributed storage of gene database and search sequence file division are handled, make the IO bandwidth that adds up of the total system in the sequence alignment method improve greatly; The bandwidth of sharing the formula storage networking that surpasses far away has alleviated the IO ink-bottle effect that the gene function note is analyzed.Further; Because the distributed algorithm among the present invention is applicable to the storage and the computer system of cheapness; Therefore can use the network topology structure based on common server of loose coupling, high isomery to replace the high-performance computer group system; Make the analysis of gene function note on the common PC cluster, to move, thereby reduced the hardware cost of bioinformatics research.In addition, in the search sequence dividing processing of Blast, introduce the method for load balancing, further improved resource utilization, quickened the holistic approach execution speed.
Description of drawings
Fig. 1 is a method flow diagram of the present invention;
Fig. 2 opens up complement for distributed BLAST network.
Embodiment
Below in conjunction with accompanying drawing and embodiment, specific embodiments of the invention describes in further detail.Following examples are used to explain the present invention, but are not used for limiting scope of the present invention.
Method flow diagram of the present invention is as shown in Figure 1, may further comprise the steps:
S1, at first program is resolved customer parameter; And definite MPI (Message Passing Interface; Message passing interface) number of threads; Read search sequence file (FASTA form) and cut apart search sequence file (the task number is more than or equal to the MPI number of threads) according to task (being the search sequence file) number and obtain the search sequence file fragment, each MPI thread reads the MPI thread sequence number of oneself respectively then; Said customer parameter mainly is meant the BLAST parameter, and BLAST is an open source software, and its parameter can obtain at the NCBI query site.Here customer parameter is resolved and be meant that program is resolved user input parameters, and these parameters are passed to BLAST.
S2, judge according to said MPI thread sequence number whether current MPI thread is head node; If the MPI thread is head node (MPI_RANK==1); Then wait for the communication request of other MPI thread; If have then respond the communication request of other MPI thread and current task is distributed to this thread, task number subtracts 1; Continue allocating task, finish, then finish current MPI thread up to all Task Distribution; If not being head node, current MPI thread (not that is to say; Be computing node), then first node request task number headward reads the search sequence file fragment according to task number; And to gene database execution BLAST; Obtain the BLAST comparison result, then task number is subtracted 1, execution BLAST finishes and asks task number again; Continue to carry out BLAST, until last task; Wherein, said database and when carrying out BLAST resulting BLAST intermediate result all be stored in the local storage of said computing node.Used load-balancing algorithm in this step.
S3, merge all BLAST comparison results.The merging here is meant that simple text merges, because the comparison result of all nodes all is the destination file that generates text formatting, last the long and is exactly to merge all texts according to processing sequence simply.For example, in the linux system, through several errorlevels, cat etc. can realize.
In the method for the invention, adopt following two kinds of methods to improve program run efficient, improved existing bottleneck and shortcoming in other parallel versions:
1. for the sequence alignment task of big data quantity, high concurrency, in the architectural framework of sharing storage, IO is the maximum bottleneck place of total system all the time.Especially for Blast, the generation of the visit of lot of data storehouse, search sequence access and intermediate result all need take memory bandwidth.Yet; Even for the high-performance magnetism disk array, HDS 3080 systems that for example adopted in the present invention's test, the about 2GB/s of its maximum memory access bandwidth (peak value); If use the degree of parallelism of 200 threads; If suppose to adopt the IB network, band width in physical is enough, and the memory bandwidth of so average each thread has only 10MB/s.From the angle of unit storage, such memory access performance is very low, even common PC hard disk, memory access speed also can reach average 70-80MB/s.Consider that present user class hard disk price is extremely cheap; The storage space of single hard disk can reach 2TB; And the public gene database take up space that NCBI uses always is below 200GB; Therefore the present invention is positioned over database, Blast intermediate result in the local storage of computing node fully, and each computing node all keeps a complete copy.On the whole, all in the program operation process add up memory bandwidth just can be considerably beyond existing shared memory bandwidth.Distributed storage network of the present invention is opened up and is mended (among Fig. 2, gene database, Blast intermediate result distributed store are under the same directory structure of the local storage on each node) as shown in Figure 2.In addition, because this scheme is applicable to the architectural framework of common server+domestic consumer's level hard disk, and also suitable with the property retention of adopting the high-performance computer cluster; Therefore; Can cut operating costs greatly, reduce the hardware input, make bioinformatics research threshold reduce.
In order to make procedural application in the computer body based environment of isomery, the architectural framework of for example above-mentioned common server+domestic consumer's level hard disk adopts adaptive job assignment algorithm.With respect to the node number, task (search sequence) is divided into the thinner fritter of granularity, improve the resource utilization of each node again through load balancing.For example under the parallel condition of 100 nodes; With task division is 200 parts, moves 100 parts of tasks at first simultaneously, for the thread of FEFO; Then continue to distribute more task, can solve like this that single distributes whole task and the inconsistent problem of concluding time that causes.
Search sequence in the inventive method is cut apart, job scheduling and distribution, thread communication, job result assembling section, can adopt various program languages (like the C language) to realize, the sequence alignment part of concrete single-threaded operation is then directly called Blast and realized.Executive routine of the present invention is tested totally 10 nodes, each node 4 nuclear 4G internal memory in distributed PC cluster; Adopt the distributed storage scheme; (version: 2010-10-09) compare in the storehouse, and 100,000 of test sample book sequences are divided into 40 tasks to NCBI NT.Simultaneously, also traditional parallel BLAST method is moved test in high-performance computing environment, distribute 40 nuclears, the 80G internal memory adopts and shares storage scheme altogether, realizes based on HNAS 3080.The result shows that when this task of high performance environments completion needed 893 minutes machine, average monokaryon spent 22.32 minutes; When this task of common server environment completion has then been used 871 minutes machine, average monokaryon cost 21.78 minutes.This shows that when obtaining similar calculated performance, the technical scheme that the present invention adopted greatly reduces operating cost, has high economic benefit.
The above only is an embodiment of the present invention; Should be pointed out that for those skilled in the art, under the prerequisite that does not break away from know-why of the present invention; Can also make some improvement and modification, these improve and modification also should be regarded as protection scope of the present invention.

Claims (6)

1. the distributed gene order comparison method based on BLAST is characterized in that, may further comprise the steps:
S1, program are resolved customer parameter, and definite MPI number of threads, read the search sequence file, cut apart the search sequence file according to the task number and obtain the search sequence file fragment, and each MPI thread reads MPI thread sequence number separately respectively then;
S2, judge according to said MPI thread sequence number whether current MPI thread is head node; If current MPI thread is a head node; Then wait for the communication request of other MPI thread; If communication request is arranged then respond the communication request of other MPI thread, and current task is distributed to the thread that proposes communication request, task number is subtracted 1; Continue allocating task, finish, then finish current MPI thread up to all Task Distribution; If current MPI thread is not a head node, then, read the search sequence file fragment according to task number earlier to head node request task number; And to database execution BLAST; Obtain the BLAST comparison result, then task number is subtracted 1, execution BLAST finishes and asks task number again; Continue to carry out BLAST, until last task;
S3, merge all BLAST comparison results.
2. the method for claim 1 is characterized in that, when cutting apart the search sequence file according to the task number among the step S1, is with respect to the more piece of node number with the search sequence file division.
3. the method for claim 1; It is characterized in that, among the step S2, if the MPI thread is not a head node; Then be computing node, resulting Blast intermediate result all is stored in the local storage of said computing node when said database and execution BLAST.
4. the method for claim 1 is characterized in that, said task number is more than or equal to the MPI number of threads.
5. the method for claim 1 is characterized in that, said search sequence file is the FASTA form.
6. like each described method in the claim 1~5, it is characterized in that said database is a gene database.
CN2011104102015A 2011-12-09 2011-12-09 Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST) Pending CN102521529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104102015A CN102521529A (en) 2011-12-09 2011-12-09 Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104102015A CN102521529A (en) 2011-12-09 2011-12-09 Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST)

Publications (1)

Publication Number Publication Date
CN102521529A true CN102521529A (en) 2012-06-27

Family

ID=46292440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104102015A Pending CN102521529A (en) 2011-12-09 2011-12-09 Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST)

Country Status (1)

Country Link
CN (1) CN102521529A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317942A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Massive data comparison method and system based on hadoop cloud platform
CN104462211A (en) * 2014-11-04 2015-03-25 北京诺禾致源生物信息科技有限公司 Re-sequencing data processing method and processing device
CN104854617A (en) * 2012-07-06 2015-08-19 河谷控股Ip有限责任公司 Healthcare analysis stream management
CN106484881A (en) * 2016-10-14 2017-03-08 北京百度网讯科技有限公司 Document handling method and device
CN107403076A (en) * 2016-05-18 2017-11-28 华为技术有限公司 The processing method and equipment of DNA sequence dna
CN107526943A (en) * 2016-06-22 2017-12-29 宁波数方信息技术有限公司 A kind of gene comparison method that distributed concurrent is coupled based on interior external memory
CN107798216A (en) * 2016-09-07 2018-03-13 中央研究院 The comparison method of high similitude sequence is carried out using divide and conquer
CN108537006A (en) * 2018-04-03 2018-09-14 郑州云海信息技术有限公司 A kind of gene sequence data processing method, apparatus and system
CN113241121A (en) * 2021-04-26 2021-08-10 哈尔滨理工大学 Gene sequence precise matching method based on MPI
CN117275583A (en) * 2023-09-27 2023-12-22 四川大学 Quantum technology-based gene search BLAST acceleration method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030149691A1 (en) * 2002-02-06 2003-08-07 Davin Potts Distributed blast processing architecture and associated systems and methods
US6636849B1 (en) * 1999-11-23 2003-10-21 Genmetrics, Inc. Data search employing metric spaces, multigrid indexes, and B-grid trees
US20050221353A1 (en) * 2004-03-30 2005-10-06 Hitachi Software Engineering Co., Ltd. Data processing and display method for gene expression analysis system and gene expression analysis system
CN101149743A (en) * 2007-11-09 2008-03-26 中国水产科学研究院黑龙江水产研究所 DNA sequencing polluted sequence batch treating tool
CN101158952A (en) * 2007-11-22 2008-04-09 中国人民解放军国防科学技术大学 Biological sequence data-base searching multilayered accelerating method based on flow process
WO2008090336A2 (en) * 2007-01-24 2008-07-31 Inventanet Ltd Method and system for searching for patterns in data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6636849B1 (en) * 1999-11-23 2003-10-21 Genmetrics, Inc. Data search employing metric spaces, multigrid indexes, and B-grid trees
US20030149691A1 (en) * 2002-02-06 2003-08-07 Davin Potts Distributed blast processing architecture and associated systems and methods
US20050221353A1 (en) * 2004-03-30 2005-10-06 Hitachi Software Engineering Co., Ltd. Data processing and display method for gene expression analysis system and gene expression analysis system
WO2008090336A2 (en) * 2007-01-24 2008-07-31 Inventanet Ltd Method and system for searching for patterns in data
CN101149743A (en) * 2007-11-09 2008-03-26 中国水产科学研究院黑龙江水产研究所 DNA sequencing polluted sequence batch treating tool
CN101158952A (en) * 2007-11-22 2008-04-09 中国人民解放军国防科学技术大学 Biological sequence data-base searching multilayered accelerating method based on flow process

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
AARON E.DARLING,ET AL: "The Design,Implementation, and Evaluation of mpiBLAST", 《CLUSTERWORLD CONFERENCE & EXPO AND THE 4TH INTERNATIONAL CONFERENCE ON LINUX CLUSTERS:THE HPC REVOLUTION 2003》 *
DAVID R.MATHOG: "Parallel BLAST on split databases", 《BIOINFORMATICS APPLICATIONS NOTE》 *
HESHAN LIN,ET AL: "Efficient Data Access for Parallel BLAST", 《PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2005. PROCEEDINGS. 19TH IEEE INTERNATIONAL》 *
R.C,BRAUN,ET AL: "Parallelization of local BLAST service on workstation clusters", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
ROGERIO,ET AL: "Database allocation strategies for parallel BLAST evaluation on Clusters", 《DISTRIBUTED AND PARALLEL DATABASES》 *
STEPHEN F.ALTSCHUL,ET AL: "Basic Local Alignment Search Tool", 《JOURNAL OF MOLECULAR BIOLOGY》 *
STEPHEN PELLICER,ET AL: "Distributed Sequence Alignment Applications for the Public Computing Architecture", 《IEEE TRANSACTION ON NANOBIOSCIENCE》 *
李俊照等: "基于Linux集群的并行计算", 《计算机测量与控制》 *
李晓梅等: "《并行与分布计算技术丛书》", 30 April 2001, 国防工业出版社 *
闫蓉: "基于并行计算负载均衡算法的研究", <中国地质大学(北京)硕士论文> *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10055546B2 (en) 2012-07-06 2018-08-21 Nant Holdings Ip, Llc Healthcare analysis stream management
CN104854617A (en) * 2012-07-06 2015-08-19 河谷控股Ip有限责任公司 Healthcare analysis stream management
CN110491449B (en) * 2012-07-06 2023-08-08 河谷控股Ip有限责任公司 Management of healthcare analytic flows
US10957429B2 (en) 2012-07-06 2021-03-23 Nant Holdings Ip, Llc Healthcare analysis stream management
US9953137B2 (en) 2012-07-06 2018-04-24 Nant Holdings Ip, Llc Healthcare analysis stream management
CN110491449A (en) * 2012-07-06 2019-11-22 河谷控股Ip有限责任公司 The management of health care analysis stream
US10580523B2 (en) 2012-07-06 2020-03-03 Nant Holdings Ip, Llc Healthcare analysis stream management
US10095835B2 (en) 2012-07-06 2018-10-09 Nant Holdings Ip, Llc Healthcare analysis stream management
CN104854617B (en) * 2012-07-06 2019-09-17 河谷控股Ip有限责任公司 The management of health care analysis stream
CN104317942A (en) * 2014-10-31 2015-01-28 北京思特奇信息技术股份有限公司 Massive data comparison method and system based on hadoop cloud platform
CN104462211A (en) * 2014-11-04 2015-03-25 北京诺禾致源生物信息科技有限公司 Re-sequencing data processing method and processing device
CN104462211B (en) * 2014-11-04 2018-01-02 北京诺禾致源科技股份有限公司 The processing method and processing unit of weight sequencing data
CN107403076A (en) * 2016-05-18 2017-11-28 华为技术有限公司 The processing method and equipment of DNA sequence dna
CN107526943A (en) * 2016-06-22 2017-12-29 宁波数方信息技术有限公司 A kind of gene comparison method that distributed concurrent is coupled based on interior external memory
CN107798216A (en) * 2016-09-07 2018-03-13 中央研究院 The comparison method of high similitude sequence is carried out using divide and conquer
CN107798216B (en) * 2016-09-07 2021-06-04 中央研究院 Method for comparing high-similarity sequences by adopting divide-and-conquer method
CN106484881A (en) * 2016-10-14 2017-03-08 北京百度网讯科技有限公司 Document handling method and device
CN108537006A (en) * 2018-04-03 2018-09-14 郑州云海信息技术有限公司 A kind of gene sequence data processing method, apparatus and system
CN113241121A (en) * 2021-04-26 2021-08-10 哈尔滨理工大学 Gene sequence precise matching method based on MPI
CN117275583A (en) * 2023-09-27 2023-12-22 四川大学 Quantum technology-based gene search BLAST acceleration method and system
CN117275583B (en) * 2023-09-27 2024-04-16 四川大学 Quantum technology-based gene search BLAST acceleration method and system

Similar Documents

Publication Publication Date Title
CN102521529A (en) Distributed gene sequence alignment method based on Basic Local Alignment Search Tool (BLAST)
Raicu et al. Many-task computing for grids and supercomputers
Tang et al. A MapReduce task scheduling algorithm for deadline constraints
Lin et al. Coordinating computation and I/O in massively parallel sequence search
Xie et al. Improving mapreduce performance through data placement in heterogeneous hadoop clusters
US10381106B2 (en) Efficient genomic read alignment in an in-memory database
EP2759953B1 (en) System and method for genomic data processing with an in-memory database system and real-time analysis
US11031097B2 (en) System for genomic data processing with an in-memory database system and real-time analysis
Ocaña et al. Parallel computing in genomic research: advances and applications
EP2759952A1 (en) Efficient genomic read alignment in an in-memory database
Senthilkumar et al. A survey on job scheduling in big data
Gupta et al. Accelerating molecular sequence analysis using distributed computing environment
Roy et al. Massively parallel processing of whole genome sequence data: an in-depth performance study
Daily et al. A work stealing based approach for enabling scalable optimal sequence homology detection
Deng et al. HiGene: A high-performance platform for genomic data analysis
Liu et al. Load balancing in MapReduce environments for data intensive applications
Huang et al. Performance evaluation of enabling logistic regression for big data with R
Mazur et al. Towards scalable one-pass analytics using mapreduce
Boulund et al. Tentacle: distributed quantification of genes in metagenomes
Oh et al. Clustom-cloud: In-memory data grid-based software for clustering 16s rrna sequence data in the cloud environment
Cohen et al. High-performance statistical modeling
Becker et al. Memory-driven computing accelerates genomic data processing
Chen et al. pmTM-align: scalable pairwise and multiple structure alignment with Apache Spark and OpenMP
Nunes et al. Towards Analyzing Computational Costs of Spark for SARS-CoV-2 Sequences Comparisons on a Commercial Cloud
Bongo et al. Data-intensive computing infrastructure systems for unmodified biological data analysis pipelines

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120627