CN110990063B - Accelerating device and method for gene similarity analysis and computer equipment - Google Patents

Accelerating device and method for gene similarity analysis and computer equipment Download PDF

Info

Publication number
CN110990063B
CN110990063B CN201911191604.8A CN201911191604A CN110990063B CN 110990063 B CN110990063 B CN 110990063B CN 201911191604 A CN201911191604 A CN 201911191604A CN 110990063 B CN110990063 B CN 110990063B
Authority
CN
China
Prior art keywords
gene
task
tasks
sequence
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911191604.8A
Other languages
Chinese (zh)
Other versions
CN110990063A (en
Inventor
陈灿
臧大伟
沈华
谭光明
孙凝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201911191604.8A priority Critical patent/CN110990063B/en
Publication of CN110990063A publication Critical patent/CN110990063A/en
Application granted granted Critical
Publication of CN110990063B publication Critical patent/CN110990063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The embodiment of the invention provides an accelerating device, a method and computer equipment for gene similarity analysis, wherein the accelerating device comprises a high-speed communication interface, a data processing interface and a data processing interface, wherein the high-speed communication interface is used for communicating with a host and receiving tasks to be accelerated and processed distributed by the host; the sequence cache module is used for caching one or more tasks from the host, and each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis; the array processor is provided with a processing unit for processing tasks, a complete pipeline for processing the tasks based on a data-driven streaming computing mode is arranged in the processing unit, and a plurality of fixed-point computing components required by the processing tasks are arranged in the pipeline; the control module is configured to be used for distributing the tasks to be processed in the sequence cache module to the processing units; and the task cache module is provided with a task cache unit and is used for caching the to-be-processed tasks distributed to the processing unit. The invention can improve the efficiency of gene similarity analysis and quickly obtain the analysis result.

Description

Accelerating device and method for gene similarity analysis and computer equipment
Technical Field
The invention relates to the technical field of biological gene data processing, in particular to an acceleration structure oriented to a graphical gene similarity algorithm, and more particularly relates to an acceleration device, a method and computer equipment for gene similarity analysis.
Background
With the completion of the human genome project, human knowledge and mastery of genetic information has made unprecedented progress. Meanwhile, with the continuous development and improvement of molecular level gene detection technology, gene sequencing technology is developed rapidly, high-throughput and low-cost sequencing technology is widely applied, and a large amount of gene and protein data of different species are accumulated. In the face of the explosive growth of the data volume of gene sequences, how to analyze and interpret the useful information contained in the gene sequences becomes the key to the current biological research. The similarity analysis of gene sequences is one of the key techniques in bioinformatics, and is the most basic method for understanding the structural function of gene sequences and bioinformatics. For gene sequence data of an unknown organism, if the gene sequence data can be proved to be connected with certain known sequences, the species and the characters of the organism can be deduced to a certain extent, and the gene sequence data have significance for biological and medical research. In the face of massive gene sequence data, how to improve the speed and energy efficiency of similarity analysis is particularly important.
Gene sequence similarity analysis algorithms typically include alignment (alignment) and non-alignment (alignment-free) algorithms. The comparison algorithm is accurate in calculation, but the calculation complexity is high, the speed is low, a large amount of calculation resources need to be consumed, and the comparison algorithm cannot meet the requirement of gene similarity analysis along with the rapid increase of gene data. One advantage of the non-aligned algorithm is that it avoids the selection of complete genomic sequences of multiple genes for analysis, and secondly it is computationally inexpensive and time consuming. In recent years, non-alignment algorithms have been developed rapidly in the chemical and industrial fields, and generally include statistical methods and graphic representations, etc., and the k-words based method is a classical statistical method, but the statistical method ignores the chemical structure and characteristics of biomolecules. The main process of graph-based representation is to map sequences into graphs so that complex relationships of biological sequences can be visualized and then the visualized graphs can be described using numerical features.
If a general processor is adopted to realize a non-alignment algorithm, the processing time is too long, and the requirement of rapidly carrying out similarity analysis on millions of base sequences is difficult to meet, so that a new calculation structure is urgently needed to accelerate the similarity analysis of a large number of base sequences. In particular, although the illustrated gene similarity analysis algorithm has great advantages in applicability and accuracy, the enormous computational demands thereof present challenges to the design of computing systems in the face of the vast amount of gene sequence data. For example, the current throughput of RNA sequence alignment of a general-purpose processor is ten thousand bases per second, if the present general-purpose processor is used for sequence alignment of an RNA virus, hundreds of thousands of virus sequences are usually required to be aligned each time, and the average length of the virus is 1000 bases, and it takes ten hours to obtain the alignment result of a virus, which has a great influence on subsequent preparation of measures such as pharmacy, and particularly, in response to a large-scale infection accident caused by some viruses or bacteria, it is necessary to take a second to prepare measures to prevent the uncontrolled diffusion of the viruses or bacteria, while the existing technology cannot meet the requirements in terms of performance.
The main reason for the above problem is the mismatch of the computational structure of the general purpose processor and the algorithmic features of the non-aligned algorithm. Firstly, the non-comparison algorithm needs a large number of 2-bit, 8-bit and 32-bit integer multiply-add operations, and 64-bit calculation components configured in the general-purpose processor have a large number of bits, but the number of the calculation components is small, so that the requirement of the algorithm on the number of integer calculation units cannot be met, and a large amount of calculation resources are wasted; secondly, due to the independence among gene sequences, a non-alignment algorithm has high concurrency, the same sequence can be simultaneously aligned with a plurality of other sequences, but the concurrency of a general processor is low, so that the requirement of the non-alignment algorithm cannot be met; thirdly, the general processor is provided with a Cache mechanism and branch control logic, the Cache mechanism is used for reducing the access delay of the Memory and improving the response speed of the general processor, and the Cache mechanism is used for storing the data to be accessed by the CPU in a short time by using a temporary Memory arranged between the CPU and the Memory as a CPU Cache (Cache Memory), so that the reading speed is increased and the delay of the CPU for accessing the data is reduced. The capacity of the temporary storage is smaller than that of the memory but the exchange speed is faster, the data in the CPU Cache is a small part of the memory, but the small part is to be accessed by the CPU in a short time, when the CPU calls a large amount of data, the memory can be avoided from being directly called from the Cache, so that the reading speed is accelerated, the cached data is usually hot data, namely data which can be frequently accessed by the CPU, but for the similarity analysis of gene sequences, after the similarity analysis of the gene sequence data in a task is completed, the gene sequence data in the task is difficult to reuse, and the Cache mechanism has no great effect on the speed improvement of the similarity analysis of the gene sequence data. The branch control logic is used for supporting the pipeline technology of a general-purpose processor and accelerating the execution speed of a program, and for the branch control logic, the processor improves the performance through the pipeline technology, and the pipeline requires that specific instructions to be executed next are known in advance so as to keep the pipeline full of instructions to be executed. When a branch statement/conditional jump is encountered in a program, a problem arises in that the processor does not determine what the next instruction is, and then "branch control logic" is required to determine which instruction enters the pipeline. If a branch is predicted to be taken into the pipeline and is later found to be a wrong branch, the processor may back-off the mispredicted execution and fill the pipeline with the correct instructions. Such a false prediction can waste significantly clock cycles, resulting in degraded program performance. The branch prediction is more scheduled at an instruction level, and as the non-comparison algorithm for calculating the gene similarity has high fluidity and concurrency, the processing flow is completely fixed and predictable, and can be processed in a streaming manner, as long as data comes, a processing unit can know what operation to do, and the support of a complex Cache mechanism and branch control logic is not needed.
Therefore, the limited number of computing cores in the general processor limits the improvement of the concurrency of the algorithm, so that the complex Cache mechanism and branch control logic in the general processor cannot exert corresponding effects, a large amount of silicon chip area is consumed, and the energy consumption is high. Therefore, there is a need for improvements in the prior art.
Disclosure of Invention
It is therefore an object of the present invention to overcome the above-mentioned drawbacks of the prior art and to provide an accelerating apparatus, method and computer device for gene similarity analysis.
According to a first aspect of the present invention, there is provided an acceleration apparatus for gene similarity analysis, comprising: the high-speed communication interface is used for communicating with the host and receiving tasks to be accelerated and processed distributed by the host; the sequence cache module is used for caching one or more tasks from the host, and each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis; the array processor is provided with at least one processing unit for processing tasks, a complete pipeline for processing the tasks based on a data-driven streaming computing mode is arranged in each processing unit, and a plurality of fixed-point computing components required by the processing tasks are configured in the pipeline; the control module is configured to be used for distributing the tasks to be processed in the sequence cache module to the processing units; the task cache module is provided with a task cache unit and is used for caching the tasks to be processed distributed to the processing unit; and/or the result caching module is connected to the control module and the array processor through the on-chip network and is used for caching analysis results of the gene similarity analysis obtained by processing tasks by the array processor. The sequence cache module and/or the control module are/is connected to an array processor through an on-chip network, and a plurality of processing units in the array processor are connected with each other in a Mesh structure through the on-chip network. The sequence cache module, the task cache unit and the result cache module all adopt a program visible memory, wherein the program visible memory refers to a memory which is uniformly addressed with a main memory.
Preferably, the processing unit is configured as a multi-stage pipeline structure, the multi-stage pipeline structure is used for executing tasks in parallel, and fixed point calculation components with low bit width are arranged in the multi-stage pipeline structure; the number of the fixed point computing units with low bit width in the multistage pipeline structure is more than that of the fixed point computing units in the processor of the host. The multistage pipeline structure at least comprises a first-stage pipeline, a second-stage pipeline and a third-stage pipeline; the first-stage production line is provided with a plurality of sequence preprocessing modules for preprocessing gene sequence data in a task to obtain decimal sequence data of each gene sequence represented by fewer digits; the second-level pipeline is provided with a plurality of cross-correlation calculation modules for calculating the cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data obtained by the first-level pipeline processing; and the three-stage pipeline is provided with a plurality of Euclidean distance calculation modules for calculating the Euclidean distance between every two cross-correlation vectors based on all the cross-correlation vectors calculated by the two-stage pipeline.
Preferably, the array processor comprises a plurality of processing units for processing tasks in parallel, the task cache module comprises a plurality of task cache units, and each processing unit is independently provided with a task cache unit exclusive to the processing unit; the control module is configured to monitor the condition of processing tasks of each processing unit, and when the current task of the corresponding processing unit is monitored to be in a state of being completed to be executed, the task to be processed in the sequence cache module is distributed to the processing unit and is sent to the task cache unit exclusive to the processing unit in advance. The control module is configured to randomly select one or more to-be-processed tasks from all the to-be-processed tasks remaining in the sequence cache module to be allocated to the corresponding processing unit when the current task of the processing unit is monitored to be in a state of being completed to be executed.
Preferably, the control module is configured to monitor the task execution condition of each processing unit by analyzing the amount of data to be read in the task cache unit dedicated to each processing unit, wherein when the percentage of the data to be read in the corresponding task cache unit in the task cache unit to the capacity of the task cache unit is reduced to a preset threshold, it is determined that the current task of the processing unit to which the control module belongs is in a state to be executed completely. The value range of the preset threshold is 10-30%.
According to a second aspect of the present invention, there is provided an acceleration method based on the acceleration device for gene similarity analysis according to the first aspect, comprising: s100, receiving tasks to be accelerated and distributed by a host, wherein each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis; s200, processing tasks based on a data-driven streaming computing mode, adopting a plurality of mutually independent complete pipelines to process the tasks in parallel, and adopting a plurality of fixed point computing components with low bit width to accelerate the tasks in each pipeline, wherein each pipeline comprises the following steps: s210, preprocessing the gene sequence data in the task to obtain decimal sequence data of each gene sequence represented by fewer digits; s220, calculating a cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data; and/or S230, calculating Euclidean distance between every two cross-correlation vectors based on all the cross-correlation vectors.
According to a third aspect of the present invention, there is provided a computer apparatus for gene similarity analysis, comprising: a memory; a host; and/or an acceleration device as described in the first aspect.
Compared with the prior art, the invention has the advantages that: each processing unit of the invention is provided with a complete pipeline based on a data-driven stream type calculation mode processing task, can process data in a stream type, is provided with a plurality of fixed-point calculation components required by the processing task, can improve the efficiency of gene similarity analysis, and can carry out accelerated analysis on tasks distributed by a host to obtain analysis results, moreover, the invention can distribute the tasks to be processed according to the task execution situation of each processing unit in a small amount and continuously, so that each processing unit is in a working state as far as possible before all tasks are completed, the efficiency of gene similarity analysis is further improved, and particularly, under the condition that a countermeasure needs to be made in minutes and seconds to prevent the virus or bacteria from being out of control to spread in case of sudden large-scale infection events caused by the virus or bacteria, the analysis results of the gene similarity can be obtained in an accelerated manner, so as to quickly provide technical reference for the subsequent treatment measures (such as pharmacy, epidemic prevention measures and the like).
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of an accelerating apparatus for gene similarity analysis according to an embodiment of the present invention;
FIG. 2 is a pipeline configuration of a processing unit of the acceleration apparatus for gene similarity analysis according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of a selection process of five-dimensional cross-correlation vectors in an acceleration apparatus for gene similarity analysis according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a process of calculating a cross-correlation vector of a certain dimension of an acceleration apparatus for gene similarity analysis according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a process for calculating cross-correlation vectors in an acceleration apparatus for gene similarity analysis according to an example of the present invention;
FIG. 6 is a schematic diagram illustrating a procedure for calculating Euclidean distance in an acceleration apparatus for gene similarity analysis according to an example of the present invention;
FIG. 7 is a flow chart illustrating an accelerated method for gene similarity analysis according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As mentioned in the background section, the complex Cache mechanism and branch control logic in the general-purpose processor cannot exert corresponding effects on the non-alignment algorithm used in the present invention, and the limited number of computational cores in the general-purpose processor limits the improvement of the concurrency of the algorithm. After the similarity of the corresponding gene sequences is calculated, the probability that the gene sequence data is subsequently used by the processor again is low, and a Cache mechanism is not needed. The branch prediction is more scheduled at an instruction level, and as the flow of the non-comparison algorithm is completely fixed and can be processed in a streaming manner, as long as sequence data comes, a processing unit knows what operation needs to be performed, and the processing unit does not need to use the branch prediction and directly adopts a pipeline structure. According to the invention, a complete pipeline for processing tasks based on a data-driven streaming computing mode is arranged in the processing unit, and a plurality of fixed-point computing components required by the processing tasks are configured in the pipeline structure, so that the efficiency of gene similarity analysis can be improved, and the tasks distributed by the host computer are subjected to accelerated analysis to obtain analysis results.
Before describing embodiments of the present invention in detail, some of the terms used therein will be explained as follows:
a network-on-chip (NoC) is a communication platform that provides a communication network for a plurality of nodes of a system-on-chip (SoC). A plurality of nodes in the network can simultaneously utilize different physical links in the network to exchange information, and support a plurality of IP cores to concurrently carry out data communication. Network on chip (NoC) technology has higher bandwidth and higher communication efficiency relative to Bus (Bus) interconnect technology.
The pcie (peripheral component interconnect express) interface is a communication interface manufactured according to the high-speed serial computer expansion bus standard.
Scratch pad Memory (Scratchpad Memory), which may also be commonly referred to as a scratch pad Memory, cache intermediate result buffer, or SPM.
The Array processor (Process Element Array) refers to a processing unit Array, and is also generally called a PE processing Array.
Cross-correlation calculation refers to calculation of the degree of correlation between two sequences using a cross-correlation function. In general, the cross-correlation function is used to calculate the degree of correlation between the values of the signals x (t), y (t) at any two different times t1, t 2. The invention introduces the method into the calculation of gene sequences, replaces the time with the number of bits of corresponding bits of the gene sequences, and calculates the degree of cross-correlation of the two gene sequences.
The euclidean distance may refer to a straight line distance between two points in euclidean space.
The Mesh structure is a network topology structure of a network on chip, or a multi-core interconnection form for communication between multiple cores of the network on chip. At present, the network-on-chip topology mainly has a Mesh structure and a Torus structure.
Fixed-point computation refers to computation in a spatial grid. On a given rectangular coordinate system, the coordinates are all integer points called integer points, and the group of all integer points is called a space grid.
Floating-point operations, operations other than fixed-point operations are floating-point operations.
According to an embodiment of the invention, an acceleration apparatus for gene similarity analysis is provided, which may include one or more of a high-speed communication interface, a sequence caching module, an array processor, a task caching module, a result caching module, and a control module. The high-speed communication interface can be used for communicating with the host and receiving tasks to be accelerated and processed distributed by the host; the sequence caching module can be used for caching one or more tasks from the host, wherein each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis; the array processor is provided with at least one processing unit for processing tasks, a complete pipeline for processing the tasks based on a data-driven streaming computing mode is arranged in each processing unit, and a plurality of fixed-point computing components required by the processing tasks are configured in the pipeline; a control module configured to allocate the tasks to be processed in the sequence buffer module to the processing units; the task cache module is provided with a task cache unit and is used for caching the tasks to be processed distributed to the processing unit; and the result caching module can be used for caching the analysis result of the gene similarity analysis obtained by the processing task of the array processor.
According to an embodiment of the present invention, as shown in fig. 1, the acceleration apparatus for gene similarity analysis according to an embodiment of the present invention includes one or more modules of a high-speed communication interface 110, a sequence cache module 140, an array processor, a task cache module, a result cache module 150, and a control module 120. Preferably, the sequence cache module 140, the array processor, the task cache module, the result cache module 150, and the control module 120 may be connected to each other through a network on chip. Preferably, the sequence cache module 140 and/or the control module 120 are coupled to the array processor via a network on chip. Each processing unit 160 has direct access to its own task cache unit 170. Preferably, the acceleration means may be implemented on an application specific integrated chip, such as an ASIC chip. Alternatively, the acceleration device may be implemented on a semi-custom chip, such as an FPGA chip.
Preferably, the array processor may include a plurality of processing units 160 for processing tasks in parallel; the task cache module includes a plurality of task cache units 170, and each processing unit 160 is independently configured with its own task cache unit 170. Preferably, the array processor may include a plurality of processing units 160. The plurality of processing units 160 may be interconnected to each other in a Mesh structure through a network on chip, see fig. 1 where the Mesh-structured routing node 130 is formed in a Mesh shape. Preferably, the number of the processing units 160 for processing tasks in parallel in the array processor may be 2 to 100, such as 16 shown in fig. 1. The task cache module may include a plurality of task cache units 170, and each processing unit 160 may be independently provided with its own task cache unit 170. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the acceleration device is used for accelerating the process of performing the gene similarity analysis on the gene sequence data in the task distributed by the host machine, and sending the analysis result to the host machine after the analysis result is obtained, so that the analysis result of the gene sequence data subjected to the similarity analysis by the host machine can be conveniently checked by a user.
Preferably, the acceleration device can be connected to the host through the high-speed communication interface 110, for example, the acceleration device of the present invention can be connected to the host through a PCIe interface or an HSSI interface. A host refers to a computer, such as a personal computer, a server, or a cluster of servers. It should be noted that the host may employ a general-purpose processor, and the general-purpose processor may be responsible for distributing tasks to the acceleration device and obtaining an analysis result of the gene similarity analysis obtained after the acceleration processing by the acceleration device from the acceleration device. After the accelerating device is connected with the host, the accelerating device can carry out gene similarity analysis on gene sequence data in tasks distributed by the host and return an analysis result to the host.
Referring to fig. 2, fig. 2 is a block diagram illustrating a pipeline configuration of a processing unit 160 of an acceleration apparatus for gene similarity analysis according to an embodiment of the present invention. Each processing unit 160 may be configured as a multi-stage pipeline structure for parallel execution of tasks, with low-bit-width fixed-point computation elements provided in the multi-stage pipeline structure. Preferably, each stage of pipeline executes a part of gene non-alignment algorithm for gene similarity analysis in parallel and finally obtains an analysis result. For example, a multi-stage pipeline structure may include one, two, and three stages of pipelines. Wherein, a plurality of sequence preprocessing modules 161 are configured in the first-stage pipeline, and are used for preprocessing the gene sequence data in the task to obtain decimal sequence data of each gene sequence represented by fewer digits. The second-level pipeline is provided with a plurality of cross-correlation calculation modules 162 for calculating the cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data obtained by the first-level pipeline processing. The three-stage pipeline is provided with a plurality of Euclidean distance calculation modules 163 for calculating Euclidean distances between every two cross-correlation vectors based on all the cross-correlation vectors calculated by the two-stage pipeline.
Preferably, a plurality of fixed point computation units may be provided in the processing unit 160 or in the multi-stage pipeline structure and more than the number of fixed point computation units in the processor of the host. Preferably, the number of the respective fixed point calculating means is proportional to the average length of the gene sequences corresponding to the gene sequence data in the task. Preferably, the number of the corresponding fixed-point calculation units is 1-2 times of the average length of the gene sequences corresponding to the gene sequence data in the task, for example, the average length is 1000 according to the viral RNA sequence, and then the number of the corresponding fixed-point calculation modules arranged in the two-stage pipeline can be 1000-2000. The low bit width fixed point calculation module may refer to a low bit width fixed point multiplier and/or a low bit width fixed point adder. Preferably, the low bit width may refer to a bit width of 64 bits or less, for example, 4 bits, 8 bits, or 32 bits. For example, when calculating the cross-correlation vector, 1000 8-bit fixed-point multipliers and 1000 32-bit fixed-point adders may be used in the first-stage pipeline to perform the multiplication and addition operations when calculating the cross-correlation vector. Preferably, a floating point calculation unit may be provided in the pipeline structure in addition to the fixed point calculation unit. For example, a 32-bit floating point device may be disposed in the cross-correlation calculation module 162 for division. A plurality of floating-point operators are arranged in the euclidean distance calculating module 163 to improve the parallelism of the calculation. Since the input value in the euclidean distance calculation is a 32-bit floating point number, and overflow may occur in the 32-bit operator after squaring and adding, the 32-bit floating point operator may be used for the subtraction and squaring operations in the euclidean distance calculation unit, and the 64-bit floating point operator may be used for the addition, so as to prevent overflow. In this embodiment, the non-alignment algorithm employed by the acceleration device of the present invention is a graph-oriented gene similarity algorithm. However, it should be understood that other types of non-alignment algorithms, such as statistical algorithms, may also be used, and the configuration of the acceleration apparatus of the present invention is only required to adjust the specific calculation performed by the pipeline structure of the processing unit 160 to a pipeline structure adapted to the statistical algorithm. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the invention can utilize the fixed-point calculation modules with more quantity arranged in the accelerating device to carry out accelerated analysis on the tasks distributed by the host, makes up the defect that the quantity of the fixed-point calculation modules on the host is less than the requirement of a comparison algorithm on the fixed-point calculation modules, reduces the time of gene similarity analysis and improves the analysis efficiency; moreover, the algorithm in the invention is complete flow calculation, the processing unit does not need to use branch prediction in the calculation process, and the flow control is simple.
Preferably, a first-in first-out memory FIFO for temporarily storing data is arranged between the first-stage pipeline and the second-stage pipeline, and between the second-stage pipeline and the third-stage pipeline. The sequence preprocessing module 161 completes the conversion of the original gene sequence into the decimal sequence, and stores the decimal gene sequence data into a first-in first-out memory FIFO between the first-stage pipeline and the second-stage pipeline. The sequence pre-processing module 161 can be used to find the shortest gene sequence. After all sequences are preprocessed, the sequence preprocessing module 161 raises the signal indicating the decimal gene sequence (i.e. sets the signal from 0 to 1), and notifies the cross-correlation calculating module 162 to read the data. After the cross-correlation calculation module 162 reads the data, the cross-correlation vector between each gene sequence to be calculated and the shortest gene sequence is calculated, and the calculated cross-correlation vector is stored in the first-in first-out memory FIFO between the second-stage pipeline and the third-stage pipeline, and when the cross-correlation vectors of all the sequences are calculated, the euclidean distance calculation module 163 is notified to read the data.
According to an embodiment of the present invention, the control module 120 is configured to monitor the task execution status of each processing unit by analyzing the data amount to be read in the task cache unit dedicated to each processing unit. When the percentage ratio of the data to be read in the corresponding task cache unit to the capacity of the task cache unit is reduced to a preset threshold value, the current task of the processing unit to which the data belongs is judged to be in a state of being completed to be executed. The control module 120 may be configured to monitor the processing tasks of each processing unit, and when it is monitored that the current task of the corresponding processing unit 160 is in a state of being executed completely and there are tasks to be processed in the sequence cache module 140, allocate the tasks to be processed in the sequence cache module to the processing unit 160 and send the tasks to be processed to the task cache unit 170 dedicated to the processing unit 160 in advance. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the invention can reduce the influence of transmission delay on the total computing efficiency, and can distribute the tasks to be processed according to the task execution condition of each processing unit in a small amount and continuously, so that each processing unit is in a working state as far as possible before all the tasks are completed, thereby improving the computing efficiency. The method can avoid the problem that after a large number of computing tasks are uniformly distributed to each processing unit, some processing units finish processing the tasks too early and no subsequent tasks can only be in an idle state, and other processing units still have a large number of tasks to be processed at the moment, so that the computing efficiency is not high. Preferably, the preset threshold may be set in a range of 10% to 30%. For example, assuming that the capacity of the task buffer unit 170 is 100MB, the preset threshold is set to 10%. Assuming that the size of the task first allocated to the processing unit 160 is 60MB, when the processing unit 160 reads the task, the percentage of the data to be read in the task buffer unit 170 to the capacity of the task buffer unit is monitored, and when the amount of the data to be read is reduced to 10%, that is, 10MB, it is determined that the current task of the processing unit 160 is in a state to be completed.
Preferably, the control module 120 may include control logic and task distribution logic. The control logic can be used for being responsible for interaction with the host, receiving the gene sequence distributed by the host and returning the comparison result to the host, and comprises control logic such as DMA (direct memory access), interrupt generation and the like. For example, the control logic may be configured to move a task carrying gene sequence data that needs to be subjected to similarity analysis to the sequence cache module 140 through a DMA operation and an interrupt operation. Preferably, the task distribution logic may be configured to monitor a condition that each processing unit 160 executes the task, and when it is monitored that the current task of the corresponding processing unit 160 is in a state to be executed and there are tasks to be processed in the sequence cache module 140, allocate the tasks to be processed in the sequence cache module to the processing unit 160 and send the tasks to the task cache unit 170 dedicated to the processing unit 160 in advance. That is, the control module 120 may randomly select one or more new tasks assigned to the processing unit 160 from all the remaining tasks to be processed in the sequence buffer module 140. For example, if ten more tasks remain to be processed in the sequence buffer module 140, one or two new tasks are randomly selected from the ten tasks and allocated to the corresponding processing unit 160. The task to be processed in the sequence buffer module is allocated to the array processor, so that a new task is allocated to the array processor, and the task buffer unit 170 dedicated to the processing unit 160 is sent. Because tasks of the non-comparison algorithm can be executed in sequence without a specific sequence and without using a branch prediction function, the invention directly adopts a random allocation mode to allocate new tasks, does not need branch prediction in the task distribution process, and has simple and efficient control process.
In one embodiment of the present invention, the sequence caching module 140, the task caching unit 170, and the result caching module 150 all employ program visible memory. Program-visible memory refers to memory that is uniformly addressed with main memory, and is directly accessible to the processor of control module 120, processing unit 160, and/or the host, for example, the program-visible memory may be a scratch pad memory. The technical scheme of the embodiment can at least realize the following beneficial technical effects: in the embodiment, the scratch pad memory visible to the program is used for replacing a hardware-controlled Cache mechanism, so that the use of a complex Cache mechanism in a general processor is avoided, the characteristics of a non-comparison algorithm can be fully utilized, and the flow control of task allocation is simpler and more efficient.
According to an embodiment of the present invention, the sequence preprocessing module 161 preprocesses the gene sequence data in the task to obtain decimal sequence data of each gene sequence in a fewer number of bits. For example, the sequence preprocessing module replaces four bases in the gene sequence data of the task with corresponding quadruple numerical values occupying two bit widths in a table look-up manner to obtain quaternary sequence data; two adjacent quaternary values representing a pair of binary nucleotides in a biological sense in each of the quaternary sequence data are converted into decimal values to obtain decimal sequence data. Since both DNA and RNA contain 4 bases, the conversion of the numerical sequence can be performed on a quaternary basis. In this example, RNA is taken as an example, for example, RNA contains U, C, G, A four bases, U, C, G, A can be mapped to one of 0, 1, 2, and 3, and the values are different from each other. For example, A, U, C, G may be defined to have a mapping: u maps to 0, C maps to 1, G maps to 2, and a maps to 3. After obtaining the quaternary sequence data, calculating the decimal value of every two adjacent bases, for example, if the adjacent two quaternary bases are CA, the quaternary value is 13, then the decimal value is 1 times 4 plus 3, that is, the decimal value of the two adjacent bases is 7, and calculating sequentially in this way, and finally obtaining the decimal sequence data. Each decimal value needs to be represented by a 4-bit width. The table lookup process may be implemented using a 4-bit wide processing element. The technical scheme of the embodiment can at least realize the following beneficial technical effects: the embodiment converts the gene sequence data into the decimal data, can convert the base expressed by the ASCII code into the decimal data for expression, can save the computing resource and improve the computing efficiency.
According to an embodiment of the present invention, the cross-correlation calculation module 162 calculates a cross-correlation vector of each gene sequence with the shortest gene sequence in the task based on the decimal sequence data obtained by the one-stage pipeline processing. Preferably, the cross-correlation vector of the x sequence and the y sequence of the two gene sequences can be calculated by the following formula:
Figure RE-GDA0002364365140000121
wherein the length of the x-sequence is shorter than that of the y-sequence, c (N) represents the value of the cross-correlation vector, N represents the number of dimensions, N starts from 0 until 4 is calculated, the values of the cross-correlation vector of five dimensions are calculated in total, N represents the length of the x-sequence, m represents the count value, and m starts from 0 until N-N is calculated.
Preferably, the cross-correlation calculation module 162 calculates the cross-correlation vector of each gene sequence with the shortest gene sequence in the task based on the decimal sequence data obtained by the one-stage pipeline processing by:
aligning the first digit of the shortest decimal gene sequence with the first corresponding digit of the decimal gene sequence to be calculated, calculating the sum of products of each digit of the shortest decimal gene sequence and the corresponding digit of the decimal gene sequence to be calculated according to the current alignment state, and dividing the sum by the length of the shortest gene sequence to obtain a one-dimensional cross-correlation vector of the gene sequence, the method comprises the steps that the number of digits of a first corresponding digit of gene sequence data to be calculated starts from the first digit, the number of digits of the first corresponding digit is increased by one after each one-dimensional cross-correlation vector is calculated until the number of digits of the first corresponding digit of the gene sequence data to be calculated is increased to enable the last digit of the shortest decimal gene sequence to be aligned with the last digit of the gene sequence data to be calculated, a plurality of one-dimensional cross-correlation vectors are obtained, and the minimum value of the plurality of one-dimensional cross-correlation vectors is used as the final result of the one-dimensional cross-correlation vectors;
aligning the first digit of the shortest decimal gene sequence with the second corresponding digit of the decimal gene sequence to be calculated under the condition of neglecting the last digit of the shortest decimal gene sequence, calculating the sum of the products of each digit of the shortest decimal gene sequence and the corresponding digit of the decimal gene sequence to be calculated in the current alignment state and dividing the sum by the length of the shortest gene sequence to obtain a two-dimensional cross-correlation vector of the gene sequence, wherein the digit of the second corresponding digit of the gene sequence data to be calculated starts from the second digit thereof, the digit of the second corresponding digit increases by one after each two-dimensional cross-correlation vector is calculated until the digit of the second corresponding digit of the gene sequence data to be calculated increases to align the last digit of the shortest decimal gene sequence with the last digit of the gene sequence data to be calculated to obtain a plurality of two-dimensional cross-correlation vectors, taking the minimum value in the two-dimensional cross-correlation vectors as the final result of the two-dimensional cross-correlation vectors;
aligning the first digit of the shortest decimal gene sequence with the third corresponding digit of the decimal gene sequence to be calculated under the condition of neglecting the last two digits of the shortest decimal gene sequence, calculating the sum of the products of each digit of the shortest decimal gene sequence and the corresponding digit of the decimal gene sequence to be calculated in the current alignment state and dividing the sum by the length of the shortest gene sequence to obtain a three-dimensional cross-correlation vector of the gene sequence, wherein the digit of the third corresponding digit of the gene sequence data to be calculated is started from the third digit, the digit of the third corresponding digit is increased by one after each three-dimensional cross-correlation vector is calculated until the digit of the third corresponding digit of the gene sequence data to be calculated is increased to align the last digit of the shortest decimal gene sequence with the last digit of the gene sequence data to be calculated to obtain a plurality of three-dimensional vectors, taking the minimum value in the three-dimensional cross-correlation vectors as the final result of the three-dimensional cross-correlation vectors;
aligning the first digit of the shortest decimal gene sequence with the fourth corresponding digit of the decimal gene sequence to be calculated while neglecting the last three digits of the shortest decimal gene sequence, calculating the sum of the products of each digit of the shortest decimal gene sequence and the corresponding digit of the decimal gene sequence to be calculated in the current alignment state divided by the length of the shortest gene sequence to obtain a four-dimensional cross-correlation vector of the gene sequence, wherein the digit of the fourth corresponding digit of the gene sequence to be calculated is increased by one after each four-dimensional cross-correlation vector is calculated from the fourth digit until the digit of the fourth corresponding digit of the gene sequence to be calculated is increased to align the last digit of the shortest decimal gene sequence with the last digit of the gene sequence to be calculated to obtain a plurality of four-dimensional cross-correlation vectors, taking the minimum value in the four-dimensional cross-correlation vectors as the final result of the four-dimensional cross-correlation vectors;
aligning the first digit of the shortest decimal gene sequence with the fifth corresponding digit of the decimal gene sequence to be calculated while neglecting the last four digits of the shortest decimal gene sequence, calculating the sum of the products of each digit of the shortest decimal gene sequence and the corresponding digit of the decimal gene sequence to be calculated in the current alignment state divided by the length of the shortest gene sequence to obtain a five-dimensional cross-correlation vector of the gene sequence, wherein the digit of the fifth corresponding digit of the gene sequence data to be calculated starts from the fifth digit, the digit of the fifth corresponding digit increases by one after each calculation of one five-dimensional cross-correlation vector until the digit of the fifth corresponding digit of the gene sequence data to be calculated increases to align the last digit of the shortest decimal gene sequence with the last digit of the gene sequence data to be calculated to obtain a plurality of five-dimensional cross-correlation vectors, and taking the minimum value in the five-dimensional cross-correlation vectors as the final result of the five-dimensional cross-correlation vectors.
It can be seen that the one-dimensional cross-correlation vector is based on the complete shortest gene sequence data, and is calculated once from the alignment of the first bit of the shortest gene sequence and the first bit of the gene sequence data to be calculated, and then the one-dimensional cross-correlation vector is calculated once again after sequentially shifting the shortest gene sequence data back by one bit and shifting it back by one dimension until the last bit of the shortest gene sequence data is aligned with the last bit of the gene sequence data to be calculated. Thus, assuming that the gene sequence data to be calculated differs from the shortest gene sequence by k bits, k +1 values are calculated for each dimension of the cross-correlation vector, and the minimum value is selected as the result of the dimension of the cross-correlation vector. Similarly, the calculation of the two-dimensional, three-dimensional, four-dimensional, and five-dimensional cross-correlation vectors is calculated based on the last digit, the last two digits, the last three digits, and the last four digits, respectively, of the shortest gene sequence data being ignored. Fig. 3 is a schematic diagram of a selection process of five-dimensional cross-correlation vectors in an acceleration apparatus for gene similarity analysis according to an embodiment of the present invention. Each dimension includes a plurality of cross-correlation vectors calculated, see for example the one-dimensional cross-correlation vector in fig. 3, and the smallest value in the dimension is selected as the result of the one-dimensional cross-correlation vector in each dimension. The technical scheme of the embodiment can at least realize the following beneficial technical effects: in the embodiment, the cross-correlation vectors of five dimensions are calculated to be used for calculating the Euclidean distance, so that the loss of excessive similarity information can be reduced, and the accuracy of gene similarity analysis is improved.
Fig. 4 is a flowchart illustrating a process of calculating a cross-correlation vector of a certain dimension of an acceleration apparatus for gene similarity analysis according to an embodiment of the present invention. Referring to fig. 4, the cross-correlation calculation module 162, when calculating a cross-correlation vector of a certain dimension, may include:
step T1, inputting a decimal gene sequence x sequence and a decimal gene sequence y sequence, setting the initial value of the cross-correlation vector with the minimum dimension as the maximum value, and turning to step T2;
step T2, judging whether the right shift number of the x sequence is smaller than the length of the y sequence minus the length of the x sequence, if yes, turning to step T3, and if not, turning to step T4;
step T3, setting the sum of the products of all corresponding bits of the x sequence and the y sequence calculated last time to zero, and turning to step T5;
step T4, the minimum cross correlation vector calculated in the dimension is returned as the final result of the cross correlation vector of the dimension;
step T5, judging whether the current calculation position m is smaller than the length of the x sequence, if yes, turning to step T6, and if not, turning to step T7;
step T6, calculating the product of the current calculation bit of the x sequence and the corresponding bit of the y sequence and summing the product with the previous value, and turning to step T5 after the value of the current calculation bit m of the x sequence is increased by one;
step T7, the calculated value of the cross correlation vector is equal to the sum of the products of the alignment bits of the x sequence and the y sequence calculated at this time divided by the length of the x sequence, the value of the right shift number of the x sequence is increased by one, and the step T8 is carried out;
step T8, judging whether the calculated value of the cross correlation vector is smaller than the cross correlation vector with the minimum dimension, if so, turning to step T9, and if not, turning to step T2;
step T9, let the value of the cross-correlation vector with the smallest dimension be equal to the calculated value of the cross-correlation vector of this time, go to step T2.
For ease of understanding, the process of calculating the five-dimensional cross-correlation vector of the present embodiment is described below by way of a specific example. In one example of the present invention, the computational process of the present invention is illustrated by two schematic gene sequence data. It should be noted that one digit of the decimal sequence is represented by four digits of a binary number. For example, binary number 0111 represents decimal number 7 as one digit in the decimal sequence, and binary number 1111 represents decimal number 15 as one digit in the decimal sequence. For simplicity of visualization, the example uses a decimal sequence containing only 0, 1, 2, and 3 for illustration. The decimal gene sequence data corresponding to the shortest gene sequence is assumed to be: 3120230313032032031302. One decimal gene sequence data to be calculated for which a cross-correlation vector is to be calculated with the shortest gene sequence data is 032132120231320232321313132. The two sequences differ in length by 4 bits.
Therefore, when the one-dimensional cross-correlation vector is calculated, five one-dimensional cross-correlation vectors are calculated under the following five states, and the minimum value of the five one-dimensional cross-correlation vectors is selected as the final result of the one-dimensional cross-correlation vector. In the first state, the shortest gene sequence data is aligned with the first bit of the gene sequence data to be calculated. The aligned bits of the two gene sequence data are multiplied by each other and summed up and divided by the length 23 of the shortest gene sequence data to obtain a one-dimensional cross-correlation vector. In the process of calculating the cross-correlation vector, the extra head or tail in the gene sequence data to be calculated is ignored. The principle of the subsequently listed ways of calculating the cross-correlation vector is the same, and the alignment position is changed, so that the subsequent description is omitted.
31202303130320320313023
032132120231320232321313132
In the second state, the shortest gene sequence data is aligned with the second digit of the gene sequence data to be calculated.
31202303130320320313023
032132120231320232321313132
In the third state, the shortest gene sequence data is aligned with the third position of the gene sequence data to be calculated.
31202303130320320313023
032132120231320232321313132
In the fourth state, the shortest gene sequence data is aligned with the fourth digit of the gene sequence data to be calculated.
31202303130320320313023
032132120231320232321313132
In the fifth state, the shortest gene sequence data is aligned with the fifth digit of the gene sequence data to be calculated.
31202303130320320313023
032132120231320232321313132
Similarly, when calculating the two-dimensional cross-correlation vector, the last bit of the shortest gene sequence data is omitted, and five two-dimensional cross-correlation vectors are calculated in the following five states by shifting right one bit each time. It is understood that the length of the shortest gene sequence data calculated at this time is decreased by one after the last bit of the shortest gene sequence data is omitted. That is, the length N of the shortest gene sequence data is based on the current actual length of the shortest gene sequence data when calculating the cross-correlation vector of the corresponding dimension. For example, when the two-dimensional cross-correlation vector is calculated here, two aligned bits of two gene sequence data are multiplied by each other, summed, and divided by the length 22 of the shortest gene sequence data at that time, so as to obtain a two-dimensional cross-correlation vector.
When the two-dimensional cross-correlation vector is calculated, the shortest gene sequence data of the last bit is omitted from the second bit alignment of the gene sequence data to be calculated in the first state.
3120230313032032031302
032132120231320232321313132
When the two-dimensional cross-correlation vector is calculated, the shortest gene sequence data of the last bit is omitted from being aligned with the third bit of the gene sequence data to be calculated in the second state.
3120230313032032031302
032132120231320232321313132
When the two-dimensional cross-correlation vector is calculated, the shortest gene sequence data of the last bit is omitted from being aligned with the fourth bit of the gene sequence data to be calculated in the third state.
3120230313032032031302
032132120231320232321313132
When the two-dimensional cross-correlation vector is calculated, the shortest gene sequence data of the last bit is omitted from being aligned with the fifth bit of the gene sequence data to be calculated in the fourth state.
3120230313032032031302
032132120231320232321313132
When the two-dimensional cross-correlation vector is calculated, the shortest gene sequence data of the last bit is omitted from being aligned with the sixth bit of the gene sequence data to be calculated in the fifth state.
3120230313032032031302
032132120231320232321313132
Similarly, three-, four-, and five-dimensional cross-correlation vectors are calculated as described above, ignoring one more bit at a time at the end of the shortest gene sequence data. For example, assuming that there are only X-sequences and y-sequences, the first cross-correlation vector obtained by the X-sequence and the shortest gene sequence data (itself) is X ═{Cx(0),Cx(1),Cx(2),Cx(3),Cx(4) Y-C-Y-y(0),Cy(1),Cy(2),Cy(3),Cy(4)}。
The calculation of the cross-correlation vector is illustrated in another specific example by the figures below. Fig. 5 is a schematic diagram illustrating a process of calculating a cross-correlation vector in an acceleration apparatus for gene similarity analysis according to an example of the present invention. Assuming that the one-dimensional cross-correlation vector is calculated, the shortest gene sequence is eight bits, the eight bits in the shortest gene sequence and the eight bits in the gene sequence to be calculated are sequentially taken for multiplication, and the numerical value in each box represents one bit of the decimal sequence data. For example, 4, 6, 15, and 1 in the shortest gene sequence data are four of eight bits, respectively. Eight pairs of corresponding bit multiplication of the two gene sequences are calculated simultaneously, namely eight multiplication operations are used for calculating simultaneously, then the multiplication results are summed, and the sum result is divided by the length 8 of the shortest gene sequence, so that a one-dimensional cross-correlation vector is obtained.
According to an embodiment of the present invention, the euclidean distance calculating module 163 calculates euclidean distances between every two cross-correlation vectors based on all the calculated cross-correlation vectors. Wherein, the smaller the Euclidean distance, the higher the similarity of the gene sequence data corresponding to the two cross-correlation vectors.
Preferably, the formula for calculating the euclidean distance between each two cross-correlation vectors is as follows:
Figure RE-GDA0002364365140000171
where X, Y refer to two vectors, Cx (i) refers to the elements in the X vector, and Cy (i) refers to the elements in the Y vector. i is a serial number and takes values from 0 to 4. And (3) assuming that three sequences exist, wherein one sequence is the shortest sequence, and the three sequences are respectively subjected to cross-correlation calculation with the shortest sequence to obtain three cross-correlation vectors. Cx (i) and cy (i) take the values from these 3 vectors. For example, if there are three sequences x, y, and z, where z is the shortest sequence, the cross-correlation vectors are calculated based on three combinations of xz, yz, and zz, and the euclidean distance between the x sequence and the y sequence is calculated based on two cross-correlation vectors calculated based on the combinations of xz and yz; calculating Euclidean distances of the x-sequence and the z-sequence based on two cross-correlation vectors calculated by combining xz and zz; the euclidean distances of the y-sequence and the z-sequence are calculated based on the two cross correlation vectors calculated by combining yz and zz.
Preferably, the euclidean distance calculating module 163 calculates the euclidean distance between every two cross-correlation vectors based on all the calculated cross-correlation vectors by:
respectively and correspondingly calculating the square of the difference of each dimension of the cross-correlation vectors of the two cross-correlation vectors;
summing the squares of the differences between all the dimensional cross-correlation vectors of the two cross-correlation vectors to obtain the square sum;
squaring the square sum of the differences of all the dimensional cross-correlation vectors of the two cross-correlation vectors to obtain the Euclidean distance of the two cross-correlation vectors; and/or
And respectively calculating Euclidean distances aiming at every two cross-correlation vectors until the calculation of the Euclidean distances of all the cross-correlation vectors is completed.
Fig. 6 is a schematic diagram illustrating a procedure of calculating euclidean distances in an acceleration apparatus for gene similarity analysis according to an example of the present invention. According to one example of the present invention, the first cross-correlation vector is X ═ { C ═ Cx(0),Cx(1),Cx(2),Cx(3),Cx(4) Y ═ C for the second cross-correlation vectory(0),Cy(1),Cy(2),Cy(3),Cy(4)}. Cx (0) to Cx (4) denote one-to five-dimensional cross-correlation vectors included in the first cross-correlation vector, respectively. In step S231, the square of the difference by subtraction of Cx (0) and Cy (0), the square of the difference by subtraction of Cx (1) and Cy (1), the square of the difference by subtraction of Cx (2) and Cy (2), the square of the difference by subtraction of Cx (3) and Cy (3), and the square of the difference by subtraction of Cx (4) and Cy (4) are calculated. In step S232, the squares of all the differences of step S231 are summed. In the step ofIn S233, the sum result in step S232 is squared to obtain the euclidean distance between the two cross-correlation vectors. In step S234, returning to step S231, the euclidean distance is calculated for other cross-correlation vectors for which no euclidean distance between two cross-correlation vectors has been calculated until the calculation of the euclidean distances for all cross-correlation vectors is completed.
According to an example of the present invention, the smaller the euclidean distance, the higher the similarity of genes, in order to show more vividly, for example:
the RNA sequence of the AIMV-3 virus is:
AUGCUCAUGCAAAACUGCAUGAAUGCCCCUAAGGGAUGC
the RNA sequence of the CiLRV-3 virus is:
AUGCCUAUAUUUUCUCUCCUGAGAAAAUAUAGAUGCCUCCAAAGGAGAUGC
the RNA sequence of the APMV-3 virus is as follows:
AAUGCCCACAACGUGAAGUUGUGGAUGCCCCGUUAGGGAAGC。
through calculation of the non-alignment algorithm, the Euclidean distance between AIMV-3 and CiLRV-3 is 51.2248, and the Euclidean distance between AIMV-3 and APMV-3 is 27.5925. From Euclidean distance, APMV-3 is far less from AIMV-3, and from RNA sequence, APMV-3 is more similar to AIMV-3.
According to one embodiment of the present invention, there is disclosed an acceleration method for gene similarity analysis, comprising: step S100, receiving tasks to be accelerated and distributed by a host, wherein each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis; and/or step S200, processing the tasks based on the data-driven streaming computing mode, processing the tasks in parallel by adopting a plurality of mutually independent complete pipelines, and accelerating the tasks by adopting a plurality of fixed point computing components with low bit width in each pipeline. Each pipeline performing accelerated processing may include: step S210, preprocessing the gene sequence data in the task to obtain decimal sequence data of each gene sequence represented by fewer digits; step S220, calculating a cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data; and/or step S230, calculating euclidean distances between every two cross-correlation vectors based on all cross-correlation vectors. The steps in this embodiment correspond to the functions of the modules in the embodiment of the acceleration apparatus for gene similarity analysis described above. For details that are not disclosed in the embodiments of the system of the present invention, please refer to the embodiments of the accelerating device for gene similarity analysis of the present invention.
According to another preferred embodiment of the present invention, it should be noted that within the scope of the present invention, the method step S200 of the present invention can be split into finer steps according to different comprehensions. Taking fig. 7 as an example, fig. 7 is a schematic flow chart of an acceleration method for gene similarity analysis according to an embodiment of the present invention. Step S200 of the present invention may include one or more of the following steps:
step S201, reading character string sequences of a plurality of genes from the task cache unit 170;
step S202, converting the gene character string sequence into quaternary gene sequence data;
step S203, converting the quaternary gene sequence data into decimal gene sequence data by converting every two adjacent quaternary gene sequence data into a decimal number;
step S204, calculating a cross-correlation vector representing the correlation of every two decimal gene sequences through a cross-correlation function;
step S204, calculating the Euclidean distance between every two cross-correlation vectors;
and step S206, obtaining a similarity result of the gene sequences, wherein the Euclidean distance between two cross-correlation vectors represents the similarity between two corresponding gene sequences.
According to one embodiment of the present invention, a computer device for gene similarity analysis is disclosed, comprising a memory; a host; and/or the accelerating device of the previous embodiment. The acceleration device in this embodiment corresponds to the embodiment of the acceleration device for gene similarity analysis described above, and therefore, for details that are not disclosed in the embodiment of the system of the present invention, please refer to the embodiment of the acceleration device for gene similarity analysis described above in the present invention.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. An acceleration apparatus for gene similarity analysis, comprising:
the high-speed communication interface is used for communicating with the host and receiving tasks to be accelerated and processed distributed by the host;
the sequence cache module is used for caching one or more tasks from the host, and each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis;
the array processor is provided with at least one processing unit for processing tasks, a complete pipeline for processing the tasks based on a data-driven streaming computing mode is arranged in each processing unit, a plurality of fixed-point computing components required by the processing tasks are configured in the pipeline, the processing unit is configured to be a multi-stage pipeline structure and comprises a first-stage pipeline, a second-stage pipeline and a third-stage pipeline, wherein a plurality of sequence preprocessing modules are configured in the first-stage pipeline and used for preprocessing gene sequence data in the tasks to obtain decimal sequence data of the gene sequences represented by fewer digits; the second-level pipeline is provided with a plurality of cross-correlation calculation modules for calculating the cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data obtained by the first-level pipeline processing; the three-stage pipeline is provided with a plurality of Euclidean distance calculation modules which are used for calculating the Euclidean distance between every two cross-correlation vectors based on all the cross-correlation vectors calculated by the two-stage pipeline;
the control module is configured to be used for distributing the tasks to be processed in the sequence cache module to the processing units;
and the task cache module is provided with a task cache unit and is used for caching the to-be-processed tasks distributed to the processing unit.
2. The accelerating apparatus for gene similarity analysis according to claim 1, wherein said multi-stage pipeline structure is used for parallel task execution, and a low bit width fixed-point computing unit is provided in said multi-stage pipeline structure; the number of the fixed point computing units with low bit width in the multistage pipeline structure is more than that of the fixed point computing units in the processor of the host.
3. The accelerating device for gene similarity analysis according to claim 1,
the array processor comprises a plurality of processing units for processing tasks in parallel, the task cache module comprises a plurality of task cache units, and each processing unit is independently provided with a task cache unit exclusive to the processing unit;
the control module is configured to monitor the condition of processing tasks of each processing unit, and when the current task of the corresponding processing unit is monitored to be in a state of being completed to be executed, the task to be processed in the sequence cache module is distributed to the processing unit and is sent to the task cache unit exclusive to the processing unit in advance.
4. The accelerating device for gene similarity analysis according to claim 3, wherein the control module is configured to monitor the task execution status of each processing unit by analyzing the data amount to be read in the task cache unit dedicated to each processing unit, wherein when the percentage ratio of the data to be read in the corresponding task cache unit to the capacity of the task cache unit is reduced to a preset threshold, it is determined that the current task of the processing unit to which the corresponding task cache unit belongs is in a state to be completely executed.
5. The accelerating device for gene similarity analysis according to claim 4, wherein the preset threshold value ranges from 10% to 30%.
6. The accelerating apparatus for gene similarity analysis according to claim 4, wherein the control module is configured to randomly select one or more to-be-processed tasks from all the to-be-processed tasks remaining in the sequence buffer module to be allocated to the corresponding processing unit when the current task of the processing unit is monitored to be in a state of being completed for execution.
7. An acceleration apparatus for gene similarity analysis according to any one of claims 3 to 6, characterized in that, the sequence buffer module and/or the control module are connected to the array processor through the network on chip, and a plurality of processing units in the array processor are connected to each other through the network on chip in a Mesh structure.
8. An acceleration apparatus for gene similarity analysis according to any one of claims 1 to 6, characterized in that, the acceleration apparatus further comprises a result buffer module, the result buffer module is connected to the control module and the array processor through the network on chip, and is used for buffering the analysis results of the gene similarity analysis obtained by the processing task of the array processor.
9. The apparatus of claim 8, wherein the sequence buffer module, the task buffer unit and the result buffer module all use a program-visible memory, wherein the program-visible memory refers to a memory that is uniformly addressed to a main memory.
10. An acceleration method based on the acceleration apparatus for gene similarity analysis according to any one of claims 1 to 9, comprising:
s100, receiving tasks to be accelerated and distributed by a host, wherein each task comprises a plurality of gene sequence data to be subjected to gene similarity analysis;
s200, processing tasks based on a data-driven streaming computing mode, adopting a plurality of mutually independent complete pipelines to process the tasks in parallel, and adopting a plurality of fixed point computing components with low bit width to accelerate the tasks in each pipeline, wherein each pipeline comprises the following steps:
s210, preprocessing the gene sequence data in the task to obtain decimal sequence data of each gene sequence represented by fewer digits;
s220, calculating a cross-correlation vector of each gene sequence and the shortest gene sequence in the task based on the decimal sequence data;
and S230, calculating Euclidean distance between every two cross-correlation vectors based on all the cross-correlation vectors.
11. A computer device for gene similarity analysis, comprising:
a memory;
a host; and the number of the first and second groups,
an accelerating device as in any one of claims 1 to 9.
CN201911191604.8A 2019-11-28 2019-11-28 Accelerating device and method for gene similarity analysis and computer equipment Active CN110990063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911191604.8A CN110990063B (en) 2019-11-28 2019-11-28 Accelerating device and method for gene similarity analysis and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911191604.8A CN110990063B (en) 2019-11-28 2019-11-28 Accelerating device and method for gene similarity analysis and computer equipment

Publications (2)

Publication Number Publication Date
CN110990063A CN110990063A (en) 2020-04-10
CN110990063B true CN110990063B (en) 2021-11-23

Family

ID=70088014

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911191604.8A Active CN110990063B (en) 2019-11-28 2019-11-28 Accelerating device and method for gene similarity analysis and computer equipment

Country Status (1)

Country Link
CN (1) CN110990063B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113867793A (en) * 2020-06-30 2021-12-31 上海寒武纪信息科技有限公司 Computing device, integrated circuit chip, board card, electronic equipment and computing method
CN113254104B (en) * 2021-06-07 2022-06-21 中科计算技术西部研究院 Accelerator and acceleration method for gene analysis
CN113268270B (en) * 2021-06-07 2022-10-21 中科计算技术西部研究院 Acceleration method, system and device for paired hidden Markov models

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375807A (en) * 2014-12-09 2015-02-25 中国人民解放军国防科学技术大学 Three-level flow sequence comparison method based on many-core co-processor
CN110427262A (en) * 2019-09-26 2019-11-08 深圳华大基因科技服务有限公司 A kind of gene data analysis method and isomery dispatching platform

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060293859A1 (en) * 2005-04-13 2006-12-28 Venture Gain L.L.C. Analysis of transcriptomic data using similarity based modeling
CN104735003B (en) * 2013-12-24 2019-05-31 锐迪科(重庆)微电子科技有限公司 Euclidean distance calculation method, module and multiple-input and multiple-output code translator
CN105631239B (en) * 2014-10-30 2018-08-17 国际商业机器公司 Method and apparatus for managing gene order
CN104375963B (en) * 2014-11-28 2019-03-15 上海兆芯集成电路有限公司 Control system and method based on buffer consistency
CN104951673B (en) * 2015-06-19 2018-03-30 中国科学院计算技术研究所 A kind of genome restriction enzyme mapping joining method and system
RU2750706C2 (en) * 2016-06-07 2021-07-01 Иллюмина, Инк. Bioinformatic systems, devices and methods for performing secondary and/or tertiary processing
CN107273209B (en) * 2017-06-09 2020-11-03 北京工业大学 Hadoop task scheduling method based on minimum spanning tree clustering improved genetic algorithm
CN109698010A (en) * 2017-10-23 2019-04-30 北京哲源科技有限责任公司 A kind of processing method for gene data
US11244761B2 (en) * 2017-11-17 2022-02-08 Accenture Global Solutions Limited Accelerated clinical biomarker prediction (ACBP) platform
CN109785905B (en) * 2018-12-18 2021-07-23 中国科学院计算技术研究所 Accelerating device for gene comparison algorithm
CN110188129A (en) * 2019-05-31 2019-08-30 北京旷视科技有限公司 Data processing method, device, system, equipment and the medium of testimony of a witness verification terminal

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104375807A (en) * 2014-12-09 2015-02-25 中国人民解放军国防科学技术大学 Three-level flow sequence comparison method based on many-core co-processor
CN110427262A (en) * 2019-09-26 2019-11-08 深圳华大基因科技服务有限公司 A kind of gene data analysis method and isomery dispatching platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data;Hanaa M. Hussain .etc;《2011 NASA/ESA Conference on Adaptive Hardware and Systems (AHS)》;20110729;1-8 *

Also Published As

Publication number Publication date
CN110990063A (en) 2020-04-10

Similar Documents

Publication Publication Date Title
CN110990063B (en) Accelerating device and method for gene similarity analysis and computer equipment
Zachariadis et al. Accelerating sparse matrix–matrix multiplication with GPU Tensor Cores
US9529590B2 (en) Processor for large graph algorithm computations and matrix operations
Ren et al. FPGA acceleration of the pair-HMMs forward algorithm for DNA sequence analysis
EP2657842B1 (en) Workload optimization in a multi-processor system executing sparse-matrix vector multiplication
Bošnački et al. Parallel probabilistic model checking on general purpose graphics processors
JP7292297B2 (en) probabilistic rounding logic
CN104254833A (en) Vector and scalar based modular exponentiation
KR20160141675A (en) Highly efficient inexact computing storage device
Sun et al. An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on FPGAs
US20210357732A1 (en) Neural network accelerator hardware-specific division of inference into groups of layers
Saavedra et al. Mining discriminative k-mers in DNA sequences using sketches and hardware acceleration
Soto et al. JACC-FPGA: A hardware accelerator for Jaccard similarity estimation using FPGAs in the cloud
CN109032667B (en) Method and system for quickly establishing adjacency list in molecular dynamics simulation
JP4477959B2 (en) Arithmetic processing device for broadcast parallel processing
CN112835551A (en) Data processing method for processing unit, electronic device, and computer-readable storage medium
CN112395548A (en) Processor for dynamic programming by instructions and method of configuring the processor
CN112540946A (en) Reconfigurable processor and method for calculating activation functions of various neural networks on reconfigurable processor
CN116842304A (en) Method and system for calculating irregular sparse matrix
CN115729554A (en) Formalized verification constraint solving method and related equipment
CN116502028B (en) Large-scale FFT (fast Fourier transform) implementation method and device based on floating point number compression technology
Ren et al. GPU-accelerated GATK haplotypecaller with load-balanced multi-process optimization
Anderson et al. An FPGA-based hardware accelerator supporting sensitive sequence homology filtering with profile hidden Markov models
CN112596912B (en) Acceleration operation method and device for convolution calculation of binary or ternary neural network
CN112464157B (en) Vector ordering method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant