CN114490023A - High-energy physical calculable storage device based on ARM and FPGA - Google Patents

High-energy physical calculable storage device based on ARM and FPGA Download PDF

Info

Publication number
CN114490023A
CN114490023A CN202111564111.1A CN202111564111A CN114490023A CN 114490023 A CN114490023 A CN 114490023A CN 202111564111 A CN202111564111 A CN 202111564111A CN 114490023 A CN114490023 A CN 114490023A
Authority
CN
China
Prior art keywords
fpga
module
data
chip
hard disk
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111564111.1A
Other languages
Chinese (zh)
Other versions
CN114490023B (en
Inventor
程耀东
程垚松
毕玉江
李海波
高宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of High Energy Physics of CAS
Original Assignee
Institute of High Energy Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of High Energy Physics of CAS filed Critical Institute of High Energy Physics of CAS
Priority to CN202111564111.1A priority Critical patent/CN114490023B/en
Publication of CN114490023A publication Critical patent/CN114490023A/en
Application granted granted Critical
Publication of CN114490023B publication Critical patent/CN114490023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses high-energy physical computable storage equipment based on ARM and FPGA, which is characterized by comprising a main control module, an extension module and a hard disk access module; an ARM chip is integrated in the main control module, and an FPGA chip, a PCIe interface conversion module and an SATAIII interface conversion module are integrated in the expansion module; the FPGA chip is connected with the ARM chip through PCIe; the main control module is used for pre-analyzing the input data by utilizing the ARM chip and performing online control on FPGA function calling; the expansion module is used for utilizing the FPGA chip to carry out hardware acceleration on the set algorithm, and carrying out partial reconfiguration on the circuit of the FPGA chip after the received data is the target algorithm, so that the FPGA chip runs the target algorithm; and when the received data is the data to be processed by the user, the FPGA processes the data and then sends the processed data to the hard disk access module.

Description

High-energy physical calculable storage device based on ARM and FPGA
Technical Field
The invention belongs to the technical field of computable storage and ARM storage, and relates to high-energy physical computable storage equipment based on ARM and FPGA (field programmable gate array) and an application method.
Background
With the development of high-energy physical experiment devices and detector technologies, the experimental data volume increases sharply from TB level to PB level and even EB level. In terms of task execution, high-energy physical computing is a typical I/O (input/output) intensive computing, the mode involves massive data input and output, and most of the conditions when the system is running are that a CPU waits for a read/write from/to a hard disk/memory, and the load of the CPU is not high. In a distributed computing system which is currently and generally adopted, a data storage node and a computing node are not generally present in the same device, so that data needs to be frequently carried between a hard disk and a memory and between the computing node and the storage node, and the computational power, the power consumption and the time cost caused by the frequent transportation become limiting factors of high-energy physical data processing on more advanced algorithm exploration.
In terms of task scheduling, in a high-energy physical computing environment, a user generally submits tasks from a login node, then a job scheduling system such as HTcondor/churm schedules the jobs to a computing node, and the computing node reads data from a network storage system such as Lustre/EOS through a network for analysis. With the higher and higher density of the computing nodes, the more and more single-node CPU cores and the more and more disk array storage data in the system, the network bandwidth and the I/O pressure are more and more serious.
In the aspect of data storage, high-energy physical experiment data is usually stored in a network storage system in a compressed file form to save storage space. Therefore, in high-energy physical data processing flows, decompression or compression operations are required. The use of traditional compute node-based CPU compression competes with the data analysis program for CPU resources, resulting in reduced computational efficiency.
In addition, the storage hardware used in the high-energy physical experiment usually adopts 2 CPU servers based on an X86 architecture, and is connected with a disk array through an FC-HBA card or an SAS-HBA card, and then is configured with a high-speed network with a bandwidth of 10G or 25G, 40G to provide services to the outside. The server with high configuration can not provide other services while performing storage service, hardware resources are greatly wasted, power consumption is high, density is low, the server is not suitable for the requirement of energy conservation and emission reduction advocated by the current country, and the problems of power consumption and utilization rate are particularly obvious.
Disclosure of Invention
Aiming at the problems of frequent data handling, large network transmission pressure, low existing storage density, high power consumption, single function and the like in the I/O intensive data analysis and processing process in the high-energy physical field, the invention aims to: an ARM-based high-energy physical computable storage device and an application method are provided. The device is a storage device which has high density, high performance and low power consumption and can realize programmability and computability by fully utilizing the SOC (System On chip) function, and the computing power, the power consumption and the time cost of high-energy physical data analysis can be greatly reduced by a matched application method; meanwhile, the pressure of a network, I/O and a CPU of a computing node can be greatly relieved; and a plurality of fixed processes in the whole automatic experiment process can be decoupled and realized in the storage node, so that the application range is wide and the expandability is strong.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a high-energy physical calculable storage device based on ARM and FPGA is characterized by comprising a main control module, an expansion module and a hard disk access module; the system comprises a main control module, an expansion module, a hard disk access module, an ARM chip, an FPGA chip, a PCIe-SATAIII interface conversion module, a PCIe bus and a SATAIII interface conversion module, wherein the main control module is internally integrated with the ARM chip, the FPGA chip of the expansion module is connected with the ARM chip through a PCIe bus, and the expansion module is connected with the hard disk access module through the PCIe-SATAIII interface conversion module;
the main control module is used for carrying out pre-analysis on the incoming data by utilizing an ARM chip and carrying out online control on FPGA function calling; when data are transmitted into the high-energy physical computable storage device, the ARM chip firstly analyzes whether a target algorithm is called according to the file name or the file extension attribute of the transmitted data, and after the calling of the target algorithm is confirmed, the ARM chip sends the target algorithm stored in advance to the FPGA chip of the extension module, otherwise, the transmitted data are sent to the extension module;
the PCIe and SATAIII interface conversion module is used for converting data input by the FPGA chip and outputting the converted data to the hard disk expansion module, and converting data input by the hard disk expansion module and outputting the converted data to the FPGA chip;
the expansion module is used for utilizing the FPGA chip to perform hardware acceleration on the set algorithm, and performing partial reconfiguration on the circuit of the FPGA chip in a soft reset mode after receiving data sent by the main control module as a target algorithm so as to realize that the FPGA chip operates the target algorithm; when the received data sent by the main control module is data to be processed by a user, the FPGA processes the data stored by the user and sends the processed data to the hard disk access module or sends the processed data to the main control module and forwards the processed data to the client for display;
the hard disk access module comprises a plurality of hard disk slots and is used for connecting with an accessed hard disk and storing the data sent by the expansion module into the connected hard disk.
Further, the main control module receives and analyzes a user instruction sent by a user through application, if the user instruction is to call an ARM chip CPU or a system on chip SOC, an algorithm library in the main control module is traversed, then the instruction is sent to an accelerator driver subsystem in the CPU through a drive unified interface, and after a BIOS subsystem is called to initialize a chip acceleration subsystem, the acceleration of the application is completed through register operation; if the user instruction is to call the FPGA chip to perform hardware accelerated calculation, firstly traversing the FPGA algorithm circuit library stored in the hard disk expansion module, then writing a target algorithm circuit into the FPGA chip on the expansion module, and performing partial reconfiguration on the FPGA chip circuit in a soft reset mode to realize that the FPGA chip operates the target algorithm circuit.
Furthermore, the expansion module firstly checks the data sent by the main control module, and if the data is arithmetic circuit data, a single PCIe is allocated to the arithmetic circuit data and loaded to a working area; and if the data is the storage data, the data flows into the FPGA through a PCIe bus to be operated and then flows out to the hard disk access module.
Furthermore, the main control module further comprises a memory, a flash memory, an SD card and an Ethernet card.
Further, the expansion module is connected with the hard disk access module through an SFF-8643 interface.
Further, the high-energy physical calculable storage device includes a dual redundant power supply.
The large-scale storage device is characterized by comprising a plurality of nodes which are deployed in a distributed mode, wherein the nodes are the high-energy physical computable storage device based on ARM and FPGA; and a distributed storage system is installed on each node.
The invention relates to high-energy physical calculable storage equipment based on ARM and FPGA, which is 1U (1U is 4.445 cm) in height and comprises a main control module, an expansion module and a hard disk access module, wherein the main control module integrates 1 ARM CPU, two DDR4 memories (a single memory supports more than 32 GB), more than 8M NoR Flash memory, more than 32GB micro SD cards, 2 10G bandwidth Ethernet cards and the like; the single CPU comprises a 100Gbits/s compression/decompression engine and supports data formats of ZLIB, DEFLATE and GZIP; the expansion module comprises 1 FPGA chip and a PCIe-SATAIII interface conversion module, and the FPGA chip is connected with the ARM through a PCIe bus; the hard disk access module comprises 12 blocks of 3.5-inch SATA3.0 hard disk slots; each computing storage device supports dual redundant power supplies.
Specifically, the SATA hard disk slot may be fully configured or partially configured, and may be configured with an SSD hard disk or a general mechanical hard disk. According to the calculation of 14TB of the storage capacity of each hard disk, a single 1U node supports 168TB storage space, and the read-write bandwidth performance of the single node reaches 6 GB/sec.
The device can be used as a single storage node, and can also be used by combining all the storage nodes into a large storage device through an Ethernet switch. When the storage nodes are used in a combined mode, distributed storage systems such as EOS, Gluster, CEPH and HDFS need to be installed on the storage nodes, and a unified storage pool is formed.
The invention also discloses an application method of the high-energy physical computable storage device based on the ARM and the FPGA, which comprises the following parts:
a hardware computing capability calling method, an FPGA part reconfiguration method and a computing task scheduling method.
The hardware computing capacity calling method comprises the calling of the CPU/SOC and the FPGA hardware computing capacity. A user instruction related to hardware calling is sent from an application, a CPU on a main control module determines whether to call a CPU/SOC acceleration algorithm on the main control module or an FPGA acceleration algorithm on an expansion module according to parameters (namely an algorithm name) contained in the instruction, namely, the mode of hardware acceleration can be identified, if the CPU/SOC is called, an algorithm library stored in a hard disk expansion module is loaded into a memory of the main control module, the algorithm library stored in the hard disk expansion module is traversed, and then the instruction is sent to a chip acceleration subsystem in the CPU through a driving unified interface, and the acceleration of the application is completed through a register operation on the main control module; if the FPGA is called to carry out hardware accelerated calculation, the FPGA algorithm library developed by the invention and stored in a hard disk is firstly searched by the algorithm name in the instruction, the target algorithm is loaded into Flash of the expansion module, the algorithm library comprises algorithm circuits such as general algorithms of compression, decompression, erasure codes and the like and special algorithms of decode, a machine learning model and the like, then the algorithm circuits in the Flash are written into the FPGA on the expansion module through an internal bus in the form of Bit stream, and the internal circuits of the FPGA are replaced by using a partial reconfiguration method of the FPGA. In order to distinguish arithmetic circuit data from data to be processed of a user, the invention allocates a single PCIe for the arithmetic circuit data Bit stream and verifies the type of input data, the data to be processed of the user flows into an FPGA for operation through another PCIe bus in the form of the Bit stream, and an arithmetic operation result is sent to a main control module and forwarded to a client for display or sent to a hard disk access module for storage according to the definition in the arithmetic without staying in the FPGA.
The FPGA partial reconfiguration method is used for replacing an FPGA internal execution algorithm in a soft reset mode according to application requirements. When the equipment is powered on, the FPGA can firstly load a pre-programmed working circuit, so that the working circuit is displayed as a hardware equipment in an operating system to be convenient to call. When a system or a user instruction calls an FPGA function, a CPU firstly obtains a target function algorithm circuit from Flash, the circuit is sent to the FPGA by a PCIe bus in a Bit stream mode, the FPGA verifies and judges the algorithm data stream and then loads the algorithm data stream to a working area to reserve a hardware device definition circuit and cover the original algorithm circuit, and therefore switching of the FPGA function is achieved.
The computing task scheduling method is realized by developing additional functional plug-ins on the basis of a deployed distributed storage system (such as Lustre/EOS). The invention develops a special plug-in for a file system deployed on the hardware, is compatible with a standard hardware access protocol, and is used for unloading computing tasks on a local host or a network host to a computable storage device. When a user sends a storage system instruction, before the instruction takes effect, the plug-in checks whether the instruction contains a special sign symbol for calling the computational resources of the computational storage equipment, if so, firstly loads a user-specified computation method to a CPU/SOC or an FPGA of the same equipment through a hardware computation capability calling method, and if so, the plug-in can start partial reconfiguration on the FPGA; after the hardware calculation algorithm is loaded, the target file stored in the calculable storage device is sent to the FPGA or the CPU/SOC of the same device to complete the required calculation task, and finally the result is sent to the user client. For example, when a file is opened in a Linux system in a computable storage device, a special mark symbol of "& css _ alg" is added at the end of an instruction and an algorithm parameter is called, so that a hardware function on the device is called. Specifically, the method comprises the following steps: the method comprises the following steps: and (3) starting a runid.txt file, using a CPU/SOC or FPGA hardware to accelerate sequencing, and then sending a result to a client for a user to use.
Compared with the prior art, the invention has the following advantages:
1) flexible application mode
The device can be used as a common storage node, and can also be provided with computable storage software to realize soft-hard integrated storage equipment. When the storage device is used as a computable storage device, data storage service can be provided to the outside based on interfaces such as a POSIX (position identification input) interface of a storage system, meanwhile, a calculation task can be supported by adding special parameter options to the POSIX interface, the hardware capabilities of an FPGA (field programmable gate array) and a CPU (central processing unit) SOC (system on chip) on a storage node are called to realize near storage data processing, and data handling is avoided.
2) Hardware functions of ARM, FPGA and the like on storage nodes are fully utilized
ARM is a Reduced Instruction Set (RISC) architecture widely used, and ARM-based CPUs are widely used in smart phones, notebook computers, tablet computers, embedded systems, and the like. Compared with the x86 architecture of Intel, it has the characteristics of small volume, low power consumption, low cost and the like. These characteristics make ARM advantageous in high performance computing applications requiring significant power costs. The FPGA is composed of bottom hardware resources such as a logic unit, an RAM (random access memory), an adder and the like, the characteristics of the FPGA support developers to reasonably organize the hardware resources so as to realize hardware circuits such as a multiplier, a register, an address generator and the like, the FPGA can be infinitely reprogrammed, a new design scheme is loaded only for hundreds of milliseconds, and the hardware overhead can be reduced by reconfiguration. According to the invention, hardware characteristics such as ARM CPU SOC function and FPGA on the storage node are fully utilized, a data intensive computing task is unloaded to the storage node to realize 'near storage' execution, functions such as hardware-level data compression, erasure codes and image differential storage are supported, and programmable and computable storage equipment is realized;
3) high storage density
The invention further expands the capability of an ARM CPU, so that the ARM CPU can be connected with 12 SATA hard disks with 3.5 inches, and the density is doubled compared with that of the traditional 2U 12 disk storage server;
4) the whole machine system has low power consumption and is green and energy-saving
The core device ARM and the FPGA chip adopted by the invention have the characteristic of low power consumption in products with the same function type. Based on ARM CPU and FPGA computing components, the whole system has low power consumption, is green and saves energy;
5) solves the problem of resource waste in practical application
Modern server chips typically have many processing cores with high single-core and total computing power. As a storage node of the distributed storage system, a storage process of the storage node, such as an FST process of an EOS distributed file system, has a limited CPU processing capability, and most of core computing capabilities are not fully utilized, which results in resource waste. The invention aims at the high-energy physical mass storage system and realizes a certain data processing function by utilizing the idle CPU resource of the storage node and the more efficient FPGA computing resource based on the design concept of computable storage, thereby effectively improving the use efficiency of the storage node. The problems that a storage server in a traditional high-energy physical computing environment is high in power consumption, low in density, idle in CPU resources and the like are solved.
6) Strong expandability
According to the programmable computable storage device and the application method, the application field is not limited to high-energy physics, universality is achieved, the computing function is not limited to algorithms such as data compression, data redundancy and verification, dynamic loadable plug-ins can be compiled according to user requirements and added into an existing storage system, so that different computing modules are called, and good expandability is achieved.
Drawings
FIG. 1 is a block diagram of the general structure of a high-energy physical computable memory device based on ARM and FPGA.
FIG. 2 is a block diagram of a main control module of the high-energy physical computable storage device based on ARM and FPGA.
FIG. 3 is a block diagram of an expansion module of an ARM and FPGA based high-energy physical computable storage device.
FIG. 4 is a hard disk access block diagram of a high-energy physical computable storage device based on ARM and FPGA.
FIG. 5 is a framework of an application method of the high-energy physical computable storage device based on ARM and FPGA.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples.
The invention provides a high-energy physical calculable storage device based on ARM and FPGA and an application method thereof, wherein the calculable storage device mainly comprises the following components in a case with the height of 1U: the system comprises a main control module, an expansion module and a hard disk access module. The main control module is connected with the expansion module through a PCIe connector (high-speed serial bus connector), and the expansion module is connected with the hard disk access module through an SFF-8643 (small optical fiber connector-8643) interface. The application method based on the computable storage device realizes functions through a calling interface, hardware design, development application library and algorithm library.
The general structure block diagram of the high-energy physical computable storage device based on ARM and FPGA is shown in FIG. 1.
Each module is described in detail below.
As shown in fig. 2, the main control module mainly includes: an ARM CPU; two main memory DDR4 (fourth generation memory) slots, the specific types are: 288PIN vertical plug-in memory slot, 72bit data bus, support ECC (error correction code); the NoR Flash (Flash memory) stores a system boot code through a QSPI parallel interface (queue serial peripheral parallel interface) with the capacity larger than 8M; SD (secure digital memory card) pluggable design, capacity is not less than 32G; 2 USB (universal serial bus) interface; 2 ten gigabit Ethernet cards; 2 gigabit ethernet cards; and JTAG (joint test action group) debugging is supported, and the debugging efficiency of the production line is improved.
The main functions of the main control module include overall management and control logic of the system, namely: the power management of the whole system, the modulation generation of various secondary power supplies of the main control system (such as 1.0V VCC core, 1.8V SPI VCC, 1.2V DDR4 interface VCC, 0.6VDDR4 VTT Source, etc.), a power-on reset circuit (POR, which supports the power-on timing control of the CPU and peripheral devices), and the reset logic of a single unit (which supports the individual reset of a single peripheral device and the whole system). Compared with the traditional storage device, the method has the unique characteristic that the method realizes the pre-analysis of the ARM chip on the incoming data and the online control on the FPGA function call. When data are transmitted into equipment, the ARM firstly analyzes whether FPGA functions (compression, sorting and the like) are called or not according to file names or file extension attributes, and after a calling target algorithm is confirmed, the ARM sends algorithm data Bit streams stored in Flash in advance to the FPGA of an extension module through a high-speed serial bus.
As shown in fig. 3, the expansion module mainly includes: 1 FGPA chip, PCIe and SATAIII interface conversion module; SFF-8643 interface. The PCIe and SATAIII interface conversion module has the functions of: and converting the high-speed serial interfaces corresponding to 2 paths of PCIe (peripheral component interface express) 3, converting each group of PCIe.3x2 corresponding interfaces into 5 SATA III interfaces, and expanding four SATA III interfaces into 12 SATAIII interfaces through multi-port expansion. After data flows in through a PCIe bus, the data flows out from an SATAIII interface through the conversion of the module, and finally the data is interconnected with a hard disk expansion card through a small-sized optical fiber connector-8643 interface; meanwhile, the expansion module integrates an FPGA chip, and the on-board FPGA chip is directly connected with the main control module through PCIe and an ARM CPU. The logic online debugging of the whole FPGA is realized by adopting a JTAG interface through a logic simulator, and in practical application, a logic circuit of the FPGA is realized through onboard devices.
The expansion module is a main part different from other traditional storage server equipment, the invention utilizes the characteristics of the FPGA chip and can provide extra computing capacity outside a CPU, and one application mode is to utilize the FPGA chip to carry out hardware acceleration on some common algorithms or special algorithms and develop and design FPGA circuits of general algorithms such as compression, decompression, erasure codes and the like and special algorithms such as decode, machine learning models and the like. In addition, the FPGA is reconfigured on-line based on the expansion module, the FPGA can be reprogrammed infinitely without affecting the use of the system, and the FPGA can be reset to a new device quickly. Specifically, after receiving an instruction sent by a main control module, an expansion module searches a target algorithm in a hard disk according to the instruction content, the target algorithm is sent to an FPGA through a PCIe bus in the form of a Bit stream, the FPGA judges whether the target algorithm belongs to an algorithm configuration data stream or a data stream to be processed according to a front check Bit of the Bit stream, the algorithm configuration data stream is loaded to a set area in the FPGA through a single PCIe bus, partial reconfiguration is performed on the FPGA circuit through a soft reset mode, partial original FPGA circuit is replaced, so that FPGA online programming and function algorithm switching are realized, an algorithm operation result is sent to the main control module according to the definition in the algorithm and is forwarded to a client for display or is sent to a hard disk access module for storage, and the algorithm operation result cannot stay in the FPGA.
As shown in fig. 4, the hard disk access module realizes access of 12 hard disks of 3.5 inches, and the hard disks are interconnected with the hard disk access board through a standard 22pin SATA connector. The 12 hard disks are divided into 3 rows, each row comprises 4 hard disks, the distance between two rows of hard disks is 4mm, and the distance between two rows of hard disks is 22 mm. All hard disks support hot plugging, and each hard disk is connected to the panel through a light guide column to display the state.
As shown in FIG. 5, a technical architecture diagram of an application method based on the computing storage device is provided. The bottom layer provides an application resource foundation for the hardware of the computing storage device, the middle layer is an application method of the computing storage device, and the top layer is an application calling mode. The application method of the computable storage device comprises a hardware computing capacity calling method, an FPGA part reconfiguration method and a computing task scheduling method.
The implementation mode of the hardware computing capacity calling method is realized by designing a hardware circuit control mode. A user instruction related to hardware calling is sent from an application and storage service, a CPU analyzes metadata to obtain a calculation task therein and identifies a mode for hardware acceleration, if the CPU/SOC is called, a developed algorithm library is firstly traversed, then an instruction is sent to an accelerator driver subsystem through a drive unified interface, and after a BIOS subsystem is called to initialize a chip acceleration subsystem, the acceleration of the application is completed through register operation; if the FPGA is called to carry out hardware accelerated calculation, the FPGA algorithm circuit library developed and stored in Flash firstly passes through the invention and comprises algorithm circuits such as general algorithms of compression, decompression, erasure codes and the like and special algorithms of decode, machine learning models and the like, then the algorithm circuits are written into the FPGA on the expansion module through an internal bus in the form of Bit stream, and the internal circuits are replaced by using a partial reconfiguration method of the FPGA. In order to distinguish arithmetic circuit data from user storage data, the invention allocates a single PCIe for arithmetic circuit data Bit stream and checks the type of input data, and the user storage data flows into an FPGA for operation through another PCIe bus in the form of Bit stream and then flows out to a storage device without staying in the FPGA.
The implementation mode of the FPGA part reconfiguration method is realized through instruction triggering and a hardware circuit control mode. When the equipment is powered on, the FPGA can load a pre-programmed working circuit, and the invention displays the special driving module as a hardware equipment in an operating system for convenient calling by developing the special driving module. When a system or a user instruction triggers and calls the FPGA function, the CPU firstly obtains a target function algorithm circuit from Flash, the circuit is sent to the FPGA by a PCIe bus in the form of Bit stream, the FPGA verifies and judges the algorithm data stream and then loads the algorithm data stream to a working area to reserve a hardware device definition circuit and cover the original algorithm circuit, and therefore switching of the FPGA function is achieved.
The implementation mode of the calculation task scheduling method is realized through a program and software development mode, and comprises special plug-in development, hardware acceleration algorithm transplantation and development, and FPGA-based circuit design and development of a general or special algorithm. The invention develops a special plug-in for a file system deployed on the hardware, is compatible with a standard hardware access protocol, and is used for unloading computing tasks on a local host or a network host to a computable storage device. When a user sends a storage system instruction, before the instruction takes effect, the plug-in checks whether the instruction contains a special sign symbol for calling the computational resources of the computational storage equipment, if so, firstly loads a user-specified computation method to a CPU/SOC or an FPGA of the same equipment through a hardware computation capability calling method, and if so, the plug-in can start partial reconfiguration on the FPGA; after the hardware calculation algorithm is loaded, the target file stored in the calculable storage device is sent to the FPGA or the CPU/SOC of the same device to complete the required calculation task, and finally the result is sent to the user client.
The present invention is not limited to the embodiments described in detail in the present invention, and various modifications can be made thereto by those skilled in the art, but they are still within the scope of the present invention as long as they do not depart from the spirit and intent of the present invention.

Claims (7)

1. A high-energy physical calculable storage device based on ARM and FPGA is characterized by comprising a main control module, an expansion module and a hard disk access module; the system comprises a main control module, an expansion module, a hard disk access module, an ARM chip, an FPGA chip, a PCIe-SATAIII interface conversion module, a PCIe bus and a SATAIII interface conversion module, wherein the main control module is internally integrated with the ARM chip, the FPGA chip of the expansion module is connected with the ARM chip through a PCIe bus, and the expansion module is connected with the hard disk access module through the PCIe-SATAIII interface conversion module;
the main control module is used for carrying out pre-analysis on the incoming data by utilizing an ARM chip and carrying out online control on FPGA function calling; when data are transmitted into the high-energy physical computable storage device, the ARM chip firstly analyzes whether a target algorithm is called according to the file name or the file extension attribute of the transmitted data, and after the calling of the target algorithm is confirmed, the ARM chip sends the target algorithm stored in advance to the FPGA chip of the extension module, otherwise, the transmitted data are sent to the extension module;
the PCIe and SATAIII interface conversion module is used for converting data input by the FPGA chip and outputting the converted data to the hard disk expansion module, and converting data input by the hard disk expansion module and outputting the converted data to the FPGA chip;
the expansion module is used for utilizing the FPGA chip to perform hardware acceleration on the set algorithm, and performing partial reconfiguration on the circuit of the FPGA chip in a soft reset mode after receiving data sent by the main control module as a target algorithm so as to realize that the FPGA chip operates the target algorithm; when the received data sent by the main control module is data to be processed by a user, the FPGA processes the data stored by the user and sends the processed data to the hard disk access module or sends the processed data to the main control module and forwards the processed data to the client for display;
the hard disk access module comprises a plurality of hard disk slots and is used for connecting with an accessed hard disk and storing the data sent by the expansion module into the connected hard disk.
2. The high-energy physical computable storage device of claim 1, wherein the main control module receives and parses a user command sent by a user through an application, if the user command is to call an ARM chip CPU or a system on chip SOC, the algorithm library in the main control module is traversed, then a command is sent to an accelerator driver subsystem in the CPU through a driver unified interface, and after the BIOS subsystem is called to initialize the chip accelerator subsystem, acceleration of the application is completed through register operation; if the user instruction is to call the FPGA chip to perform hardware accelerated calculation, firstly traversing the FPGA algorithm circuit library stored in the hard disk expansion module, then writing a target algorithm circuit into the FPGA chip on the expansion module, and performing partial reconfiguration on the FPGA chip circuit in a soft reset mode to realize that the FPGA chip operates the target algorithm circuit.
3. The high-energy physical computable storage device of claim 1 or claim 2, wherein the expansion module first checks the data sent from the master module and if the data is algorithmic circuit data, allocates a separate PCIe load to the work area; and if the data is the storage data, the data flows into the FPGA through a PCIe bus to be operated and then flows out to the hard disk access module.
4. The high-energy physical computable storage device of claim 1, wherein the master control module further comprises a memory, a flash memory, an SD card, and an ethernet card.
5. The high-energy physical computable storage device of claim 1, wherein the expansion module and the hard disk access module interface via SFF-8643.
6. The high-energy physical computable storage device of claim 1, wherein the high-energy physical computable storage device comprises a dual redundant power supply.
7. A mass storage device comprising a plurality of distributively deployed nodes, the nodes being ARM and FPGA-based high-energy physical computable storage devices as recited in claim 1; and a distributed storage system is installed on each node.
CN202111564111.1A 2021-12-20 2021-12-20 ARM and FPGA-based high-energy physical computable storage device Active CN114490023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111564111.1A CN114490023B (en) 2021-12-20 2021-12-20 ARM and FPGA-based high-energy physical computable storage device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111564111.1A CN114490023B (en) 2021-12-20 2021-12-20 ARM and FPGA-based high-energy physical computable storage device

Publications (2)

Publication Number Publication Date
CN114490023A true CN114490023A (en) 2022-05-13
CN114490023B CN114490023B (en) 2024-05-07

Family

ID=81494128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111564111.1A Active CN114490023B (en) 2021-12-20 2021-12-20 ARM and FPGA-based high-energy physical computable storage device

Country Status (1)

Country Link
CN (1) CN114490023B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11822797B1 (en) 2022-07-27 2023-11-21 Beijing Superstring Academy Of Memory Technology Object computational storage system, data processing method, client and storage medium
WO2024021453A1 (en) * 2022-07-27 2024-02-01 北京超弦存储器研究院 Object computing and storage system, data processing method, and client and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710596A (en) * 2018-05-10 2018-10-26 中国人民解放军空军工程大学 It is a kind of to assist the desktop of processing card is super to calculate hardware platform based on DSP and FPGA more
WO2020113966A1 (en) * 2018-12-03 2020-06-11 山东浪潮人工智能研究院有限公司 High-performance fusion server architecture

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710596A (en) * 2018-05-10 2018-10-26 中国人民解放军空军工程大学 It is a kind of to assist the desktop of processing card is super to calculate hardware platform based on DSP and FPGA more
WO2020113966A1 (en) * 2018-12-03 2020-06-11 山东浪潮人工智能研究院有限公司 High-performance fusion server architecture

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨亚涛;张松涛;李子臣;张明舵;曹广灿;: "基于Zynq平台PCIE高速数据接口的设计与实现", 电子科技大学学报, no. 03, 30 May 2017 (2017-05-30) *
程旭;陆俊林;易江芳;刘姝;: "面向UMPC的北大众志-SK***芯片设计", 计算机学报, no. 11, 15 November 2008 (2008-11-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11822797B1 (en) 2022-07-27 2023-11-21 Beijing Superstring Academy Of Memory Technology Object computational storage system, data processing method, client and storage medium
WO2024021453A1 (en) * 2022-07-27 2024-02-01 北京超弦存储器研究院 Object computing and storage system, data processing method, and client and storage medium

Also Published As

Publication number Publication date
CN114490023B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US10877766B2 (en) Embedded scheduling of hardware resources for hardware acceleration
US9164853B2 (en) Multi-core re-initialization failure control system
US9389904B2 (en) Apparatus, system and method for heterogeneous data sharing
CN112035381B (en) Storage system and storage data processing method
CN114490023B (en) ARM and FPGA-based high-energy physical computable storage device
CN105930186B (en) The method for loading software of multi -CPU and software loading apparatus based on multi -CPU
CN114296638B (en) Storage and calculation integrated solid state disk controller and related device and method
CN112181293B (en) Solid state disk controller, solid state disk, storage system and data processing method
JP2001051959A (en) Interconnected process node capable of being constituted as at least one numa(non-uniform memory access) data processing system
CN115033188B (en) Storage hardware acceleration module system based on ZNS solid state disk
Tseng et al. Gullfoss: Accelerating and simplifying data movement among heterogeneous computing and storage resources
US11029847B2 (en) Method and system for shared direct access storage
CN113487006B (en) Portable artificial intelligence auxiliary computing equipment
US20200233588A1 (en) Efficient lightweight storage nodes
CN113656076A (en) BIOS starting method and device based on hardware multiplexing channel
CN116541317A (en) Computable memory, computable memory system, and data processing method
Liu et al. Hippogriff: Efficiently moving data in heterogeneous computing systems
CN106951268A (en) A kind of Shen prestige platform supports the implementation method of NVMe hard disk startups
CN100492299C (en) Embedded software developing method and system
CN111651382A (en) Parallelization storage implementation method of data acquisition system based on local bus
KR102457183B1 (en) Multi-core simulation system and method based on shared translation block cache
CN117852600B (en) Artificial intelligence chip, method of operating the same, and machine-readable storage medium
US20230134506A1 (en) System and method for managing vm images for high-performance virtual desktop services
Liu Rethinking the Programming Interface in Future Heterogeneous Computers
CN118193425A (en) CXL memory device, computing system and data processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant