WO2024082985A1 - 一种安装有加速器的卸载卡 - Google Patents

一种安装有加速器的卸载卡 Download PDF

Info

Publication number
WO2024082985A1
WO2024082985A1 PCT/CN2023/123519 CN2023123519W WO2024082985A1 WO 2024082985 A1 WO2024082985 A1 WO 2024082985A1 CN 2023123519 W CN2023123519 W CN 2023123519W WO 2024082985 A1 WO2024082985 A1 WO 2024082985A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
accelerator
processed
cpu
data processing
Prior art date
Application number
PCT/CN2023/123519
Other languages
English (en)
French (fr)
Inventor
张争宪
Original Assignee
杭州阿里云飞天信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州阿里云飞天信息技术有限公司 filed Critical 杭州阿里云飞天信息技术有限公司
Publication of WO2024082985A1 publication Critical patent/WO2024082985A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • the embodiments of the present specification relate to the field of data processing technology, and in particular to an offload card equipped with an accelerator.
  • CPUs produced by various manufacturers are used, and the CPUs produced by various manufacturers may be installed with different types of accelerators or no accelerators, resulting in uneven CPU performance, making it difficult to regulate CPU resources.
  • an embodiment of the present specification provides an offload card with an accelerator installed.
  • One or more embodiments of the present specification also relate to a data processing method, a data processing device, a data processing system, a computer-readable storage medium and a computer program to solve the technical defects existing in the prior art.
  • an offload card with an accelerator installed therein wherein:
  • the offloading card is configured to receive a data processing request, wherein the data processing request carries data to be processed;
  • the data to be processed is sent to the accelerator, and the data processing result obtained by the accelerator is fed back.
  • a data processing method which is applied to an offload card equipped with an accelerator, and the method includes:
  • the data to be processed is The data is sent to the accelerator, and the data processing result obtained by the accelerator is fed back.
  • a data processing system comprising a CPU, a memory, and an offload card with an accelerator installed, wherein:
  • the offloading card is configured to receive a data processing request, wherein the data processing request carries data to be processed, and when it is determined that the data type of the data to be processed meets the processing condition of the offloading card, process the data to be processed, and feed back the obtained data processing result to the memory; or
  • the data to be processed is sent to the accelerator, and the data processing result obtained by the accelerator is fed back to the memory;
  • the CPU is configured to obtain the data processing result from the memory.
  • a data processing device which is applied to an offload card equipped with an accelerator, and the device includes:
  • a receiving module is configured to receive a data processing request, wherein the data processing request carries data to be processed
  • a first processing module is configured to process the data to be processed and feed back the obtained data processing result when it is determined that the data type of the data to be processed meets the unloading card processing condition;
  • the second processing module is configured to send the data to be processed to the accelerator and feed back the data processing result obtained by the accelerator when it is determined that the data type of the data to be processed meets the accelerator processing condition.
  • a computer-readable storage medium which stores computer-executable instructions, and when the instructions are executed by a processor, the steps of the data processing method applied to the above-mentioned unloading card are implemented.
  • a computer program is provided, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the above-mentioned data processing method applied to the above-mentioned uninstall card.
  • An embodiment of the present specification provides an offload card with an accelerator installed, wherein the offload card is configured to receive a data processing request, wherein the data processing request carries data to be processed; when it is determined that the data type of the data to be processed meets the processing conditions of the offload card, the data to be processed is processed and the obtained data processing result is fed back; or when it is determined that the data type of the data to be processed meets the processing conditions of the accelerator, the data to be processed is sent to the accelerator and the data processing result obtained by the accelerator is fed back.
  • the offload card with an accelerator installed provided in this specification avoids the problem of uneven CPU performance and difficulty in regulating CPU resources due to the CPU having different types of accelerators installed or not installing an accelerator, by installing the accelerator on the offload card.
  • the offload card determines that the data processing request carries the data type of the data to be processed and meets the accelerator processing conditions, the data to be processed is sent to the accelerator and the data processing result obtained by the accelerator is fed back, thereby achieving the purpose of improving the CPU performance and reducing the CPU pressure.
  • FIG1 is a structural diagram of a CPU solution with an accelerator provided by an embodiment of the present specification
  • FIG2 is an application diagram of a CPU solution with an accelerator provided by an embodiment of the present specification
  • FIG3 is a structural diagram of an offload card + CPU solution provided by an embodiment of this specification.
  • FIG4 is an application diagram of an offload card + CPU solution provided by an embodiment of this specification.
  • FIG5 is a schematic diagram of an application of an offload card equipped with an accelerator provided by an embodiment of the present specification
  • FIG6 is a schematic diagram of an application of an uninstall card provided by an embodiment of this specification.
  • FIG. 7 is a schematic diagram of the interaction between an offload card equipped with an accelerator and a CPU provided by an embodiment of the present specification
  • FIG8 is a schematic diagram of an application of an offload card equipped with an accelerator provided by an embodiment of the present specification
  • FIG9 is a flow chart of a data processing method provided by an embodiment of the present specification.
  • FIG10 is a schematic diagram of the structure of a data processing device provided by an embodiment of this specification.
  • FIG. 11 is a schematic diagram of the structure of a data processing system provided by an embodiment of this specification.
  • first, second, etc. may be used to describe various information in one or more embodiments of this specification, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • the first may also be referred to as the second, and similarly, the second may also be referred to as the first.
  • word "if” as used herein may be interpreted as "at the time of” or "when” or "in response to determining”.
  • Offload card a chip. This chip is a dedicated processor designed for cloud data centers, specifically used to connect server hardware and virtualized resources on the cloud; it can replace the CPU to become the control and acceleration center of cloud computing.
  • the offload card can be a chip, such as a cloud native chip, or a processor.
  • Cloud native chips From the perspective of computing chips, cloud computing has brought new application scenarios, which has put forward new requirements for CPUs. Cloud native chips are special chips used in cloud computing scenarios to replace CPUs. The cloud native chips include offload cards.
  • Accelerator hardware accelerators for various specific applications, used to replace the inefficient operation of software on the CPU, greatly improving the performance of specific applications while releasing CPU computing power for other general loads.
  • the accelerator includes but is not limited to AMX, AI accelerators, ML engines, HPC accelerators, security coprocessors, GPUs and other accelerators.
  • AI accelerators are a class of specialized hardware accelerators or computer systems designed to accelerate artificial intelligence applications, especially artificial neural networks, machine vision, and machine learning.
  • ML Engine Machine Learning (ML) engine.
  • AI refers to Artificial Intelligence (AI).
  • AMX The full name is Advanced Matrix Extension, which is a matrix operation programming framework designed to accelerate machine learning workloads.
  • HPC accelerator The full name is High Performance Computing accelerator. Generally refers to high-energy computing accelerators that are used to process data at high speed and perform complex calculations.
  • Security coprocessor An independent hardware module added in addition to the CPU core to handle security key management, key generation, encryption and decryption, etc.
  • GPU refers to graphics processing unit (abbreviated as GPU).
  • CPU die refers to the core of the CPU, which is the most important component of the CPU.
  • Offload card In cloud scenarios, in order to improve the processing speed of input/output (I/O) services, operators can offload some I/O services in the server to low-cost heterogeneous hardware for execution, which can release the server's central processing unit (CPU) resources and improve the CPU's operating efficiency.
  • These heterogeneous hardware used to offload I/O data are usually called offload cards; the offload card can be a single Peripheral Component Interconnect Express (PCIe) card, which establishes a PCIe channel with the server.
  • PCIe Peripheral Component Interconnect Express
  • the PCIe channel is mainly used for communication of I/O services.
  • IDC Internet Data Center
  • IDC Internet Data Center
  • complete equipment including high-speed Internet access bandwidth, high-performance local area network, safe and reliable computer room environment, etc.
  • professional management and complete application service platform.
  • CPU socket processor socket.
  • RAM Random Access Memory, abbreviation: RAM: refers to random access memory, also known as memory, main memory.
  • DMA Direct Memory Access
  • IAAS Intelligent as a Service
  • IT infrastructure As a service through the Internet and charges users based on their actual usage or occupation of resources.
  • RoCE The full name is RDMA over Converged Ethernet, which is a network protocol that allows applications to communicate over Ethernet. RoCE currently has two protocol versions, v1 and v2. RoCE v1 is a link layer protocol that allows any two hosts in the same broadcast domain to directly access each other. RoCE v2 is an Internet layer protocol that can implement routing functions.
  • IB InfiniBand: Literally translated as "infinite bandwidth” technology, abbreviated as IB) is a computer network communication standard for high-performance computing. It has extremely high throughput and extremely low latency and is used for data interconnection between computers. InfiniBand is also used as a direct or switched interconnection between servers and storage systems, as well as interconnection between storage systems.
  • CPU central processing unit
  • Moore's Law there are currently two routes for CPU development: one is to integrate hardware accelerators inside the CPU die to continuously improve single-core performance; the other is to improve core density, while single-core performance improves slowly.
  • cloud vendors will use CPUs from multiple vendors at the same time, and these CPUs have different acceleration capabilities, for example, AI engines are only available on CPUs from specific vendors, not on other CPU platforms. Therefore, after building a CPU resource pool based on CPUs with different acceleration capabilities, this heterogeneous CPU resource pool is not very friendly to cloud native services, making it difficult to regulate CPU resources.
  • the current offload card (which can be a chip or a processor) only supports general network traffic, storage traffic offloading, and general encryption and decryption capabilities. It has limited acceleration capabilities for AI (Artificial Intelligence), HPC (High Performance Computing), and ML (Machine Learning), and still needs to rely on CPU computing power and accelerators on the CPU to complete.
  • AI Artificial Intelligence
  • HPC High Performance Computing
  • ML Machine Learning
  • this specification provides four solutions to seek solutions to the above problems, wherein the first solution is: a CPU solution with an accelerator.
  • offline IDC Since the current CPU is actually the product of the development of offline IDC, offline IDC is generally a single customer and does not have diverse CPU requirements. Therefore, only one CPU can be selected to support data processing work. Then, for vertical services such as AI, ML, HPC, etc., using a CPU with a hardware accelerator is a more preferred solution.
  • FIG. 1 is a schematic diagram of the structure of a CPU solution with an accelerator provided by an embodiment of this specification, wherein accelerators 1 and accelerator 2 are installed on the CPU. And the CPU is connected to the offload card via PCIe.
  • the offload card includes an offload card control panel for controlling various operations of the offload card, and the offload card also includes a hardware forwarding module for forwarding data to the CPU.
  • the CPU also includes a core (processor core), L3 cache (three-level cache), and IMC (CPU integrated memory controller).
  • the CPU is connected to the RAM.
  • FIG. 2 is an application diagram of a CPU solution with an accelerator provided by an embodiment of the present specification, wherein the CPU Die is installed in a CPU socket (processor slot), and an ML engine (an accelerator) is installed on the CPU.
  • the ML engine is connected to the memory, and the offload card is connected to the memory; based on this, the data obtained by the ML engine in server A will be stored in the memory connected thereto, and the data in the memory will be transmitted to the offload card of server B through the offload card.
  • the offload card of server B After receiving the data, the offload card of server B will store the data in the memory so that the ML engine in server B can obtain the data sent by server A from the memory.
  • traffic data is transmitted through the network card. Send and receive, and then directly DMA data to the system memory, after which the hardware accelerator (ML engine) in the CPU can directly process the data, freeing up CPU computing power.
  • ML engine hardware accelerator
  • the second solution is: CPU + heterogeneous chip solution.
  • the advantage of this solution is that it has excellent performance and is suitable for complex heavy-load vertical scenario services; however, the disadvantage is that the cost of implementing this solution is high, which is not friendly to light-load vertical services.
  • the third solution is: uninstall card + CPU solution.
  • the offload card since the offload card only has ordinary I/O traffic offloading capabilities and general encryption and decryption, compression and decompression functions, and does not have the hardware accelerator required for vertical scenarios, this type of data can only be processed by software in the CPU, with extremely low efficiency and performance.
  • the fourth solution is: CPU solution that does not support hardware accelerator.
  • This solution requires the use of an external PCIe accelerator card such as a GPU. After the network card sends and receives packets, it directly DMAs the data to the system memory, and then the CPU moves the data from the system memory to the GPU memory through PCIe for processing.
  • an external PCIe accelerator card such as a GPU.
  • FIG 3 is a schematic diagram of the structure of an offload card + CPU solution provided by an embodiment of this specification, wherein the CPU is not equipped with an accelerator, and the CPU and the offload card are connected via PCIe.
  • the CPU and the offload card please refer to the corresponding explanation of Figure 1. No further details will be given here.
  • FIG 4 is an application diagram of an offload card + CPU solution provided by an embodiment of this specification.
  • the GPU in server A will store the data in the GPU memory during data processing; the CPU will move the data from the GPU memory to the system memory through PCIe, and send the data to server B through the network card (i.e., the offload card); after receiving the data packet, the network card (offload card) of server B directly DMAs the data to the system memory, and then the CPU moves the data from the system memory to the GPU memory through PCIe for processing.
  • the network card i.e., the offload card
  • an offload card with an accelerator installed is provided.
  • This specification also involves a data processing method, a data processing device, a data processing system, a computer-readable storage medium and a computer program, which are described in detail one by one in the following embodiments.
  • FIG. 5 shows an application schematic diagram of an offload card equipped with an accelerator according to an embodiment of the present specification, wherein the offload card is configured to receive a data processing request, wherein the data processing request Carrying data to be processed; when it is determined that the data type of the data to be processed meets the processing conditions of the unloading card, processing the data to be processed and feeding back the obtained data processing results; or when it is determined that the data type of the data to be processed meets the processing conditions of the accelerator, sending the data to be processed to the accelerator and feeding back the data processing results obtained by the accelerator.
  • the offload card is configured to receive a data processing request, wherein the data processing request Carrying data to be processed; when it is determined that the data type of the data to be processed meets the processing conditions of the unloading card, processing the data to be processed and feeding back the obtained data processing results; or when it is determined that the data type of the data to be processed meets the processing conditions of the accelerator, sending the data to be processed to the accelerator and feeding back the data processing results obtained by the accelerator.
  • the offload card with an accelerator installed provided in this manual can be used in all computing products of the IAAS type in the cloud computing field, including but not limited to: ECS (cloud server, full name Elastic Compute Service), containers, serverless (serverless computing architecture, full name Serverless computing), microservices, etc.
  • ECS cloud server, full name Elastic Compute Service
  • containers serverless (serverless computing architecture, full name Serverless computing), microservices, etc.
  • the data processing request can be understood as a request that needs to be processed by the offload card.
  • the data processing request can be an AI calculation request, an image rendering request, a machine learning request, an I/O traffic offload request, or a general encryption and decryption request, etc.
  • the offload card provided in this specification can be a network card for performing packet sending and receiving operations. Based on this, the offload card can receive data processing requests.
  • the data to be processed can be understood as data that needs to be processed.
  • the data to be processed can be a picture to be rendered; for example, when the data processing request is an I/O traffic offloading request, the data to be processed can be I/O traffic data that needs to be offloaded.
  • the data processing result can be understood as the processing result obtained after the accelerator or the offload card processes the data to be processed.
  • the data processing result can be a rendered image, that is, an image rendering result.
  • the data type may be understood as data that uniquely identifies a type of data to be processed. For example, when the data to be processed is a picture to be rendered, the data type is a picture type.
  • An accelerator can be understood as a hardware device that reduces the amount of CPU computation and implements computational acceleration for the CPU, including but not limited to any two types of accelerators such as artificial intelligence accelerators, machine learning accelerators, graphics processing accelerators, data security accelerators, and computing accelerators.
  • the artificial intelligence accelerator refers to a special hardware accelerator or computer system designed to accelerate the application of artificial intelligence, for example, an AI accelerator.
  • a machine learning accelerator refers to an accelerator used to accelerate machine learning workloads or processing efficiency.
  • ML engines, AMX AMX.
  • a graphics processing accelerator refers to a microprocessor that specializes in image and graphics-related computing work, for example, the graphics processing accelerator can be a graphics processing unit (GPU).
  • a data security accelerator refers to a device that handles tasks such as security key management, key generation, encryption and decryption, for example, a security coprocessor.
  • a computing accelerator refers to an accelerator that processes data at high speed and performs complex calculations, for example, an HPC accelerator.
  • the data types processed by the accelerator include artificial intelligence type, machine learning type, graphics type, data security type, and data computing type.
  • the artificial intelligence type can be understood as the data type corresponding to the artificial intelligence data that supports the implementation of artificial intelligence
  • the graphics type can be understood as the types of various graphics and images, such as jpg, png, etc.
  • the machine learning type can be understood as the training data set type, machine learning model type, etc. in the machine learning field.
  • the data computing type can be understood as the data set type that requires a large amount of data computing in the data computing field.
  • the data security type can be understood as the type of data that needs to be decrypted, encrypted, etc. type of data.
  • the data type meeting the processing condition of the offload card can be understood as the data type of the to-be-processed data matches the data type that can be processed by the offload card.
  • the offloading card may be configured with a data type determination strategy for the data to be processed.
  • the data type determination strategy can be used to determine the corresponding data type for the data to be processed carried in the data processing request.
  • the offloading card can receive various types of data processing requests; for example, image processing requests and machine learning requests. And the above requests can all carry image data; in this case, how to process the data to be processed becomes a problem that needs to be solved.
  • the offloading card can be pre-configured with an association relationship between the data processing request and the data type, and the association relationship can be stored in a table. For example, there is an association relationship between the image rendering request and the graphic type.
  • the offloading card receives the image rendering request, it can determine that the data type of the image to be rendered carried in the image rendering request is a graphic type based on the association relationship. Subsequently, the data to be rendered is sent to the graphics processor based on the graphic type, instead of sending the image to be rendered to other accelerators for processing. That is to say, the corresponding data type can be determined for the data to be processed carried in the data processing request according to the data processing request; it can also be understood that the corresponding data type is determined for the data to be processed carried in the data processing request according to the request type of the data processing request.
  • the unloading card provided by the specification with an accelerator installed can be configured on a server and can receive a data processing request, which can be sent by other servers, and the data processing request carries data to be processed; after the unloading card receives the data processing request, it will determine whether to process the data to be processed by itself or to process the data to be processed by the accelerator installed on the unloading card. Based on this, when the unloading card determines that the data type of the data to be processed meets the processing conditions of the unloading card, it will process the data to be processed through the hardware modules of the processor and storage medium configured by itself, and feedback the obtained data processing results.
  • the unloading card is installed with an accelerator, the unloading card itself can still realize the I/O flow unloading capability, general encryption and decryption, compression and decompression functions, etc. Based on this, after receiving the data processing request carrying the data to be processed, the unloading card will determine the data type of the data to be processed; when it is determined that the data type matches the data type processed by itself, it is determined that the data to be processed is the data that needs to be processed by itself, so the data to be processed is processed, and the obtained data processing results are fed back to the CPU, thereby reducing the processing pressure of the CPU and improving the performance of the CPU.
  • the data to be processed will be sent to the accelerator, which will process it, obtain the data processing result of the accelerator, and feed back the data processing result.
  • the data type meets the accelerator processing conditions can be understood as the data type of the data to be processed is consistent with the data type processed by the accelerator.
  • the offload card itself has IO traffic offload capability and general encryption and decryption, compression and decompression functions, which are implemented by hardware modules such as the processor and storage medium of the offload card itself.
  • the offload card with an accelerator installed provided in this specification can install the accelerator on the offload card, so that the offload card not only has IO traffic offload capability, but also has a plurality of other functions.
  • other capabilities can be realized through an accelerator.
  • the offload card can realize the artificial intelligence acceleration function based on the artificial intelligence accelerator; if the accelerator is a graphics processor (GPU), then the offload card equipped with the graphics processor can realize graphics and image processing based on the graphics processor.
  • the accelerator is an artificial intelligence accelerator
  • the offload card equipped with the graphics processor can realize graphics and image processing based on the graphics processor.
  • the offload card when the offload card receives an image rendering request carrying an image to be rendered, the offload card will determine, based on the data type of the image to be rendered (i.e., image type), that the image to be rendered needs to be processed by the graphics processor installed on the offload card. Therefore, the offload card sends the image to be rendered to the graphics processor, obtains the image rendering result obtained after the graphics processor renders the image to be rendered, and feeds back the image rendering result.
  • image type the data type of the image to be rendered
  • the offload card determines the IO flow data to be processed by itself based on the data type of the IO flow data (i.e., IO flow type), and processes the IO flow data by itself, and does not send it to the accelerator.
  • the offload card then feeds back the data processing result of the IO flow data.
  • the architecture diagram of the offload card in the cloud computing scenario can be seen in Figure 6, which is an application schematic diagram of an offload card provided by an embodiment of this specification.
  • the offload card can be connected to a network card, a storage medium, a heterogeneous chip, a CPU and a GPU.
  • the offload card can perform network acceleration for the network card by connecting to the network card, wherein the network card can be a RDAM network card; the offload card can perform storage acceleration for the storage medium by connecting to the storage medium, wherein the storage medium can be an SSD hard disk.
  • the offload card can perform computing acceleration on the computing device by connecting to computing devices such as heterogeneous chips, CPUs and GPUs.
  • server hardware can be better utilized and more virtualized resources can be obtained.
  • the operating system to which the offload card is connected can more efficiently complete the work of virtualized resource orchestration and scheduling; at the hardware level, the offload card can quickly manage the physical devices in the data center, and accelerate the network and storage hardware, avoid the waste of CPU computing power, and enhance network and storage performance.
  • the offload card can be a chip, such as a cloud native chip; it can also be a processor designed specifically for a cloud data center.
  • accelerators such as AMX accelerator, AI accelerator, ML engine, HPC accelerator, security coprocessor or GPU
  • the target accelerator for processing the data packet is selected from the accelerators installed on the offload card, and the data in the process is only processed by the target accelerator installed on the network card (offload card), and the final processing result is returned to the system memory for the CPU to do final processing.
  • the offload card is provided with a processor and a memory, and the processor and the memory are used to realize the capabilities of the offload card itself, such as I/O traffic offloading capability, general encryption and decryption, compression and decompression functions.
  • the processor can also send the data to be processed to the accelerator or CPU when the data to be processed is data that needs to be processed by the accelerator or CPU; the processor can also obtain the data processing results of the accelerator.
  • a control unit can be installed in the offload card, and the control unit can determine whether the data to be processed carried in the data processing request needs to be processed by the offload card, accelerator or CPU, and send the data to be processed.
  • accelerator or CPU of the offload card for processing.
  • At least two accelerators can be installed on the offload card provided in this specification, and the data types processed by the accelerators are the same.
  • at least two graphics processors are installed on the offload card, and the graphics processors can process images and graphic data, so that the accelerators can process the data to be processed, reduce the processing pressure of the CPU, and improve the performance of the CPU.
  • the offload card is also configured to send the data to be processed to a target accelerator and to feed back the data processing result obtained by the target accelerator, wherein the target accelerator is one of the at least two accelerators, and the data type processed by the target accelerator is the same as the data type of the data to be processed.
  • accelerators include but are not limited to any two types of accelerators such as artificial intelligence accelerators, machine learning accelerators, graphics processing accelerators, data security accelerators, and computing accelerators.
  • At least two accelerators can be installed on the unloading card provided in this specification, but the data types processed by the accelerators may be different.
  • an image processor and an artificial intelligence accelerator are installed on the unloading card.
  • the unloading card needs to determine the corresponding target accelerator based on the data type of the data to be processed.
  • the target accelerator is one of the at least two accelerators, and the data type processed by the target accelerator is consistent with the data type of the data to be processed, so that the accelerator is used to process the data to be processed, thereby reducing the processing pressure of the CPU and improving the performance of the CPU.
  • the offloading card with the accelerator installed is further configured to determine a data storage unit corresponding to the target accelerator, wherein the data storage unit stores a data processing result obtained by the target accelerator, and the data processing result is obtained by the target accelerator processing the data to be processed;
  • the data processing result is obtained from the data storage unit, and the data processing result is fed back.
  • the data storage unit may be understood as corresponding to the target accelerator and used to store the data required by the target accelerator during data processing, as well as the data processing result of the target accelerator.
  • the target accelerator is a GPU
  • the data storage unit may be understood as a GPU memory.
  • the offload card after the offload card sends the image to be rendered in the image rendering request to the GPU, it can obtain the image rendered by the GPU from the GPU memory corresponding to the GPU, and feed the rendered image back to the CPU for subsequent processing.
  • the target accelerator installed on the offload card can share the computing pressure of the CPU and avoid the problem of scheduling difficulties caused by uneven acceleration capabilities of the CPU.
  • the offload card can store the data to be rendered in the GPU memory, and instruct the GPU to obtain the data to be rendered from the GPU memory for rendering.
  • the offload card equipped with the accelerator communicates with the CPU
  • the offload card is further configured to feed back the data processing result to the CPU.
  • the offload card processes the data in the data processing process only through the offload card or the accelerator, and returns the final processing result to the CPU for final processing, thereby reducing the calculation pressure of the CPU.
  • the accelerator installed on the original CPU is not used and is in a closed state, thereby ensuring uniform CPU performance in cloud computing scenarios and avoiding scheduling difficulties caused by uneven CPU acceleration capabilities.
  • the offload card equipped with an accelerator is further configured to determine a memory corresponding to the CPU and store the data processing result in the memory so that the CPU obtains the data processing result from the memory.
  • the offload card processes the data in the data processing process only through the offload card or accelerator, and returns the final processing result to the system memory.
  • the CPU can obtain the data processed by the offload card or accelerator from the system memory and perform final processing on it, reducing the computing pressure of the CPU.
  • the uninstall card with an accelerator installed therein wherein:
  • the offload card is further configured to determine a memory corresponding to the CPU, store the data processing result in the memory, and send storage information of the data processing result in the memory to the CPU, so that the CPU obtains the data processing result from the memory based on the storage information; or
  • the offload card is further configured to determine the memory corresponding to the CPU, and store the data processing result in a preset storage area in the memory, so that the CPU obtains the data processing result from the preset storage area in the memory.
  • the storage information can be understood as the storage location of the data processing results in the memory;
  • the preset storage area can be understood as a pre-set area in the memory, which is dedicated to storing the data processing results provided by the offload card to the CPU; the CPU can check the area periodically and obtain the newly written data processing results from the area.
  • the offload card needs to determine the memory corresponding to the CPU, store the data processing results in the memory, and send the storage information of the data processing results in the memory to the CPU; after receiving the storage information, the CPU can obtain the data processing results from the memory based on the storage information and perform subsequent processing.
  • the unload card needs to determine the memory corresponding to the CPU and the preset storage area in the memory for data transmission with the CPU, and store the data processing results in the preset storage area in the memory; the CPU can obtain the data processing results from the preset storage area in the memory and perform subsequent processing.
  • the offload card reduces the processing pressure of the CPU by providing the data processing results to the CPU.
  • the offload card equipped with the accelerator communicates with the CPU
  • the offloading card is further configured to, when it is determined that the data type of the to-be-processed data meets the CPU processing condition, Next, the data to be processed is sent to the CPU.
  • the data type meeting the CPU processing condition can be understood as the data type being consistent with the data type processed by the CPU, or the data type being different from the data type processed by the offload card and the data type processed by the accelerator.
  • the offload card and the accelerator installed on the offload card are used to share the processing pressure of the CPU; the image processing, I/O traffic offloading, data encryption and decryption and other functions of the original CPU are realized through the offload card and the accelerator installed on the offload card; so that the CPU can process more important requests, such as user requests, web (World Wide Web, abbreviated as web, that is, global wide area network, also known as the World Wide Web) requests, etc.
  • web World Wide Web, abbreviated as web, that is, global wide area network, also known as the World Wide Web
  • the offload card determines that the data to be processed needs to be processed by the CPU during the process of receiving the data packet, the data to be processed is sent to the CPU, thereby ensuring the smooth operation of the CPU.
  • the offload card having the accelerator installed therein is further configured to determine the data type of the data to be processed, and the data type processed by the at least two accelerators;
  • An accelerator among the at least two accelerators that processes the data type of the data to be processed is determined as a target accelerator.
  • the unloading card after receiving a data processing request carrying data to be processed, the unloading card will determine the data type of the data to be processed and the data type processed by each of the at least two accelerators, and match the two.
  • the data type processed by the at least two accelerators matches the data type of the data to be processed, it is determined that the data type of the data to be processed meets the accelerator processing conditions, and the data to be processed needs to be sent to the accelerator for processing.
  • the accelerator that processes the data type of the data to be processed is determined from the at least two accelerators, and is determined as the target accelerator. This achieves accurate allocation of the data to be processed to the corresponding accelerator for processing, thereby improving data processing efficiency.
  • the offload card with an accelerator installed avoids the problem of uneven CPU performance and difficulty in regulating CPU resources due to the CPU having different types of accelerators installed or not having an accelerator installed, by installing at least two accelerators on the offload card; and, when the offload card determines that the data processing request carries the data type of the data to be processed and meets the accelerator processing conditions, the data to be processed is sent to the target accelerator, and the data processing result obtained by the target accelerator is fed back, thereby achieving the purpose of improving the CPU performance and reducing the CPU pressure.
  • Figure 7 shows a schematic diagram of the interaction between an offload card equipped with an accelerator and a CPU according to an embodiment of the present specification.
  • Figure 8 shows an application schematic diagram of an offload card equipped with an accelerator according to an embodiment of the present specification.
  • the explanation of the CPU and the offload card in Figure 7 can refer to the corresponding or corresponding content in the above explanation of Figure 1.
  • the offload card equipped with an accelerator provided in this specification can sink accelerators such as AMX, AI accelerator, ML engine, HPC accelerator, security coprocessor, GPU, etc. to the offload card.
  • server A communicates with server B through the offload card, wherein the offload card of server A communicates with server B.
  • the offload cards of the network card B can communicate with each other through RoCE and InfiniBand. Based on this, when the offload card sends and receives packets, the data will be processed by at least two accelerators installed on the offload card, so that data processing can be completed only on the network card (i.e., the offload card), and finally the data processing results are returned to the system memory for the CPU to do the final processing, which has the advantage of short data links and supports different CPU platforms.
  • this solution is a method to achieve universal cloud native chip acceleration by sinking some hardware accelerators on CPUs of various architectures to offload cards represented by offload cards, such as: AMX accelerators, AI accelerators, ML engines, HPC accelerators, security coprocessors, GPUs (CPU manufacturers will consider embedding micro GPUs in CPUs in the future), etc.
  • Cloud native CPU chips remove these hardware accelerators, thereby subtracting the CPU of the HOST (host machine), focusing on providing high computing power, and truly achieving a data-centric computing architecture. Data is processed wherever it is, which will bring a series of benefits, including: unifying the acceleration capabilities of different CPU platforms; eliminating CPU-centric traffic detours and resource consumption.
  • FIG. 9 shows a flow chart of a data processing method provided according to an embodiment of the present specification, which specifically includes the following steps.
  • Step 902 Receive a data processing request, wherein the data processing request carries data to be processed.
  • Step 904 When it is determined that the data type of the data to be processed meets the unloading card processing condition, the data to be processed is processed and the obtained data processing result is fed back.
  • Step 906 When it is determined that the data type of the data to be processed meets the processing condition of the accelerator, the data to be processed is sent to the accelerator, and the data processing result obtained by the accelerator is fed back.
  • the data processing method provided in the present specification and applied to an offload card equipped with an accelerator avoids the problem of uneven CPU performance and difficulty in regulating CPU resources due to the CPU having different types of accelerators installed or not having an accelerator installed; and, when the offload card determines that the data processing request carries the data type of the data to be processed and meets the accelerator processing conditions, the data to be processed is sent to the accelerator, and the data processing result obtained by the accelerator is fed back, thereby achieving the purpose of improving the CPU performance and reducing the CPU pressure.
  • the above is a schematic scheme of a data processing method of this embodiment. It should be noted that the technical scheme of the data processing method and the technical scheme of the above-mentioned unloading card equipped with an accelerator belong to the same concept, and the details not described in detail in the technical scheme of the data processing method can be referred to the description of the technical scheme of the above-mentioned unloading card equipped with an accelerator.
  • Figure 10 shows a schematic diagram of the structure of a data processing device provided by one embodiment of this specification.
  • the device is applied to an unloading card equipped with an accelerator, and the device includes:
  • the receiving module 1002 is configured to receive a data processing request, wherein the data processing request carries data to be processed;
  • the first processing module 1004 is configured to determine that the data type of the to-be-processed data meets the unloading card processing condition In the case of a fault, the data to be processed is processed, and the obtained data processing result is fed back; or
  • the second processing module 1006 is configured to send the data to be processed to the accelerator and feed back the data processing result obtained by the accelerator when it is determined that the data type of the data to be processed meets the accelerator processing condition.
  • the data processing device provided in the present specification is applied to an offload card installed with an accelerator.
  • an offload card installed with an accelerator.
  • the offload card determines that the data processing request carries the data type of the data to be processed and meets the accelerator processing conditions, the data to be processed is sent to the accelerator, and the data processing result obtained by the accelerator is fed back, thereby achieving the purpose of improving the CPU performance and reducing the CPU pressure.
  • the above is a schematic scheme of a data processing device of this embodiment. It should be noted that the technical scheme of the data processing device and the technical scheme of the above-mentioned unloading card with an accelerator are of the same concept, and the details not described in detail in the technical scheme of the data processing device can be referred to the description of the technical scheme of the above-mentioned unloading card with an accelerator.
  • FIG11 shows a schematic diagram of the structure of a data processing system provided by an embodiment of this specification.
  • the system includes a CPU 1102, a memory 1104, and an offload card 1108 equipped with an accelerator 1106, wherein:
  • the offloading card 1108 is configured to receive a data processing request, wherein the data processing request carries data to be processed, and when it is determined that the data type of the data to be processed meets the offloading card processing condition, process the data to be processed, and feed back the obtained data processing result to the memory 1104; or
  • the data to be processed is sent to the accelerator 1106, and the data processing result obtained by the accelerator 1106 is fed back to the memory 1104;
  • the CPU 1102 is configured to obtain the data processing results from the memory 1104.
  • the data processing system avoids the problem of uneven CPU performance and difficulty in regulating CPU resources due to the CPU having different types of accelerators installed or not having an accelerator installed, by installing an accelerator on an offload card; and, when the offload card determines that the data processing request carries the data type of the data to be processed and meets the accelerator processing conditions, the data to be processed is sent to the accelerator, and the data processing results obtained by the accelerator are fed back to the memory, so that the CPU can obtain the data processing results from the memory, thereby achieving the purpose of improving the CPU performance and reducing the CPU pressure.
  • the above is a schematic scheme of a data processing system of this embodiment. It should be noted that the technical scheme of the data processing system and the technical scheme of the above-mentioned unloading card with an accelerator are of the same concept, and the details not described in detail in the technical scheme of the data processing system can be referred to the description of the technical scheme of the above-mentioned unloading card with an accelerator.
  • An embodiment of the present specification also provides a computer-readable storage medium storing computer-executable instructions, which, when executed by a processor, implement the steps of a data processing method applied to an uninstall card.
  • the above is a schematic scheme of a computer-readable storage medium of this embodiment. It should be noted that the technical scheme of the storage medium and the technical scheme of the data processing method applied to the uninstall card belong to the same concept, and the details not described in detail in the technical scheme of the storage medium can be referred to the description of the technical scheme of the data processing method applied to the uninstall card.
  • An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the above-mentioned data processing method applied to the uninstall card.
  • the computer instructions include computer program codes, which may be in source code form, object code form, executable files or some intermediate forms, etc.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本说明书实施例提供一种安装有加速器的卸载卡,其中所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且达到提高CPU的性能,降低CPU压力的目的。

Description

一种安装有加速器的卸载卡
本申请要求于2022年10月17日提交中国专利局、申请号为202211268202.5、申请名称为“一种安装有加速器的卸载卡”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书实施例涉及数据处理技术领域,特别涉及一种安装有加速器的卸载卡。
背景技术
随着计算机技术的不断发展,为了进一步提高CPU(中央处理器)的性能,降低CPU压力,部分CPU厂商会选择在CPU上安装各种类型的加速器,达到提高CPU性能、降低CPU压力的目的。
但是,在云计算场景下,由于采用了各种厂商所生产的CPU,而各种厂商所生产的CPU会安装不同类型的加速器或者不安装加速器,因此导致CPU性能的参差不齐,从而难以对CPU资源进行调控。
发明内容
有鉴于此,本说明书实施例提供了一种安装有加速器的卸载卡。本说明书一个或者多个实施例同时涉及一种数据处理方法,一种数据处理装置,一种数据处理***,一种计算机可读存储介质以及一种计算机程序,以解决现有技术中存在的技术缺陷。
根据本说明书实施例的第一方面,提供了一种安装有加速器的卸载卡,其中,
所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
根据本说明书实施例的第二方面,提供了一种数据处理方法,应用于安装有加速器的卸载卡,所述方法包括:
接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据 发送至加速器,并反馈所述加速器获得的数据处理结果。
根据本说明书实施例的第三方面,提供了一种数据处理***,所述***包括CPU,内存、以及安装有加速器的卸载卡,其中,
所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据,在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并将获得的数据处理结果反馈至所述内存;或者
在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,将所述加速器获得的数据处理结果反馈至所述内存;
所述CPU,被配置为从所述内存中获取所述数据处理结果。
根据本说明书实施例的第四方面,提供了一种数据处理装置,应用于安装有加速器的卸载卡,所述装置包括:
接收模块,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
第一处理模块,被配置为在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
第二处理模块,被配置为在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
根据本说明书实施例的第五方面,提供了一种计算机可读存储介质,其存储有计算机可执行指令,该指令被处理器执行时实现上述应用于上述卸载卡的数据处理方法的步骤。
根据本说明书实施例的第六方面,提供了一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行上述应用于上述卸载卡的数据处理方法的步骤。
本说明书一实施例提供的一种安装有加速器的卸载卡,其中所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
本说明书提供的安装有加速器的卸载卡,通过将加速器安装在卸载卡上,从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且,卸载卡在确定数据处理请求中携带有待处理数据的数据类型,满足加速器处理条件的情况下,将待处理数据发送至加速器,并反馈加速器获得的数据处理结果,从而达到提高CPU的性能,降低CPU压力的目的。
附图说明
图1是本说明书一个实施例提供的带有加速器的CPU方案的结构图;
图2是本说明书一个实施例提供的带有加速器的CPU方案的应用图;
图3是本说明书一个实施例提供的卸载卡+CPU方案的结构图;
图4是本说明书一个实施例提供的卸载卡+CPU方案的应用图;
图5是本说明书一个实施例提供的一种安装有加速器的卸载卡的应用示意图;
图6是本说明书一个实施例提供的一种卸载卡的应用示意图;
图7是本说明书一个实施例提供的一种安装有加速器的卸载卡与CPU的交互示意图;
图8是本说明书一个实施例提供的一种安装有加速器的卸载卡的应用示意图;
图9是本说明书一个实施例提供的一种数据处理方法的流程图;
图10是本说明书一个实施例提供的一种数据处理装置的结构示意图;
图11是本说明书一个实施例提供的一种数据处理***的结构示意图。
具体实施方式
在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本说明书内涵的情况下做类似推广,因此本说明书不受下面公开的具体实施的限制。
在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书一个或多个实施例范围的情况下,第一也可以被称为第二,类似地,第二也可以被称为第一。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
首先,对本说明书一个或多个实施例涉及的名词术语进行解释。
卸载卡:是一种芯片。该芯片是一种为云数据中心设计的专用处理器,专门用于连接服务器内硬件和云上虚拟化资源;可以替代CPU成为云计算的管控和加速中心,也即是说,该卸载卡可以为一种芯片,例如云原生芯片,也可以为一种处理器。
云原生芯片:从计算芯片的角度而言,云计算带来了全新的应用场景,从而对CPU提出了新的需求,云原生芯片是一种用于云计算场景中、代替CPU的专用芯片。该云原生芯片包括卸载卡。
加速器:各种特定应用的硬件加速器,用来替代软件在CPU上的低效运行,使得特定应用性能得到大幅度提升的同时释放CPU算力给其他通用负载。该加速器包括但不限于AMX、 AI加速器、ML引擎、HPC加速器、安全协处理器、GPU等加速器。
AI加速器:是一类专门的硬件加速器或计算机***旨在加速人工智能的应用,尤其是人工神经网络、机器视觉和机器学习。
ML引擎:机器学习(Machine Learning,简称ML)引擎。
AI:指人工智能(Artificial Intelligence,英文缩写为AI)。
AMX:全称为Advanced Matrix Extension,是一种矩阵运算编程框架,目的是加速机器学习工作负载。
HPC加速器:全称为High Performance Computing加速器。一般指高能计算加速器,用于高速处理数据并执行复杂计算的能力。
安全协处理器:在CPU核心之外额外增加的一个独立的硬件模块,用来处理安全秘钥管理、秘钥生成、加密解密等事务。
GPU:指图形处理器(graphics processing unit,缩写为:GPU)。
CPU die:是指CPU的核心,是CPU最重要的组成部分。
卸载卡:在云场景中,为了提高输入/输出(Input/Output,I/O)服务的处理速度,运营商可以将服务器中的一些I/O服务卸载至低成本的异构硬件中执行,如此可以释放服务器的中央处理器(Central Processing Unit,简称为CPU)资源,提高CPU的运行效率。这些用于卸载I/O数据的异构硬件通常被称为卸载卡;卸载卡可以为单独一张外设部件互连标准(PeripheralComponent Interconnect Express,PCIe)卡,其与服务器之间建立PCIe通道,服务器在处理卸载至卸载卡的I/O服务时,将数据通过PCIe通道传输给卸载卡处理,所述PCIe通道主要用于I/O服务的通信。
IDC一般指互联网数据中心。互联网数据中心(Internet Data Center,简称IDC)是指一种拥有完善的设备(包括高速互联网接入带宽、高性能局域网络、安全可靠的机房环境等)、专业化的管理、完善的应用服务平台。
CPU socket:处理器插槽。
RAM(Random Access Memory,缩写:RAM):指随机存取存储器,又称为内存,主存。
DMA(Direct Memory Access,直接存储器访问):是所有现代电脑的重要特色,它允许不同速度的硬件装置来沟通,而不需要依赖于CPU的大量中断负载。否则,CPU需要从来源把每一片段的资料复制到暂存器,然后把它们再次写回到新的地方。在这个时间中,CPU对于其他的工作来说就无法使用。
IAAS(Infrastructure as a Service),即基础设施即服务。指把IT基础设施作为一种服务通过网络对外提供,并根据用户对资源的实际使用量或占用量进行计费的一种服务模式。
RoCE:全称为RDMA over Converged Ethernet,是一种网络协议,允许应用通过以太 网实现远程内存访问。目前RoCE有两个协议版本,v1和v2。其中RoCE v1是一种链路层协议,允许在同一个广播域下的任意两台主机直接访问。而RoCE v2是一种Internet层协议,即可以实现路由功能。
InfiniBand:直译为“无限带宽”技术,缩写为IB)是一个用于高性能计算的计算机网络通信标准,它具有极高的吞吐量和极低的延迟,用于计算机与计算机之间的数据互连。InfiniBand也用作服务器与存储***之间的直接或交换互连,以及存储***之间的互连。
随着计算机技术的不断发展,CPU(中央处理器)也在不断的进步,但是受限于摩尔定律,当前的CPU发展有2条路线:一个是CPU die内部集成硬件加速器持续提升单核性能;另一个是提升核密度,而单核性能提升较慢。在此情况下,由于云厂商会同时使用多种厂商的CPU,而这些CPU的加速能力不相同,例如AI引擎只在特定厂商的CPU上有,其他CPU平台没有。因此,基于加速能力不同的CPU构建CPU资源池之后,这种异构的CPU资源池对云原生服务不是很友好,导致难以对CPU资源进行调控。
此外,在云原生场景中,当前的卸载卡(该卸载卡可以为一种芯片,或者一种处理器)上仅支持通用的网络流量、存储流量卸载以及通用的加解密等能力,对于像AI(人工智能(Artificial Intelligence),英文缩写为AI)、HPC(高性能计算(High Performance Computing)简称)、ML(机器学习(Machine Learning,简称为ML))等加速能力有限,仍然需要依赖CPU算力,以及CPU上的加速器完成。
针对上述问题,本说明书提供了四种方案以寻求解决上述问题,其中,第一种方案为:带有加速器的CPU方案。
由于当前的CPU实际上是线下IDC发展的产物,线下IDC一般是单一客户,没有多样化的CPU需求,因此可以只选择一款CPU来支撑数据处理工作,那么像AI,ML,HPC等垂直类服务使用带有硬件加速器的CPU就是较为优选的方案。
参见图1,图1是本说明书一个实施例提供的带有加速器的CPU方案的结构示意图,其中,该CPU上安装有加速器1和加速器2。并且该CPU与卸载卡之间通过PCIe连通。该卸载卡包括卸载卡控制面板,用于对卸载卡的各种操作进行控制,同时该卸载卡还包括硬件转发模块,用于将数据转发至CPU。需要说明的是,该CPU中还包括core(处理器核)、L3 cache(三级缓存)、IMC(CPU的集成存储器控制器)。该CPU与RAM连通。
参见图2,图2是本说明书一个实施例提供的带有加速器的CPU方案的应用示意图,其中,该CPU Die安装在CPU socket(处理器插槽)中,并且,该CPU上安装有ML引擎(一种加速器)。该ML引擎与内存连通,该卸载卡与内存连通;基于此,服务器A中ML引擎所获得的数据会存储在与之连通的内存中,并通过卸载卡将内存中的数据传输至服务器B的卸载卡中。该服务器B的卸载卡在接收到数据之后,会将该数据存储在内存中,以便服务器B中的ML引擎能够从该内存中获取服务器A发送的数据。
基于上述图1、图2可知,该支持硬件加速器的CPU方案中,流量数据通过网卡进行 收发,然后直接DMA数据到***内存,之后CPU中的硬件加速器(ML引擎)就可以直接处理数据,释放了CPU算力。
但是,由于云原生时代多样化的客户对多样化的CPU有着非常强烈的诉求,包括x86架构,ARM架构,RISC-V架构等。而这些CPU各有特点,架构差异很大,支持的能力也不完全相同,特别是硬件加速器,可能出现的情况是,厂商A生产的CPU所安装的加速器比较丰富,但厂商B和厂商C所生产的CPU几乎没有加速器。特别在于RISC-V目前处于起步阶段,因此各CPU平台加速能力差异较大。因此,这类垂直类性能在不同CPU平台的性能表现差异巨大,无法为客户提供统一的云原生服务能力。
第二种方案为:CPU+异构芯片方案。该方案的优点在于性能表现优秀,适合复杂的重载垂直场景类服务;但是,缺点是实现该方案的成本居高不下,对于轻载的垂直类服务很不友好。
第三种方案为:卸载卡+CPU方案。
基于此,由于卸载卡上只有普通的I/O流量卸载能力和通用的加密解密,加解压缩功能,并不具备垂直场景所需的硬件加速器,因此该类型数据只能在CPU里进行软件处理,效率和性能极其低下。
第4种方案为:不支持硬件加速器的CPU方案。
该方案需要使用GPU等外置PCIe加速卡实现,网卡收发包后,直接DMA数据到***内存,然后CPU把数据通过PCIe从***内存搬运至GPU内存进行处理。
参见图3,图3是本说明书一个实施例提供的卸载卡+CPU方案的结构示意图,其中,该CPU中并未安装加速器,该CPU与卸载卡之间通过PCIe连通。针对该CPU和卸载卡的解释可参见图1对应的解释。在此不再赘述。
参见图4,图4是本说明书一个实施例提供的卸载卡+CPU方案的应用示意图。该服务器A中的GPU在处理数据的过程中,会将数据存储在GPU内存中;该CPU会通过PCIe从GPU内存中搬运该数据至***内存中,并通过网卡(即卸载卡)将数据发送至服务器B;该服务器B的网卡(卸载卡)在收到数据包之后,直接DMA数据到***内存,然后CPU把数据通过PCIe从***内存搬运至GPU内存进行处理。
基于上述四种方案的缺陷可知,该四种方案并不能完全解决上述技术问题,因此。为了避免异构CPU资源池对云原生服务不是很友好,难以对CPU资源进行调控的问题,亟需提供一种通用的云原生基础架构解决方案,来解决垂直类服务的加速问题。
在本说明书中,提供了一种安装有加速器的卸载卡,本说明书同时涉及一种数据处理方法,一种数据处理装置,一种数据处理***,一种计算机可读存储介质以及一种计算机程序,在下面的实施例中逐一进行详细说明。
参见图5,图5示出了根据本说明书一个实施例提供的一种安装有加速器的卸载卡的应用示意图,其中,所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求 中携带有待处理数据;在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
需要说明的是,本说明书提供的一种安装有加速器的卸载卡,可以应用在云计算领域IAAS类的所有计算类产品中,包括但不限于:ECS(云服务器,全称为Elastic Compute Service)、容器、serverless(无服务器运算架构,全称为Serverless computing)、微服务等。
其中,该数据处理请求可以理解为需要卸载卡进行处理的请求,例如,该数据处理请求可以为AI推算请求、图片渲染请求、机器学习请求、I/O流量卸载请求或者通用的加密解密请求等等,本说明书对此不作具体限制。需要说明的是,本说明书提供的卸载卡可以为一张网卡,用于进行收发包操作,基于此,该卸载卡能够接收到数据处理请求。
待处理数据可以理解为需要进行处理数据,例如,在数据处理请求为图片渲染请求的情况下,该待处理数据可以为待渲染图片;在例如,在数据处理请求为I/O流量卸载请求的情况下,该待处理数据可以为需要进行卸载的I/O流量数据。
数据处理结果可以理解为加速器或者卸载卡对该待处理数据进行处理后,获得的处理结果,例如,该数据处理结果可以为渲染后的图片,也即是图像渲染结果。
数据类型可以理解为唯一标识一种待处理数据类型的数据,例如,在待处理数据为待渲染图片的情况下,该数据类型为图片类型。
加速器可以理解为降低CPU的计算量、实现针对CPU进行计算加速的硬件设备,包括但不限于人工智能加速器、机器学习加速器、图形处理加速器、数据安全加速器、计算加速器等加速器中的任意两种类型的加速器。其中,该人工智能加速器是指一种专门的硬件加速器或计算机***旨在加速人工智能的应用,例如,AI加速器。机器学习加速器是指用于加速机器学习工作负载或处理效率的加速器。例如ML引擎、AMX。图形处理加速器是指一种专门进行图像和图形相关运算工作的微处理器,例如,该图形处理加速器可以为图形处理器(GPU)。数据安全加速器是指处理安全秘钥管理、秘钥生成、加密解密等任务的装置,例如,安全协处理器。计算加速器是指进行高速处理数据并执行复杂计算的加速器,例如,HPC加速器。
需要说明的是,在本说明书提供的一实施例中,所述加速器处理的数据类型包括人工智能类型、机器学习类型、图形类型、数据安全类型、数据计算类型。其中,人工智能类型可以理解为支持人工智能实现的人工智能数据所对应的数据类型;图形类型可以理解为各种图形和图像的类型,例如,jpg、png等。该机器学习类型可以理解为该机器学习领域中训练数据集类型、机器学习模型类型等。该数据计算类型可以理解为数据计算领域中需要进行大量数据计算的数据集类型。该数据安全类型可以理解为需要解密数据类型、加密 数据类型。
该数据类型满足卸载卡处理条件可以理解为该待处理数据的数据类型与卸载卡所能够处理的数据类型相匹配。
此外,在本说明书提供的实施例中,该卸载卡上可以配置有针对待处理数据的数据类型确定策略,当接收到数据处理请求的情况下,能够基于该数据类型确定策略,为该数据处理请求中携带的待处理数据确定对应的数据类型。具体的,该卸载卡能够接收到各种类型的数据处理请求;例如,图像处理请求、机器学习请求。而上述请求中均可以携带图像数据;在此情况下,如何对该待处理数据进行处理成为需要解决的问题。基于此,该卸载卡上可以预先配置有数据处理请求与数据类型之间的关联关系,该关联关系可以通过表的方式进行存储。例如图像渲染请求与图形类型之间具有关联关系。基于此,当卸载卡接收到图像渲染请求之后,能够基于该关联关系,确定该图像渲染请求中携带的待渲染图像的数据类型为图形类型。后续基于该图形类型将该待渲染数据发送至图形处理器,而不是将该待渲染图像发送至其他加速器进行处理。也即是说,可以根据数据处理请求为该数据处理请求中携带的待处理数据确定对应的数据类型;也可以理解为根据数据处理请求的请求类型,为该数据处理请求中携带的待处理数据确定对应的数据类型。
具体的,本说明书提供的安装有加速器的卸载卡,可以被配置在服务器上,并且能够接收到数据处理请求,该数据处理请求可以是其他服务器发送的,并且,该数据处理请求中携带有待处理数据;在卸载卡接收到该数据处理请求之后,会确定是由自身对该待处理数据进行处理,还是通过安装在卸载卡上的加速器对该待处理数据进行处理。基于此,当该卸载卡确定该待处理数据的数据类型满足卸载卡处理条件的情况下,会通过自身配置的处理器、存储介质的硬件模块,对该待处理数据进行处理,并反馈获得的数据处理结果。也即是说,该卸载卡上虽然安装有加速器,但是该卸载卡本身依然能够实现I/O流量卸载能力、通用的加密解密,加解压缩功能等能力,基于此,卸载卡在接收到携带待处理数据的数据处理请求之后,会确定该待处理数据的数据类型;在确定该数据类型与自身所处理的数据类型相匹配的情况下,确定该待处理数据为自身需要处理的数据,因此对该待处理数据进行处理,并将获得的数据处理结果反馈至CPU,从而降低CPU的处理压力,提高CPU的性能。
但是,当确定该待处理数据的数据类型满足加速器处理条件的情况下,会将该待处理数据发送至该加速器,由该加速器对其进行处理,并获得该加速器的数据处理结果,并反馈该数据处理结果。其中,数据类型满足加速器处理条件可以理解为该待处理数据的数据类型与加速器所处理的数据类型为一致。
例如,该卸载卡本身具有IO流量卸载能力和通用的加密解密,加解压缩功能,该功能是通过卸载卡本身的处理器和存储介质等硬件模块实现的。而本说明书提供的安装有加速器的卸载卡,能够将加速器安装在卸载卡上,使得该卸载卡不仅仅具有IO流量卸载能 力和通用的加密解密,加解压缩功能等基础能力,还能够通过加速器实现其他能力,比如,该加速器是人工智能加速器,那么该卸载卡基于该人工智能加速器可以实现人工智能加速功能;该加速器为图形处理器(GPU)的情况下,那么安装该图形处理器的卸载卡基于该图形处理器能够实现图形和图像处理。
基于此,当卸载卡接收到携带有待渲染图像的图像渲染请求的情况下,该卸载卡会基于该待渲染图像的数据类型(即图像类型),确定该待渲染图像需要安装在卸载卡上的图形处理器进行处理,因此,卸载卡将该待渲染图像发送至图形处理器,并获取该图形处理器对该待渲染图像进行渲染处理后获得的图像渲染结果,并反馈该图像渲染结果。
或者,当卸载卡接收到携带有IO流量数据的IO处理请求的情况下,该卸载卡会基于该IO流量数据的数据类型(即IO流量类型),确定该IO流量数据自己进行处理,并由自身对该IO流量数据进行处理,并不会发送至加速器。之后卸载卡将IO流量数据的数据处理结果进行反馈。
在实际应用中,针对该卸载卡在云计算场景下的架构图可以参见图6,图6是本说明书一个实施例提供的一种卸载卡的应用示意图,基于图6可知,该卸载卡能够与网卡、存储介质、异构芯片、CPU以及GPU相连接。该卸载卡通过与网卡相连接,能够针对网卡进行网络加速,其中,该网卡可以为RDAM网卡;该卸载卡通过与存储介质相连接,能够针对存储介质进行存储加速,其中,该存储介质可以为SSD硬盘。同时,该卸载卡通过与异构芯片、CPU以及GPU等计算设备的相连接,能够对该计算设备进行计算加速。也即是说,通过采用卸载卡取代以CPU为核心的架构,能够更好的利用服务器硬件、获取更多虚拟化资源,而且,在软件层面上,该卸载卡所接入的操作***,更高效地完成虚拟化资源编排调度的工作;在硬件层面上,通过卸载卡能快速管理数据中心物理设备,并对网络和存储硬件进行加速,避免CPU算力的浪费,并且增强网络和存储性能。
需要说明的是,本说明书提供的卸载卡可以为一种芯片,例如云原生芯片;也可以为一种处理器,该处理器专为云数据中心设计。
基于此,通过将AMX加速器、AI加速器、ML引擎、HPC加速器、安全协处理器或GPU等加速器安装到卸载卡,在卸载卡接收到数据包之后,从卸载卡上安装的加速器中选择处理该数据包的目标加速器,并将该过程中的数据仅通过网卡(卸载卡)上安装目标加速器进行处理即可,并把最终的处理结果返回给***内存供CPU做最后的处理。
需要说明的是,该卸载卡上按照有处理器以及存储器,该处理器以及存储器用于实现I/O流量卸载能力、通用的加密解密,加解压缩功能等卸载卡自身所具有的能力。需要说明的是,该处理器还可以在待处理数据是需要加速器或CPU处理的数据时,将该待处理数据发送至加速器或CPU;该处理器还能够在获得加速器的数据处理结果。或者,在本说明书提供的一实施例中,该卸载卡中可以安装有控制单元,该控制单元能够确定数据处理请求中携带的待处理数据,需要卸载卡、加速器或者CPU进行处理,并将该待处理数据发送 至卸载卡的处理器、加速器或者CPU进行处理。
在本说明书提供的一实施例中,所述加速器为至少两个,至少两个加速器处理的数据类型相同。
也即是说,本说明书提供的卸载卡上可以安装至少两个加速器,并且,该加速器所处理的数据类型是相同的。例如在卸载卡上安装至少两个图形处理器,该图形处理器能够对图像和图形数据进行处理,从而通过加速器对待处理数据进行处理,降低CPU的处理压力,提高CPU的性能。
在本说明书提供的一实施例中,所述加速器为至少两个,至少两个加速器处理的数据类型不同;
相应地,所述卸载卡,还被配置为将所述待处理数据发送至目标加速器,并反馈所述目标加速器获得的数据处理结果,其中,所述目标加速器为所述至少两个加速器之一,且所述目标加速器处理的数据类型与所述待处理数据的数据类型相同。
其中,至少两种类型加速器包括但不限于人工智能加速器、机器学习加速器、图形处理加速器、数据安全加速器、计算加速器等加速器中的任意两种类型的加速器。
具体的,本说明书提供的卸载卡上可以安装至少两个加速器,但是该加速器所处理的数据类型可以是不相同的,例如,该卸载卡上安装有图像处理器、以及人工智能加速器。基于此,该卸载卡在需要加速器对待处理数据进行处理的过程中,需要基于该待处理数据的数据类型,为其确定对应的目标加速器。其中,该目标加速器为至少两个加速器之一,并且,该目标加速器所处理的数据类型,与待处理数据的数据类型为一致,从而通过加速器对待处理数据进行处理,降低CPU的处理压力,提高CPU的性能。
在本说明书一实施例中,所述安装有加速器的卸载卡,还被配置为确定所述目标加速器对应的数据存储单元,其中,所述数据存储单元中存储所述目标加速器获得的数据处理结果,且所述数据处理结果为所述目标加速器对所述待处理数据进行处理获得;以及
从所述数据存储单元中获取所述数据处理结果,并反馈所述数据处理结果。
其中,该数据存储单元可以理解为目标加速器对应的、用于存储该目标加速器进行数据处理过程中所需要的数据,以及该目标加速器的数据处理结果。例如,该目标加速器为GPU,该数据存储单元可以理解为GPU内存。
例如,该卸载卡在将图片渲染请求中携带的待渲染图片发送至GPU之后,能够从GPU对应的GPU内存中获得该GPU渲染后的图片,并将该渲染后的图片反馈给CPU进行后续处理。从而通过安装在卸载卡上的目标加速器分担CPU的计算压力,并避免了CPU由于加速能力参差不齐导致调度困难的问题。
在实际应用中,该卸载卡可以将待渲染数据存储至GPU内存中,并指示该GPU从该GPU内存中获取该待渲染数据进行渲染。
在本说明书一实施例中,所述安装有加速器的卸载卡与CPU相互通信;
所述卸载卡,还被配置为将所述数据处理结果反馈至所述CPU。
例如,卸载卡将该数据处理过程中的数据仅通过卸载卡或加速器进行处理,并把最终的处理结果返回给CPU做最后的处理,降低了CPU的计算压力。
需要说明的是,当将加速器安装至卸载卡之后,则不采用原有CPU上安装的加速器,该原有CPU上安装的加速器处于关闭状态,从而保证云计算场景下CPU的性能统一,避免CPU加速能力参差不齐导致的调度困难问题。
在本说明书一实施例中,所述安装有加速器的卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存,以使所述CPU从所述内存中获得所述数据处理结果。
具体的,卸载卡将该数据处理过程中的数据仅通过卸载卡或加速器进行处理,并把最终的处理结果返回给***内存,该CPU可以从该***内存中获取卸载卡或者加速器处理后的数据,并对其做最后的处理,降低了CPU的计算压力。
在本说明书一实施例中,所述安装有加速器的卸载卡,其中,
所述卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存,且将所述数据处理结果在所述内存中的存储信息发送至所述CPU,以使所述CPU基于所述存储信息,从所述内存中获得所述数据处理结果;或者
所述卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存中的预设存储区域,以使所述CPU从所述内存中的预设存储区域中获得所述数据处理结果。
其中,存储信息可以理解为数据处理结果在内存中的存储位置;该预设存储区域可以理解为内存中的一块预先设定的区域,专用于存储卸载卡提供该CPU的数据处理结果;后缀该CPU能够定时查看该区域,并从该区域中获取新写入的数据处理结果。
具体的,卸载卡在将数据处理结果反馈该CPU的过程中,需要确定CPU对应的内存,并将数据处理结果存储至内存,且将数据处理结果在内存中的存储信息发送至CPU;该CPU在接收到该存储信息之后,能够基于该存储信息从内存中获得数据处理结果,并进行后续的处理。
或者,卸载卡在将数据处理结果反馈该CPU的过程中,需要确定CPU对应的内存,以及内存中与CPU进行数据传输的预设存储区域,并将数据处理结果存储至内存中的预设存储区域;CPU能够从内存中的预设存储区域中获得数据处理结果,并进行后续的处理。
基于此,该卸载卡通过将数据处理结果提供给CPU,从而降低了CPU的处理压力。
在本说明书一实施中,所述安装有加速器的卸载卡与CPU相互通信;
所述卸载卡,还被配置为在确定所述待处理数据的数据类型满足CPU处理条件的情况 下,将所述待处理数据发送至所述CPU。
在实际应用中,该数据类型满足CPU处理条件可以理解为,该数据类型与该CPU处理的数据类型为一致。或者,该数据类型与该卸载卡处理的数据类型、加速器所处理的数据类型不同。
具体的,该卸载卡以及安装在卸载卡上的加速器,均是用于分担CPU的处理压力;将原有CPU的图像处理、I/O流量卸载、数据加解密等功能,通过卸载卡以及安装在卸载卡上的加速器实现;以便于该CPU能处理更重要的请求,例如,用户请求、web(World Wide Web,简称为web,即全球广域网,也称为万维网)请求等。
基于此,当卸载卡在接收数据包的过程中,确定该待处理数据需要CPU进行处理,则将该待处理数据发送至CPU,从而保证CPU工作的顺利进行。
在本说明书一实施例中,所述安装有加速器的卸载卡,还被配置为确定所述待处理数据的数据类型,以及所述至少两个加速器处理的数据类型;
在所述至少两个加速器处理的数据类型,与所述待处理数据的数据类型相匹配的情况下,确定所述待处理数据的数据类型满足加速器处理条件;
将所述至少两个加速器中处理所述待处理数据的数据类型的加速器,确定为目标加速器。
具体的,该卸载卡在接收到携带有待处理数据的数据处理请求之后,会确定该待处理数据的数据类型,以及至少两个加速器中每个加速器所处理的数据类型,并将两者进行匹配,在至少两个加速器处理的数据类型,与待处理数据的数据类型相匹配的情况下,确定待处理数据的数据类型满足加速器处理条件,需要将该待处理数据发送至加速器进行处理。基于此,从至少两个加速器中确定处理待处理数据的数据类型的加速器,并将其确定为目标加速器。从而实现准确的将待处理数据分配至对应的加速器中进行处理,提高了数据处理效率。
本说明书提供的安装有加速器的卸载卡,通过将至少两个加速器安装在卸载卡上,从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且,卸载卡在确定数据处理请求中携带有待处理数据的数据类型,满足加速器处理条件的情况下,将待处理数据发送至目标加速器,反馈目标加速器获得的数据处理结果,从而达到提高CPU的性能,降低CPU压力的目的。
参见图7,图8,图7示出了根据本说明书一个实施例提供的一种安装有加速器的卸载卡与CPU的交互示意图,图8示出了根据本说明书一个实施例提供的一种安装有加速器的卸载卡的应用示意图,其中,图7中的CPU与卸载卡的解释可以参见上述针对图1的解释中对应或相应的内容,基于图7可知,本说明书提供的安装有加速器的卸载卡,能够将AMX、AI加速器、ML引擎、HPC加速器、安全协处理器、GPU等加速器下沉到卸载卡。参见图7可知,服务器A与服务器B通过卸载卡进行通信,其中,服务器A的卸载卡与服务 器B的卸载卡之间可以通过RoCE、InfiniBand进行通信。基于此,当卸载卡收发包过程中,会通过卸载卡上安装的至少两个的加速器对数据进行处理,从而实现仅在网卡(即卸载卡)上就可以完成数据处理,并最终把数据处理结果返回给***内存中,以供CPU做最后的处理,从而具有数据链路短的优点,且支持不同的CPU平台。
基于上述内容可知,本方案是一种实现通用的云原生芯片加速的方法,通过把各种架构CPU上部分硬件加速器下沉到以卸载卡为代表的卸载卡上,例如:AMX加速器,AI加速器,ML引擎,HPC加速器,安全协处理器,GPU(CPU厂商未来会考虑在CPU中嵌入微型GPU)等。而云原生CPU芯片去掉这些硬件加速器,从而对HOST(宿主机)的CPU做减法,专注提供高算力,真正做到以数据为中心的计算架构,数据在哪里就在哪里处理,将会带来一系列的收益,包括:统一不同CPU平台的加速能力;消除以CPU为中心的流量绕行和资源消耗。
参见图9,图9示出了根据本说明书一个实施例提供的一种数据处理方法的流程图,具体包括如下步骤。
步骤902:接收数据处理请求,其中,所述数据处理请求中携带有待处理数据。
步骤904:在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果。
步骤906:在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
针对该数据处理方法的解释,可以参见上述针对一种安装有加速器的卸载卡的解释中对应或相应的内容,在此不过多赘述。
本说明书提供的应用于安装有加速器的卸载卡的数据处理方法,通过将加速器安装在卸载卡上,从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且,卸载卡在确定数据处理请求中携带有待处理数据的数据类型,满足加速器处理条件的情况下,将待处理数据发送至加速器,反馈加速器获得的数据处理结果,从而达到提高CPU的性能,降低CPU压力的目的。
上述为本实施例的一种数据处理方法的示意性方案。需要说明的是,该数据处理方法的技术方案与上述的安装有加速器的卸载卡的技术方案属于同一构思,数据处理方法的技术方案未详细描述的细节内容,均可以参见上述安装有加速器的卸载卡的技术方案的描述。
与上述方法实施例相对应,本说明书还提供了数据处理装置实施例,图10示出了本说明书一个实施例提供的一种数据处理装置的结构示意图。如图10所示,该装置应用于安装有加速器的卸载卡,所述装置包括:
接收模块1002,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
第一处理模块1004,被配置为在确定所述待处理数据的数据类型满足卸载卡处理条件 的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
第二处理模块1006,被配置为在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
针对该数据处理装置的解释,可以参见上述针对一种安装有加速器的卸载卡的解释中对应或相应的内容,在此不过多赘述。
本说明书提供的应用于安装有加速器的卸载卡的数据处理装置,通过将至少两种类型加速器安装在卸载卡上,从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且,卸载卡在确定数据处理请求中携带有待处理数据的数据类型,满足加速器处理条件的情况下,将待处理数据发送至加速器,反馈加速器获得的数据处理结果,从而达到提高CPU的性能,降低CPU压力的目的。
上述为本实施例的一种数据处理装置的示意性方案。需要说明的是,该数据处理装置的技术方案与上述的安装有加速器的卸载卡的技术方案属于同一构思,数据处理装置的技术方案未详细描述的细节内容,均可以参见上述安装有加速器的卸载卡的技术方案的描述。
与上述方法实施例相对应,本说明书还提供了数据处理***实施例,图11示出了本说明书一个实施例提供的一种数据处理***的结构示意图。如图11所示,所述***包括CPU 1102,内存1104、以及安装有加速器1106的卸载卡1108,其中,
所述卸载卡1108,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据,在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并将获得的数据处理结果反馈至所述内存1104;或者
在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器1106,将所述加速器1106获得的数据处理结果反馈至所述内存1104;
所述CPU 1102,被配置为从所述内存1104中获取所述数据处理结果。
针对该数据处理***的解释,可以参见上述针对一种安装有加速器的卸载卡的解释中对应或相应的内容,在此不过多赘述。
本说明书提供的数据处理***,通过将加速器安装在卸载卡上,从而避免了由于CPU会安装不同类型的加速器或者不安装加速器,所导致的CPU性能的参差不齐,难以对CPU资源进行调控的问题;并且,卸载卡在确定数据处理请求中携带有待处理数据的数据类型,满足加速器处理条件的情况下,将待处理数据发送至加速器,反馈加速器获得的数据处理结果至内存,使得该CPU能够从内存中获得数据处理结果,从而达到提高CPU的性能,降低CPU压力的目的。
上述为本实施例的一种数据处理***的示意性方案。需要说明的是,该数据处理***的技术方案与上述的安装有加速器的卸载卡的技术方案属于同一构思,数据处理***的技术方案未详细描述的细节内容,均可以参见上述安装有加速器的卸载卡的技术方案的描述。
本说明书一实施例还提供一种计算机可读存储介质,其存储有计算机可执行指令,该计算机可执行指令被处理器执行时实现应用于卸载卡的数据处理方法的步骤。
上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是,该存储介质的技术方案与上述应用于卸载卡的数据处理方法的技术方案属于同一构思,存储介质的技术方案未详细描述的细节内容,均可以参见上述应用于卸载卡的数据处理方法的技术方案的描述。
本说明书一实施例还提供一种计算机程序,其中,当所述计算机程序在计算机中执行时,令计算机执行上述应用于卸载卡的数据处理方法的步骤。
上述为本实施例的一种计算机程序的示意性方案。需要说明的是,该计算机程序的技术方案与上述应用于卸载卡的数据处理方法的技术方案属于同一构思,计算机程序的技术方案未详细描述的细节内容,均可以参见上述应用于卸载卡的数据处理方法的技术方案的描述。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
所述计算机指令包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
需要说明的是,对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本说明书实施例并不受所描述的动作顺序的限制,因为依据本说明书实施例,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本说明书实施例所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节,也不限制该发明仅为所述的具体实施方式。显然,根据本说明书实施例的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解 释本说明书实施例的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。

Claims (12)

  1. 一种安装有加速器的卸载卡,其中,
    所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
    在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
    在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
  2. 根据权利要求1所述的安装有加速器的卸载卡,其中,所述加速器为至少两个,至少两个加速器处理的数据类型相同。
  3. 根据权利要求1所述的安装有加速器的卸载卡,其中,所述加速器为至少两个,至少两个加速器处理的数据类型不同;
    相应地,所述卸载卡,还被配置为将所述待处理数据发送至目标加速器,并反馈所述目标加速器获得的数据处理结果,其中,所述目标加速器为所述至少两个加速器之一,且所述目标加速器处理的数据类型与所述待处理数据的数据类型相同。
  4. 根据权利要求3所述的安装有加速器的卸载卡,其中,
    所述卸载卡,还被配置为确定所述目标加速器对应的数据存储单元,其中,所述数据存储单元中存储所述目标加速器获得的数据处理结果,且所述数据处理结果为所述目标加速器对所述待处理数据进行处理获得;以及
    从所述数据存储单元中获取所述数据处理结果,并反馈所述数据处理结果。
  5. 根据权利要求1所述的安装有加速器的卸载卡,其中,所述卸载卡与CPU相互通信;
    所述卸载卡,还被配置为将所述数据处理结果反馈至所述CPU。
  6. 根据权利要求5所述的安装有加速器的卸载卡,其中,
    所述卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存,以使所述CPU从所述内存中获得所述数据处理结果。
  7. 根据权利要求6所述的安装有加速器的卸载卡,其中,
    所述卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存,且将所述数据处理结果在所述内存中的存储信息发送至所述CPU,以使所述CPU基于所述存储信息,从所述内存中获得所述数据处理结果;或者
    所述卸载卡,还被配置为确定所述CPU对应的内存,并将所述数据处理结果存储至所述内存中的预设存储区域,以使所述CPU从所述内存中的预设存储区域中获得所述数据处理结果。
  8. 根据权利要求1所述的安装有加速器的卸载卡,其中,所述卸载卡与CPU相互通信;
    所述卸载卡,还被配置为在确定所述待处理数据的数据类型满足CPU处理条件的情况下,将所述待处理数据发送至所述CPU。
  9. 根据权利要求3所述的安装有加速器的卸载卡,其中,
    所述卸载卡,还被配置为确定所述待处理数据的数据类型,以及所述至少两个加速器处理的数据类型;
    在所述至少两个加速器处理的数据类型,与所述待处理数据的数据类型相匹配的情况下,确定所述待处理数据的数据类型满足加速器处理条件;
    将所述至少两个加速器中处理所述待处理数据的数据类型的加速器,确定为目标加速器。
  10. 根据权利要求2或3任意一项所述的安装有加速器的卸载卡,所述加速器处理的数据类型包括人工智能类型、机器学习类型、图形类型、数据安全类型、数据计算类型。
  11. 一种数据处理方法,应用于安装有加速器的卸载卡,所述方法包括:
    接收数据处理请求,其中,所述数据处理请求中携带有待处理数据;
    在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并反馈获得的数据处理结果;或者
    在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,并反馈所述加速器获得的数据处理结果。
  12. 一种数据处理***,所述***包括CPU,内存、以及安装有加速器的卸载卡,其中,
    所述卸载卡,被配置为接收数据处理请求,其中,所述数据处理请求中携带有待处理数据,在确定所述待处理数据的数据类型满足卸载卡处理条件的情况下,对所述待处理数据进行处理,并将获得的数据处理结果反馈至所述内存;或者
    在确定所述待处理数据的数据类型满足加速器处理条件的情况下,将所述待处理数据发送至加速器,将所述加速器获得的数据处理结果反馈至所述内存;
    所述CPU,被配置为从所述内存中获取所述数据处理结果。
PCT/CN2023/123519 2022-10-17 2023-10-09 一种安装有加速器的卸载卡 WO2024082985A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211268202.5 2022-10-17
CN202211268202.5A CN115686836A (zh) 2022-10-17 2022-10-17 一种安装有加速器的卸载卡

Publications (1)

Publication Number Publication Date
WO2024082985A1 true WO2024082985A1 (zh) 2024-04-25

Family

ID=85066833

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/123519 WO2024082985A1 (zh) 2022-10-17 2023-10-09 一种安装有加速器的卸载卡

Country Status (2)

Country Link
CN (1) CN115686836A (zh)
WO (1) WO2024082985A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686836A (zh) * 2022-10-17 2023-02-03 阿里巴巴(中国)有限公司 一种安装有加速器的卸载卡
CN116723058B (zh) * 2023-08-10 2023-12-01 井芯微电子技术(天津)有限公司 网络攻击检测和防护方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114363185A (zh) * 2022-03-17 2022-04-15 阿里云计算有限公司 虚拟资源处理方法以及装置
CN114528032A (zh) * 2020-10-30 2022-05-24 华为云计算技术有限公司 一种服务器***以及数据处理的方法
CN114691037A (zh) * 2022-03-18 2022-07-01 阿里巴巴(中国)有限公司 卸载卡命名空间管理、输入输出请求处理***和方法
CN115686836A (zh) * 2022-10-17 2023-02-03 阿里巴巴(中国)有限公司 一种安装有加速器的卸载卡

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114528032A (zh) * 2020-10-30 2022-05-24 华为云计算技术有限公司 一种服务器***以及数据处理的方法
CN114363185A (zh) * 2022-03-17 2022-04-15 阿里云计算有限公司 虚拟资源处理方法以及装置
CN114691037A (zh) * 2022-03-18 2022-07-01 阿里巴巴(中国)有限公司 卸载卡命名空间管理、输入输出请求处理***和方法
CN115686836A (zh) * 2022-10-17 2023-02-03 阿里巴巴(中国)有限公司 一种安装有加速器的卸载卡

Also Published As

Publication number Publication date
CN115686836A (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
WO2024082985A1 (zh) 一种安装有加速器的卸载卡
CN110892380B (zh) 用于流处理的数据处理单元
JP3696563B2 (ja) コンピュータ・プロセッサ及び処理装置
JP4455822B2 (ja) データ処理方法
US9575689B2 (en) Data storage system having segregated control plane and/or segregated data plane architecture
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US7231638B2 (en) Memory sharing in a distributed data processing system using modified address space to create extended address space for copying data
US20240127054A1 (en) Remote artificial intelligence (ai) acceleration system
JP4768386B2 (ja) 外部デバイスとデータ通信可能なインターフェイスデバイスを有するシステム及び装置
US20080235679A1 (en) Loading Software on a Plurality of Processors
JP2004078979A (ja) プロセッサでのデータ処理方法及びデータ処理システム
US10802753B2 (en) Distributed compute array in a storage system
US10873630B2 (en) Server architecture having dedicated compute resources for processing infrastructure-related workloads
WO2020163327A1 (en) System-based ai processing interface framework
WO2021164163A1 (zh) 一种请求处理方法、装置、设备及存储介质
US20230205715A1 (en) Acceleration framework to chain ipu asic blocks
CN112600761A (zh) 一种资源分配的方法、装置及存储介质
KR20230024418A (ko) 빠른 분산 훈련을 위한 축소 서버
CN113472523A (zh) 用户态协议栈报文处理优化方法、***、装置及存储介质
US20230153174A1 (en) Device selection for workload execution
CN117632457A (zh) 一种加速器调度方法及相关装置
CN113691465B (zh) 一种数据的传输方法、智能网卡、计算设备及存储介质
US20140122722A1 (en) Allocation of resources in a networked computing environment
WO2024001850A1 (zh) 数据处理***、方法、装置和控制器
US20240061805A1 (en) Host endpoint adaptive compute composability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23878977

Country of ref document: EP

Kind code of ref document: A1