WO2023082609A1 - 一种数据处理方法以及相关设备 - Google Patents

一种数据处理方法以及相关设备 Download PDF

Info

Publication number
WO2023082609A1
WO2023082609A1 PCT/CN2022/095908 CN2022095908W WO2023082609A1 WO 2023082609 A1 WO2023082609 A1 WO 2023082609A1 CN 2022095908 W CN2022095908 W CN 2022095908W WO 2023082609 A1 WO2023082609 A1 WO 2023082609A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
execution information
computing device
data processing
network card
Prior art date
Application number
PCT/CN2022/095908
Other languages
English (en)
French (fr)
Inventor
张蔚
周敏均
周辉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023082609A1 publication Critical patent/WO2023082609A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Definitions

  • the present application relates to the field of communications, and in particular to a data processing method and related equipment.
  • Remote direct memory access (remote direct memory access, RDMA) technology is a communication technology that directly accesses remote memory, that is, it can directly and quickly migrate data from the client to the remote memory without the need for operating systems on both sides. system, OS) to intervene.
  • the source device when the source device needs to instruct the remote device to process a task, the source device first sends the task processing request to the remote device through the RDMA sending command, and the RDMA network interface card (rdma network interface) of the remote device card, RNIC) after receiving the task processing request, first store the task processing request in the memory of the remote device, and then analyze the task processing request by the operating system of the remote device, and call the central processing unit (central processing unit) of the remote device unit, CPU) executes the task processing request.
  • the CPU scheduling efficiency of the operating system is relatively low, thereby reducing the efficiency of task processing.
  • the embodiment of the present application provides a data processing method and related equipment, which are used to improve the efficiency of task execution when a remote call processor is required to execute the task.
  • the method in the embodiment of the present application is executed by a computing device.
  • the computing device includes a network card, a processor, and a processor scheduling engine.
  • the network card of the computing device receives a data processing request, the data processing request includes first execution information of the data to be processed, and the network card directly obtains the first execution information in the data processing request. After acquiring the first execution information, the network card converts the first execution information into second execution information, where the second execution information includes the first execution information.
  • the network card sends the second execution information to the processor scheduling engine, and the processor scheduling engine invokes the processor. After the processor obtains the second execution information, it processes the data to be processed according to the second execution information.
  • the network card After the network card receives the data processing request, it directly parses the data processing request, obtains the first execution information in the data processing request, and converts the first execution information into the second execution information and sends it to the processor scheduling engine, and directly calls it through the processor scheduling engine
  • the processor processes the data to be processed.
  • the OS calls the processor to analyze the data processing request, thereby avoiding calling the processor through the OS and improving execution efficiency.
  • the data processing request includes the first execution information of the data to be processed, wherein the data to be processed is the code to be executed, and the first execution information includes the first storage address of the code.
  • the network card After the network card obtains the first execution information, the network card sends the first execution information
  • the first execution information is converted into the second execution information, and the second execution information is sent to the processor scheduling engine, and the processor scheduling engine calls the processor to process the second execution information.
  • the processor obtains the code from the computing device according to the first storage address in the second execution information, and executes the code.
  • the network card directly obtains the first execution information in the data processing request, the first execution information includes the address of the code to be processed in the computing device, and converts the first storage address in the data processing request into the second execution information, so that the processing After being invoked, the device directly executes the corresponding code based on the first storage address in the second information, without sending the data processing request to the memory of the computing device, and then the OS schedules the processor to obtain it, which saves the acquisition time and improves the processing time. Data efficiency.
  • the computing device also includes a memory, the data processing request includes first execution information, the data to be processed is the code to be executed, the first execution information includes the first storage address of the code and the context required for executing the code, and the network card parses the data processing request , to obtain the context in the first execution information, the network card stores the context in the memory, and obtains the second storage address of the context in the memory.
  • the network card can store the context in memory through direct memory access (DMA), so that the operating system does not need to call the processor to parse the data request and obtain the context from it, nor does it need the operating system to call the processor to store the context Copying to memory can improve the efficiency of processing pending data.
  • the network card converts the first storage address and the second storage address into second execution information.
  • the processor is called by the processor scheduling engine to process, the processor obtains the code from the first storage address and the second storage address in the second execution information, obtains the code according to the first storage address, obtains the context according to the second storage address, and executes the code according to the context.
  • the network card can encapsulate the storage address of the context into the second execution information, without the need for the processor to parse the data processing request and obtain the second execution information therefrom, which can improve processing efficiency.
  • the data processing request carries an execution command in the RDMA protocol
  • the data processing request includes a BTH (basic transmission header, BTH) and an extension header
  • the BTH includes an execution command
  • the extension header includes first execution information.
  • the execution command is used to instruct the network card to process the data processing request.
  • the operating system calls the processor to analyze the data processing request, avoiding repeated copying of the data processing request, wasting memory resources, and because there is no need for the processor to parse the data processing request, it improves efficiency.
  • the execution command is the command defined in the custom field of the transport layer of the RDMA protocol.
  • the execution command is used to instruct the network card to directly parse the data processing request.
  • the custom field of the transport layer of the RDMA protocol is written into the execution command, so that the transmitted data processing request includes the execution command, so that after the network card receives the data processing request containing the execution command, it directly parses the data processing request without processor parsing This data processing request improves efficiency.
  • the processor scheduling engine is a packet ordering engine.
  • the processor is scheduled through the processor scheduling engine, avoiding the use of OS calls, and improving the processing efficiency.
  • the computing device includes a host connected to a network card, and the host includes a processor and a processor scheduling engine.
  • the network card obtains the data processing request, it directly obtains the first execution information in the data processing request. After acquiring the first execution information, the network card converts the first execution information into second execution information.
  • the network card sends the second execution information to the processor scheduling engine of the host, and the processor scheduling engine calls the processor to process the second execution information.
  • the processor processes the data to be processed according to the second execution information.
  • the network card After the network card receives the data processing request, it directly parses the data processing request, obtains the first execution information in the data processing request, and converts the first execution information into the second execution information and sends it to the processor scheduling engine in the network card for scheduling by the processor.
  • the engine directly calls the processor in the network card to execute the data to be processed. There is no need to transfer data processing requests to the memory, and then call the processor through the OS to improve execution efficiency.
  • the computing device is a data processing unit, including a network card, a processor, and a processor scheduling engine.
  • the data to be processed is stored in another computing device.
  • the network card obtains the data processing request, it directly obtains the first execution information in the data processing request. After acquiring the first execution information, the network card converts the first execution information into second execution information.
  • the network card sends the second execution information to the processor scheduling engine, and the processor scheduling engine calls the processor in the computing unit to process the second execution information.
  • the processor processes the data to be processed in another computing device according to the second execution information.
  • the network card includes a first processor and a storage device, the storage device stores program instructions, and the first processor runs the program instructions to perform: receiving a data processing request, the data processing request includes the first execution information of the data to be processed; obtain the first execution information; convert the first execution information into second execution information, and send the second execution information to the processor scheduling engine, and the second execution information is used to instruct the processor scheduling engine to schedule the second processor, so that The second processor processes the data to be processed according to the second execution information.
  • the network card directly acquires the first execution information in the data processing request, the first execution information includes the address of the code to be processed in the computing device, and converts the first storage address in the data processing request into the second execution information, so that the processing After being invoked, the device directly executes the corresponding code based on the first storage address in the second information, without sending the data processing request to the memory of the computing device, and then the OS schedules the processor to obtain it, which saves the acquisition time and improves the processing time. Data efficiency.
  • the data processing request is carried in the execution command in the remote direct memory access RDMA protocol, and the execution command is used to instruct the network card to process the data processing request.
  • the execution command is the command defined in the custom field of the transport layer of the RDMA protocol.
  • the computing device includes the network card of the second aspect, a second processor and a processor scheduling engine.
  • the network card is configured to receive a data processing request, where the data processing request includes first execution information of the data to be processed.
  • the network card is also used to obtain the first execution information from the data processing request.
  • the network card is also used to convert the first execution information into second execution information, and the second execution information includes the first execution information.
  • the network card is also used to send the second execution information to the processor scheduling engine.
  • the processor schedules the execution engine to call the second processor to process the second execution information.
  • the second processor is configured to process the data to be processed according to the second execution information.
  • the data to be processed is a code to be executed
  • the first execution information includes a first storage address of the code.
  • the second processor is specifically configured to acquire the code according to the first storage address in the second execution information, and execute the code.
  • the computing device further includes a memory
  • the data processing request includes first execution information
  • the data to be processed is the code to be executed
  • the first execution information includes the first storage of the code
  • the network card parses the data processing request, obtains the context in the first execution information, stores the context in the memory, and obtains the second storage address of the context in the memory.
  • the network card may store the context in the memory through DMA, so that the operating system does not need to call the second processor to parse the data processing request and obtain the context therefrom, and the operating system does not need to call the second processor to copy the context to the memory. It is possible to improve the efficiency of processing pending data.
  • the network card converts the first storage address and the second storage address into second execution information.
  • the second processor is called by the processor scheduling engine to process, the second processor obtains the code according to the first storage address from the first storage address and the second storage address in the second execution information, obtains the context according to the second storage address, and obtains the context according to the second storage address. Context to execute code.
  • the network card can encapsulate the storage address of the context into the second execution information, without the need for the processor to parse the data processing request and obtain the second execution information therefrom, which can improve processing efficiency.
  • the data processing request is carried in an execution command in the Remote Direct Memory Access (RDMA) protocol.
  • RDMA Remote Direct Memory Access
  • the execution command is a command defined in a custom field of the transport layer of the RDMA protocol.
  • the processor scheduling engine is a packet order preserving engine.
  • the computing device includes a host connected to a network card, and the host includes a second processor and a processor scheduling engine.
  • the processor is a data processor DPU
  • the computing device is connected to another computing device
  • the data to be processed is stored in the other computing device.
  • Fig. 1 is a schematic structural diagram of an embodiment of a data processing system provided by the present application
  • FIG. 2 is a schematic diagram of commands defined by the RDMA protocol in the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of the second computing device in FIG. 1;
  • FIG. 4 is a schematic structural diagram of an embodiment of a network card provided by the present application.
  • FIG. 5 is a schematic flowchart of the first embodiment of the data processing method provided by the present application.
  • FIG. 6 is a schematic diagram of a format of a data processing request provided by this application.
  • Fig. 7 is a schematic diagram of another format of the data processing request provided by this application.
  • FIG. 8 is a schematic diagram of an embodiment of processing data by a second computing device of the present application.
  • FIG. 9 is a schematic structural diagram of another embodiment of a data processing system provided by the present application.
  • FIG. 10 is a schematic structural diagram of an embodiment of the second computing device in FIG. 9;
  • FIG. 11 is a schematic flowchart of the second embodiment of the data processing method provided by the present application.
  • Fig. 12 is a schematic diagram of another embodiment of processing data by the second computing device of the present application.
  • RDMA technology is a technology that can directly transmit data to the storage area of a remote computing device, allowing data to quickly move from the storage area of one computing device to the storage area of another computing device, and accessing the storage area of the remote computing device In the process, it is not necessary to call the processor, which improves the efficiency of data access in the remote computing device.
  • the RDMA technology can also be used to call the computing power of the remote computing device.
  • the scheduling device source device
  • the request message carries specific task information such as function calculation or information processing, and the remote computing device executes according to the request message. corresponding task, and return the result of task execution to the source device.
  • the remote computing device needs the operating system scheduling processor to obtain the request message through operations such as polling or interruption, and in the process of executing the task, the operating system scheduling processor needs to perform operations such as parsing and copying the request message, which will increase the processing time.
  • the delay of the task, and the processor is in a high-load operation state, which increases the power consumption of the processor.
  • FIG. 1 is a schematic structural diagram of an embodiment of a data processing system provided by the present application.
  • System 10 includes a first computing device 11 and a second computing device 12 .
  • the first computing device 11 in FIG. 1 is used as a source device for sending a data processing request.
  • the data processing request is used to instruct the second computing device 12 to process the data to be processed, for example, the data to be processed is a computing task.
  • the second computing device 12 is a remote device that stores data to be processed, such as a computing task.
  • the computing task can be, for example, a section of program code, such as a function, and the second computing device 12 is also used to execute the data processing request. the indicated task.
  • the second computing device 12 may be a device such as a server, a computer, or a tablet computer.
  • the following description will be made by taking the data to be processed as a computing task as an example.
  • the communication between the first computing device 11 and the second computing device 12 is through the RDMA protocol.
  • the present application in addition to the basic operation commands executed by the RDMA protocol, the present application adds a new command, that is, an Execute command.
  • This command is a command defined in the custom field of the transport layer of the RDMA protocol.
  • FIG. 2 is a schematic diagram of commands defined by the RDMA protocol in the embodiment of the present application.
  • the commands currently defined in the RDMA protocol include a send (Send) command, a write (Write) command, a read (Read) command, and an atomic (Atomic) command.
  • a new command is defined in the OPCode field, that is, an Execute command, and the Execute command is used to instruct the network card in the second computing device 12 to directly
  • the carried data such as a data processing request, is parsed without transferring the carried data to the memory of the second computing device.
  • execution command is just a general term for the commands that instruct the network card to directly parse the data processing request, and does not specifically refer to one or some commands.
  • the command is not called “execution command”, but replaced by other names, which are not limited here. In the embodiment of this application, only “execution command” is used as an example for illustration.
  • FIG. 3 is a schematic structural diagram of an embodiment of a computing device provided in the present application
  • FIG. 4 is a schematic structural diagram of an embodiment of a network card provided in the present application.
  • the second computing device 12 includes the network card 121 shown in FIG. 4 , the second processor 122 and the processor scheduling engine 123 .
  • the network card 121 is connected to the processor scheduling engine 123
  • the processor scheduling engine 123 is connected to the second processor 122 .
  • the second computing device 12 may include a host 124 connected to the network card 121 , and the second processor 122 and the processor scheduling engine 123 are arranged in the host 124 .
  • the second processor 122 may be a central processing unit (central processing unit, CPU) or a graphics processing unit (graphics processing unit, GPU), which is not specifically limited here.
  • the processor scheduling engine 123 may be a packet order enforcer (POE).
  • POE packet order enforcer
  • the network card 121 includes a first processor 1211 and a storage device 1212 .
  • the network card 121 is configured to receive a data processing request from the first computing device 11 and analyze the data processing request.
  • program instructions are stored in the storage device 1212 .
  • the first processor 1211 executes the program instructions to perform the following operations: receive and parse the data processing request, and after parsing the computing task information from the data processing request, convert the computing task information into a scheduling Information, the scheduling information may be used to schedule the second processor 122, and then send the scheduling information to the processor scheduling engine 123.
  • the processor scheduling engine 123 may directly schedule the second processor 122 to process the computing task indicated by the scheduling information according to the scheduling information.
  • the operating system does not need to schedule the second processor 122 to execute the computing task, thereby improving the execution efficiency of the task.
  • first execution information the information of the computing task
  • second execution information the scheduling information
  • the second computing device may also include a memory 125 connected to the second processor 122 .
  • the memory 125 is used to store computing tasks, so that the second processor 122 obtains and processes the computing tasks from the memory 125 according to the indication of the second execution information.
  • the memory 125 may be nonvolatile memory such as read-only memory (read-only memory, ROM), flash memory (flash memory) or disk.
  • the network card 121 further includes a direct memory access (direct memory access, DMA) controller 1213, and the DMA controller 1213 is connected to the first processor 1211.
  • the second computing device 12 also includes a memory 126 connected to the DMA controller 1213 and the second processor 122 .
  • the memory 126 is random access memory (random access memory, RAM).
  • the DMA controller 1213 transfers the state information to the memory 126 by DMA, so that the second processor 122 can directly read from the State information is obtained in the memory 126 to process computing tasks.
  • the status information for the convenience of description, the following will take the status information as an example for description.
  • FIG. 5 is a schematic flowchart of a first embodiment of the data processing method provided by the present application. This embodiment includes the following steps:
  • a network card of a second computing device receives a data processing request sent by a first computing device.
  • the first computing device sends a data processing request to the second computing device, and the network card of the second computing device receives the data processing request.
  • the data processing request includes first execution information of the computing task.
  • Computing tasks are codes to be executed.
  • the coding to be executed is, for example, an algorithm code or a function code.
  • the first execution information includes a first storage address of the code in the second computing device. Before the first computing device sends the data processing request to the second computing device, the first computing device needs to obtain the first execution information.
  • the calculation task may be pre-stored in the memory of the second device, and the first calculation device stores the mapping relationship between the calculation task and the first storage address.
  • the first computing device needs the second computing device to perform tasks such as function calculation based on the computing task, the first computing device obtains the first storage address according to the mapping relationship, and encapsulates the first storage address into a data processing request.
  • FIG. 6 is a schematic diagram of a format of a data processing request provided by this application.
  • the data processing request includes a basic transmission header (basic transmission header, BTH) and an extension header.
  • BTH basic transmission header
  • OP operation code
  • the operation code (opcode, OP) in the BTH includes the execution command
  • the extension header includes the first execution information.
  • This kind of data processing request without context is a stateless execution request, that is, the computing task can be directly run by the processor and output the result without additional input data.
  • FIG. 7 is a schematic diagram of another format of the data processing request provided by the present application.
  • Data processing requests include BTH and extension headers.
  • the OP in the BTH includes the execution command
  • the extension header includes the first execution information.
  • the format of the data processing request in FIG. 7 also includes context in the first execution information carried in the extension header.
  • the context is, for example, the data required in the process of running a computing task, such as parameters, initial values, or value ranges, and so on.
  • This type of data processing request carrying a context is a stateful execution request, that is, the computing task needs to be provided with a context before the computing task can be processed by the processor.
  • cyclic redundancy check (cyclic redundancy check, CRC) is a hash function that generates a short fixed-digit check code based on data such as network packets or computer files, and is used to detect or verify For errors that may occur after data transmission or storage, use the principle of division and remainder to detect errors.
  • the network card of the second computing device acquires first execution information in the data processing request.
  • the network card of the second computing device parses the data processing request, and obtains the execution command from the transmission header of the data processing request.
  • the network card further obtains the first execution information from the extension header of the data processing request, and performs step 503 .
  • the analysis of the data processing request is completed by the network card, instead of directly storing the data processing request in the memory, and the operating system then dispatches the processor to analyze the data processing request in the memory, thus, without the participation of the operating system and the processor, it also reduces
  • the operation of writing the data processing request into the memory can improve the efficiency of parsing and reduce the power consumption of the processor.
  • the network card of the second computing device converts the first execution information into second execution information.
  • the network card of the second computing device encapsulates the first execution information obtained through parsing into second execution information.
  • the second execution information is a descriptor, so that the processor scheduling engine can identify the second execution message and perform subsequent operations of scheduling processors.
  • the first execution message When the data processing request is a stateless execution request, the first execution message includes the first storage address.
  • the network card encapsulates the first storage address into a descriptor to obtain second execution information.
  • the first execution message When the data processing request is a stateful execution request, the first execution message includes the first storage address and the context.
  • the network card encapsulates the first storage address and the second storage address into descriptors to obtain second execution information.
  • the network card When the first execution information includes the context, after the network card parses the context, the network card stores the context in the memory of the second computing device, and acquires a second storage address of the context in the memory.
  • the network card writes the context into the memory of the second computing device, for example, by means of DMA.
  • the DMA controller in the network card applies for the DMA transfer context from the second processor, the second processor allows the DMA transfer and configures the main memory start address for the DMA controller to write the context, and the DMA controller bases the main memory start address on Address to write the context into memory.
  • the second storage address of the context in the memory starts from the start address of the main memory and ends at the start address of the main memory plus the length of the context.
  • the second storage address is determined according to the starting address of the main memory configured by the second processor and the length of the context. Therefore, the second processor can directly obtain the context from the memory without the operating system calling the second processor to obtain the context from the data processing request and copy the context to the memory, which can reduce processor power consumption and improve processor processing of computing tasks s efficiency.
  • the first execution information in the first The storage address may be a function address (function address, FA).
  • the network card of the second computing device parses out the first storage address in the form of a virtual address, it obtains an address table (adress table, AT).
  • the address table includes a mapping relationship between the virtual address and the actual physical address of the stored code.
  • the network card of the second computing device looks up the actual physical address corresponding to the first storage address in the virtual address form according to the address table.
  • the first storage address in the second execution message is an actual physical address.
  • the address of the virtual replacement code is used to prevent the address of the code from being leaked during the transmission process and improve the security of the transmission process.
  • the network card of the second computing device sends the second execution information to the processor scheduling engine of the second computing device.
  • the network card of the second computing device After obtaining the second execution information, the network card of the second computing device sends the second execution information to the processor scheduling engine of the second computing device, so as to notify the processor scheduling engine to schedule the second processor.
  • the processor calling engine of the second computing device calls the second processor of the second device.
  • a processor scheduling engine is hardware that can directly call a processor.
  • the processor calling engine of the second computing device calls the idle second processor after obtaining the second execution information.
  • the network card generates the second execution information for calling the processor, and sends the second execution information to the processor scheduling engine, and the processor scheduling engine schedules the second processor to execute computing tasks, without the operating system scheduling the second processor, thereby improving the efficiency of execution.
  • the second processor of the second computing device processes the computing task according to the second execution information.
  • the second processor when the second execution information includes the first storage address, the second processor obtains the code from the memory of the second computing device according to the first storage address in the second execution information, and processes it.
  • the processor scheduling engine of the second computing device directly calls the idle second processor, and the second processor modifies the program counter (program counter, PC) of the second processor after receiving the address of the code in the second execution information value, so that the second processor fetches the code from the address pointed to by the value of the PC (that is, the first storage address).
  • PC program counter
  • PC is a control register in the second processor for storing the address of an instruction, which contains the address (location) of the instruction currently being executed.
  • the program counter is incremented by one. After each instruction is fetched, the program counter points to the next instruction in sequence. Since most instructions are executed sequentially, the process of modifying the PC is usually simply adding the "number of instruction bytes" to the PC.
  • the final result of the execution of the transfer instruction is to change the value of the PC, and this PC value is the target address to be transferred to.
  • the second processor always fetches, decodes, and executes instructions according to the PC, thereby realizing program transfer.
  • the address of the subsequent instruction ie, the content of the PC
  • the structure of the program counter should be a structure with two functions of registering information and counting.
  • modifying the value of the PC may be modified sequentially, or modified through program transfer, which is not limited in this application.
  • the second processor when the second execution information includes the first storage address and the second storage address of the context, the second processor obtains the code from the second computing device according to the first storage address, and obtains the code from the second computing device according to the second storage address
  • the memory in the second computing device acquires the context, and executes the code using the context, that is, brings the context into the code, and then executes the code.
  • the processor scheduling engine of the second computing device directly calls the idle second processor, and after the second processor obtains the first storage address and the second storage address, it modifies the value of the PC, respectively obtains the The code is obtained from the memory and the context is obtained from the memory according to the second storage address, and the processor uses the context to execute the code to generate result information.
  • the network card of the second computing device receives the execution result.
  • the network card of the second computing device sends the execution result to the first computing device.
  • the network card of the second computing device After receiving the result information, the network card of the second computing device sends the execution result to the first computing device.
  • FIG. 8 is a schematic diagram of an embodiment of data processing by the second computing device of the present application.
  • the network card parses the data processing request, and when the basic transmission header of the data processing request is Excute, the network card further obtains the first execution information from the extension header of the data processing request.
  • the network card encapsulates the first storage address (function address) in the first execution information as the second execution information, and sends the second execution information to the processor scheduling engine, and the processor scheduling engine calling the idle second processor to execute the code corresponding to the first storage address.
  • the network card will store the context in the memory of the second computing device, that is, the stack (stack), and obtain the second storage address (context address) of the context in the memory. ), and on the other hand, encapsulate the first storage address and the second storage address as the second execution information.
  • the network card sends the second execution information to the processor scheduling engine, and the processor scheduling engine calls the idle second processor to obtain the code according to the first storage address, obtains the context according to the second storage address, and the processor executes the code according to the context.
  • the network card after receiving a data processing request, the network card directly stores the data request in the memory, and the operating system schedules the processor to obtain the data processing request by polling, and parses the data processing request.
  • the data processing request carries the context
  • this application has the following effects: the network card does not need to write the entire data processing request into the memory, and can reduce the delay in parsing the data processing request; when the network card determines that there is an execution task When the network card notifies the processor scheduling engine to actively call the second processor, there is no need for the operating system to schedule the second processor to use polling, interrupt, etc.
  • Task processing efficiency when there is a context, the context is directly stored by the network card to the location corresponding to the second storage address, and the second processor only needs to obtain the context directly from the memory according to the second storage address, without the need for the second processor to parse data processing request and copy the context to the location corresponding to the second storage address, which can reduce the bandwidth and other overhead of the second processor and improve the execution efficiency of the second processor; from the processing of the data processing request to the execution of the task, there is no need to pass
  • the OS invokes the second processor, which can improve the execution efficiency of computing tasks.
  • FIG. 9 is a schematic structural diagram of another embodiment of a data processing system provided by the present application.
  • System 90 includes first computing device 91 , second computing device 92 and third computing device 93 .
  • the first computing device 91 in FIG. 9 is used as a source device for sending a data processing request.
  • the second computing device 92 is a remote device configured to perform tasks upon request.
  • the second computing device 92 is a DPU.
  • the third computing device 93 is a device for storing computing tasks, specifically, the third computing device 93 is a host including a central processing unit and a memory.
  • the first computing device 91 and the second computing device 92 communicate through a network supporting the RDMA protocol.
  • Information transmission can be performed between the second computing device 92 and the third computing device 93 through the system bus.
  • FIG. 10 is a schematic structural diagram of an embodiment of the second computing device in FIG. 9 .
  • the second computing device 92 includes a network card 921 , a processor 922 and a processor scheduling engine 923 .
  • the network card 921 in this embodiment is the same as the network card 121 in FIG. 4 , so details are not repeated here.
  • the network card 921 is connected to the processor scheduling engine 923 , and the processor scheduling engine 923 is connected to the processor 922 .
  • the processor scheduling engine may be a packet order enforcer (POE).
  • the processor 922 may be a data processor.
  • the network card 921 is configured to receive a data processing request from the first computing device 91 and analyze the data processing request. After analyzing the corresponding task information from the data processing request, the network card 921 sends the task information to the processor scheduling engine 923 .
  • the processor scheduling engine 923 is configured to, after receiving task information from the network card 921 , schedule the processor 922 to process computing tasks to execute tasks corresponding to the task information. Therefore, the processor 922 can be directly scheduled to execute a specific task without the operating system scheduling the processor 922 to analyze the data processing request, thereby improving the execution efficiency of the task.
  • FIG. 11 is a schematic flowchart of a second embodiment of the data processing method provided by the present application. This embodiment includes the following steps:
  • a network card of a second computing device receives a data processing request sent by a first computing device.
  • step 501 This step is similar to step 501, so it will not be repeated here.
  • the network card of the second computing device acquires first execution information in the data processing request.
  • step 502 This step is similar to step 502, so it will not be repeated here.
  • the network card of the second computing device converts the first execution information into second execution information.
  • step 503 This step is similar to step 503, so it will not be repeated here.
  • the network card of the second computing device sends the second execution information to the processor scheduling engine of the second computing device.
  • step 504 This step is similar to step 504, so it will not be repeated here.
  • the processor calling engine of the second computing device calls the second processor of the second device.
  • the second processor of the second computing device acquires a computing task from the third computing device according to the second execution information.
  • the calculation task is stored in the memory of the host (that is, the third computing device). Therefore, before the processor of the second computing device processes the calculation task, it needs to follow the first storage address in the second execution message from the third device. Get computing tasks.
  • the second processor of the second computing device processes the computing task.
  • the processor of the second computing device directly processes the computing task.
  • the processor of the second computing device processes the computing task according to the context.
  • the network card of the second computing device receives the execution result.
  • the network card of the second computing device sends the execution result to the first computing device.
  • FIG. 12 is a schematic diagram of another embodiment of data processing by the second computing device of the present application.
  • the network card parses the data processing request, and when the basic transmission header of the data processing request is Excute, the network card further obtains the first execution information from the extension header of the data processing request.
  • the network card encapsulates the first storage address in the first execution information as the second execution information, and sends the second execution information to the processor scheduling engine, and the processor scheduling engine calls the idle processing
  • the device executes the code corresponding to the first storage address.
  • the network card will, on the one hand, DMA the context into the memory of the second computing device and obtain the second storage address of the context in the memory;
  • the second storage address is encapsulated as the second execution information.
  • the network card sends the second execution information to the processor scheduling engine, and the processor scheduling engine calls the idle second processor to obtain the code from the memory of the third computing device according to the first storage address, and obtains the context according to the second storage address, and the processor Execute code based on context.
  • the network card After the network card receives the data processing request, it directly parses the data processing request, obtains the first execution information in the data processing request, and converts the first execution information into the second execution information and sends it to the processor scheduling engine. Directly call the second processor to process the data to be processed, without sending the data processing request to the memory of the computing device, and then the OS calls the processor to analyze the data processing request, thereby avoiding calling the second processor through the OS and improving execution efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)

Abstract

一种数据处理方法及其相关设备,方法用于通讯领域,特别是需要调用远端的处理器执行任务的场景,方法由计算设备执行,计算设备包括网卡、处理器及处理器调度引擎,网卡从接收到的数据处理请求中获取待处理数据的第一执行信息,并将第一执行信息转化为第二执行信息,网卡将第二执行信息发送至处理器调度引擎,通过处理器调度引擎调用处理器处理,无需将数据处理请求传输到内存,通过操作***调用处理器处理。

Description

一种数据处理方法以及相关设备
本申请要求于2021年11月9日提交中国专利局、申请号为202111320330.5、发明名称为“一种数据处理方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通讯领域,尤其涉及一种数据处理方法以及相关设备。
背景技术
远程直接内存访问(remote direct memory access,RDMA)技术是一种直接访问远端内存的通讯技术,即可以直接将数据从客户端快速迁移到远端的内存中,而无需双方的操作***(operating system,OS)介入。
基于当前的RDMA技术,源端设备需要指示远端设备处理任务时,源端设备首先通过RDMA的发送命令将任务处理的请求发送至远端设备,远端设备的RDMA网络接口卡(rdma network interface card,RNIC)接收到任务处理请求后,首先将任务处理请求存储到远端设备的内存,然后由远端设备操作***解析所述任务处理请求,并调用远端设备的中央处理器(central processing unit,CPU)执行所述任务处理请求。但是操作***调度CPU的效率比较低,从而降低了任务处理的效率。
发明内容
本申请实施例提供了一种数据处理方法以及相关设备,用于需要远端调用处理器执行任务时,提高任务执行效率。
本申请实施例第一方面提供了一种数据处理方法:
本申请实施例中的方法由计算设备执行。计算设备包括网卡、处理器及处理器调度引擎,计算设备的网卡接收数据处理请求,数据处理请求中包括待处理数据的第一执行信息,网卡直接获取数据处理请求中的第一执行信息。网卡在获取到第一执行信息后,将第一执行信息转换为第二执行信息,第二执行信息包括第一执行信息。网卡将第二执行信息发送给处理器调度引擎,通过处理器调度引擎调用处理器,处理器获取第二执行信息后,根据第二执行信息对待处理数据进行处理。
网卡接收到数据处理请求后直接解析数据处理请求,获取数据处理请求内的第一执行信息,并将第一执行信息转换为第二执行信息发送给处理器调度引擎,通过处理器调度引擎直接调用处理器对待处理数据进行处理,无需将数据处理请求发送到计算设备的内存后,由OS调用处理器解析数据处理请求,从而避免通过OS调用处理器,提高执行的效率。
基于第一方面,一种可选的实现方式中:
数据处理请求中包括待处理数据的第一执行信息,其中,待处理数据为待执行的代码,第一执行信息包括代码的第一存储地址,网卡在获取到第一执行信息后,网卡将第一执行信息转换为第二执行信息,并将第二执行信息发送给处理器调度引擎,通过处理器调度引擎调 用处理器来处理第二执行信息。处理器根据第二执行信息中的第一存储地址从计算设备中获取代码,并执行该代码。
网卡直接获取数据处理请求中的第一执行信息,第一执行信息包括待处理的代码在计算设备中的地址,并将数据处理请求中的第一存储地址转换到第二执行信息中,使得处理器被调用后直接基于第二信息中的第一存储地址执行对应的代码,无需将数据处理请求发送到计算设备的内存中,再由OS调度处理器获取,节约获取时间,从而提高处理待处理数据的效率。
基于第一方面,一种可选的实现方式中:
计算设备还包括内存,数据处理请求中包括第一执行信息,待处理数据为待执行的代码,第一执行信息中包括代码的第一存储地址和执行代码所需要的上下文,网卡解析数据处理请求,获取第一执行信息中的上下文,网卡将上下文存储到内存中,并获取上下文在内存中的第二存储地址。网卡具体可以是通过直接存储器访问(direct memory access,DMA)的方式将上下文存储到内存中的,从而无需操作***调用处理器解析数据处请求并从中获取上下文,也无需操作***调用处理器将上下文复制到内存中,能够提高处理待处理数据的效率。网卡将第一存储地址与第二存储地址转换为第二执行信息。通过处理器调度引擎调用处理器来处理,处理器从第二执行信息中第一存储地址及第二存储地址,根据第一存储地址获取代码,根据第二存储地址获取上下文,根据上下文执行代码。
当执行代码需要上下文时,网卡能够将上下文的存储地址封装到第二执行信息中,无需处理器解析数据处理请求并从中获取第二执行信息,能够提高处理效率。
基于第一方面,一种可选的实现方式中;
数据处理请求携带在RDMA协议中的执行命令,数据处理请求包含BTH(basic transmission header,BTH)和扩展头,BTH包括执行命令,扩展头内包括第一执行信息。执行命令用于指示网卡对数据处理请求进行处理。
由于数据处理请求包含执行命令,无需传输到内存后,再由操作***调用处理器解析该数据处理请求,避免重复拷贝数据处理请求,浪费内存资源,且由于无需处理器解析该数据处理请求,提高效率。
基于第一方面,一种可选的实现方式中:
执行命令为在RDMA协议的传输层的自定义字段中定义的命令。执行命令用于指示网卡直接解析数据处理请求。
RDMA协议的传输层的自定义字段写入执行命令,使传输的数据处理请求包括执行命令,从而使网卡接收到包含执行命令的数据处理请求后,直接解析该数据处理请求,且无需处理器解析该数据处理请求,提高效率。
基于第一方面,一种可选的实现方式中:
处理器调度引擎为包保序引擎。
通过处理器调度引擎调度处理器,避免使用OS调用,提高处理的效率。
基于第一方面一种可选地实现方式中,三者之间的关系参见下述任一项所示:
1.计算设备包括与网卡连接的主机、主机包括处理器与处理器调度引擎。网卡获取到数据处理请求时,直接获取数据处理请求内的第一执行信息。网卡在获取到第一执行信息后,将第一执行信息转换为第二执行信息。网卡将第二执行信息发送给主机的处理器调度引擎, 通过处理器调度引擎调用处理器来处理第二执行信息。处理器根据第二执行信息对待处理数据进行处理。
网卡接收到数据处理请求后直接解析数据处理请求,获取数据处理请求内的第一执行信息,并将第一执行信息转换为第二执行信息发送给网卡内的处理器调度引擎,由处理器调度引擎直接调用网卡内的处理器执行待处理数据。无需将数据处理请求传送到内存,再通过OS调用处理器,提高执行的效率。
2.计算设备为数据处理单元,包括与网卡、处理器与处理器调度引擎。待处理数据存储在另一计算设备中。网卡获取到数据处理请求时,直接获取数据处理请求内的第一执行信息。网卡在获取到第一执行信息后,将第一执行信息转换为第二执行信息。网卡将第二执行信息发送给处理器调度引擎,通过处理器调度引擎调用计算单元内的处理器来处理第二执行信息。处理器根据第二执行信息对另一计算设备中的待处理数据进行处理。
本申请实施例第二方面提供了一种网卡:
网卡包括第一处理器及存储设备,存储设备中存储程序指令,第一处理器运行程序指令以执行:接收数据处理请求,数据处理请求中包括待处理数据的第一执行信息;从数据处理请求中获取第一执行信息;将第一执行信息转换为第二执行信息,并将第二执行信息发送至处理器调度引擎,第二执行信息用于指示处理器调度引擎调度第二处理器,使第二处理器根据第二执行信息对待处理数据进行处理。
网卡直接获取数据处理请求中的第一执行信息,第一执行信息包括待处理的代码在计算设备中的地址,并将数据处理请求中的第一存储地址转换到第二执行信息中,使得处理器被调用后直接基于第二信息中的第一存储地址执行对应的代码,无需将数据处理请求发送到计算设备的内存中,再由OS调度处理器获取,节约获取时间,从而提高处理待处理数据的效率。
基于第二方面,一种可能实现的方式中;
数据处理请求携带于远程直接内存访问RDMA协议中的执行命令中,执行命令用于指示网卡对数据处理请求进行处理。
基于第二方面,一种可能实现的方式中;
执行命令为在RDMA协议的传输层的自定义字段中定义的命令。
本申请实施例第三方面提供了一种计算设备:
计算设备包括第二方面的网卡、第二处理器及处理器调度引擎。网卡用于,接收数据处理请求,数据处理请求中包括待处理数据的第一执行信息。网卡还用于,从数据处理请求中获取第一执行信息。网卡还用于,将第一执行信息转换为第二执行信息,第二执行信息包括第一执行信息。网卡还用于将第二执行信息发送至处理器调度引擎。处理器调度执行引擎用于,调用第二处理器处理第二执行信息。第二处理器用于,根据第二执行信息对待处理数据进行处理。
基于第三方面,一种可选的实现方式中,待处理数据为待执行的代码,第一执行信息包括代码的第一存储地址。第二处理器具体用于,根据第二执行信息中第一存储地址获取代码,并执行代码。
基于第三方面,一种可选的实现方式中,计算设备还包括内存,数据处理请求中包括第一执行信息,待处理数据为待执行的代码,第一执行信息中包括代码的第一存储地址和执行 代码所需要的上下文,网卡解析数据处理请求,获取第一执行信息中的上下文,网卡将上下文存储到内存中,并获取上下文在内存中的第二存储地址。网卡具体可以是通过DMA的方式将上下文存储到内存中的,从而无需操作***调用第二处理器解析数据处理请求并从中获取上下文,也无需操作***调用第二处理器将上下文复制到内存中,能够提高处理待处理数据的效率。网卡将第一存储地址与第二存储地址转换为第二执行信息。通过处理器调度引擎调用第二处理器来处理,第二处理器从第二执行信息中第一存储地址及第二存储地址,根据第一存储地址获取代码,根据第二存储地址获取上下文,根据上下文执行代码。
当执行代码需要上下文时,网卡能够将上下文的存储地址封装到第二执行信息中,无需处理器解析数据处理请求并从中获取第二执行信息,能够提高处理效率。
基于第三方面,一种可选的实现方式中,数据处理请求携带于远程直接内存访问RDMA协议中的执行命令中。
基于第三方面,一种可选的实现方式中,执行命令为在RDMA协议的传输层的自定义字段中定义的命令。
基于第三方面,一种可选的实现方式中,处理器调度引擎为包保序引擎。
基于第三方面,一种可选的实现方式中,计算设备包括与网卡连接的主机,主机包括第二处理器与处理器调度引擎。
基于第三方面,一种可选的实现方式中,处理器为数据处理器DPU,计算设备连接至另一计算设备,待处理数据存储在另一计算设备中。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请提供的数据处理***一实施例的结构示意图;
图2为本申请实施例中RDMA协议所定义的命令的示意图;
图3为图1中第二计算设备一实施例的结构示意图;
图4为本申请提供的网卡一实施例的结构示意图;
图5为本申请提供的数据处理方法第一实施例的流程示意图;
图6为本申请提供的数据处理请求的一种格式的示意图;
图7为本申请提供的数据处理请求的另一种格式的示意图;
图8为本申请第二计算设备处理数据一实施例的示意图;
图9为本申请提供数据处理***的另一个实施例的结构示意图;
图10为图9中第二计算设备一实施例的结构示意图;
图11为本申请提供的数据处理方法第二实施例的流程示意图;
图12为本申请第二计算设备处理数据另一实施例的示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
RDMA技术是一种可以把数据直接传入远端计算设备存储区的技术,让数据从一个计算设备的存储区快速移动到另一个计算设备的存储区,且在访问远端计算设备的存储区的过程中,可以不用调用处理器,提高数据在远端计算设备存取效率。
而在另一些应用场景中,RDMA技术除了用于访问远端计算设备的存储区,还可以用于调用远端计算设备的算力。具体而言,发出调度请求的调度设备(源端设备)向被调度的远端计算设备发送请求消息,请求消息中携带具体的函数计算或信息处理等任务信息,远端计算设备根据请求消息执行对应的任务,并将任务执行的结果返回至源端设备。远端计算设备需要操作***调度处理器通过轮询或中断等操作获取请求消息,并且在执行任务的过程中,需要操作***调度处理器对请求消息进行解析、拷贝等操作,这些操作都会增加处理任务的时延,而且让处理器处于高负荷运转状态,增加处理器的功耗。
为了解决基于RDMA技术的传输场景中,特别是需要远端执行任务的场景中,通过操作***调用处理器效率低的问题,本申请提供如下实施例。如图1所示,图1为本申请提供的数据处理***一实施例的结构示意图。***10包括第一计算设备11和第二计算设备12。图1中的第一计算设备11作为源端设备,用于发送数据处理请求。其中,所述数据处理请求用于指示第二计算设备12对待处理的数据进行处理,所述待处理的数据例如为计算任务。第二计算设备12为远端设备,存储有待处理的数据,例如计算任务,所述计算任务例如可以为一段程序代码,例如函数,所述第二计算设备12还用于执行所述数据处理请求所指示的任务。其中,第二计算设备12可以是服务器、计算机或平板电脑等设备。为了方便描述,下文将以所述待处理数据为计算任务为例进行描述。
第一计算设备11和第二计算设备12之间通过RDMA协议通信。本申请实施例中,在RDMA协议所执行的基本操作命令之外,本申请增加了一种新的命令,即执行(Excute)命令。该命令为在RDMA协议的传输层的自定义字段中定义的命令。请参阅图2,图2为本申请实施例中RDMA协议所定义的命令的示意图。RDMA协议当前定义的命令有发送(Send)命令、写(Write)命令、读(Read)命令和原子(Atomic)命令。本申请实施例中,在OPCode字段中定义了一种新的命令,即执行(Excute)命令,所述执行命令用于指示第二计算设备12中的网卡在接收到所述执行命令后,直接解析携带的数据,例如数据处理请求,而无需将所携带的数据传输到第二计算设备的内存。需要说明的是,“执行命令”只是对指示网卡直接解析 数据处理请求的命令的统称,并不特指某个或某些命令,在实际应用中,也可能对指示网卡直接解析数据处理请求的命令不称之为“执行命令”,而以其他名称代替,具体此处不做限定,本申请实施例中只以“执行命令”为例进行说明。
具体地,如图3和图4所示,图3为本申请提供的计算设备一实施例的结构示意图;图4为本申请提供的网卡一实施例的结构示意图。本实施例中,第二计算设备12包括图4中的网卡121、第二处理器122及处理器调度引擎123。网卡121连接处理器调度引擎123,处理器调度引擎123连接第二处理器122。其中,第二计算设备12可以包括与网卡121连接的主机124,第二处理器122与处理器调度引擎设置123在主机124内。第二处理器122可以为中央处理器(central processing unit,CPU)或图形处理器(graphics processing unit,GPU),具体此处不做限定。处理器调度引擎123可以为包保序引擎(packet order enforcer,POE)。网卡121包括第一处理器1211和存储设备1212。
网卡121用于接收第一计算设备11的数据处理请求,并对数据处理请求进行解析。具体地,存储设备1212中存储程序指令。响应于执行命令,第一处理器1211运行该程序指令以执行下列操作:接收并解析数据处理请求,从数据处理请求中解析到计算任务的信息后,将所述计算任务的信息进行转换为调度信息,所述调度信息可以用于调度第二处理器122,然后将调度信息发送给处理器调度引擎123。处理器调度引擎123可以直接根据调度信息,调度第二处理器122处理所述调度信息所指示的计算任务。从而,在第二计算设备12执行计算任务时,无需操作***调度第二处理器122执行计算任务,从而提高了任务的执行效率。在以下实施例中,为了方便描述,将所述计算任务的信息称为第一执行信息,将所述调度信息称为第二执行信息。
第二计算设备还可以包括存储器125,存储器125与第二处理器122连接。存储器125用于存储计算任务,从而第二处理器122根据第二执行信息的指示从存储器125获取并处理计算任务。其中,存储器125可以为只读存储器(read-only memory,ROM)、闪速存储器(flash memory)或磁盘等非易失性存储器。
可选地,网卡121还包括直接存储器访问(direct memory access,DMA)控制器1213,DMA控制器1213与第一处理器1211连接。第二计算设备12还包括内存126,内存126与DMA控制器1213以及第二处理器122连接。其中,内存126为随机存储器(random access memory,RAM)。当网卡121解析到第一执行信息中还包括执行计算任务所需的状态信息,例如上下文时,通过DMA控制器1213将状态信息DMA传送到内存126中,以使第二处理器122能够直接从内存126中获取状态信息,从而处理计算任务。为了方便描述,下文将以状态信息为上下文为例进行描述。
基于上述的第二计算设备12,本申请提供如下的数据处理方法实施例。如图5所示,图5为本申请提供的数据处理方法第一实施例的流程示意图。本实施例包括以下步骤:
501、第二计算设备的网卡接收第一计算设备发送的数据处理请求。
在本申请实施例中,第一计算设备向第二计算设备发送数据处理请求,第二计算设备的网卡接收该数据处理请求。数据处理请求包括计算任务的第一执行信息。计算任务为待执行的代码。待执行的打码例如为算法代码或函数代码等。
其中,第一执行信息包括代码在第二计算设备中的第一存储地址。第一计算设备在向第 二计算设备发送数据处理请求之前,第一计算设备需要获取第一执行信息。
为了提高任务的执行效率,计算任务可以是预先存储在第二设备的存储器中的,第一计算设备则存储计算任务与第一存储地址的映射关系。当第一计算设备需要第二计算设备基于计算任务执行函数计算等任务时,第一计算设备根据映射关系获取第一存储地址,并将第一存储地址封装到数据处理请求中。
[根据细则91更正 05.08.2022] 
如图6所示,图6为本申请提供的数据处理请求的一种格式的示意图。数据处理请求包括基本传输头(basic transmission header,BTH)和扩展头。BTH内的操作码(opcode,OP)包括执行命令,扩展头内包括第一执行信息。此类不携带上下文的数据处理请求为无状态执行请求,即计算任务可直接被处理器运行并输出结果,而不需要额外输入数据。
[根据细则91更正 05.08.2022] 
如图7所示,图7为本申请提供的数据处理请求的另一种格式的示意图。数据处理请求包括BTH和扩展头。BTH内的OP包括执行命令,扩展头内包括第一执行信息。与图5中的数据处理请求的格式不同的是,图7中的数据处理请求的格式中扩展头所携带的第一执行信息中,还包括上下文。上下文例如为运行计算任务过程中所需要的数据,例如参数、初始值或取值范围等等。此类携带上下文的数据处理请求为有状态执行请求,即需要为计算任务提供上下文,计算任务才能够被处理器处理。
图6和图7中,循环冗余校验(cyclic redundancy check,CRC)是根据网络数据包或电脑文件等数据产生简短固定位数校验码的一种散列函数,用来检测或校验数据传输或者保存后可能出现的错误,利用除法及余数的原理来做错误侦测。
502、第二计算设备的网卡获取数据处理请求内的第一执行信息。
具体地,第二计算设备的网卡接收到数据处理请求后,解析数据处理请求,从数据处理请求的传输头中获取执行命令。当网卡从数据处理请求的传输头获取到的执行命令为Excute时,网卡进一步从数据处理请求的扩展头获取第一执行信息,并执行步骤503。
也即,对数据处理请求的解析由网卡完成,而非直接将数据处理请求存储到内存、操作***再调度处理器解析内存中的数据处理请求,从而,无需操作***和处理器参与,还减少了将数据处理请求写入到内存的操作,能够提高解析的效率以及降低处理器的功耗。
503、第二计算设备的网卡将第一执行信息转换为第二执行信息。
第二计算设备的网卡将解析得到的第一执行信息封装为第二执行信息。该第二执行信息为描述符,以使得处理器调度引擎能够识别第二执行消息并进行后续的调度处理器的操作。
当数据处理请求为无状态执行请求时,第一执行消息中包括第一存储地址。网卡将第一存储地址封装为描述符,得到第二执行信息。
当数据处理请求为有状态执行请求时,第一执行消息中包括第一存储地址和上下文。网卡将第一存储地址和第二存储地址封装为描述符,得到第二执行信息。
当第一执行信息包括上下文时,网卡在解析得到上下文后,网卡将上下文存储到第二计算设备的内存中,并获取上下文在内存中的第二存储地址。网卡例如通过DMA的方式将上下文写入到第二计算设备的内存中。具体地,网卡内的DMA控制器向第二处理器申请DMA传送上下文,第二处理器允许DMA传送并为DMA控制器配置写入上下文的主存起始地址,DMA控制器根据主存起始地址将上下文写入内存中。上下文在内存中的第二存储地址为主存起始地址开始,到主存起始地址加上上下文长度结束。即第二存储地址时根据第二处理器配置的主 存起始地址和上下文的长度确定的。从而第二处理器可以直接从内存中获取上下文,无需操作***调用第二处理器从数据处理请求中获取上下文并将上下文复制到内存中,能够降低处理器功耗,以及提高处理器处理计算任务的效率。
[根据细则91更正 05.08.2022] 
在其他实施方式中,为避免数据处理请求在传输过程中被劫持,导致第一存储地址泄露而造成第二计算设备被恶意攻击,提高传输过程中的安全性,第一执行信息中的第一存储地址可以为函数地址(function address,FA)。第二计算设备的网卡解析出虚拟地址形式的第一存储地址后,获取地址表(adress table,AT),地址表包括虚拟地址与存储代码的实际物理地址的映射关系。第二计算设备的网卡根据地址表查找该虚拟地址形式的第一存储地址对应的实际物理地址。该情况下,第二执行消息中的第一存储地址为实际物理地址。使用虚拟替代代码的地址,防止代码的地址在传输过程中泄露,提高传输过程的安全性。
504、第二计算设备的网卡发送第二执行信息至第二计算设备的处理器调度引擎。
第二计算设备的网卡得到第二执行信息后,将第二执行信息发送至第二计算设备的处理器调度引擎,以通知处理器调度引擎调度第二处理器。
505、第二计算设备的处理器调用引擎调用第二设备的第二处理器。
处理器调度引擎为能够直接调用处理器的硬件。第二计算设备的处理器调用引擎在获取到第二执行信息后,调用空闲的第二处理器。本申请实施例,通过网卡生成调用处理器的第二执行信息,并将第二执行信息发送给处理器调度引擎,由处理器调度引擎调度第二处理器执行计算任务,无需操作***调度第二处理器,从而提高执行的效率。
506、第二计算设备的第二处理器根据第二执行信息对计算任务进行处理。
在本申请实施例中,当第二执行信息包括第一存储地址时,第二处理器根据第二执行信息中的第一存储地址从第二计算设备的存储器获取代码,并进行处理。第二计算设备的处理器调度引擎直接调用空闲的第二处理器,第二处理器接获取到第二执行信息内代码的地址后,修改第二处理器的程序计数器(program counter,PC)的值,使第二处理器从PC的值指向的地址(即第一存储地址)获取代码。第二处理器获取代码后执行该代码。
PC为第二处理器中用于存储指令的地址的控制寄存器,它包含当前正在执行的指令的地址(位置)。当每个指令被获取,程序计数器的存储地址加一。在每个指令被获取之后,程序计数器指向顺序中的下一个指令。由于大多数指令都是按顺序来执行的,所以修改PC的过程通常只是简单的对PC加“指令字节数”。当程序转移时,转移指令执行的最终结果就是要改变PC的值,此PC值就是转去的目标地址。第二处理器总是按照PC指向取指、译码、执行,以此实现了程序转移。转移指令如跳转指令时,后继指令的地址(即PC的内容)必须从指令寄存器中的地址字段取得。在这种情况下,下一条从内存取出的指令将由转移指令来规定,而不像通常一样按顺序来取得。因此程序计数器的结构应当是具有寄存信息和计数两种功能的结构。本实施例中,修改PC的值可以是按顺序修改,也可以是通过程序转移修改,对此本申请不做限制。
在其他实施例中,当第二执行信息包括第一存储地址以及上下文的第二存储地址时,第二处理器根据第一存储地址从第二计算设备中获取代码,以及根据第二存储地址从第二计算设备中的内存获取上下文,并使用上下文执行代码,即将上下文带入到代码中,然后执行代码。具体地,第二计算设备的处理器调度引擎直接调用空闲的第二处理器,第二处理器获取 到第一存储地址与第二存储地址后,修改PC的值,分别获取根据第一存储地址从存储器获取代码以及根据第二存储地址从内存中获取上下文,处理器使用上下文执行该代码生成结果信息。
507、第二计算设备的网卡接收执行结果。
第二计算设备的第二处理器执行代码后,生成执行结果,第二计算设备的网卡接收第二计算设备的第二处理器发送该执行结果。
508、第二计算设备的网卡向第一计算设备发送执行结果。
第二计算设备的网卡接收到结果信息后,将该执行结果发送给第一计算设备。
以上步骤502至步骤506由第二计算设备中的各部件配合实现,请参阅图8,图8为本申请第二计算设备处理数据一实施例的示意图。网卡解析数据处理请求,当数据处理请求的基本传输头为Excute时,网卡进一步从数据处理请求的扩展头获取第一执行信息。
若第一执行信息中不包括上下文,网卡将第一执行信息中的第一存储地址(function address)封装为第二执行信息,并将第二执行信息发送至处理器调度引擎,处理器调度引擎调用空闲的第二处理器执行第一存储地址对应的代码。
若第一执行信息中包括第一存储地址和上下文,网卡将一方面将上下文存储到第二计算设备的内存,即堆栈(stack)中,并获取上下文在内存中的第二存储地址(context address),另一方面将第一存储地址和第二存储地址封装为第二执行信息。网卡将第二执行信息发送至处理器调度引擎,处理器调度引擎调用空闲的第二处理器根据第一存储地址获取代码,根据第二存储地址获取上下文,处理器根据上下文执行代码。
可见,相较于相关技术中,网卡接收到数据处理请求后,直接将数据请求存储到内存,由操作***调度处理器轮询获取数据处理请求,并解析数据处理请求,在数据处理请求携带上下文的情况下,第二处理器还需要将上下文拷贝到内存中,本申请具有如下效果:网卡无需将整个数据处理请求写入内存,能够降低解析数据处理请求的时延;当网卡确定有执行任务时,网卡通知处理器调度引擎主动调用第二处理器,无需操作***调度第二处理器采用轮询、中断等方式判断是否接收到执行任务,能够减少第二处理器的带宽等开销,提高计算任务的处理效率;当存在上下文时,上下文由网卡直接存储到第二存储地址对应的位置,第二处理器只需直接根据第二存储地址从内存中获取上下文,无需第二处理器解析数据处理请求以及将上下文拷贝到第二存储地址对应的位置,能够减少第二处理器的带宽等开销,提高第二处理器的执行效率;从数据处理请求的处理,到任务的执行过程中,无需通过OS调用第二处理器,能够提高处理计算任务的执行效率。
以上是由主机中的第二处理器对计算任务进行处理的实施例,在其他实施例中,第二计算设备为数据处理单元(data processing unit,DPU),由数据处理单元中的第二处理器对计算任务进行处理。请参阅图9,图9为本申请提供数据处理***的另一个实施例的结构示意图。
***90包括第一计算设备91和第二计算设备92和第三计算设备93。图9中的第一计算设备91作为源端设备,用于发送数据处理请求。第二计算设备92为远端设备,用于根据请求执行任务。第二计算设备92为DPU。第三计算设备93为存储计算任务的设备,具体而言,第三计算设备93为包括中央处理器和内存的主机。
第一计算设备91和第二计算设备92之间通过支持RDMA协议的网络通信。第二计算设备92与第三计算设备93之间可以通过***总线进行信息传输。
如图10所示,图10为图9中第二计算设备一实施例的结构示意图。本实施例中,第二计算设备92包括网卡921、处理器922及处理器调度引擎923。本实施例中网卡921与图4中的网卡121相同,故在此不再赘述。网卡921连接处理器调度引擎923,处理器调度引擎923连接处理器922。处理器调度引擎可以为包保序引擎(packet order enforcer,POE)。本实施例中,处理器922可以为数据处理器。
网卡921用于接收第一计算设备91的数据处理请求,并对数据处理请求进行解析。网卡921从数据处理请求中解析到相应的任务信息后,将任务信息发送给处理器调度引擎923。处理器调度引擎923用于在接收到来自网卡921的任务信息后,调度处理器922处理计算任务以执行任务信息对应的任务。从而,无需操作***调度处理器922对数据处理请求进行解析,处理器922可直接被调度用于执行具体的任务,提高任务的执行效率。
基于上述的第二计算设备92,本申请提供如下的数据处理方法实施例。如图11所示,图11为本申请提供的数据处理方法第二实施例的流程示意图。本实施例包括以下步骤:
1101、第二计算设备的网卡接收第一计算设备发送的数据处理请求。
本步骤与步骤501类似,故在此不再赘述。
1102、第二计算设备的网卡获取数据处理请求内的第一执行信息。
本步骤与步骤502类似,故在此不再赘述。
1103、第二计算设备的网卡将第一执行信息转换为第二执行信息。
本步骤与步骤503类似,故在此不再赘述。
1104、第二计算设备的网卡发送第二执行信息至第二计算设备的处理器调度引擎。
本步骤与步骤504类似,故在此不再赘述。
1105、第二计算设备的处理器调用引擎调用第二设备的第二处理器。
1106、第二计算设备的第二处理器根据第二执行信息从第三计算设备获取计算任务。
计算任务存储于主机(即第三计算设备)的存储器中,因而,第二计算设备的处理器在对计算任务进行处理前,需要跟第二执行消息中的第一存储地址从第三设备中获取计算任务。
1107、第二计算设备的第二处理器对计算任务进行处理。
当第二执行信息中包括第一存储地址而不包括上下文时,第二计算设备的处理器直接对计算任务进行处理。
当第二执行信息中包括第一存储地址、上下文和第二存储地址时,第二计算设备的处理器根据上下文对计算任务进行处理。
1108、第二计算设备的网卡接收执行结果。
第二计算设备的处理器执行代码后,生成执行结果,第二计算设备的网卡接收第二计算设备的处理器发送该执行结果。
1109、第二计算设备的网卡向第一计算设备发送执行结果。
以上步骤1102至步骤1107由第二计算设备中的各部件配合实现,请参阅图12,图12为本申请第二计算设备处理数据另一实施例的示意图。网卡解析数据处理请求,当数据处理请求的基本传输头为Excute时,网卡进一步从数据处理请求的扩展头获取第一执行信息。
若第一执行信息中不包括上下文,网卡将第一执行信息中的第一存储地址封装为第二执行信息,并将第二执行信息发送至处理器调度引擎,处理器调度引擎调用空闲的处理器执行第一存储地址对应的代码。
若第一执行信息中包括第一存储地址及上下文,网卡将一方面将上下文DMA到第二计算设备的内存中并获取上下文在内存中的第二存储地址,另一方面将第一存储地址和第二存储地址封装为第二执行信息。网卡将第二执行信息发送至处理器调度引擎,处理器调度引擎调用空闲的第二处理器根据第一存储地址从第三计算设备的存储器中获取代码,根据第二存储地址获取上下文,处理器根据上下文执行代码。
可见,网卡接收到数据处理请求后直接解析数据处理请求,获取数据处理请求内的第一执行信息,并将第一执行信息转换为第二执行信息发送给处理器调度引擎,通过处理器调度引擎直接调用第二处理器对待处理数据进行处理,无需将数据处理请求发送到计算设备的内存后,由OS调用处理器解析数据处理请求,从而避免通过OS调用第二处理器,提高执行的效率。
以上,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (15)

  1. 一种数据处理方法,由计算设备执行,所述计算设备包括网卡、处理器及处理器调度引擎,其特征在于,所述方法包括:
    所述网卡接收数据处理请求,所述数据处理请求中包括待处理数据的第一执行信息;
    所述网卡从所述数据处理请求中获取所述第一执行信息;
    所述网卡将所述第一执行信息转换为第二执行信息;
    所述网卡将所述第二执行信息发送至所述处理器调度引擎;
    所述处理器调度引擎调用所述处理器处理所述第二执行信息;
    所述处理器根据所述第二执行信息对所述待处理数据进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述待处理数据为待执行的代码,所述第一执行信息包括所述代码的第一存储地址;
    所述处理器根据所述第二执行信息对所述待处理数据进行处理包括:
    所述处理器根据所述第二执行信息中的所述第一存储地址获取所述代码;
    所述处理器执行所述代码。
  3. 根据权利要求1所述的方法,其特征在于,所述待处理数据为待执行的代码,所述第一执行信息包括所述代码的第一存储地址和执行所述代码所需要的上下文,
    所述网卡在从所述数据处理请求中获取所述第一执行信息后,将所述上下文存储至所述计算设备的内存;
    所述第二执行信息包括所述第一存储地址与存储所述上下文的第二存储地址;
    所述处理器根据所述第二执行信息对所述待处理数据进行处理包括:
    所述处理器从所述第二执行信息中获取所述第一存储地址及所述第二存储地址,根据所述第一存储地址获取所述代码,根据所述第二存储地址获取所述上下文,根据所述上下文执行所述代码。
  4. 根据权利要求1-3任意一项所述的方法,其特征在于,所述数据处理请求携带于远程直接内存访问RDMA协议中的执行命令中。
  5. 根据权利要求4所述的方法,其特征在于,所述执行命令为在所述RDMA协议的传输层的自定义字段中定义的命令。
  6. 根据权利要求1-5任意一项所述的方法,其特征在于,所述计算设备包括与所述网卡连接的主机,所述主机包括所述处理器与所述处理器调度引擎。
  7. 根据权利要求2或3所述的方法,其特征在于,所述计算设备为数据处理单元DPU,所述计算设备连接至另一计算设备,所述待处理数据存储在所述另一计算设备中,所述处理器根据所述第一存储地址获取所述代码包括:
    所述处理器根据所述第一存储地址从所述另一计算设备中获取所述代码。
  8. 一种网卡,其特征在于,包括第一处理器及存储设备,所述存储设备中存储程序指令,所述第一处理器运行所述程序指令以执行:
    接收数据处理请求,所述数据处理请求中包括待处理数据的第一执行信息;
    从所述数据处理请求中获取所述第一执行信息;
    将所述第一执行信息转换为第二执行信息,并将所述第二执行信息发送至处理器调度引 擎,所述第二执行信息用于指示所述处理器调度引擎调度第二处理器,使所述第二处理器根据所述第二执行信息对所述待处理数据进行处理。
  9. 根据权利要求8所述的网卡,其特征在于,所述数据处理请求携带于远程直接内存访问RDMA协议中的执行命令中,所述执行命令用于指示所述网卡对所述数据处理请求进行处理。
  10. 根据权利要求8所述的网卡,其特征在于,所述执行命令为在所述RDMA协议的传输层的自定义字段中定义的命令。
  11. 一种计算设备,其特征在于,包括权利要求8-10所述的网卡、第二处理器及处理器调度引擎;
    所述处理器调度引擎用于,调用所述第二处理器处理所述第二执行信息;
    所述第二处理器用于,根据所述第二执行信息对所述待处理数据进行处理。
  12. 根据权利要求11所述的计算设备,其特征在于,所述待处理数据为待执行的代码,所述第一执行信息包括所述代码的第一存储地址;
    所述第二处理器具体用于,根据所述第二执行信息中的所述第一存储地址获取所述代码,并执行所述代码。
  13. 根据权利要求12所述的计算设备,其特征在于,所述待处理数据为待执行的代码,所述第一执行信息包括所述代码的第一存储地址和执行所述代码所需要的上下文;
    所述网卡在从所述数据处理请求中获取所述第一执行信息后,还用于将所述上下文存储至所述计算设备的内存;
    所述第二执行信息包括所述第一存储地址与存储所述上下文的第二存储地址;
    所述第二处理器具体用于,从所述第二执行信息中获取所述第一存储地址及所述第二存储地址,根据所述第一存储地址获取所述代码,根据所述第二存储地址获取所述上下文,根据所述上下文执行所述代码。
  14. 根据权利要求11或12所述的计算设备,其特征在于,所述计算设备包括与所述网卡连接的主机,所述主机包括所述第二处理器与所述处理器调度引擎。
  15. 根据权利要求13所述的计算设备,其特征在于,所述计算设备为数据处理器DPU,所述计算设备连接至另一计算设备,所述待处理数据存储在所述另一计算设备中。
PCT/CN2022/095908 2021-11-09 2022-05-30 一种数据处理方法以及相关设备 WO2023082609A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111320330.5 2021-11-09
CN202111320330.5A CN116107954A (zh) 2021-11-09 2021-11-09 一种数据处理方法以及相关设备

Publications (1)

Publication Number Publication Date
WO2023082609A1 true WO2023082609A1 (zh) 2023-05-19

Family

ID=86264272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095908 WO2023082609A1 (zh) 2021-11-09 2022-05-30 一种数据处理方法以及相关设备

Country Status (2)

Country Link
CN (1) CN116107954A (zh)
WO (1) WO2023082609A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690622A (zh) * 2016-08-26 2018-02-13 华为技术有限公司 实现硬件加速处理的方法、设备和***
US20200326868A1 (en) * 2019-04-11 2020-10-15 Samsung Electronics Co., Ltd. Intelligent path selection and load balancing
CN113296718A (zh) * 2021-07-27 2021-08-24 阿里云计算有限公司 数据处理方法以及装置
CN113312092A (zh) * 2020-07-27 2021-08-27 阿里巴巴集团控股有限公司 启动方法、***以及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107690622A (zh) * 2016-08-26 2018-02-13 华为技术有限公司 实现硬件加速处理的方法、设备和***
US20200326868A1 (en) * 2019-04-11 2020-10-15 Samsung Electronics Co., Ltd. Intelligent path selection and load balancing
CN113312092A (zh) * 2020-07-27 2021-08-27 阿里巴巴集团控股有限公司 启动方法、***以及装置
CN113296718A (zh) * 2021-07-27 2021-08-24 阿里云计算有限公司 数据处理方法以及装置

Also Published As

Publication number Publication date
CN116107954A (zh) 2023-05-12

Similar Documents

Publication Publication Date Title
US11368560B2 (en) Methods and apparatus for self-tuning operation within user space stack architectures
US10666777B2 (en) Reducing network latency
EP1086421B1 (en) Method and computer program product for offloading processing tasks from software to hardware
US6449656B1 (en) Storing a frame header
US6098112A (en) Streams function registering
US7512128B2 (en) System and method for a multi-packet data link layer data transmission
US8291486B2 (en) Gateway device having socket library for monitoring, communication method of gateway device having socket library for monitoring, and communication program of gateway device having socket library for monitoring
WO2021042840A1 (zh) 数据处理方法、装置、服务器和计算机可读存储介质
WO2020224300A1 (zh) 基于用户态协议栈的报文分流方法、装置及***
US8737262B2 (en) Zero copy transmission with raw packets
US9692642B2 (en) Offloading to a network interface card
WO2022032984A1 (zh) 一种mqtt协议仿真方法及仿真设备
WO2020151449A1 (zh) 基于quic协议栈的数据处理方法、***、设备及存储介质
US9680774B2 (en) Network interface card offloading
US20210391950A1 (en) Methods and Apparatus for Cross-Layer Transport Awareness
CN111277600A (zh) 数据传输方法及装置
CN116049085A (zh) 一种数据处理***及方法
CN111131439A (zh) 基于iSCSI的报文传输方法、装置、设备及存储介质
US8745235B2 (en) Networking system call data division for zero copy operations
WO2024125106A1 (zh) 数据传输方法、装置、设备及存储介质
CN107040539B (zh) 一种协议数据包构建方法、装置及计算机***
WO2023082609A1 (zh) 一种数据处理方法以及相关设备
EP3547132B1 (en) Data processing system
CN116915860B (zh) 基于udp的指令传输方法、装置、设备以及介质
Saito et al. Low-latency remote-offloading system for accelerator offloading

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891419

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022891419

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022891419

Country of ref document: EP

Effective date: 20240515