CN118250175A

CN118250175A - Distributed computing implementation method and device, network card chip and medium

Info

Publication number: CN118250175A
Application number: CN202410326855.7A
Authority: CN
Inventors: 周永财; 张亚林
Original assignee: Shanghai Suiyuan Technology Co ltd
Current assignee: Shanghai Suiyuan Technology Co ltd
Priority date: 2024-03-21
Filing date: 2024-03-21
Publication date: 2024-06-25

Abstract

The invention discloses a distributed computing implementation method, a device, a network card chip and a medium, comprising the following steps: acquiring transmission bandwidths corresponding to PCIE interfaces in the distributed computing system, and adjusting the network port bandwidth of the network card chip according to the transmission bandwidths corresponding to the PCIE interfaces; responding to a distributed computing request triggered by a user, and executing distributed computing operation according to the adjusted network port bandwidth through a network card chip; and feeding back an execution result corresponding to the distributed computing operation to the artificial intelligent chip and the remote computing equipment through the network card chip. The technical scheme of the embodiment of the invention can solve the problem of communication bottleneck caused by PCIE interface bandwidth in the prior art, and improves the distributed computing performance.

Description

Distributed computing implementation method and device, network card chip and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a distributed computing implementation method, a device, a network card chip, and a medium.

Background

Fig. 1a is a schematic diagram of a scenario in which an artificial intelligent chip and a network card chip (NIC) interact with each other, as shown in fig. 1a, where the artificial intelligent chip and the network card chip are completely independent and communicate with each other using a high-speed serial computer expansion bus standard (PERIPHERAL COMPONENT INTERCONNECT EXPRESS, PCIE) interface. Wherein, graphics processor (Graphics Processing Unit, GPU) is artificial intelligent chip, and the proportion of artificial intelligent chip and network card chip can be 1:1 or 2:1.

In the prior art, all calculations related to artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) are performed by the GPU, including distributed Allreduce calculations, etc., whose bandwidth of communication can significantly impact efficiency.

Taking Ring Allreduce calculation as an example, in the currently mainstream implementation manner, the calculated data volume and the communication data volume are about 1: and 2, the communication data volume is borne by the PCIE interface, so that the distributed computing performance is seriously affected, and the communication bottleneck in the distributed computing process is caused.

Disclosure of Invention

The invention provides a distributed computing implementation method, a distributed computing implementation device, a network card chip and a medium, which can solve the problem of communication bottleneck caused by PCIE interface bandwidth in the prior art and improve distributed computing performance.

According to one aspect of the invention, a distributed computing implementation method is provided and is applied to a distributed computing system, wherein the distributed computing system comprises at least one artificial intelligent chip and a network card chip, and the artificial intelligent chip adopts a PCIE interface to communicate with the network card chip; the method comprises the following steps:

acquiring transmission bandwidths corresponding to PCIE interfaces in a distributed computing system, and adjusting the network port bandwidth of a network card chip according to the transmission bandwidths corresponding to the PCIE interfaces;

responding to a distributed computing request triggered by a user, and executing distributed computing operation according to the adjusted network port bandwidth through the network card chip;

And feeding back an execution result corresponding to the distributed computing operation to the artificial intelligent chip and the remote computing equipment through the network card chip.

Optionally, executing, by the network card chip, a distributed computing operation according to the adjusted network port bandwidth, including:

acquiring target data to be calculated, which respectively correspond to the artificial intelligent chip and the remote computing equipment, by the network card chip according to the adjusted network port bandwidth;

and executing distributed computing operation according to the target data through the network card chip.

Optionally, after the distributed computing operation is executed by the network card chip according to the adjusted network port bandwidth, the method further includes:

and storing an execution result corresponding to the distributed computing operation into a local buffer through the network card chip.

Optionally, the executing result corresponding to the distributed computing operation is fed back to the artificial intelligent chip through the network card chip, including:

And acquiring the execution result from the local buffer through the network card chip, and feeding back the execution result to the artificial intelligent chip by adopting a PCIE interface.

Optionally, the executing result corresponding to the distributed computing operation is fed back to the remote computing device through the network card chip, including:

And acquiring the execution result from the local buffer through the network card chip, and feeding back the execution result to the remote computing equipment according to the adjusted network port bandwidth.

Optionally, after feeding back the execution result to the remote computing device, the method further includes:

And acquiring response data fed back by the remote computing equipment aiming at the execution result through a network port in the network card chip, and carrying out information interaction with the remote computing equipment according to the response data.

According to another aspect of the present invention, a distributed computing implementation apparatus is provided, which is applied to a distributed computing system, where the distributed computing system includes at least one artificial intelligence chip and a network card chip, and the artificial intelligence chip uses a PCIE interface to communicate with the network card chip; the device comprises:

The bandwidth adjusting module is used for acquiring the transmission bandwidth corresponding to the PCIE interface in the distributed computing system, and adjusting the network port bandwidth of the network card chip according to the transmission bandwidth corresponding to the PCIE interface;

The distributed computing module is used for responding to a distributed computing request triggered by a user and executing distributed computing operation according to the adjusted network port bandwidth through the network card chip;

And the result feedback module is used for feeding back the execution result corresponding to the distributed computing operation to the artificial intelligent chip and the remote computing equipment through the network card chip.

According to another aspect of the present invention, there is provided a network card chip including:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the distributed computing implementation method of any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement a distributed computing implementation method according to any embodiment of the present invention when executed.

According to the technical scheme provided by the embodiment of the invention, the network port bandwidth of the network card chip is adjusted according to the transmission bandwidth corresponding to the PCIE interface in the distributed computing system by acquiring the transmission bandwidth corresponding to the PCIE interface, the distributed computing operation is executed by the network card chip according to the adjusted network port bandwidth in response to the distributed computing request triggered by the user, the execution result corresponding to the distributed computing operation is fed back to the artificial intelligent chip and the remote computing device by the network card chip, the problem of communication bottleneck caused by the PCIE interface bandwidth in the prior art can be solved, and the distributed computing performance is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic diagram of a scenario in which an artificial intelligence chip interacts with a network card chip in the prior art;

FIG. 1b is a flow chart of a distributed computing implementation method provided in accordance with an embodiment of the present invention;

FIG. 1c is a schematic diagram of a distributed computing system according to an embodiment of the present invention;

FIG. 2a is a flow chart of another distributed computing implementation method provided in accordance with an embodiment of the present invention;

Fig. 2b is a schematic diagram of a scenario corresponding to a distributed computing implementation method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a distributed computing implementation apparatus according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a network card chip for implementing a distributed computing implementation method according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the prior art, all computation related to artificial intelligence is completed by the GPU, including distributed Allreduce computation and the like, and the communication bandwidth can obviously influence the efficiency. Taking Ring Allreduce calculation as an example, assume that there are N GPUs currently, the Allreduce data size of each GPU is D, and the Allreduce operation data size corresponding to the N GPUs is 2 (N-1) D/N. When N is sufficiently large, the communication data amount is approximately 2 times the calculation data amount. In the current design, because the Allreduce operation is completed by the GPU, the PCIE interface needs to bear all communication data traffic, seriously affecting the distributed computing performance, and creating a communication bottleneck in the distributed computing process.

For this reason, the present embodiment provides a distributed computing implementation method, and fig. 1b is a flowchart of the distributed computing implementation method, where the present embodiment is applicable to a case of implementing distributed computing by using a network card chip, and the method may be performed by a distributed computing implementation device, and the distributed computing implementation device may be implemented in a form of hardware and/or software, and the distributed computing implementation device may be configured in a distributed computing system. The distributed computing system comprises at least one artificial intelligent chip and a network card chip, wherein the artificial intelligent chip adopts a PCIE interface to communicate with the network card chip. As shown in fig. 1b, the method comprises:

step 110, acquiring a transmission bandwidth corresponding to a PCIE interface in the distributed computing system, and adjusting a network port bandwidth of the network card chip according to the transmission bandwidth corresponding to the PCIE interface.

In this embodiment, specifically, fig. 1c may be a schematic structural diagram corresponding to a distributed computing system in this embodiment, and fig. 1c takes two artificial intelligence chips and two network card chips as an example. The artificial intelligent chip may communicate with a corresponding network card chip by using a PCIE interface, where the network card chip may include a plurality of network ports (fig. 1c illustrates two network ports).

In practical application, in the distributed computing system shown in fig. 1c, the bandwidth of the PCIE interface between the artificial intelligence chip and the network card chip is usually a fixed value, and the network port bandwidth of the network card chip may be updated, so that the transmission bandwidth of the network port bandwidth and the PCIE interface may be configured to be 2:1 in this embodiment.

And 120, responding to a distributed computing request triggered by a user, and executing distributed computing operation according to the adjusted network port bandwidth through the network card chip.

In this embodiment, after detecting that the user triggers the distributed computing request, the corresponding distributed computing process in the artificial intelligent chip may be unloaded, and the network card chip executes the distributed computing process to complete the distributed computing operation. In particular, the distributed computing operations include, but are not limited to, distributed Allreduce operations.

And 130, feeding back an execution result corresponding to the distributed computing operation to the artificial intelligent chip and the remote computing equipment through the network card chip.

In this embodiment, by adjusting the network port bandwidth of the network card chip and executing the distributed computing operation by the network card chip according to the adjusted network port bandwidth, compared with the case that the network card chip is only used for transmitting data and does not bear the distributed computing operation in the prior art, the embodiment can make the network card chip bear the communication data volume of the distributed computing, and the PCIE interface transmits the computation data volume, thereby solving the problem of communication bottleneck caused by the PCIE interface bandwidth in the prior art and further improving the distributed computing performance.

Fig. 2a is a flowchart of a distributed computing implementation method according to a second embodiment of the present invention, where the embodiment is further refined. As shown in fig. 2a, the method comprises:

Step 210, acquiring a transmission bandwidth corresponding to a PCIE interface in the distributed computing system, and adjusting a network port bandwidth of the network card chip according to the transmission bandwidth corresponding to the PCIE interface.

And 220, responding to a distributed computing request triggered by a user, and acquiring target data to be computed, which respectively correspond to the artificial intelligent chip and the remote computing device, through the network card chip according to the adjusted network port bandwidth.

In this embodiment, specifically, fig. 2b may be a schematic view of a scenario corresponding to a distributed computing implementation method, where fig. 2b is an example of a distributed computing operation Allreduce.

As shown in fig. 2b, after detecting a distributed computing request triggered by a user, the artificial intelligent chip and the remote computing device may aggregate target data to be computed in an Allreduce computing unit in the network card chip. Assuming that there are N artificial intelligence chips, the Allreduce data size of each artificial intelligence chip is D, the data size corresponding to this step is 2*D x (N-1)/N.

And 230, executing distributed computing operation according to the target data through the network card chip.

In this step, specifically, as shown in fig. 2b, distributed computing operation may be performed according to the obtained target data by an Allreduce computing unit in the network card chip.

And 240, storing an execution result corresponding to the distributed computing operation into a local buffer through the network card chip.

In this step, as shown in fig. 2b, the Allreduce calculating unit in the network card chip may store the execution result corresponding to the distributed calculating operation into the local buffer of the network card chip, where the data size is D (N-1)/N.

Step 250, obtaining the execution result from the local buffer through the network card chip, and feeding back the execution result to the artificial intelligent chip by adopting a PCIE interface.

And 260, acquiring the execution result from the local buffer through the network card chip, and feeding back the execution result to the remote computing equipment according to the adjusted network port bandwidth.

In this step, as shown in fig. 2b, the network card chip may obtain the execution result from the local buffer through the Allreduce control unit, and feed back the execution result to the remote computing device, where the data size is D x (N-1)/N.

In one implementation of this embodiment, after feeding back the execution result to the remote computing device, the method further includes: and acquiring response data fed back by the remote computing equipment aiming at the execution result through a network port in the network card chip, and carrying out information interaction with the remote computing equipment according to the response data.

In this embodiment, it is assumed that the network card chip includes two network ports (network card 1 and network card 2), and the remote computing device may feed back response data to the Allreduce computing unit or the local buffer through the network card 1, the network card 2 and the Allreduce control unit, where the data size is D (N-1)/N, and the reverse data size is the same.

In a specific embodiment, assuming that the PCIE interface bandwidth is unidirectional 64GB, the network port bandwidth is unidirectional 50GB, in the prior art, when implementing distributed computing, the communication duration of the PCIE interface is 2*D x (N-1)/N/64, and the communication duration of the network card chip is 2*D x (N-1)/N/50. Compared with the prior art, through the distributed computing implementation method of the embodiment, the communication duration of the PCIE interface is D/64, and the communication duration of the network card chip is D (N-1)/N/50.

Assuming that the number N of artificial intelligent chips is 8, the communication duration corresponding to the distributed computing operation in the prior art is d×7/200, and the communication duration of the present application is d×7/400, so that it can be determined that the performance of the distributed computing implementation method in the present application is about 2 times that of the prior art.

According to the technical scheme provided by the embodiment of the invention, the network port bandwidth of the network card chip is adjusted according to the transmission bandwidth corresponding to the PCIE interface in the distributed computing system by acquiring the transmission bandwidth corresponding to the PCIE interface, the target data to be computed, which are respectively corresponding to the artificial intelligent chip and the remote computing equipment, are acquired through the network card chip according to the adjusted network port bandwidth, distributed computing operation is executed according to the target data by the network card chip, the execution result corresponding to the distributed computing operation is stored in the local buffer by the network card chip, the execution result is acquired from the local buffer by the network card chip, the execution result is fed back to the artificial intelligent chip by the PCIE interface, the execution result is acquired from the local buffer by the network card chip, and the execution result is fed back to the remote computing equipment according to the adjusted network port bandwidth.

Fig. 3 is a schematic structural diagram of a distributed computing implementation device according to a third embodiment of the present invention, where the device is applied to a distributed computing system, and the distributed computing system includes at least one artificial intelligent chip and a network card chip, where the artificial intelligent chip uses a PCIE interface to communicate with the network card chip, and the device includes: a bandwidth adjustment module 310, a distributed computation module 320, and a result feedback module 330.

The bandwidth adjustment module 310 is configured to obtain a transmission bandwidth corresponding to a PCIE interface in the distributed computing system, and adjust a network port bandwidth of the network card chip according to the transmission bandwidth corresponding to the PCIE interface;

the distributed computing module 320 is configured to respond to a distributed computing request triggered by a user, and execute a distributed computing operation according to the adjusted network port bandwidth through the network card chip;

and the result feedback module 330 is configured to feed back, through the network card chip, an execution result corresponding to the distributed computing operation to the artificial intelligent chip and the remote computing device.

Based on the above embodiment, the distributed computing module 320 includes:

The data acquisition unit is used for acquiring target data to be calculated, which respectively correspond to the artificial intelligent chip and the remote computing equipment, according to the adjusted network port bandwidth through the network card chip;

The data processing unit is used for executing distributed computing operation according to the target data through the network card chip;

and the result caching unit is used for storing the execution result corresponding to the distributed computing operation into a local cache through the network card chip.

The result feedback module 330 includes:

the artificial intelligent chip feedback unit is used for acquiring the execution result from the local buffer through the network card chip and feeding back the execution result to the artificial intelligent chip by adopting a PCIE interface;

The remote equipment feedback unit is used for acquiring the execution result from the local buffer through the network card chip and feeding back the execution result to the remote computing equipment according to the adjusted network port bandwidth;

The response data acquisition unit is used for acquiring response data fed back by the remote computing device aiming at the execution result through the network port in the network card chip, and carrying out information interaction with the remote computing device according to the response data.

The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in the embodiments of the present invention can be found in the methods provided in all the foregoing embodiments of the present invention.

Fig. 4 shows a schematic diagram of the structure of a network card chip 10 that may be used to implement an embodiment of the invention. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the network card chip 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the network card chip 10 can also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

The various components in the network card chip 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the network card chip 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as a distributed computing implementation.

In some embodiments, the distributed computing implementation may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the network card chip 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the distributed computing implementation method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the distributed computing implementation method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here may be implemented on a network card chip having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) through which a user may provide input to the network card chip. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. The distributed computing implementation method is characterized by being applied to a distributed computing system, wherein the distributed computing system comprises at least one artificial intelligent chip and a network card chip, and the artificial intelligent chip adopts a PCIE interface to communicate with the network card chip; the method comprises the following steps:

2. The method of claim 1, wherein performing, by the network card chip, a distributed computing operation according to the adjusted network port bandwidth, comprises:

3. The method of claim 1, further comprising, after performing, by the network card chip, a distributed computing operation according to the adjusted network port bandwidth:

4. The method of claim 3, wherein feeding back, by the network card chip, an execution result corresponding to the distributed computing operation to the artificial intelligence chip, includes:

5. The method according to claim 3, wherein feeding back, by the network card chip, an execution result corresponding to the distributed computing operation to the remote computing device, includes:

6. The method of claim 5, further comprising, after feeding back the execution result to a remote computing device:

7. The distributed computing implementation device is characterized by being applied to a distributed computing system, wherein the distributed computing system comprises at least one artificial intelligent chip and a network card chip, and the artificial intelligent chip adopts a PCIE interface to communicate with the network card chip; the device comprises:

8. A network card chip, the network card chip comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the distributed computing implementation method of any of claims 1-6.

9. A computer readable storage medium storing computer instructions for causing a processor to implement the distributed computing implementation method of any one of claims 1-6 when executed.

10. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements the distributed computing implementation method according to any of claims 1-6.