CN115062329B

CN115062329B - Running water computing device and method for private computation, private data and federal learning

Info

Publication number: CN115062329B
Application number: CN202210950232.8A
Authority: CN
Inventors: 戴蒙
Original assignee: Shenzhen Zhixing Technology Co Ltd
Current assignee: Shenzhen Zhixing Technology Co Ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-12-20
Anticipated expiration: 2042-08-09
Also published as: CN115062329A

Abstract

The application relates to a running computing device and method for private computing, private data and federal learning. The pipelined computing device includes a plurality of stream processing units including at least one first type of stream processing unit. Each of the first type stream processing units is at least partially reconfigurable. The plurality of stream processing units correspond one-to-one to the plurality of steps of the reference calculation flow. Each first-class stream processing unit is configured to be reconfigurable to optimize a processing time to execute a corresponding step in a reference computational flow for the corresponding step. The multiple stream processing units are used for processing one or more tasks belonging to the same task batch in a pipeline mode, wherein the calculation flow of any task can be expanded into multiple steps in one-to-one correspondence with the multiple steps of the reference calculation flow according to the reference calculation flow. Each stream processing unit is used for processing the step corresponding to the stream processing unit in a plurality of steps after the calculation flow of each task is expanded according to the reference calculation flow. Thus, the whole flow processing efficiency is improved.

Description

Running water computing device and method for private computation, private data and federal learning

Technical Field

The application relates to the technical field of privacy calculation, privacy data and federal learning, in particular to the technical field of chips and processors, and particularly relates to a running water calculation device and method for privacy calculation, privacy data and federal learning.

Background

Privacy Computing (Privacy Computing) refers to a series of techniques for analyzing and Computing data on the premise of ensuring that the original data is not disclosed by a data provider, and ensuring that the data is 'available and invisible' in the circulation and fusion processes. Privacy computing in the general sense covers a wide range of techniques used with the goal of achieving a computing task while protecting data privacy. The privacy computing combines the development of technologies such as computer science, artificial intelligence, cloud computing and the like, makes great progress in data query and machine learning, and provides safe data acquisition and data privacy protection in a plurality of application scenes. Common Privacy computing techniques include, for example, federal Learning (FL), secure Multi-Party computing (SMPC), secret Sharing (Secret Sharing), trusted Execution Environment (TEE), differential Privacy (DP), homomorphic Encryption (HE), and the like. The federal learning refers to the realization of a multi-party cooperative construction federal learning model on the premise of ensuring that data does not leave a safety control range, for example, the data is not transmitted to the outside. On the other hand, with the increasing importance of data security and privacy information protection and the introduction of relevant laws and regulations such as "data security law" and "personal information protection law", privacy data such as personal privacy data related to personal information or sensitive information are also subject to higher privacy protection and data security requirements in data processing, data communication, data interaction, and the like.

In the related practices of private computing, private data processing, and federal learning, intensive operations are required for massive dense data with large integer bit widths, and a general-purpose computer, a computing device, or a computing system, etc., exhibits low computational efficiency in the face of such computational requirements, and is difficult to cope well with complicated and varied application environments and computational requirements.

Therefore, a need exists for a running computing device and method for privacy computing, privacy data and federal learning, which can meet the computing requirements in relevant practices of privacy computing, privacy data processing and federal learning and have better computing efficiency.

Disclosure of Invention

In a first aspect, embodiments of the present application provide a pipelined computing device for use in private computing, private data, and federal learning. The pipelined computing device includes: a plurality of stream processing units, wherein the plurality of stream processing units comprises at least one first type stream processing unit, each of the at least one first type stream processing units being at least partially reconfigurable. Wherein the plurality of stream processing units are in one-to-one correspondence with a plurality of steps of a reference computation flow, and each of the at least one first-class stream processing unit is configured to be reconfigurable to optimize, for a step in the reference computation flow corresponding to the first-class stream processing unit, a processing time for the first-class stream processing unit to execute the step in the reference computation flow corresponding to the first-class stream processing unit. The plurality of stream processing units are configured to pipeline one or more tasks belonging to the same task batch, a calculation flow of any task belonging to the same task batch may be expanded into a plurality of steps corresponding to the plurality of steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, and each of the plurality of stream processing units is configured to process a step corresponding to the stream processing unit in the plurality of steps after the calculation flow of each of the one or more tasks is expanded according to the reference calculation flow.

According to the technical scheme described in the first aspect, the commonality in the logic flow or the computing process between the tasks of the same task batch is embodied by referring to the computing flow and the expanded steps thereof, and the stream processing units are configured according to the commonality by the one-to-one correspondence between the stream processing units and the steps of the reference computing flow, and further, in consideration of a certain difference between the reference computing flow and the tasks of the same task batch, the processing time of the first-type stream processing unit executing the step corresponding to the first-type stream processing unit in the reference computing flow is optimized by each of the at least one first-type stream processing unit configured to be reconfigurable according to the step corresponding to the first-type stream processing unit in the reference computing flow, so that the difference is overcome, the computing power resources of the stream processing units are maximally utilized in a pipeline manner, and the overall stream processing efficiency and the resource utilization efficiency are improved.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that each of the at least one first-class stream processing unit is reconfigured before the pipelined computing device leaves a factory to optimize, for a step corresponding to the first-class stream processing unit in the reference computing flow, a processing time for the first-class stream processing unit to execute the step corresponding to the first-class stream processing unit in the reference computing flow.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the reference computation flow is determined based on a computation scenario associated with the same task batch, and each of the at least one first-type stream processing unit is reconfigured before the pipelined computing device processes the same task batch to optimize, for a step corresponding to the first-type stream processing unit in the reference computation flow, a processing time for the first-type stream processing unit to execute the step corresponding to the first-type stream processing unit in the reference computation flow.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the performing the pipelined processing on one or more tasks belonging to the same task batch includes: according to the processing sequence of the one or more tasks, each of the plurality of stream processing units starts to process the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is expanded according to the reference calculation flow after the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is processed according to the reference calculation flow.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that, according to a sequence of a plurality of steps of the reference calculation flow, the plurality of stream processing units that are in one-to-one correspondence with the plurality of steps are arranged to obtain a stream processing unit sequence, a given stream processing unit in the stream processing unit sequence is configured to obtain input data from a previous stream processing unit in the stream processing unit sequence that is opposite to the given stream processing unit and provide output data to a next stream processing unit in the stream processing unit sequence that is opposite to the given stream processing unit, and the given stream processing unit is any stream processing unit in the stream processing unit sequence.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the plurality of stream processing units are arranged according to a sequence of a plurality of steps of the reference computation flow to obtain a stream processing unit sequence, and the plurality of stream processing units are configured to pipeline one or more tasks belonging to the same task batch, and the method further includes: each of the plurality of stream processing units obtains, from a previous stream processing unit in the sequence of stream processing units with respect to the stream processing unit, input data for a step corresponding to the stream processing unit in a plurality of steps after the calculation flow of the current task has been expanded in accordance with the reference calculation flow, executes the step corresponding to the stream processing unit in the calculation flow of the current task and supplies a calculation result as output data to a next stream processing unit in the sequence of stream processing units with respect to the stream processing unit so as to be input data for the step corresponding to the next stream processing unit in the calculation flow of the current task.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that processing times of each of the plurality of stream processing units executing the step corresponding to the stream processing unit in the reference calculation flow form a reference processing time array, and a first difference between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold.

According to a possible implementation manner of the technical solution of the first aspect, in the process of pipelining the one or more tasks by the multiple stream processing units, the pipeline computing device monitors an actual processing time of each of the multiple stream processing units executing the step corresponding to the stream processing unit, compares the actual processing time of the stream processing unit with the processing time of the stream processing unit in the reference processing time array, and selectively reconstructs one or more first type stream processing units of the at least one first type stream processing unit according to a comparison result.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that selectively reconstructing one or more first-type stream processing units of the at least one first-type stream processing unit according to the comparison result includes: the processing time of the steps corresponding to the one or more first type stream processing units is increased by reducing the executable resources of the one or more first type stream processing units for performing the steps corresponding to the one or more first type stream processing units.

According to a possible implementation manner of the technical solution of the first aspect, in the process of pipelining the one or more tasks by the plurality of stream processing units, the pipeline computing device monitors a second gap between a maximum value and a minimum value in actual processing time of each of the plurality of stream processing units executing the step corresponding to the stream processing unit, and selectively reconstructs one or more first-type stream processing units of the at least one first-type stream processing unit when the second gap is greater than a second preset threshold value, so as to reduce the second gap.

According to a possible implementation manner of the technical solution of the first aspect, in the process of pipelining the one or more tasks by the plurality of stream processing units, the pipeline computing device monitors a maximum value of actual processing times of each of the plurality of stream processing units, which each executes a step corresponding to the stream processing unit, compares the actual processing time of the maximum value with a processing time of the stream processing unit having the actual processing time of the maximum value in the reference processing time array, and selectively reconstructs the stream processing unit having the actual processing time of the maximum value according to a comparison result.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that, according to a sequence of a plurality of steps of the reference calculation flow, the plurality of stream processing units in one-to-one correspondence with the plurality of steps are arranged to obtain a stream processing unit sequence, in a process of pipelining the one or more tasks by the plurality of stream processing units, the pipelining calculation apparatus monitors a third gap between respective actual processing times of any two adjacent stream processing units in the stream processing unit sequence, and when the gap exceeds a third preset threshold, reconstructs a stream processing unit with a smaller actual processing time in the two adjacent stream processing units so as to reduce an executable resource of the stream processing unit for executing the step corresponding to the stream processing unit, thereby increasing the actual processing time of the stream processing unit.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the plurality of stream processing units are configured to be directly interconnected or connected through an internal bus.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the at least one first-type stream processing unit includes at least one FPGA and/or at least one CGRA, the plurality of stream processing units further includes at least one second-type stream processing unit, none of the at least one second-type stream processing unit is reconfigurable, and the at least one second-type stream processing unit includes at least one CPU and/or at least one GPU.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the at least one second-type stream processing unit corresponds to an initial step and an end step of the reference calculation flow.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that each of the at least one first-class stream processing unit includes a configurable logic block, a programmable interconnection resource, and a programmable input/output unit.

According to a possible implementation manner of the technical solution of the first aspect, an embodiment of the present application further provides that the reference calculation process is a Paillier decryption operation process or a Paillier encryption operation process.

In a second aspect, embodiments of the present application provide a pipelined computation method for private computation, private data, and federal learning. The running water computing method comprises the following steps: providing a plurality of stream processing units, wherein the plurality of stream processing units include at least one first-class stream processing unit, each of the at least one first-class stream processing units is at least partially reconfigurable, the plurality of stream processing units correspond to a plurality of steps of a reference computation flow in a one-to-one manner, and each of the at least one first-class stream processing units is configured to be reconfigurable to optimize processing time for the first-class stream processing unit to execute the step of the reference computation flow corresponding to the first-class stream processing unit with respect to the step of the reference computation flow corresponding to the first-class stream processing unit; obtaining a plurality of tasks of the same task batch, wherein the calculation process of the tasks belonging to the same task batch can be expanded into a plurality of steps corresponding to the steps of the reference calculation process in a one-to-one manner according to the reference calculation process; for each of the plurality of tasks, the task is pipelined through the plurality of stream processing units, and each of the plurality of stream processing units is configured to process a step corresponding to the stream processing unit in a plurality of steps after a computation flow of the task is expanded according to the reference computation flow.

According to the technical scheme described in the second aspect, the commonality between the tasks of the same task batch in the logic flow or the computing process is embodied by the reference computing flow and the expanded steps thereof, and the stream processing units are configured according to the commonality by the one-to-one correspondence between the stream processing units and the steps of the reference computing flow, and further, in consideration of a certain difference between the reference computing flow and the tasks of the same task batch, the processing time of the first-type stream processing unit executing the step corresponding to the first-type stream processing unit in the reference computing flow is optimized by configuring each of the at least one first-type stream processing unit to be reconfigurable according to the step corresponding to the first-type stream processing unit in the reference computing flow, so that the difference is overcome, the computational power resources of the stream processing units are maximally utilized in a pipeline manner, and the overall pipeline processing efficiency and the resource utilization efficiency are improved.

According to a possible implementation manner of the technical solution of the second aspect, an embodiment of the present application further provides that processing times of each of the plurality of stream processing units for executing the step corresponding to the stream processing unit in the reference calculation flow respectively form a reference processing time array, and a first difference between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold.

According to a possible implementation manner of the technical solution of the second aspect, an embodiment of the present application further provides that the pipeline computing method further includes: during the process of pipelining the one or more tasks by the plurality of stream processing units, monitoring the actual processing time of each stream processing unit in the plurality of stream processing units for executing the step corresponding to the stream processing unit, comparing the actual processing time of the stream processing unit with the processing time of the stream processing unit in the reference processing time array, and selectively reconstructing one or more first-type stream processing units in the at least one first-type stream processing unit according to the comparison result.

Drawings

In order to explain the technical solutions in the embodiments or background art of the present application, the drawings used in the embodiments or background art of the present application will be described below.

FIG. 1 is a block diagram of a pipelined computing device according to one implementation provided by an embodiment of the present application.

FIG. 2 is a block diagram of a pipelined computing device according to another implementation provided by an embodiment of the present application.

FIG. 3 illustrates a block diagram of a pipelined computing device of another implementation provided by an embodiment of the present application.

Fig. 4 shows a block diagram of a stream processing unit according to an implementation manner provided in an embodiment of the present application.

Fig. 5 shows a flowchart of a pipeline computing method provided in an embodiment of the present application.

Detailed Description

In order to solve the technical problem of the calculation requirement in the related practice of the privacy calculation, the privacy data processing and the federal learning, the embodiment of the application provides a running water calculation device and method for the privacy calculation, the privacy data and the federal learning. Wherein, the running water computing device includes: a plurality of stream processing units, wherein the plurality of stream processing units comprises at least one first type stream processing unit, each of the at least one first type stream processing units being at least partially reconfigurable. Wherein the plurality of stream processing units are in one-to-one correspondence with a plurality of steps of a reference computation flow, and each of the at least one first-class stream processing unit is configured to be reconfigurable to optimize, for a step in the reference computation flow corresponding to the first-class stream processing unit, a processing time for the first-class stream processing unit to execute the step in the reference computation flow corresponding to the first-class stream processing unit. The multiple stream processing units are configured to pipeline-process one or more tasks belonging to the same task batch, a calculation flow of any task belonging to the same task batch may be expanded into multiple steps corresponding to the multiple steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, and each of the multiple stream processing units is configured to process a step corresponding to the stream processing unit in the multiple steps after the calculation flow of each of the one or more tasks is expanded according to the reference calculation flow. The embodiment of the application has the following beneficial technical effects: the method embodies the commonality of a plurality of tasks in the same task batch on a logic flow or a computing process by referring to a computing flow and a plurality of expanded steps thereof, realizes the configuration of flow processing units according to the commonality by the one-to-one correspondence of the plurality of flow processing units and the plurality of steps of the reference computing flow, further considers that a certain difference exists between the reference computing flow and the plurality of tasks in the same task batch, and optimizes the processing time of the first type flow processing unit for executing the step corresponding to the first type flow processing unit in the reference computing flow aiming at the step corresponding to the first type flow processing unit in the reference computing flow by configuring each first type flow processing unit in the at least one first type flow processing unit to be reconfigurable, thereby overcoming the difference, further being beneficial to utilizing a pipeline mode to maximize the computing force resources of the plurality of flow processing units, and improving the overall flow processing efficiency and the resource utilization efficiency.

Embodiments of the application may be used in application scenarios including, but not limited to, privacy computing, privacy data-related processing, multi-party security computing, federal learning-related machine learning model training, data security, privacy protection, or other application scenarios applying a privacy computing framework or algorithm, and the like.

The embodiments of the present application may be modified and improved according to specific application environments, and are not limited herein.

In order to make the technical field of the present application better understand, embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

FIG. 1 is a block diagram of a pipelined computing device of one implementation provided by an embodiment of the present application. As shown in fig. 1, the pipeline computing apparatus includes five stream processing units, namely, a stream processing unit 102, a stream processing unit 104, a stream processing unit 106, a stream processing unit 108, and a stream processing unit 110. The data flow among the plurality of stream processing units included in the pipeline computing device is transmitted from one stream processing unit to another stream processing unit in a unidirectional manner according to a specific order, or the plurality of stream processing units acquire input data according to the specific order, process the input data and provide output data. Taking fig. 1 as an example, the stream processing unit 102 is the first stream processing unit in the specific sequence or the data flow direction of the pipeline computing device, starting from the stream processing unit 102, the stream processing unit 102 processes the acquired input data, the stream processing unit 102 provides the output data to the stream processing unit 104 after processing, the stream processing unit 104 provides the output data to the stream processing unit 106 after processing, the stream processing unit 106 provides the output data to the stream processing unit 108 after processing, and the stream processing unit 108 provides the output data to the stream processing unit 110 after processing. It should be understood that the plurality of stream processing units included in the pipelined computing device shown in fig. 1 is merely exemplary, and the pipelined computing device may include any number of stream processing units, and is not limited in particular herein. Moreover, the data flow direction among the plurality of stream processing units included in the pipelined computing device shown in fig. 1 is only an example, and as long as the characteristic that the data flow direction is unidirectional is satisfied, the pipelined computing device shown in fig. 1 may also adopt other data flow directions, for example, the stream processing unit 110 may be taken as a first stream processing unit, and the specific order may be, for example, from the stream processing unit 110, to the stream processing unit 108, to the stream processing unit 106, to the stream processing unit 104, and to the stream processing unit 102.

With continued reference to fig. 1, the stream processing units included in the pipeline computing device of fig. 1 are connected in series according to the specific order, or alternatively, the stream processing units are directly interconnected according to the specific order, so that the data stream flow among the stream processing units is transmitted from one stream processing unit to another stream processing unit in a unidirectional way according to the specific order. It should be understood that other connections or interconnections between the stream processing units included in the pipelined computing device of fig. 1 may be used, as long as the data stream direction can be transmitted according to the above-mentioned data stream direction. Among them, the stream processing units included in the pipeline computing apparatus of fig. 1 are directly interconnected, that is, connected in series in this specific order. Specifically, the stream processing unit 102 is the first stream processing unit of the specific order, the stream processing unit 102 is connected to the stream processing unit 104 and provides output data to the stream processing unit 104, the stream processing unit 104 is connected to the stream processing unit 106 and provides output data to the stream processing unit 106, the stream processing unit 106 is connected to the stream processing unit 108 and provides output data to the stream processing unit 108, and the stream processing unit 108 is connected to the stream processing unit 110 and provides output data to the stream processing unit 110.

With continued reference to fig. 1, the pipelined computing device is used for private computing, private data, and federal learning. The pipelined computing device includes a plurality of stream processing units (fig. 1 exemplarily shows a stream processing unit 102, a stream processing unit 104, a stream processing unit 106, a stream processing unit 108, and a stream processing unit 110). The plurality of stream processing units comprises at least one first type stream processing unit, each of which is at least partially reconfigurable. The plurality of stream processing units are in one-to-one correspondence with a plurality of steps of a reference computing flow, and each first-type stream processing unit of the at least one first-type stream processing unit is configured to be reconfigurable to optimize processing time for the first-type stream processing unit to execute the step corresponding to the first-type stream processing unit in the reference computing flow with respect to the step corresponding to the first-type stream processing unit in the reference computing flow. The multiple stream processing units are configured to pipeline-process one or more tasks belonging to the same task batch, a calculation flow of any task belonging to the same task batch may be expanded into multiple steps corresponding to the multiple steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, and each of the multiple stream processing units is configured to process a step corresponding to the stream processing unit in the multiple steps after the calculation flow of each of the one or more tasks is expanded according to the reference calculation flow.

With continued reference to fig. 1, the pipelined computing device of fig. 1 includes a stream processing unit 102, a stream processing unit 104, a stream processing unit 106, a stream processing unit 108, and a stream processing unit 110, wherein at least one first type stream processing unit is included and each first type stream processing unit is at least partially reconfigurable. In some embodiments, the first class of stream processing units are Field Programmable Gate Arrays (FPGAs) that provide fine-grained Programmable hardware logic computation and storage units and custom design computation path structures in a static global reconfiguration manner according to algorithm requirements. In other embodiments, the first type of stream processing unit is a Coarse-grained Reconfigurable computing Architecture (CGRA), and the CGRA interconnects the functionally configured hardware resources to form a configurable computing portion and reconstructs the computing portion into different computing paths through configuration information, thereby implementing dynamic configuration of the hardware structure and simplifying the interconnection configuration. In other embodiments, the first type of stream processing unit is another type of programmable device, as long as the hardware architecture can be changed according to the change of requirements, i.e. reconfigurable or configurable. In other embodiments, the at least one first-type stream processing unit may include one or more types of programmable devices, for example, the at least one first-type stream processing unit is an FPGA, the at least one first-type stream processing unit is a CGRA, or both the FPGA and the CGRA are included in the at least one first-type stream processing unit, or the at least one first-type stream processing unit further includes other types of programmable devices. In addition, each first-type stream processing unit is at least partially reconfigurable, and it should be understood that at least a part of hardware resources or all hardware resources in each first-type stream processing unit are reconfigurable. In some embodiments, a part of the hardware resources in the first class of stream Processing Unit is reconfigurable and another part of the hardware resources is not reconfigurable, and for example, the hardware resources may be a combination of a Central Processing Unit (CPU) and an FPGA, or may be a processor with a partially reconfigurable function.

With continued reference to fig. 1, when the pipelined computing device is applied to relevant application scenarios of privacy computation, privacy data processing, and federal learning, it is often faced with intensive computing tasks and needs to be faced with massive large integer bit width data and may need to perform more complex operations such as encryption and decryption operations, modular exponentiation operations, modular multiplication operations, ciphertext operations, and the like. These intensive computing tasks often have commonality in logic flow or computing process, and commonality in logic flow or computing process may be embodied by reference to the computing flow. Further, the plurality of stream processing units correspond to a plurality of steps of a reference calculation flow in a one-to-one manner, and each of the at least one first type stream processing unit is configured to be reconfigurable to optimize a processing time for the first type stream processing unit to execute the step corresponding to the first type stream processing unit in the reference calculation flow with respect to the step corresponding to the first type stream processing unit in the reference calculation flow. Therefore, the reference calculation flow is expanded into a plurality of steps, the plurality of stream processing units correspond to the plurality of steps of the reference calculation flow one to one, and the first-type stream processing unit configuration described above can be reconfigured to optimize the processing time of the step corresponding to the first-type stream processing unit. This means that these computational tasks represented by the pipelined processing reference computational flow can be achieved by a plurality of stream processing units, and the reconfigurable configuration of the first type of stream processing unit can be selectively utilized to optimize the processing time, thereby improving the overall pipeline processing efficiency. The following is a detailed description of an example of processing a plurality of tasks of the same task batch by the pipeline computing apparatus of fig. 1 shown in table 1.

TABLE 1

Stream processing unit 102	Stream processing unit 104	Stream processing unit 106	Stream processing unit 108	Stream processing unit 110
					Task 1 step 1	Is free of	Is free of	Is free of	Is free of
Task 2 step 1	Task 1, step 2	Is free of	Is free of	Is composed of
					Task 3 step 1	Task 2 step 2	Task 1, step 3	Is free of	Is free of
Task 4 step 1	Task 3, step 2	Task 2 step 3	Task 1, step 4	Is free of
					Task 5 step 1	Task 4 step 2	Task 3 rd step	Task 2, step 4	Task 1, step 5
Task 6 step 1	Task 5 step 2	Task 4 step 3	Task 3, step 4	Task 2 step 5
					Is free of	Task 6, step 2	Task 5, step 3	Task 4 step 4	Task 3 step 5
Is free of	Is free of	Task 6, step 3	Task 5 step 4	Task 4 step 5
					Is composed of	Is free of	Is free of	Task 6, step 4	Task 5 step 5
Is composed of	Is free of	Is free of	Is free of	Task 6, step 5

As shown in table 1, the tasks in the same task batch include the 1 st task to the 6 th task, and the calculation flow of any of the six tasks can be expanded into 5 steps, that is, can be expanded into a plurality of steps corresponding to the steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow. And, the plurality of stream processing units correspond one-to-one to a plurality of steps of the reference calculation flow. Each of the plurality of pipeline processing units of the pipeline computing apparatus shown in fig. 1 corresponds to one step in the reference computing flow, and the computing flow for processing each of the six tasks corresponds to the step in the plurality of steps expanded according to the reference computing flow. Specifically, the stream processing unit 102 corresponds to the first step of the reference calculation flow and is therefore also used for processing the 1 st step of each of the 1 st to 6 th tasks; similarly, the stream processing unit 104 correspondingly refers to the second step of the calculation flow and is therefore also used for processing the respective 2 nd step of the 1 st task to the 6 th task; the stream processing unit 106 correspondingly refers to the third step of the calculation flow and is therefore also used for processing the respective 3 rd step of the 1 st task to the 6 th task; the stream processing unit 108 correspondingly refers to the fourth step of the calculation flow and is therefore also used for processing the 4 th step of each of the 1 st to 6 th tasks; the stream processing unit 110 correspondingly refers to the fifth step of the calculation flow and is therefore also used for processing the respective 5 th steps of the 1 st task to the 6 th task. In this way, the reference calculation flow may be expanded into 5 steps, and the 1 st task to the 6 th task of the same batch may be expanded into 5 steps corresponding to the 5 steps of the reference calculation flow one by one, respectively, for example, the 1 st task may be expanded into the 1 st step of the 1 st task to the 5 th step of the 1 st task. The correspondence between the plurality of stream processing units and the steps of the calculation flow of the reference calculation flow and the steps of the calculation flow of the specific task is consistent with the above-mentioned data flow direction between the plurality of stream processing units being unidirectionally transferred from one stream processing unit to another in a specific order. In other words, after determining the steps corresponding to the multiple stream processing units, the data flow direction between the multiple stream processing units may be determined according to the precedence relationship between the steps. For example, as shown in table 1, the stream processing unit 102 corresponds to step 1, the stream processing unit 104 corresponds to step 2, the stream processing unit 106 corresponds to step 3, the stream processing unit 108 corresponds to step 4, and the stream processing unit 110 corresponds to step 5, which means that the data streams between the stream processing unit 102, the stream processing unit 104, the stream processing unit 106, the stream processing unit 108, and the stream processing unit 110 are transmitted in a specific order. In other words, the stream processing unit 102 corresponding to step 1 processes the data and provides the output data to the stream processing unit 104 corresponding to step 2, and so on. It should be understood that the correspondence between the plurality of stream processing units included in the pipelined computing device of fig. 1 shown in table 1 above and the steps of the reference computing flow is merely exemplary. In some embodiments, there may be different correspondences, e.g., the stream processing unit 102 may not correspond to step 1 shown in table 1 but to step 3. Depending on the correspondence between a particular stream processing unit and the steps of the reference computation flow, a respective specific order and data flow direction may be determined, and the connection relationships between the plurality of stream processing units may be adjusted accordingly. As mentioned above, the stream processing units are directly interconnected in the specific order to realize that the data stream flow between the stream processing units is transmitted unidirectionally from one stream processing unit to another stream processing unit in the specific order. Accordingly, assuming that the stream processing unit corresponding to the 1 st step and the stream processing unit corresponding to the 2 nd step are different from the stream processing unit 102 corresponding to the 1 st step and the stream processing unit 104 corresponding to the 2 nd step shown in table 1 described above, the stream processing unit corresponding to the 1 st step can be directly interconnected with the stream processing unit corresponding to the 2 nd step accordingly.

With continued reference to table 1, it is mentioned above that the plurality of stream processing units (stream processing unit 102, stream processing unit 104, stream processing unit 106, stream processing unit 108, and stream processing unit 110) are used for pipeline processing of a plurality of tasks belonging to the same task batch, such as the 1 st task to the 6 th task in table 1. The calculation flow of any task belonging to the same task batch can be expanded into a plurality of steps corresponding to the plurality of steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, for example, the calculation flow of the 1 st task is expanded into the 1 st step to the 5 th step of the 1 st task. Each of the plurality of stream processing units is configured to process a calculation flow of each of the one or more tasks according to a step corresponding to the stream processing unit in a plurality of steps after the reference calculation flow is expanded. As such, the stream processing unit 102 is configured to process the 1 st step of each of the 1 st to 6 th tasks, which means that the 1 st step of each of the plurality of tasks of the same task batch is processed by the stream processing unit 102. Considering that a plurality of tasks of the same task batch have a common logic flow or a common calculation process and may share the same configuration, parameters and the like, and any of the plurality of tasks is expanded into a plurality of steps corresponding to the plurality of steps of the reference calculation flow in a one-to-one manner according to the same reference calculation flow, this means that a specific stream processing unit corresponding to a specific step is necessarily used for processing the specific step of each of the plurality of tasks of the same task batch, for example, the stream processing unit 102 is used for processing the 1 st step of each of the plurality of tasks of the same task batch, such as the 1 st step of each of the 1 st task to the 6 th task. In this way, by arranging the one-to-one correspondence between the plurality of stream processing units and the plurality of steps of the reference calculation flow and the one-to-one correspondence between the plurality of stream processing units and the plurality of steps of any task in the same task batch after being expanded according to the reference calculation flow, each stream processing unit can be used for processing the same step of different tasks and therefore faces basically the same operation requirement. Furthermore, it was mentioned above that each of the at least one first-type stream processing unit is configured to be reconfigurable to optimize a processing time for the first-type stream processing unit to perform a step in the reference computation flow corresponding to the first-type stream processing unit with respect to the step in the reference computation flow corresponding to the first-type stream processing unit. Therefore, the first-class stream processing unit in the pipeline computing device shown in fig. 1 can utilize the reconfigurable configuration thereof to optimize the processing time of the first-class stream processing unit for executing the corresponding step, thereby improving the overall pipeline processing efficiency. For example, if the stream processing unit 104 is a first-type stream processing unit, the overall pipeline processing efficiency can be improved by reconfiguring or reconfiguring the stream processing unit 104 to change the processing time of the stream processing unit 104 for processing the corresponding 2 nd step. It is noted that the flow of computation of task 1 in table 1 and its data flow direction starts from stream processing unit 102, then to stream processing unit 104, and then to stream processing unit 106. In order to improve the overall flow efficiency, the processing time of each flow processing unit for processing the corresponding step needs to be approximately equal, so that the pipeline mode can be fully utilized to maximize the computing resources of the flow processing units. Taking the example in table 1 above, before the 1 st step of the 4 th task processed by the stream processing unit 102, all other stream processing units are not in the running state or at least one or more stream processing units are in the idle state, which is not favorable for improving the overall resource utilization efficiency. On the other hand, if the processing time of the stream processing unit 102 for processing the 1 st step is much longer or much shorter than the processing time of the stream processing unit 104 for processing the 2 nd step, the overall pipeline processing efficiency may be negatively affected or additional resources and control costs may be added. When the processing time of the stream processing unit 102 for processing the 1 st step is far longer than the processing time of the stream processing unit 104 for processing the 2 nd step, this may be caused by various possible factors such as executable resources and computational complexity of steps, which means that the 1 st step of the same task can be started by the stream processing unit 104 to process the 2 nd step of the same task after the stream processing unit 102 finishes processing, and thus the stream processing unit 104 may be in an idle state and need to wait for the 1 st step of the current task processed by the stream processing unit 102 to start processing the 2 nd step of the current task, thereby negatively affecting the overall pipeline processing efficiency. When the processing time of the stream processing unit 102 for processing the 1 st step is much shorter than the processing time of the stream processing unit 104 for processing the 2 nd step, this means that after the stream processing unit 102 finishes processing the 1 st step of several tasks, for example, finishes processing the 1 st step of the 1 st task, the 2 nd task and the 3 rd task, the stream processing unit 104 finishes processing the 2 nd step of the 1 st task and can start processing the 2 nd step of the 2 nd task, and therefore means that a buffer needs to be provided to store the processing result of the stream processing unit 102, thereby increasing additional resources and control cost. When the processing time of each stream processing unit for processing the corresponding step is approximately equal, it means that each stream processing unit can start to process the corresponding step of the next task after processing the corresponding step of the previous task, for example, the example shown in table 1 above, which is beneficial to maximize the computational resources of using a plurality of stream processing units.

In order to achieve approximately equal processing time of each flow processing unit for processing the corresponding step, the flow computing device shown in fig. 1 embodies the commonality between the plurality of tasks in the same task batch in the logic flow or the computing process by referring to the computing flow and the plurality of expanded steps thereof, and the configuration of the flow processing units according to the commonality is achieved by the one-to-one correspondence between the plurality of flow processing units and the plurality of steps of the reference computing flow, which is beneficial to achieve approximately equal processing time of each flow processing unit for processing the corresponding step in the reference computing flow. Further, in consideration of a certain difference between the reference calculation flow and the actual tasks in the same task batch, for example, specific input data, data bit width, parameter configuration, actual states of the hardware devices, and the like may cause the difference, for this reason, the pipeline calculation apparatus shown in fig. 1 further utilizes characteristics of the first-type stream processing units, that is, each of the at least one first-type stream processing units is configured to be reconfigurable to optimize the processing time of the first-type stream processing unit for executing the step corresponding to the first-type stream processing unit in the reference calculation flow with respect to the step corresponding to the first-type stream processing unit in the reference calculation flow, so that the difference can be overcome by reconfiguring or reconfiguring one or more first-type stream processing units in the at least one first-type stream processing unit, so that the processing time of each stream processing unit for processing the corresponding step when the pipeline calculation apparatus faces the multiple tasks in the same task batch is approximately equal.

In summary, the pipelined computing device shown in fig. 1 embodies the commonality between the tasks of the same task batch in the logic flow or the computing process by the reference computing flow and the expanded steps thereof, and the stream processing units are configured according to the commonality by the one-to-one correspondence between the stream processing units and the steps of the reference computing flow, further considering that there is a certain difference between the reference computing flow and the tasks of the same task batch, the processing time of the first-type stream processing unit executing the step corresponding to the first-type stream processing unit in the reference computing flow is optimized by configuring each of the at least one first-type stream processing unit to be reconfigurable for the step corresponding to the first-type stream processing unit in the reference computing flow, so as to overcome the difference, thereby being beneficial to maximize the computational power resources of the multiple stream processing units by using a pipeline manner, and improving the overall processing efficiency and the resource utilization efficiency of the pipelined manner.

In one possible implementation, each of the at least one first-type stream processing unit is reconfigured before the pipelined computing device is shipped from a factory to optimize a processing time for the first-type stream processing unit to execute a step corresponding to the first-type stream processing unit in the reference computing flow with respect to the step corresponding to the first-type stream processing unit in the reference computing flow. Therefore, the first type stream processing unit is reconstructed before the stream computing device leaves the factory so as to improve the overall stream processing efficiency.

In a possible implementation manner, the reference computation flow is determined based on the computation scenario associated with the same task batch, and each of the at least one first-type stream processing unit is reconfigured before the pipelined computation device processes the same task batch to optimize, for the step corresponding to the first-type stream processing unit in the reference computation flow, the processing time for the first-type stream processing unit to execute the step corresponding to the first-type stream processing unit in the reference computation flow. Therefore, the reference calculation flow is determined according to the calculation scene, and the first class stream processing unit is reconstructed based on the reference calculation flow, so that the overall stream processing efficiency is improved by combining a specific calculation scene. Moreover, because the reference computing processes are determined based on a particular computing scenario, it is advantageous to determine commonality between tasks performed under a particular computing scenario, depending on the requirements for the different computing scenarios. Here, examples of the calculation scenario may be an industry-based calculation scenario, such as a banking calculation scenario, an e-commerce calculation scenario, a security calculation scenario, a government calculation scenario, a traffic calculation scenario, a securities calculation scenario, a medical services calculation scenario, a pharmaceutical calculation scenario, an aviation calculation scenario. Examples of computing scenarios may also be based on various possible factors such as geography, crowd, socioeconomic conditions, etc. The tasks of the same task batch are generally multiple tasks corresponding to or attributable to the same computing scenario, for example, a task batch generated in a computing scenario of the banking industry. Therefore, the reference calculation process is determined according to the calculation scenes associated with the same task batch, so that the overall pipeline processing efficiency is further improved according to the calculation scenes of specific tasks.

In a possible embodiment, the multiple stream processing units are configured to pipeline one or more tasks belonging to the same task batch, and include: according to the processing sequence of the one or more tasks, each of the plurality of stream processing units starts to process the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is expanded according to the reference calculation flow after the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is processed according to the reference calculation flow. In this way, pipelined processing of one or more tasks of the same task batch by the plurality of stream processing units in processing order is achieved. In some embodiments, the processing order is a sequential order in which the one or more tasks are received by the pipelined computing device. In this way, it is achieved that the processing order is determined according to the order in which the tasks are received and one or more tasks of the same task batch are pipelined through the plurality of stream processing units according to the processing order. In other embodiments, the order of processing may be in other orders. In some embodiments, the plurality of stream processing units are arranged in a sequence of the plurality of steps of the reference computation flow to obtain a stream processing unit sequence. The multiple stream processing units are used for pipelining processing one or more tasks belonging to the same task batch, and further comprise: each of the plurality of stream processing units obtains, from a previous stream processing unit in the sequence of stream processing units with respect to the stream processing unit, input data for a step corresponding to the stream processing unit in a plurality of steps in which the calculation flow of the current task is expanded in accordance with the reference calculation flow, executes the step corresponding to the stream processing unit in the calculation flow of the current task, and supplies a calculation result as output data to a next stream processing unit in the sequence of stream processing units with respect to the stream processing unit so as to be input data for the step corresponding to the next stream processing unit in the calculation flow of the current task. In this manner, the acquisition of input data, the execution of the calculation of the corresponding step, and the output of the calculation result for each of the plurality of stream processing units are determined based on the processing order and the stream processing unit sequence, which facilitates pipelined processing of one or more tasks of the same task batch by the plurality of stream processing units.

In a possible implementation manner, according to a precedence order of a plurality of steps of the reference calculation flow, the plurality of stream processing units that are in one-to-one correspondence with the plurality of steps are arranged to obtain a stream processing unit sequence, a given stream processing unit in the stream processing unit sequence is configured to obtain input data from a previous stream processing unit in the stream processing unit sequence that is opposite to the given stream processing unit and provide output data to a next stream processing unit in the stream processing unit sequence that is opposite to the given stream processing unit, and the given stream processing unit is any stream processing unit in the stream processing unit sequence. In this way, it is achieved that one or more tasks of the same task batch are pipelined through the plurality of stream processing units in a stream processing unit sequence. In some embodiments, a given stream processing unit in the sequence of stream processing units is configured to connect to the previous stream processing unit and the previous stream processing unit, respectively, via a high speed serial such as SERDES (SerializerDe-serializer). It should be understood that the connection may also be by any other suitable connection means.

In one possible implementation, the processing times of each of the plurality of stream processing units for performing the step corresponding to the stream processing unit in the reference calculation flow together constitute a reference processing time array, and a first gap between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold. Therefore, the difference between the processing time of each stream processing unit for processing the corresponding step can be better evaluated by referring to the processing time array, and the change degree of the difference between the processing times is controlled by setting the first difference between the maximum value and the minimum value in the reference processing time array to be smaller than the first preset threshold value, so that the situation that the processing time of a certain stream processing unit is obviously longer than the processing time of other stream processing units and becomes a bottleneck can be effectively avoided, the processing time of each stream processing unit for processing the corresponding step in the reference calculation flow is approximately equal, and the whole pipeline processing efficiency is further improved.

As mentioned above, the difference between the processing times at which the respective stream processing units process the respective steps is evaluated and controlled by referring to the processing time array. In order to improve the overall flow efficiency, the processing time of each flow processing unit for processing the corresponding step needs to be approximately equal, so that the pipeline mode can be fully utilized to maximize the computing resources of the flow processing units. Generally, by monitoring a first gap between a maximum value and a minimum value in the reference processing time array and comparing with a first preset threshold, it may be helpful to make the difference between the processing times as a whole within a desired range or within a design range. That is, the reference calculation flow and the expanded steps thereof represent the commonality between the tasks of the same task batch in the logic flow or the calculation process, and the one-to-one correspondence between the plurality of stream processing units and the steps of the reference calculation flow realizes that the stream processing units are configured according to the commonality, which is beneficial to realize that the processing time of each stream processing unit for processing the corresponding step in the reference calculation flow is approximately equal. In practice, considering that there is a certain difference between the reference calculation flow and the actual tasks in the same task batch, for example, the specific input data, data bit width, parameter configuration, actual state of the hardware device, and the like may cause the difference, the characteristics of the first-class stream processing unit are also utilized to cope with the difference. In some embodiments, during the pipelining processing of the one or more tasks by the plurality of stream processing units, the pipeline computing device monitors an actual processing time of each of the plurality of stream processing units for performing the step corresponding to the stream processing unit, compares the actual processing time of the stream processing unit with the processing time of the stream processing unit in the reference processing time array, and selectively reconstructs one or more of the at least one first type of stream processing units according to the comparison result. In this way, by monitoring the actual processing time and comparing it with the processing time in the array of reference processing times, differences between the reference calculation flow and the actual plurality of tasks of the same task batch can be dealt with. In some embodiments, selectively reconfiguring one or more of the at least one first type of stream processing unit based on the comparison comprises: the processing time of the steps corresponding to the one or more first type stream processing units is increased by reducing the executable resources of the one or more first type stream processing units for executing the steps corresponding to the one or more first type stream processing units. Here, the executable resource affects the processing time. When the reference calculation flow and the expansion of the reference calculation flow are provided, the distribution can be performed in combination with the situation of the executable resources of each of the plurality of flow processing units, for example, the flow processing unit with more executable resources is used for the step with higher requirement on computational resources, which is beneficial to approximately equalizing the processing time of each flow processing unit for processing the corresponding step in the reference calculation flow. The reference processing time array may be used as a comparison object, and in practice, when a deviation between the actual processing time and the reference processing time array as the comparison object is found to be large, this may be caused by specific input data, data bit width, parameter configuration, actual state of the hardware device, and the like, and such deviation is generally temporary and fluctuating, and thus can be fine-tuned through a dynamic adjustment mechanism. For example, more executable resources, that is, more computational resources, may be allocated to stream processing units with too long processing time or the actual processing time may be significantly longer than the stream processing units to be compared, or less executable resources, that is, less computational resources, may be allocated to stream processing units with too short processing time or the actual processing time may be significantly shorter than the stream processing units to be compared. Generally, allocating less executable resources, that is, providing less computational resources to the stream processing unit with too short processing time, may not consider the maximum allocatable degree of executable resources in design, which is simple in control. Therefore, by reducing the executable resources of the one or more first-type stream processing units for executing the steps corresponding to the one or more first-type stream processing units, thereby increasing the processing time of the steps corresponding to the one or more first-type stream processing units, load balancing among the stream processing units can be effectively achieved, that is, the processing time of each stream processing unit for processing the corresponding step in the reference calculation flow is approximately equal, which is beneficial to improving the overall pipeline processing efficiency.

In some embodiments, during the pipelining of the one or more tasks by the plurality of stream processing units, the pipeline computing device monitors a second gap between a maximum value and a minimum value of actual processing times for each of the plurality of stream processing units to respectively perform the step corresponding to the stream processing unit, and selectively reconstructs one or more of the at least one first type of stream processing units to reduce the second gap when the second gap is greater than a second preset threshold. As mentioned above, by setting the first difference between the maximum value and the minimum value in the reference processing time array to be smaller than the first preset threshold, the difference between the processing times of the respective steps of the stream processing units processing embodied by the reference processing time array can be controlled. Considering that there is a certain difference between the reference calculation flow and the actual tasks of the same task batch, the adjustment needs to be performed in combination with the actual processing time, and for this purpose, the second difference is reduced by monitoring the second difference between the maximum value and the minimum value in the actual processing time, and selectively reconfiguring one or more of the at least one first-type stream processing unit when the second difference is greater than a second preset threshold. Therefore, load balance among the stream processing units can be effectively realized, namely, the processing time of each stream processing unit for processing the corresponding step in the reference calculation flow is approximately equal, and the whole stream processing efficiency is improved. In addition, a first difference between the maximum value and the minimum value in the reference processing time array and a first preset threshold value compared therewith represent a requirement on design or an expected requirement based on a reference calculation flow, and a second difference between the maximum value and the minimum value in the actual processing time and a second preset threshold value compared therewith represent a requirement on actual performance or a requirement based on actual performance of the pipelined calculation device. Both may be present, i.e. both the first difference is required to be smaller than a first predetermined threshold and the second difference is required to be smaller than a second predetermined threshold. In addition, the first preset threshold and/or the second preset threshold may be preset, or may be set in combination with different computing scenarios, so as to take into account different design requirements and/or actual performance requirements imposed on the different computing scenarios. As mentioned above, in some embodiments, the reference calculation flow is determined according to a calculation scenario, and then the first type stream processing unit is reconstructed based on the reference calculation flow, which is beneficial to improving the overall stream processing efficiency by combining with a specific calculation scenario. Thus, in some embodiments, the reference calculation process may be determined according to a calculation scenario, and the first preset threshold and/or the second preset threshold may also be determined according to the calculation scenario, so that the overall pipeline processing efficiency is further improved by combining a specific calculation scenario. Specifically, in some embodiments, the second preset threshold is set according to a calculation scenario of the running water calculation device.

In one possible embodiment, during the pipeline processing of the one or more tasks by the plurality of stream processing units, the pipeline computing device monitors a maximum value among actual processing times of each of the plurality of stream processing units that each performs a step corresponding to the stream processing unit, compares the actual processing time of the maximum value with the processing time of the stream processing unit having the actual processing time of the maximum value in the reference processing time array, and selectively reconstructs the stream processing unit having the actual processing time of the maximum value according to the comparison result. In this way, considering that there is a certain difference between the reference calculation flow and a plurality of actual tasks in the same task batch, by monitoring and comparing the actual processing time of the maximum value with the processing time of the stream processing unit having the actual processing time of the maximum value in the reference processing time array, load balancing among the stream processing units can be effectively achieved, that is, the processing time of each stream processing unit for processing the corresponding step in the reference calculation flow is approximately equal, which is beneficial to improving the overall pipeline processing efficiency.

In a possible implementation manner, according to the sequence of the plurality of steps of the reference calculation flow, the plurality of stream processing units in one-to-one correspondence to the plurality of steps are arranged to obtain a stream processing unit sequence, and during the process of pipelining the one or more tasks by the plurality of stream processing units, the pipelining calculation apparatus monitors a third gap between respective actual processing times of any two adjacent stream processing units in the stream processing unit sequence, and reconstructs a stream processing unit with a smaller actual processing time of the two adjacent stream processing units when the gap exceeds a third preset threshold value, so as to reduce executable resources of the stream processing unit for executing the step corresponding to the stream processing unit, thereby increasing the actual processing time of the stream processing unit. In this way, considering that a certain difference exists between the reference calculation flow and a plurality of actual tasks in the same task batch, by monitoring a third difference between actual processing times of any two adjacent stream processing units in the stream processing unit sequence and comparing the third difference with a third preset threshold, load balancing between the stream processing units can be effectively realized, that is, processing times of each stream processing unit for processing corresponding steps in the reference calculation flow are approximately equal, which is beneficial to improving the overall pipeline processing efficiency.

In one possible embodiment, the plurality of stream processing units are configured to be directly interconnected or connected through an internal bus. Wherein fig. 1 schematically shows that the plurality of stream processing units are configured to be directly interconnected. In the case where the plurality of stream processing units are configured to be connected via an internal bus, reference may be made to fig. 2.

In a possible embodiment, the at least one first-type stream processing unit comprises at least one FPGA and/or at least one CGRA, the plurality of stream processing units further comprises at least one second-type stream processing unit and none of the at least one second-type stream processing unit is reconfigurable, and the at least one second-type stream processing unit comprises at least one CPU and/or at least one GPU. The second type of stream processing unit is not programmable or reconfigurable, and exemplary examples of the second type of stream processing unit include a CPU, a GPU, and the like, and may further include other non-programmable devices. Here, the second type of stream processing unit, because of its own non-programmability, may be configured to process fixed steps in the reference computation flow, in particular steps for which the computation demand may be well predicted, such as the initial and end steps of the reference computation flow that meet certain requirements. Therefore, the first-class stream processing unit and the second-class stream processing unit form the pipeline computing device together, and various complex and variable computing requirements can be met. In some embodiments, the at least one second-class stream processing unit corresponds to an initial step and an end step of the reference computation flow. In some embodiments, each of the at least one first type stream processing units comprises a configurable logic block, programmable interconnect resources, and programmable input output units. In some embodiments, the reference computation flow is a Paillier decryption operation process or a Paillier encryption operation process. It should be understood that the Paillier decryption operation process or the Paillier encryption operation process has a typical algorithm, and the expanded steps also have relatively fixed computation requirements, especially the initial step and the ending step correspond to operations with relatively definite computation requirements respectively. For example, the initial step of the Paillier decryption operation process is to perform a specific mathematical operation on the Paillier encrypted ciphertext that meets specific requirements. Thus, for a reference calculation flow fulfilling certain requirements, such as a task having explicit predictable mathematical operation requirements for the initial and final steps in a particular industry scenario, such characteristics may be utilized to arrange the second type of stream processing units for the initial and final steps of such a reference calculation flow fulfilling the certain requirements, while the first type of stream processing units are arranged for the intermediate further steps. Therefore, the first-class stream processing unit and the second-class stream processing unit form the pipeline computing device together, and various complex and variable computing requirements can be met.

FIG. 2 is a block diagram of a pipelined computing device according to another implementation provided by an embodiment of the present application. As shown in fig. 2, the pipeline computing apparatus includes five stream processing units, namely, a stream processing unit 202, a stream processing unit 204, a stream processing unit 206, a stream processing unit 208, and a stream processing unit 210. The data flow among the plurality of stream processing units included in the pipeline computing device is transmitted from one stream processing unit to another stream processing unit in a unidirectional manner according to a specific order, or the plurality of stream processing units acquire input data according to the specific order, process the input data and provide output data. Wherein a plurality of stream processing units are connected through the internal bus 220 and data flow among the plurality of stream processing units is unidirectionally transferred from one stream processing unit to another stream processing unit in a specific order through the internal bus 220. Taking fig. 2 as an example, the stream processing unit 202 is the first stream processing unit in the specific sequence or the data flow direction of the pipeline computing device, starting from the stream processing unit 202, the stream processing unit 202 processes the acquired input data, the stream processing unit 202 provides the output data to the stream processing unit 204 through the internal bus 220 after processing, the stream processing unit 204 provides the output data to the stream processing unit 206 through the internal bus 220 after processing, the stream processing unit 206 provides the output data to the stream processing unit 208 through the internal bus 220 after processing, and the stream processing unit 208 provides the output data to the stream processing unit 210 through the internal bus 220 after processing. The internal bus 220 may employ any suitable structure, circuitry, composition, or standard, for example, the internal bus 220 may be a high-speed serial computer expansion bus (PCIE) standard. It should be understood that the plurality of stream processing units included in the pipelined computing device shown in fig. 2 is merely an example, and the pipelined computing device may include any number of stream processing units, and is not limited in particular. Moreover, the data flow direction among the plurality of stream processing units included in the pipelined computing device shown in fig. 2 is only an example, and as long as the characteristic that the data flow direction is unidirectional is satisfied, the pipelined computing device shown in fig. 2 may also adopt other data flow directions, for example, the stream processing unit 210 may be taken as a first stream processing unit, and the specific order may be, for example, from the stream processing unit 210, to the stream processing unit 208, to the stream processing unit 206, to the stream processing unit 204, and to the stream processing unit 202.

Referring to fig. 1 and 2, the stream computing apparatus of fig. 1 includes a plurality of stream processing units that are directly interconnected in a specific order, so that the data stream between the stream processing units is unidirectionally transmitted from one stream processing unit to another stream processing unit in the specific order; in contrast, the plurality of stream processing units included in the pipelined computing device of fig. 2 are connected by the internal bus 220 and realize that the data stream flow among the plurality of stream processing units described above is unidirectionally transferred from one stream processing unit to another stream processing unit in a certain order by the internal bus 220. It should be understood that the pipelined computing device of the embodiment of the present application may include any number of pipelined processing units, and these pipelined processing units may be directly interconnected in the manner shown in fig. 1 or may be connected by an internal bus in the manner shown in fig. 2, and any possible combination of direct interconnection and internal bus may be adopted, for example, one part of the pipelined processing units is directly interconnected and another part of the pipelined processing units is connected by an internal bus.

The pipelined computing device shown in fig. 2 is different from the pipelined computing device shown in fig. 1 mainly in the connection manner between the plurality of stream processing units, and is basically the same in other details, and will not be described again here. The pipelined computing device shown in FIG. 2 is used for privacy calculations, privacy data and federal learning. The plurality of stream processing units of the pipelined computing device shown in fig. 2 includes at least one first-class stream processing unit, each of which is at least partially reconfigurable. The plurality of stream processing units are in one-to-one correspondence with a plurality of steps of a reference computation flow, and each of the at least one first-class stream processing unit is configured to be reconfigurable to optimize a processing time for the first-class stream processing unit to execute the step corresponding to the first-class stream processing unit in the reference computation flow with respect to the step corresponding to the first-class stream processing unit in the reference computation flow. The multiple stream processing units are configured to pipeline-process one or more tasks belonging to the same task batch, a calculation flow of any task belonging to the same task batch may be expanded into multiple steps corresponding to the multiple steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, and each of the multiple stream processing units is configured to process a step corresponding to the stream processing unit in the multiple steps after the calculation flow of each of the one or more tasks is expanded according to the reference calculation flow.

As mentioned above, the stream processing units of the pipeline computing apparatus shown in fig. 2 are connected via the internal bus 220 to realize that the data stream flow between the stream processing units is transmitted unidirectionally from one stream processing unit to another stream processing unit in a specific order. It should be understood that depending on the correspondence between a particular stream processing unit and the steps of the reference computation flow, the respective specific order and data flow direction may be determined, and the connection relationships of the plurality of stream processing units via the internal bus 220 may be adjusted accordingly. For example, the stream processing unit corresponding to the 1 st step may be caused to connect with the stream processing unit corresponding to the 2 nd step through the internal bus 220.

FIG. 3 illustrates a block diagram of a pipelined computing device of another implementation provided by an embodiment of the present application. As shown in FIG. 3, the pipelined computing device includes five stream processing units, CPU 302, FPGA 304, FPGA 306, FPGA 308, and CPU 310. Here, CPU refers to a common abbreviation of Central Processing Unit (CPU), and FPGA refers to a common abbreviation of Field Programmable Gate Array (FPGA). The CPU 302, FPGA 304, FPGA 306, FPGA 308, and CPU 310 shown in FIG. 3 may also be understood as a central processing unit 302, field programmable gate array 304, field programmable gate array 306, field programmable gate array 308, and central processing unit 310. For the purpose of brevity of presentation, these Chinese names are replaced with common abbreviations. The data flow among the plurality of stream processing units included in the pipeline computing device is transmitted from one stream processing unit to another stream processing unit in a unidirectional manner according to a specific order, or the plurality of stream processing units acquire input data according to the specific order, process the input data and provide output data. Taking fig. 3 as an example, the CPU 302 is the first stream processing unit in the specific sequence or the data flow direction of the pipeline computing apparatus, starting from the CPU 302, the CPU 302 processes the acquired input data, the CPU 302 processes the input data and provides the output data to the FPGA 304, the FPGA 304 processes the output data and provides the output data to the FPGA 306, the FPGA 306 processes the output data and provides the output data to the FPGA 308, and the FPGA 308 processes the output data and provides the output data to the CPU 310. The stream processing units included in the stream computing apparatus of fig. 3 are connected in series in the specific order, or are directly interconnected in the specific order, so that the data stream between the stream processing units is unidirectionally transmitted from one stream processing unit to another stream processing unit in the specific order.

With continued reference to FIG. 3, the

CPUs

302 and 310 of the pipelined computing arrangement of FIG. 3 are not reconfigurable or belong to non-programmable devices. While the

FPGAs

304, 306, 308 in the pipelined computing arrangement of fig. 3 are reconfigurable or belong to programmable devices.

FPGAs

304, 306, 308 are first class stream processing units. CPU 302 and CPU 310 are second class stream processing units.

Fig. 4 shows a block diagram of a stream processing unit according to an implementation manner provided in an embodiment of the present application. As shown in fig. 4, the stream processing unit includes a receiving module 402, a sending module 404, a task management module 406, a memory 408, a data distribution and merging module 410, and a computing engine 420. Wherein the receiving module 402 and the transmitting module 404 may be combined into one module or provided as separate modules as in fig. 4. The receiving module 402 is configured to receive data to be processed, for example, a calculation result of a previous stream processing unit, and the sending module 404 is configured to transmit the processing result, for example, to a next stream processing unit. The receiving module 402 may also convert the data format of the received data to be processed, e.g. the data format of a SERDES transmission, into an internal data format. The sending module 404 may also convert the internal data format of the calculation result into a data format transmitted by the SERDES. The task management module 406 is configured to receive and process the issued computation command, read data from the memory 408 and write a computation result into the memory 408, obtain a computation result of the previous step from the cache, and send data that needs to be computed to the data distribution and merging module 410. The data distribution and merging module 410 is used for data distribution and data merging, where data distribution refers to processing, such as bit width conversion, on data sent by the task management module 406 and sending the data to the calculation engine 420; the data merging refers to receiving the calculation result of the calculation engine 420, performing processing such as bit width conversion, and transmitting the calculation result to the task management module 406. The calculation engine 420 may include a plurality of sub-engines, each of which operates independently and obtains data and uploads calculation results through the data distribution and consolidation module 410.

It should be understood that the stream processing unit shown in fig. 4 is only an example, and a stream processing unit having any structure and functional composition may be adopted as long as the basic principle and the operation mechanism of the pipelined computing device mentioned in the embodiment of the present application are satisfied, and is not limited specifically herein.

Fig. 5 shows a flowchart of a pipeline computing method provided in an embodiment of the present application. As shown in fig. 5, the pipelined calculation method includes the following steps.

Step S510: a plurality of stream processing units is provided.

In step S510, the plurality of stream processing units include at least one first-class stream processing unit, each of the at least one first-class stream processing units is at least partially reconfigurable, the plurality of stream processing units correspond to a plurality of steps of a reference calculation flow in a one-to-one manner, and each of the at least one first-class stream processing units is configured to be reconfigurable to optimize, for a step corresponding to the first-class stream processing unit in the reference calculation flow, a processing time for the first-class stream processing unit to execute the step corresponding to the first-class stream processing unit in the reference calculation flow.

Step S520: and obtaining a plurality of tasks of the same task batch.

In step S520, the calculation process of the tasks belonging to the same task batch may be expanded into a plurality of steps corresponding to the plurality of steps of the reference calculation process in a one-to-one manner, respectively, according to the reference calculation process.

Step S530: for each of the tasks, the task is pipelined through the stream processing units, and the computation flow of the task is processed by each of the stream processing units according to a step corresponding to the stream processing unit in the multiple steps after the reference computation flow is expanded.

In the pipeline computing method shown in fig. 5, the commonality between the tasks of the same task batch in the logic flow or the computing process is embodied by the reference computing flow and the expanded steps thereof, and the configuration of the stream processing units according to the commonality is realized by the one-to-one correspondence between the stream processing units and the steps of the reference computing flow, and further, in consideration of a certain difference between the reference computing flow and the tasks of the same task batch, the processing time of the first-type stream processing unit executing the step corresponding to the first-type stream processing unit in the reference computing flow is optimized by each of the at least one first-type stream processing unit configured to be reconfigurable for the step corresponding to the first-type stream processing unit in the reference computing flow, so that the difference is overcome, and the pipeline manner is further facilitated to maximize the utilization of the computing power resources of the stream processing units, thereby improving the overall pipeline processing efficiency and the resource utilization efficiency.

In one possible implementation, the processing times of each of the plurality of stream processing units for performing the step corresponding to the stream processing unit in the reference calculation flow together constitute a reference processing time array, and a first gap between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold.

In one possible implementation, the pipelined computation method further includes: during the process of pipelining the one or more tasks by the plurality of stream processing units, monitoring the actual processing time of each stream processing unit in the plurality of stream processing units for executing the step corresponding to the stream processing unit, comparing the actual processing time of the stream processing unit with the processing time of the stream processing unit in the reference processing time array, and selectively reconstructing one or more first-type stream processing units in the at least one first-type stream processing unit according to the comparison result.

It is to be understood that the above-described method may be implemented by a corresponding execution body or carrier. In some exemplary embodiments, a non-transitory computer readable storage medium stores computer instructions that, when executed by a processor, implement the above-described method and any of the above-described embodiments, implementations, or combinations thereof. In some example embodiments, an electronic device includes: a processor; a memory for storing processor-executable instructions; wherein the processor executes the executable instructions to implement the method described above and any of the embodiments, implementations, or combinations thereof described above.

The embodiments provided herein may be implemented in any one or combination of hardware, software, firmware, or solid state logic circuitry, and may be implemented in connection with signal processing, control, and/or application specific circuitry. Particular embodiments of the present application provide an apparatus or device that may include one or more processors, such as microprocessors, controllers, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), and the like, that process various computer-executable instructions to control the operation of the apparatus or device. Particular embodiments of the present application provide an apparatus or device that can include a system bus or data transfer system that couples the various components together. A system bus can include any of a variety of different bus structures or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. The devices or apparatuses provided in the embodiments of the present application may be provided separately, or may be part of a system, or may be part of other devices or apparatuses.

Particular embodiments provided herein may include or be combined with computer-readable storage media, such as one or more storage devices capable of providing non-transitory data storage. The computer-readable storage medium/storage device may be configured to store data, programmers and/or instructions that, when executed by a processor of an apparatus or device provided by embodiments of the present application, cause the apparatus or device to perform operations associated therewith. The computer-readable storage medium/storage device may include one or more of the following features: volatile, non-volatile, dynamic, static, read/write, read-only, random access, sequential access, location addressability, file addressability, and content addressability. In one or more exemplary embodiments, the computer-readable storage medium/storage device may be integrated into a device or apparatus provided in the embodiments of the present application or belong to a common system. The computer-readable storage medium/memory device may include optical, semiconductor, and/or magnetic memory devices, etc., and may also include Random Access Memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, a hard disk, a removable disk, a recordable and/or rewriteable Compact Disc (CD), a Digital Versatile Disc (DVD), a mass storage media device, or any other form of suitable storage media.

The above is an implementation manner of the embodiments of the present application, and it should be noted that the steps in the method described in the embodiments of the present application may be sequentially adjusted, combined, and deleted according to actual needs. In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. It is to be understood that the embodiments of the present application and the structures shown in the drawings are not to be construed as particularly limiting the devices or systems concerned. In other embodiments of the present application, an apparatus or system may include more or fewer components than the specific embodiments and figures, or may combine certain components, or may separate certain components, or may have a different arrangement of components. Those skilled in the art will understand that various modifications and changes may be made in the arrangement, operation, and details of the methods and apparatus described in the specific embodiments without departing from the spirit and scope of the embodiments herein; without departing from the principles of embodiments of the present application, several improvements and modifications may be made, and such improvements and modifications are also considered to be within the scope of the present application.

Claims

1. A pipelined computing device for use in private computing, private data, and federal learning, the pipelined computing device comprising:

a plurality of stream processing units, wherein the plurality of stream processing units comprises at least one first class stream processing unit, each of the at least one first class stream processing units being at least partially reconfigurable, wherein the at least one first class stream processing unit comprises at least one FPGA and at least one CGRA, the plurality of stream processing units are configured to be directly interconnected or connected via an internal bus, each of the at least one first class stream processing units comprises configurable logic blocks, programmable interconnection resources, and programmable input output units, respective executable resources of the at least one first class stream processing units are independent of each other and reconfiguration of each of the first class stream processing units is independent of each other,

wherein the plurality of stream processing units correspond to a plurality of steps of a reference computing flow in a one-to-one manner, each of the at least one first type of stream processing unit is configured to be reconfigurable to optimize a processing time for the first type of stream processing unit to execute the step corresponding to the first type of stream processing unit in the reference computing flow with respect to the step corresponding to the first type of stream processing unit in the reference computing flow,

wherein, the plurality of stream processing units are configured to pipeline one or more tasks belonging to the same task batch, a calculation flow of any task belonging to the same task batch may be expanded into a plurality of steps corresponding to the plurality of steps of the reference calculation flow in a one-to-one manner according to the reference calculation flow, each of the plurality of stream processing units is configured to process a step corresponding to the stream processing unit in the plurality of steps after the calculation flow of each of the one or more tasks is expanded according to the reference calculation flow,

each stream processing unit in the plurality of stream processing units respectively executes the processing time of the step corresponding to the stream processing unit in the reference calculation flow to form a reference processing time array, wherein a first difference between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold value,

the reconstructing of the first-type stream processing unit is to increase a processing time of a step corresponding to the first-type stream processing unit by reducing an executable resource of the first-type stream processing unit for executing the step corresponding to the first-type stream processing unit.

2. The pipelined computing device of claim 1, wherein each of the at least one first-type stream processing unit is reconfigured before the pipelined computing device is shipped out of the factory to optimize a processing time for the first-type stream processing unit to perform a step in the reference computing flow corresponding to the first-type stream processing unit for a step in the reference computing flow corresponding to the first-type stream processing unit.

3. The pipelined computing device of claim 1, wherein the reference computing process is determined based on a computing scenario associated with the same task batch, each of the at least one first-type stream processing units being reconfigured before the pipelined computing device processes the same task batch to optimize a processing time for the first-type stream processing unit to perform a step in the reference computing process corresponding to the first-type stream processing unit for the step in the reference computing process corresponding to the first-type stream processing unit.

4. The pipelined computing device of claim 1, wherein the plurality of stream processing units are configured to pipeline one or more tasks belonging to a same task batch, comprising: according to the processing sequence of the one or more tasks, each of the plurality of stream processing units starts to process the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is expanded according to the reference calculation flow after the step corresponding to the stream processing unit in the plurality of steps after the calculation flow of the current task is processed according to the reference calculation flow.

5. The pipelined computing device of claim 1, wherein the plurality of stream processing units in one-to-one correspondence with the plurality of steps of the reference computing flow are arranged in a sequence of stream processing units according to a precedence order of the plurality of steps, a given stream processing unit in the sequence of stream processing units is configured to obtain input data from a previous stream processing unit in the sequence of stream processing units relative to the given stream processing unit and provide output data to a next stream processing unit in the sequence of stream processing units relative to the given stream processing unit, and the given stream processing unit is any stream processing unit in the sequence of stream processing units.

6. The pipelined computing device of claim 4, wherein the plurality of stream processing units are arranged in a sequence of the plurality of steps of the reference computing flow to obtain a sequence of stream processing units, and the plurality of stream processing units are configured to pipeline one or more tasks belonging to a same task batch, further comprising:

each of the plurality of stream processing units obtains, from a previous stream processing unit in the sequence of stream processing units with respect to the stream processing unit, input data for a step corresponding to the stream processing unit in a plurality of steps in which the calculation flow of the current task is expanded in accordance with the reference calculation flow, executes the step corresponding to the stream processing unit in the calculation flow of the current task, and supplies a calculation result as output data to a next stream processing unit in the sequence of stream processing units with respect to the stream processing unit so as to be input data for the step corresponding to the next stream processing unit in the calculation flow of the current task.

7. The pipelined computing device of claim 1, wherein during pipelined processing of the one or more tasks by the plurality of stream processing units, the pipelined computing device monitors an actual processing time for each of the plurality of stream processing units to each perform a step corresponding to that stream processing unit, compares the actual processing time for that stream processing unit to the processing time for that stream processing unit in the array of reference processing times, and selectively reconstructs at least one of the first type of stream processing units based on the comparison.

8. The pipelined computing device of claim 7, wherein selectively reconfiguring one or more of the at least one first class of stream processing units based on the comparison comprises: the processing time of the steps corresponding to the one or more first type stream processing units is increased by reducing the executable resources of the one or more first type stream processing units for executing the steps corresponding to the one or more first type stream processing units.

9. The pipelined computing device of claim 1, wherein during pipelined processing of the one or more tasks by the plurality of stream processing units, the pipelined computing device monitors a second gap between a maximum value and a minimum value of actual processing time for each of the plurality of stream processing units to perform a step corresponding to that stream processing unit, and selectively reconstructs at least one of the first type of stream processing units to reduce the second gap when the second gap is greater than a second preset threshold.

10. The pipelined computation device of claim 1, wherein during pipelined processing of the one or more tasks by the plurality of stream processing units, the pipelined computation device monitors a maximum value of actual processing times of each of the plurality of stream processing units that each perform a step corresponding to the stream processing unit, compares the actual processing time of the maximum value with the processing time of the stream processing unit having the actual processing time of the maximum value in the reference processing time array, and selectively reconstructs the stream processing unit having the actual processing time of the maximum value according to the comparison result.

11. The pipelined calculation apparatus according to claim 1, wherein, in an order of a plurality of steps of the reference calculation flow, the plurality of stream processing units corresponding to the plurality of steps in a one-to-one correspondence are arranged to obtain a stream processing unit sequence, and during the pipelined processing of the one or more tasks by the plurality of stream processing units, the pipelined calculation apparatus monitors a third gap between respective actual processing times of any two adjacent stream processing units in the stream processing unit sequence, and reconstructs a stream processing unit having a smaller actual processing time of the two adjacent stream processing units when the third gap exceeds a third preset threshold value so as to reduce an executable resource of the stream processing unit for executing the step corresponding to the stream processing unit and thereby increase the actual processing time of the stream processing unit.

12. The pipelined computing device of claim 1, wherein the plurality of stream processing units further comprises at least one second type of stream processing unit and none of the at least one second type of stream processing unit is reconfigurable, the at least one second type of stream processing unit comprising at least one CPU and/or at least one GPU.

13. The pipelined computing device of claim 12, wherein the at least one second-class stream processing unit corresponds to an initial step and an end step of the reference computing flow.

14. The pipelined computing device of claim 1, wherein the reference computing process is a Paillier decryption operation process or a Paillier encryption operation process.

15. A pipelined computational method for private computing, private data, and federal learning, the pipelined computational method comprising:

providing a plurality of stream processing units, wherein the plurality of stream processing units comprise at least one first-class stream processing unit, each of the at least one first-class stream processing units is at least partially reconfigurable, the plurality of stream processing units correspond to a plurality of steps of a reference computing flow in a one-to-one manner, each of the at least one first-class stream processing units is configured to be reconfigurable to optimize a processing time for the first-class stream processing unit to execute the step corresponding to the first-class stream processing unit in the reference computing flow with respect to the step corresponding to the first-class stream processing unit in the reference computing flow, wherein the at least one first-class stream processing unit comprises at least one FPGA and at least one CGRA, the plurality of stream processing units are configured to be directly interconnected or connected through an internal bus, each of the at least one first-class stream processing units comprises a configurable logic block, an interconnection resource, and a programmable input output unit, and the respective executable resources of the at least one first-class stream processing unit are independent of each other and are independent of each other for reconfiguring the first-class stream processing units;

obtaining a plurality of tasks of the same task batch, wherein the calculation process of the tasks belonging to the same task batch can be expanded into a plurality of steps corresponding to the steps of the reference calculation process in a one-to-one manner according to the reference calculation process;

for each task in the plurality of tasks, the task is pipelined through the plurality of stream processing units, and each stream processing unit in the plurality of stream processing units is used for processing a step corresponding to the stream processing unit in a plurality of steps after the calculation flow of the task is expanded according to the reference calculation flow,

wherein the processing time of each of the plurality of stream processing units executing the step corresponding to the stream processing unit in the reference calculation flow together constitute a reference processing time array, a first difference between a maximum value and a minimum value in the reference processing time array is smaller than a first preset threshold value,

16. The pipelined computing method of claim 15 wherein the pipelined computing method further comprises:

during the process of pipelining the one or more tasks by the plurality of stream processing units, monitoring the actual processing time of each stream processing unit in the plurality of stream processing units for executing the step corresponding to the stream processing unit, comparing the actual processing time of the stream processing unit with the processing time of the stream processing unit in the reference processing time array, and selectively reconstructing at least one first type of stream processing unit according to the comparison result.