CN112506470B

CN112506470B - Chip and computing system

Info

Publication number: CN112506470B
Application number: CN202011518541.5A
Authority: CN
Inventors: 范志军; 刘建波; 杨作兴
Original assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2024-07-02
Anticipated expiration: 2040-12-21
Also published as: CN112506470A

Abstract

Chips and computing systems are disclosed. The chip includes fault tolerant logic comprising: a plurality of registers for storing data; the logic operation module is used for carrying out operation; a fault tolerant operation module associated with the register and/or the logic operation module to receive a plurality of inputs. The fault-tolerant operation module comprises: a plurality of preprocessing units, each for receiving a corresponding bit of the plurality of inputs for operation; a carry network, the carry network comprising a plurality of stages; and a plurality of post-processing units, wherein the fault tolerant operation module is configured to intentionally cause an operation result thereof to be error with respect to a part of input values of the plurality of inputs.

Description

Chip and computing system

Technical Field

The present disclosure relates to chips and computing systems.

Background

Chips based on the Proof of Work (POW) mechanism currently typically use Hash (Hash) tables as the computational content. Typically, such chips use the SHA256 algorithm.

The following 3 aspects are often of concern: the size of the chip, which determines the production cost; the operating speed of the chip, which determines the operating speed, i.e., the calculation force; and the power consumption of the chip, which determines the degree of power consumption, i.e., the cost. The power consumption per calculation force (or power consumption calculation force ratio) is an important indicator for measuring performance.

Therefore, the method improves the computing power of the chip, reduces the production cost of the chip and reduces the power consumption of the chip, and is a main aspect of the development of the chip in the industry.

Here, a new chip and computing system are presented.

Disclosure of Invention

According to one aspect of the present disclosure, a chip is provided that includes fault tolerant logic that may include: a plurality of registers for storing data; the logic operation module is used for carrying out operation; a fault tolerant operation module associated with the register and/or the logic operation module to receive a plurality of inputs, the plurality of inputs including a first input and a second input, the first input and the second input including the same number of bits. The fault-tolerant operation module comprises: a plurality of preprocessing units, each for receiving a corresponding bit of the plurality of inputs for operation; a carry network comprising a plurality of stages, each stage comprising at least one carry-handling unit, each of the carry-handling units being configured to receive a carry input, to perform an operation, and to generate a carry output, wherein the carry-handling units of the first stage are configured to receive an output from a corresponding pre-processing unit as a carry input; and a plurality of post-processing units, at least a portion of which are configured to operate in accordance with a corresponding carry output received from the carry network and an input received from a corresponding pre-processing unit. The fault tolerant operation module is configured to intentionally cause an operation result thereof to be error with respect to a part of input values of the plurality of inputs.

In some embodiments, the base of the carry operation units in the carry network is four or less.

In some embodiments, the carry network is configured such that the operation result of the fault tolerant operation module is subject to error for a portion of the input values of the plurality of inputs.

In some embodiments, the carry-in network is configured such that the result of the operation of the fault tolerant operation module is error tolerant to a portion of the input values of the plurality of inputs that relate to the carry-in network.

In some embodiments, the carry-in network is configured to omit a portion of the logic operations of a portion of its multiple input values.

In some embodiments, where the first and second inputs comprise bits greater than or equal to 2 ⁿ and less than 2 ⁿ⁺¹, the multiple stages comprise n stages or less, where n is a natural number.

In some embodiments, each of the pre-processing units includes an AND operation unit, and/or an OR operation or an XOR operation unit. In some embodiments, each of the post-processing units comprises an exclusive-or operation unit.

In some embodiments, the logic operation module is configured to obtain data from one or more of the plurality of registers to perform a logic operation thereon.

In some embodiments, the logic operation module comprises a module adapted to perform one or more of the Maj, ch, and Σ operations specified by the secure hash algorithm SHA-256.

In some embodiments, the plurality of inputs further comprises a carry input.

In some embodiments, the plurality of inputs further includes a third input having the same number of bits as the first and second inputs.

In some embodiments, the chip includes a plurality of cores, each core including the fault tolerant logic, at least a portion of the plurality of cores being serially connected to each other.

In some embodiments, the chip further includes a top module that communicates with the plurality of cores and performs a check on the computation results of the plurality of cores.

According to one aspect of the present disclosure, there is provided a computing system comprising a chip according to any embodiment.

In some embodiments, the computing system further includes a control board in communication with the chip and checking the results of the chip.

Other features of the present disclosure and its advantages will become apparent from the following detailed description of exemplary embodiments of the disclosure, which proceeds with reference to the accompanying drawings.

Drawings

FIG. 1 schematically shows a computational block diagram of a SHA256 algorithm for operations;

FIG. 2 schematically shows a computational block diagram of a modified SHA256 algorithm for operation;

FIG. 3 illustrates a schematic diagram of a chip according to an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a view of one example of a carry-over adder;

FIG. 5 schematically illustrates a view of another example of a carry-over adder;

FIG. 6 schematically illustrates a view of yet another example of a carry-over adder;

FIG. 7 is a diagram illustrating a fault tolerant computing module according to an exemplary embodiment of the present disclosure;

Fig. 8 is a circuit diagram illustrating a logic operation performed by a certain carry operation unit according to an exemplary embodiment of the present disclosure;

Fig. 9 is a circuit diagram illustrating another logic operation performed by a certain carry operation unit according to an exemplary embodiment of the present disclosure;

FIG. 10 is a truth table illustrating logical operations performed by a certain carry-handling unit according to an exemplary embodiment of the present disclosure;

FIG. 11 is a diagram illustrating a fault tolerant operation module according to another exemplary embodiment of the present disclosure; and

Fig. 12 is a view illustrating a fault tolerant operation module according to still another exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, various exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. It should be appreciated that the following description of at least one exemplary embodiment is merely illustrative, and is not intended to limit the present disclosure, application or uses thereof. It should also be appreciated that any implementation illustratively described herein is not necessarily indicative of a preferred or advantageous embodiment over other implementations. The disclosure is not to be limited by any expressed or implied theory presented in the preceding technical field, background, brief summary or the detailed description.

In addition, certain terminology may be used in the following description for the purpose of reference only and is therefore not intended to be limiting. For example, the terms "first," "second," and other such numerical terms referring to structures or elements do not imply a sequence or order unless clearly indicated by the context.

It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, and/or components, and/or groups thereof.

Fig. 1 schematically shows a computational block diagram of a SHA256 algorithm for operation. A to H shown in fig. 1 respectively represent registers (e.g., 32 bits), W _t and K _t respectively represent data (e.g., also 32 bits), and Maj, ch, Σ1 and Σ0 respectively represent, for example, logical operation as follows:

For a more detailed description of the logical operation described above, see, for example, the description of SHA256 in the secure hash algorithm (https:/(tools. Ietf. Org/pdf/rfc4634. Pdf) may be obtained from the following addresses. Some related descriptions are exemplarily listed below.

Fig. 2 schematically shows a computational block diagram of a modified SHA256 algorithm for operation. A to H and R shown in fig. 2 respectively represent registers (e.g., 32 bits), faa (Full ADDER ARRAY) represents a Full adder array to be described later or may also be referred to as a carry save adder (CARRY SAVE ADDER), cla (Carry lookahead adder) represents a carry look ahead adder (which is a fast implementation of a Full adder). Similarly to that shown in fig. 1, maj, ch, Σ1, Σ0, σ ₀, and σ ₁, for example, represent the following logical operation, respectively:

For a more detailed description of the logical operation described above, see also, for example, the description of SHA256 in the secure hash algorithm (https:/(tools. Ietf. Org/pdf/rfc4634. Pdf) may be obtained from the following addresses.

Fig. 3 illustrates a schematic structure of a chip according to an exemplary embodiment of the present disclosure. The chip shown in fig. 3 includes a top module and a plurality of cores. The plurality of cores may be arranged in a matrix form; but the present disclosure is not limited thereto. It should be understood that the number of these cores is merely exemplary, and that a chip according to the present disclosure may include more or fewer cores.

In some embodiments of the present disclosure, the plurality of cores in a chip may be identical. For example, the plurality of cores may be configured to perform the same operation on the same received signal. Or the cores may be configured such that the computation (or operation) of the respective received signals is based on the same algorithm. Or the cores may be configured such that the computation of the respective received signals is based on the same algorithm. In some embodiments, the cores may calculate the respective received data based on the same known algorithm or future developed algorithm, SHA256 algorithm for operation or modified SHA256 algorithm as described above, and feed back the respective calculated valid calculation results to the top module. The top module may perform the checking of the valid calculation results from all modules (or cores) and then feed back the correct valid calculation results further to the higher-level checking module (e.g., control board, not shown). Further, the upper layer can also have multiple checking points.

During the calculation, the top module or control board will check the results. If the core is in error, the calculation result is misreported to the top module or the control panel, and the top module or the control panel can check the result and discard the error result. This does not have a significant impact on the accuracy of the result obtained or the probability of obtaining the final result of the operation. As shown in fig. 3, the number of cores is numerous, computationally intensive; and the top module performs checking calculation after receiving the calculation result reported by the core. Therefore, it would be desirable if the structure of the core could be simplified, thereby reducing the production cost and operating power consumption of the chip, while at the same time having substantially no or little impact on the probability of achieving the desired result, thereby improving the power-to-power ratio.

To this end, the inventors of the present application have proposed a chip for operation as disclosed herein. The chip in the present disclosure adopts a fault tolerance mechanism, so that the chip structure is simplified, the power consumption and the size of the chip can be greatly reduced, and meanwhile, the performance of the chip is ensured not to be excessively reduced, so that the power consumption ratio of the chip is improved. By fault tolerant design, substantial area savings and reduced power consumption are achieved, and the power to cost ratio is maintained or improved.

Next, a chip for operation of the present disclosure will be described in detail with reference to specific examples.

A chip for operation according to an embodiment of the present disclosure includes fault tolerant logic. The fault tolerant logic circuit may include: the system comprises a plurality of registers for storing data, a logic operation module for performing operation and a fault tolerant operation module. In some implementations, the fault tolerant logic may also be contained in one or more cores.

According to some embodiments of the present disclosure, the plurality of registers may be registers A-H as described above in connection with FIG. 1, or registers A-H as well as registers R as described above in connection with FIG. 2. The registers a-H may store data, such as data received from outside or intermediate data used internally, etc.

The logic operation module can perform logic operation. According to some embodiments of the present disclosure, the logic operation module may obtain data from one or more registers of the plurality of registers to perform a logic operation thereon. For example, the logic operation module may perform one or more of the Maj, ch, Σ1, Σ0, σ ₀, and σ ₁ operations described above in connection with fig. 1 and 2; but the present disclosure is not limited thereto. In some embodiments, a plurality of logic operation modules may be included.

According to some embodiments of the disclosure, the fault tolerant operation module is configured to intentionally error its operation result with respect to a portion of the input values of the plurality of inputs. Unlike the related art, which generally desires to perform operations comprehensively and accurately and avoid errors, the inventors of the present application propose a fault tolerant technique that intentionally introduces errors. By the techniques of this disclosure, the circuit structure of the chip may be simplified, the chip area reduced, the power consumption reduced, and/or the cost per computing force reduced or the ratio of computing force to cost improved (e.g., the computing force power consumption ratio). Furthermore, according to the techniques of the present disclosure, there is little impact on the probability of achieving the desired result, while significantly reducing costs.

Next, the fault tolerant operation module according to the present disclosure will be described in detail.

Fig. 4 is a diagram schematically showing one example of the super-carry adder. Those skilled in the art will readily appreciate that FIG. 4 shows a parallel pre (parallel pre) diagram of a Ladner-Fischer adder with a radix of 2.

In fig. 4, a 16-bit input is shown, wherein each open small circle indicates an operation unit (or also called a preprocessing unit, or node) corresponding to the input. The operation unit may receive two corresponding bits of the two 16-bit inputs and perform an operation (e.g., without limitation, summation). Fig. 4 also shows a carry network comprising 4 stages (stage 1-stage 4). Each of which indicates an operation unit or node (also referred to herein as a carry operation unit). In the case where the input includes bits of 2 ⁿ or more and 2 ⁿ⁺¹ or less, the number of stages of the carry network may be n stages or less, where n is a natural number.

As shown in fig. 4, each stage of the carry network may include 8 operation units. For example, the carry output of the processing or operation of the node (preprocessing unit) corresponding to the bit 0 is supplied to the node (which corresponds to the bit 1, as indicated by the broken line in the figure) on the rightmost side of the first stage (stage 1). The operation result of the rightmost node (carry operation unit) of the first stage (stage 1) is input to the rightmost two nodes (which correspond to bit 2 and bit 3, respectively, as indicated by the broken lines in the figure) of the second stage (stage 2). The remaining nodes represent similar operations as lines and are not described one by one.

Fig. 5 is a diagram schematically showing another example of the super-carry adder. FIG. 5 is a parallel front-end diagram of a base 2 Kogge-Stone adder. Fig. 6 is a diagram schematically showing still another example of the super-carry adder.

The diagonal lines shown in fig. 4 and 5 represent the carry network. As can be seen from fig. 4 and 5, the Ladner-Fischer adder has a greater fan-out (fanout), and therefore a simpler structure than that shown in fig. 5.

FIG. 6 is a parallel front-end diagram of a base 4 32-bit Ladner-Fischer adder. The structure shown in fig. 6 is more compact, but the carry operation unit in the carry network is more complex, compared to the structure shown in fig. 4 or 5.

As shown in fig. 6, the first stage (stage 1) includes 8 nodes (or units) that respectively receive the outputs of the nodes of the corresponding 4-bit previous stage (here, the preprocessing unit). Here the four-bit input corresponds to one group (group 1). The nodes of the first stage perform corresponding operations G and P, respectively. As an example, G may be configured as an and operation, and P may be an or operation or an exclusive or operation. The second stage (stage 2) may include 6 nodes, divided into two groups (group 3). For example, the nodes of group 3 on the right receive 1, 2, 3 outputs from the nodes of stage 1 of the lowest 12 bits, respectively. As exemplarily shown in the figures, the nodes of group 3 may also be selectively configured to perform G and/or P operations. The output of the node of stage 2 is input to the node of stage 3. As shown in the figure, the nodes of stage 3 are divided into a plurality of groups (group 4). The node of each stage 3 receives the outputs from the nodes of the two stages 2 for operation, thereby obtaining corresponding carry outputs.

It should be appreciated that the parallel pre (parallel pre) graphs of various prior art adders are readily understood by those of ordinary skill in the art. Thus, only the critical paths required for carry network carry are shown here, and not all paths or traces may be shown. Furthermore, the parallel pre-graphs shown here have the meaning generally understood in the art and will not be described in further detail.

The fault tolerant operation module according to an embodiment of the present disclosure will be described in detail below. Fault tolerant operation modules according to embodiments of the present disclosure may be implemented based on, but not limited to, various full adders of the prior art, and in particular carry-lookahead full adders.

Fig. 7 is a diagram illustrating a fault tolerant operation module according to an exemplary embodiment of the present disclosure. As shown in fig. 7, the fault tolerant operation module includes a plurality of pre-processing units (P0/G0, P1/G1, …, P8/G8), a carry network, and a plurality of post-processing units (S0, …, S8). A fault tolerant arithmetic block receiving an input having 9 bits is shown here to illustrate the principles of the present invention. Those skilled in the art will readily apply these principles to cases of more or fewer bit inputs.

First, the preprocessing unit of the fault-tolerant operation module according to the present embodiment will be described in detail.

The fault tolerant operation module according to the present embodiment may receive a plurality of inputs. The fault tolerant operation module may be associated with a register and/or logic operation module to receive a plurality of inputs, which may include a first input and a second input, which may include the same number of bits. Such as a first input a [8:0] and a second input B [8:0], as shown in fig. 7. This is merely exemplary and the plurality of inputs further includes a carry input according to some embodiments of the present disclosure. According to further embodiments of the present disclosure, the plurality of inputs further includes a third input having the same number of bits as the first input and the second input.

Each pre-processing unit receives corresponding bits of the first input and the second input, respectively. For example, the P0/G0 unit receives the 0 th bits A0 and B0 of the first and second inputs, the P1/G1 unit receives the 1 st bits A1 and B1 of the first and second inputs, and so on. Here, as an example, each of the preprocessing units P0 to P8 is configured as a unit that performs an or operation or an exclusive or operation, and each of the preprocessing units G0 to G8 is configured as a unit that performs an and operation; but the present disclosure is not limited thereto. For example, G denotes an and operation unit, and P denotes an or operation unit or an exclusive or operation unit. Preferably, P represents an exclusive or operation unit. In some embodiments, px/Gx may be present in pairs, and thus may also be considered together as a pre-processing unit in some cases; alternatively, the pre-processing unit may also include both Px/Gx.

In a specific implementation according to an embodiment of the present disclosure, the pre-processing units P0-P8 and G0-G8 are configured to perform the following logical operations for the corresponding bits of the first input and the second input:

G0=A0&B0

G1=A1&B1

G2=A2&B2

G3=A3&B3

G4=A4&B4

G5=A5&B5

G6=A6&B6

G7=A7&B7

P0=A0^B0

P1=A1^B1

P2=A2^B2

P3=A3^B3

P4=A4^B4

P5=A5^B5

P6=A6^B6

P7=A7^B7

here, the equal sign "=" left side represents the output of the corresponding cell, and at the right side of the equal sign "=" Ax or Bx represents the bit of the corresponding bit inputted. Where "&" represents an AND operation and "≡" represents an XOR operation. For example, for the equation "p0=a0≡b0", it means: the output of cell P0 = exclusive or of 0 bit A0 from input a and B0 bit from input B.

Next, the fault tolerant operation module including the carry network according to the present embodiment will be described in detail.

Fig. 7 illustrates a block diagram of a fault tolerant operation module and a carry network according to an exemplary embodiment of the present disclosure. The block diagram shown in fig. 7 is somewhat similar to a parallel pre-diagram. The critical path of part of the nodes of the carry network according to the present embodiment is exemplarily shown in fig. 7. Here xy in Pxy or Gxy means from the x-th to y-th bits whose indicated operation unit or whose operation involves input. For example, the operation unit P74 indicates that the operation P of the operation unit relates to the 7 th to 4 th bits, and G30 indicates that the operation G of the operation unit relates to the 3 rd to 0 th bits; this is so. In general, pxy/Gxy in the carry-handling unit may be present in pairs. In some embodiments, however, where Gx0 is available, the operation unit Px0 (e.g., P10, P20, P30, …, P70, etc.) may not be needed in the carry operation, and these units may be omitted. Further, the paths shown in fig. 7 indicate input-output relationships between units, for example, as indicated by arrows in the figure.

The carry network shown in fig. 7 comprises 2 stages (stage 1 and stage 2), wherein each stage comprises at least one carry-handling unit, e.g. P74/G74, G30, G70, etc. Each carry-handling unit may receive a carry input, perform an operation, and generate a carry output. For example, the carry-handling unit of the first stage (P74/G74, G30, etc.) receives as carry-in the output from the corresponding pre-processing unit. For example, G10 receives the outputs of the preprocessing units G1, P1, and G0 as carry inputs. Similarly, G20 receives the outputs of the pre-processing units G2, P2, G1, P1 and G0 as carry inputs. It should be noted that when the carry in and carry out are said here in connection with a carry operation unit, it is only intended to indicate that the carry in and carry out are inputs and outputs related to the carry operation unit concerned.

In some embodiments according to the present disclosure, each carry-handling unit in the carry network is configured to perform a logical operation as follows:

G10=G1|(P1&G0)

G20=G2|【P2&{G1|(P1&G0)}】

G30=G3|[P3&{G2|【P2&{G1|(P1&G0)}】}]

G74=G7|[P7&{G6|【P6&{G5|(P5&G4)}】}]

P74=P7&P6&P5&P4

G40=G4|(P4&G30)

G50=G5|(P5&(G4|(P4&G30)))

G60=G6|【P6&{G5|(P5&(G4|(P4&G30)))}】

G70=G74|(P74&G30)

Here, the left side of the equal sign "=" represents the output of the corresponding cell, while at the right side of the equal sign "=" Ax or Bx represents the bit of the corresponding bit inputted, and Px/Gx or Pxy/Gxy represents the operation result or output of the corresponding cell. Where "|" means an "or" operation, and "≡means an" and "operation.

As is known from the logical operation of the carry operation units as described above, the base (radix) of each carry operation unit is set to four or less. Those skilled in the art will readily appreciate that a base may be used to indicate the number of operands (sometimes referred to as operands) of an AND operation or OR operation that the corresponding unit performs. Those skilled in the art will readily appreciate that it may be desirable in some cases that the base is not too high, since the AND operation will appear as a series of transistors on the circuit implementation. For example, in this embodiment, the basis of P74/G74, G30, G60 is 4 (which may be expressed as a series of 4P-type transistors or 4N-type transistors in circuit implementation); the G20, G50 groups are 3 (which similarly may be expressed as a series of 3P-type transistors or 3N-type transistors); while the group of G10, G40, G70 is 2 (which similarly may be expressed as a series of 2P-type transistors or 2N-type transistors). Further, herein, the maximum value of the basis of each cell in a circuit is referred to as the basis of the circuit.

Here, it should be noted that although a carry network is schematically indicated in fig. 7 with an ellipse, it will be understood by those skilled in the art that while the ellipse schematically covers or intersects the operational elements and at least a majority of the paths of the carry network shown in the figure, this is merely illustrative and not limiting. For example, there may be some of the operation units not shown, or some paths may be defined that do not belong to the carry network, etc.

As shown in fig. 7, the fault tolerant operation module further includes a plurality of post-processing units (S0, …, S8). The post-processing unit may be configured to input received from a preceding stage (e.g., a pre-processing unit or a unit in a carry network). At least a portion of the plurality of post-processing units may be configured to operate, e.g. sum, on a carry output received from the carry network and an input received from the corresponding pre-processing unit. In one embodiment, the post-processing unit may include an exclusive-or operation unit. For example, the post-processing units may receive inputs from the corresponding pre-processing units and carry inputs from carry operation units of the carry network and exclusive-or them. In one particular implementation, for example, s6=p6≡g50; here, p6=a6≡b6, as previously described, while G50 is the corresponding carry for bit 6 in the carry network.

In a specific embodiment according to the present disclosure, the post-processing unit may be configured to perform the following logical operations:

S0=P0

S1=P1^G0

S2=P2^G10

S3=P3^G20

S4=P4^G30

S5=P5^G40

S6=P6^G50

S7=P7^G60

S8=P8^G70

Where "∈A" represents an exclusive OR operation. Here, similarly, the left side of the equal sign "=" represents the output of the corresponding unit, while on the right side of the equal sign "=" and Px/Gx or Pxy/Gxy represents the operation result or output of the corresponding unit.

As can be seen from the logical operations in the pre-processing unit, the carry-handling unit and the post-processing unit described above, the logical operations performed in the pre-processing unit and the post-processing unit are simpler, whereas the logical operations performed in the carry-handling unit, in particular the logical operations performed in the carry-handling units G30, G74, G60, are more complex. For example, the logical operation performed in G30 involves 7 parameters, whose basis is 4, including 4-number and operations, as follows:

G30=G3|[P3&{G2|【P2&{G1|(P1&G0)}】}]。

Here, although 3 &'s are shown in the above logical expression, those skilled in the art will understand that the expression may be expressed as g3+p3+g2+p3+p2+g1+p3+p2+p2+p1+p1+g0+g0 after expansion, involving up to 4 numbers of and (&) operations (and 4 numbers of or ("|"/"+") operations).

Fig. 8 shows a circuit diagram of a logic operation performed by the carry operation unit G30 according to one embodiment of the present disclosure. The outputs OUT= G30 in FIG. 8, where "-" represents the inverse operation.

As can be seen from fig. 8, the logical operation performed in G30 includes a 4-stage operation. The inventors of the present application conceived to simplify the circuit structure of the cell. In one example, the circuit structure shown in fig. 8 may be reduced to the structure of a modified cell G30' as shown in fig. 9. As can be seen from fig. 9, the transistors receiving P1 and G0 are omitted compared to fig. 8, and thus the corresponding arithmetic operations are omitted. The logical operation performed in G30 'at this time includes a 3-stage operation, and G30' performs the following logical operation:

G30’=G3|[P3&{G2|【P2&G1】}]。

here, the group of G30' is reduced to 3.

Fig. 10 shows a schematic partial truth table of the logic operations performed according to the embodiments shown in fig. 7 and 9. As described above in the description of the present invention,

G1=A1&B1

G2=A2&B2

G3=A3&B3

P1=A1^B1

G10=G1|(P1&G0)

G20=G2|【P2&{G1|(P1&G0)}】

G30’=G3|[P3&{G2|【P2&G1】}]。

The true values for G0, G10, G20, and G30 'for the 3 rd-0 th bits referred to by G30' are shown in FIG. 10. In fig. 10, x in the column corresponding to A2, A3, B2, B3 indicates that A2, A3, B2, or B3 corresponding to the x is an arbitrary input value, ok in the column G20 or G30' indicates that the output value is correct, and fail indicates that the output value is wrong. Therefore, the error rate of the logical operation performed by G30' is:

2/(4*16) = 1/32 ≈ 3%。

As described above, in the present embodiment, the carry operation unit including the 4-stage operation (whose basis is 4) is simplified (here, G30 is the carry operation unit G30' including the 3-stage operation), although the operation result of the fault-tolerant logic module is made to be error with respect to part of the input values of the plurality of inputs, the number of transistors and related wirings used can be reduced, so that the chip area can be effectively reduced.

Furthermore, since the circuit structure of the unit is changed from 4-transistor series connection to 3-transistor series connection, the chip operation is more robust for the case of low voltage threshold. Furthermore, the use of such a circuit configuration with reduced transistor series while functionally outputting for more bits (e.g., 4 bits) allows the corresponding fault tolerant operation module to have less delay and improved overall performance. At the same time, the number of cells in the circuit can also be reduced.

It should be appreciated that the above simplified embodiment for cell G30 is merely exemplary and the present disclosure is not limited thereto. For example, for the carry operation unit G74, a simplification process similar to that of the carry operation unit G30 may be performed. Although simplification of the basic larger cell is preferred, simplification may also be done for other cells in the circuit.

In addition, circuits such as a fault-tolerant operation module or a carry network thereof or parts thereof may be provided as multi-stage circuits each of which has a base of a predetermined value or less (for example, 4 here), so that simplification of the circuit structure can be achieved. Further, simplification of the circuit structure can be further achieved by omitting a part of the logic operations among the multi-stage logic operations performed by at least part of the units (e.g., carry operation units G30, G74, etc.). For such a cell (e.g., its base with the highest value), certain logical operations for a portion of its multiple input values may be omitted. Or may not perform certain logical operations on a portion of its multiple input values. While these logic operations, omitted or not, would be performed in the prior art.

It will also be appreciated that while the above is specifically described with respect to the case of two inputs (first input A [8:0] and second input B [8:0 ]), those skilled in the art will readily appreciate that the principles taught herein may be applied to more inputs or to cases where the inputs include a greater or lesser number of bits.

Fig. 11 is a view illustrating a fault tolerant operation module according to another exemplary embodiment of the present disclosure. In this example, the fault tolerant operation module receives a 17 bit input.

Here, similarly, xy in Pxy or Gxy means the operation unit indicated by it or the x-th to y-th bits whose operation involves input. For example, the operation unit P74 indicates that the operation P of the operation unit relates to the 7 th to 4 th bits, and G30 indicates that the operation G of the operation unit relates to the 3 rd to 0 th bits; this is so. In general, pxy/Gxy in the carry-handling unit may be present in pairs. In some embodiments, however, where Gx0 is available, the operation unit Px0 (e.g., P10, P20, P30, …, P70, etc.) may not be required in the carry operation and, in addition, the paths shown in fig. 11 indicate the input-output relationship between the units, such as shown by the arrows in the figure. As shown in fig. 11, the fault tolerant operation module according to this embodiment may include a pre-processing unit (P0/G0, P1/G1, …, P16/G16), a carry network, and a post-processing unit (S0, …, S16). For simplicity of illustration, only the critical paths of some nodes are shown in fig. 11.

First, a preprocessing unit of the fault-tolerant operation module according to the embodiment will be described. The preprocessing units receive a first input A [16:0] and a second input B [16:0], wherein each preprocessing unit receives corresponding bits of the first input and the second input respectively. The preprocessing unit according to this embodiment is similar to that of the previous embodiment, and a detailed description thereof will be omitted. In the plurality of preprocessing units as shown in fig. 11, the following logical operations are performed for the corresponding bits of the first input and the second input:

G0=A0&B0

G1=A1&B1

G2=A2&B2

G3=A3&B3

G4=A4&B4

G5=A5&B5

G6=A6&B6

G7=A7&B7

G8=A8&B8

G9=A9&B9

G10=A10&B10

G11=A11&B11

G12=A12&B12

G13=A13&B13

G14=A14&B14

G15=A15&B15

P0=A0^B0

P1=A1^B1

P2=A2^B2

P3=A3^B3

P4=A4^B4

P5=A5^B5

P6=A6^B6

P7=A7^B7

P8=A8^B8

P9=A9^B9

P10=A10^B10

P11=A11^B11

P12=A12^B12

P13=A13^B13

P14=A14^B14

P15=A15^B15。

Next, a post-processing unit of the fault-tolerant operation module according to the embodiment will be described. The post-processing unit according to this embodiment is similar to the post-processing unit of the foregoing embodiment, and a detailed description thereof will be omitted. In fig. 11, only part of the paths of the nodes of the post-processing unit are exemplarily shown. In the present embodiment, for each post-processing unit, the following logical operation is performed:

S0=P0

S1=P1^G0

S2=P2^G10

S3=P3^G20

S4=P4^G30

S5=P5^G40

S6=P6^G50

S7=P7^G60

S8=P8^G70

S9=P9^G80

S10=P10^G90

S11=P11^G10_0

S12=P12^G11_0

S13=P13^G12_0

S14=P14^G13_0

S15=P15^G14_0

S16=P16^G15_0。

here, similarly, gxx_y represents an operation unit related to the input xx-yth bit or an operation (or an operation result) thereof. For example, here, g10_0 represents an operation (or an operation result) of an operation unit involving the 10 th to 0 th bits of input.

Next, the carry network according to this embodiment will be described in detail. In this embodiment, the base of the carry network is 4. Similar to that shown in fig. 7, pxy or pxx_y (e.g., p30, p70, p11_0, p15_0, etc.) that are not needed in the carry operation are not shown for simplicity. As shown in fig. 11, the carry network includes 2 stages and thus has a 2-stage delay. Wherein the first stage carry network comprises 4 pairs of carry-handling units (p15_12/g15_12, p11_8/g11_8, P74/G74, P30 (if present, not shown)/G30), each carry-handling unit having a base of 4. The first stage carry network further comprises 3 pairs of auxiliary nodes (P98/G98, P13_12/G13_12, P14_12/G14_12) which do not affect the number of calculation stages. Each node in the first stage carry network may be configured to perform a logical operation as follows:

G10=G1|(P1&G0)

G20=G2|【P2&{G1|(P1&G0)}】

G30=G3|(P3&(G2|【P2&{G1|(P1&G0)}】))

G74=G7|[P7&{G6|【P6&{G5|(P5&G4)}】}]

P74=P7&P6&P5&P4

G11_8=G11|[P11&{G10|【P10&{G9|(P9&G8)}】}]

P11_8=P11&P10&P9&P8

G15_12=G15|[P15&{G14|【P14&{G13|(P13&G12)}】}]

P15_12=P15&P14&P13&P12

G98=G9|(P9&G8)

P98=P9&P8

G13_12=G13|(P13&G12)

P13_12=P13&P12

G14_12=G14|（P14&【G13|(P13&G12)】）

P14_12=P14&P13&P12，

Wherein, the base of each of G30, G74, g11_8 and g15_12 is 4, and the circuit structure is complex. Strictly speaking, the base of each of P74, p11_8 and p15_12 is also 4, but the logic circuit structure is simpler because of their small number of inputs.

In addition, as shown in fig. 11, the carry network also includes a second stage. The critical paths of some of the nodes in the second level carry network are only exemplarily shown in fig. 11. Each node in the second level carry network may be configured to perform a logical operation as follows:

G70=G74|(P74&G30)

G11_0=G11_8|【P11_8&{G74|(P74&G30)}】

G15_0=G15_12|[P15_12&{G11_8|【P11_8&{G74| (P74&G30)}】}]

G40=G4|(P4&G30)

G50=G5|【P5&[G4|(P4&G30)]】

G60=G6|(P6&(G5|【P5&{G4|(P4&G30)}】))

G80=G8|【P8&[G74|(P74&G30)]】

G90=G98|【P98&[G74|(P74&G30)]】

G10_0=G10|(P10&(G98|【P98&{G74|(P74&G30)}】))

G12_0=G12|[P12&{G11_8|【P11_8&{G74|(P74&G30)}】}]

G13_0=G13_12|[P13_12&{G11_8|【P11_8&{G74| (P74&G30)}】}]

G14_0=G14_12|[P14_12&{G11_8|【P11_8&{G74| (P74&G30)}】}]，

Here, similarly, pxx_y or gxx_y represents an operation unit related to the input xx-yh bit or an operation (or an operation result) thereof. For example, here, p11_8 represents an operation (or an operation result) of an operation unit involving the 11 th to 8 th bits of input.

In addition, here, the basis of the cells g12_0, g13_0, g14_0, and g15_0 is 4, and their circuit structures are complicated.

Similar to the foregoing embodiments, the circuit simplifying process may be performed for the carry operation unit (e.g., G30, G74, etc.) of the base 4 in the carry network in the present embodiment, intentionally causing the operation result of the fault tolerant operation module to be error with respect to part of the input values of the plurality of inputs. In particular, errors may be made for some of the inputs values of the multiple inputs that involve a carry-handling unit (e.g., G30, G74, etc.) that is base four. Thereby, simplification of the circuit structure can be achieved. Furthermore, the chip operation is more robust for low voltage threshold conditions, since the circuit structure of the cell changes from 4-transistor series to 3-transistor or less. Furthermore, the use of such a circuit configuration with reduced transistor series while functionally outputting for more bits (e.g., 4 bits) allows the corresponding fault tolerant operation module to have less delay and improved overall performance. At the same time, the number of cells in the circuit can also be reduced.

In addition, similar to the foregoing embodiment, circuits such as a fault tolerant operation module or a carry network thereof or parts thereof may be provided as multi-stage circuits each having a base of a predetermined value (for example, 4 here) or less, so that simplification of the circuit structure can be achieved. Further, simplification of the circuit structure can be further achieved by omitting a part of the logic operations among the multi-stage logic operations performed by at least part of the units (e.g., carry operation units G30, G74, etc.).

Fig. 12 is a further view illustrating a fault tolerant operation module according to an exemplary embodiment of the present disclosure. In this example, the fault tolerant operation module receives a 16-bit input. Here, similarly, x or xx, y in Pxy or Gxy or pxx_y or gxx_y means the operation unit indicated by it or the bits from x or xx to y of which the operation involves input. For example, the operation unit P86 indicates that the operation P of the operation unit involves bits 8-6, and the operation G20 indicates that the operation G of the operation unit involves bits 2-0; this is so. In general, pxy/Gxy in the carry-handling unit may be present in pairs. In some cases, however, the operation units Pxy (e.g., P10, P20, P30, …, P70, etc.) that are not needed in the carry operation may not be shown for the sake of brevity. Further, the paths shown in fig. 12 indicate input-output relationships between units, for example, as indicated by arrows in the figure. For the sake of simplifying the description, the description of portions similar to the above-described embodiments will be omitted.

In fig. 12, the fault tolerant operation module according to this embodiment includes a pre-processing unit (P0/G0, P1/G1, …, P15/G15), a carry network, and a post-processing unit (S0, …, S15). For simplicity of illustration, only the critical paths of some nodes are exemplarily shown in fig. 12.

First, a preprocessing unit of the fault-tolerant operation module according to the embodiment will be described. The preprocessing units receive a first input A [15:0] and a second input B [15:0], wherein each preprocessing unit receives corresponding bits of the first input and the second input respectively. The preprocessing unit according to this embodiment is similar to that of the previous embodiment, and a detailed description thereof will be omitted. In the plurality of preprocessing units as shown in fig. 12, the following logical operations are performed for the corresponding bits of the first input and the second input:

G0=A0&B0

G1=A1&B1

G2=A2&B2

G3=A3&B3

G4=A4&B4

G5=A5&B5

G6=A6&B6

G7=A7&B7

G8=A8&B8

G9=A9&B9

G10=A10&B10

G11=A11&B11

G12=A12&B12

G13=A13&B13

G14=A14&B14

P0=A0^B0

P1=A1^B1

P2=A2^B2

P3=A3^B3

P4=A4^B4

P5=A5^B5

P6=A6^B6

P7=A7^B7

P8=A8^B8

P9=A9^B9

P10=A10^B10

P11=A11^B11

P12=A12^B12

P13=A13^B13

P14=A14^B14。

next, a post-processing unit of the fault-tolerant operation module according to the embodiment will be described. The post-processing unit according to this embodiment is similar to the post-processing unit of the foregoing embodiment, and a detailed description thereof will be omitted. In fig. 12, only part of the paths of the nodes of the post-processing unit are exemplarily shown. In the present embodiment, for each post-processing unit, the following logical operation is performed:

S0=P0

S1=P1^G0

S2=P2^G10

S3=P3^G20

S4=P4^G30

S5=P5^G40

S6=P6^G50

S7=P7^G60

S8=P8^G70

S9=P9^G80

S10=P10^G90

S11=P11^G10_0

S12=P12^G11_0

S13=P13^G12_0

S14=P14^G13_0

S15=P15^G14_0。

Next, the carry network according to this embodiment will be described in detail. In this embodiment, the base of the carry network is 3. Similar to that shown in fig. 7 or 11, pxy (e.g., P20, P80, p14_0, etc.) that is not needed in the carry operation is not shown for simplicity. As shown in fig. 12, the carry network includes 3 stages and thus has a 3 stage delay. Wherein the first stage carry network comprises 5 pairs of carry-handling units (P14_12/G14_12, P11_9/G11_9, P86/G86, P53/G53, P20 (if present, not shown)/G20), each of which has a base of 3. The first stage carry network further comprises 2 pairs of auxiliary nodes (P13_12/G13_12, P76/G76) which do not affect the number of calculation stages. Each node in the first stage carry network may be configured to perform a logical operation as follows:

G10=G1|(P1&G0)

G20=G2|【P2&{G1|(P1&G0)}】

G53=G5|【P5&{G4|(P4&G3)}】

P53=P5&P4&P3

G86=G8|【P8&{G7|(P7&G6)}】

P86=P8&P7&P6

G11_9=G11|【P11&{G10|(P10&G9)}】

P11_9=P11&P10&P9

G14_12=G14|【P14&{G13|(P13&G12)}】

P14_12=P14&P13&P12

G13_12=G13|(P13&G12)

P13_12=P13&P12

G76=G7|(P7&G6)

P76=P7&P6。

In addition, as shown in fig. 12, the carry network also includes a second stage. The critical paths of some of the nodes in the second level carry network are only exemplarily shown in fig. 12. Each node in the second level carry network may be configured to perform a logical operation as follows:

G30=G3|(P3&G20)

G40=G4|【P4&[G3|(P3&G20)]】

G50=G53|(P53&G20)

G60=G6|【P6&[G53|(P53&G20)]】

G70=G76|【P76&[G53|(P53&G20)]】

G80 =g86| [ P86 ]. Note that p14_9/g14_9 indicated by a dashed box in fig. 12 is not essential. If there is P14_9/G14_9, then P14_9/G14_9 performs the following logical operations, respectively:

G14_9=G14_12|(P14_12&G11_9)

P14_9=P14_12&P11_9。

Here, similarly, pxx_y or gxx_y represents an operation unit related to the input xx-yh bit or an operation (or an operation result) thereof. For example, here, p11_9 represents an operation (or an operation result) of an operation unit related to the 11 th to 9 th bits of input.

In addition, as shown in fig. 12, the carry network further includes a third stage. The critical paths of some of the nodes in the third stage carry network are only exemplarily shown in fig. 12. Each node in the third stage carry network may be configured to perform a logical operation as follows:

G90=G9|(P9&G8_0)

G10_0=G10|【P10&[G9|(P9&G80)]】

G11_0=G11_9|(P11_9&G8_0)

G12_0=G12|【P12&[G11_9|(P11_9&G80)]】

G13_0=G13_12|【P13_12&[G11_9|(P11_9&G80)]】。

note that when p14_9/g14_9 is not present in the second stage carry network, g14_0 performs the following logical operations as indicated by solid arrows in fig. 12:

G14_0=G14_12|【P14_12&[G11_9|(P11_9&G80)]】。

When p14_9/g14_9 is present in the second stage carry network, g14_0 performs the following logical operation (the path of node g14_0 in this case is not shown in fig. 12):

G14_0=G14_9|(P14_9&G80)。

Similar to the foregoing embodiments, for the fault tolerant operation module in this embodiment, simplification of the circuit structure can be achieved by reducing the base of the carry operation unit in the carry network. That is, circuits such as a fault-tolerant operation module or a carry network thereof or parts thereof may be provided as multi-stage circuits each of which has a base of a predetermined value or less (for example, 3 here), so that simplification of the circuit structure may be achieved. Further, simplification of the circuit structure can be further achieved by omitting a part of the logic operations among the multi-stage logic operations performed by at least part of the units (e.g., carry operation unit, etc.).

According to the fault tolerant operation module described in some of the above embodiments, the original carry operation unit may be replaced by using the reduced carry operation unit having an error rate of 1/32, so that simplification of the circuit structure is achieved while the calculation speed of the chip is improved. As the circuit structure (circuit area) is simplified (reduced), the corresponding power consumption is reduced. Therefore, the power consumption ratio (i.e., operation speed/power consumption) of the chip can be improved.

According to the fault-tolerant operation module described in some embodiments described above, simplification of the circuit structure can be achieved by reducing the basis of the carry operation unit in the carry network, while improving the calculation speed of the chip and reducing the power consumption. Therefore, the power consumption ratio of the chip can be improved.

Furthermore, the chip operation is more robust for low voltage threshold conditions, since the circuit structure of the cell changes from e.g. 4 or more transistor series to reduced transistor series. Furthermore, the use of such a circuit configuration with reduced transistor series while functionally outputting for more bits (e.g., 4 bits) allows the corresponding fault tolerant operation module to have less delay and improved overall performance. At the same time, the number of cells in the circuit can also be reduced.

In addition, while the principles of some embodiments of the present disclosure are described herein in terms of carry-fast adders, this is merely exemplary and the present disclosure is not limited thereto. For example, the principles of the present disclosure may also be applied to other logic operation components in a circuit.

Further, while a two-input embodiment is shown and described above, the present disclosure is not limited thereto. According to some embodiments of the disclosure, the plurality of inputs further includes a third input having the same number of bits as the first input and the second input. According to some embodiments of the disclosure, the plurality of inputs further includes a carry input.

Consider now certain application scenarios of embodiments of the present application, for example, when applied to the computation of the SHA256 algorithm, which can reduce the computational cost even more, since SHA256 itself can be fault tolerant to some extent. Consider, for example, a CH operation module in the SHA256 algorithm. CH (x, y, z) = (x AND y) XOR ((NOT x) AND z), where x, y, z represent input data, respectively, where XOR represents exclusive or, AND represents AND, NOT represents NOT. When y=z, the output result is the same regardless of the value of x, i.e., even if x is in error, the accuracy of calculation is not affected. For another example, considering MAJ (x, y, z) = (x AND y) XOR (x AND z) XOR (y AND z), when x AND y are the same, the output result is the same regardless of the value of z, i.e., even if z is in error, the accuracy of calculation is not affected. Therefore, when the fault tolerant module of the embodiment of the present application is employed, even if some of the output results have errors, when they are provided to the MAJ module or the CH module as input, the errors do not correspond to the accuracy of the CH or MAJ operation in some cases. Therefore, the influence on the final operation result is limited.

In addition, as previously described, in some applications, the upper module may also perform calculations on the result of the final operation that is valid. For example, when a fault tolerant logic circuit in the chip (which may be included in a core in the chip) experiences a calculation error, such as calculating an originally correct calculation result or not, the core feeds back the erroneous calculation result to the top module or control board. But since the top module or control board will check the calculation fed back by the cores in the chip, the wrong calculation will be discarded so that the correctness of the final operation of the chip will not be affected. For example, the control board may communicate with the chip and check the calculation of the chip.

Therefore, according to the technology of the application, the probability of obtaining the expected operation result is little, and the cost and the power consumption are obviously reduced, thereby greatly reducing the cost of calculation force and improving the ratio of calculation force to power consumption.

Those skilled in the art will recognize that the boundaries between the operations (or steps) described in the above embodiments are merely illustrative. The operations may be combined into a single operation, the single operation may be distributed among additional operations, and the operations may be performed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in other various embodiments. Other modifications, variations, and alternatives are also possible. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. The embodiments disclosed herein may be combined in any desired manner without departing from the spirit and scope of the present disclosure. Those skilled in the art will also appreciate that various modifications might be made to the embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims

1.A chip comprising fault tolerant logic, the fault tolerant logic comprising:

A plurality of registers for storing data;

The logic operation module is used for carrying out operation;

A fault tolerant operation module associated with the register and/or the logic operation module to receive a plurality of inputs, the plurality of inputs including a first input and a second input, the first input and the second input including the same number of bits, the fault tolerant operation module comprising:

A plurality of preprocessing units, each for receiving corresponding bits of the plurality of inputs for operation,

A carry network comprising a plurality of stages, each stage comprising at least one carry-handling unit, each of the carry-handling units being configured to receive a carry input, to perform an operation, and to generate a carry output, wherein the carry-handling units of a first stage are configured to receive an output from a corresponding pre-processing unit as a carry input, and

A plurality of post-processing units, at least a portion of which are configured to operate on the basis of corresponding carry outputs received from the carry network and inputs received from corresponding pre-processing units,

Wherein the fault tolerant operation module is configured to intentionally cause an error in an operation result of the fault tolerant operation module for a part of input values of the plurality of inputs.

2. The chip of claim 1, wherein a base of a carry operation unit in the carry network is four or less.

3. The chip of claim 1, wherein the carry network is configured such that the operation result of the fault tolerant operation module is subject to errors for a portion of the input values of the plurality of inputs.

4. The chip of claim 2, wherein the carry network base four carry operation units are configured such that the operation result of the fault tolerant operation module is error-prone to some of the plurality of inputs input values related to the base four carry operation units.

5. The chip of claim 2, wherein the carry-in network carry-in operation unit based on four is configured to omit a partial logic operation of a portion of its plurality of input values.

6. The chip of claim 1, wherein the multiple stages comprise n stages or less where n is a natural number, where the first and second inputs comprise bits of 2 ⁿ or more and 2 ⁿ⁺¹ or less.

7. The chip of claim 1, wherein the chip comprises a plurality of chips,

Wherein each of the preprocessing units includes:

And arithmetic unit, and/or

An OR operation or an exclusive OR operation unit,

Wherein each of the post-processing units comprises an exclusive-or operation unit.

8. The chip of claim 1, wherein the logic operation module is configured to obtain data from one or more of the plurality of registers to perform a logic operation thereon.

9. The chip of claim 1, wherein the logic operation module comprises a module adapted to perform one or more of the operations of Maj, ch, Σ0 and Σ1 as specified by the secure hash algorithm SHA-256,

The Maj, ch, Σ0 and Σ1 operations are as follows:

where x, y and z are operands, respectively.

10. The chip of claim 1, wherein the plurality of inputs further comprises a carry input.

11. The chip of claim 1, wherein the chip comprises a plurality of chips,

The plurality of inputs further includes a third input having the same number of bits as the first and second inputs.

12. The chip of claim 1, wherein the chip comprises a plurality of cores, each core comprising the fault tolerant logic, at least a portion of the plurality of cores being serially connected to each other.

13. The chip of claim 12, further comprising a top module in communication with the plurality of cores and checking the results of the computation of the plurality of cores.

14. A computing system comprising a chip according to any one of claims 1-13.

15. The computing system of claim 14, further comprising a control board in communication with the chip and performing a verification of the chip's results of the computation.