Disclosure of Invention
The embodiment of the invention aims to solve the technical problem that: a clock signal transmission method, a clock signal transmission device, a multiplexing chip and an electronic device are provided.
The clock signal transmission method provided by the embodiment of the invention comprises the following steps:
acquiring the forward transmission direction of data based on the transmission of the data from the first core computing unit to the last core computing unit in the multiplexing chip; the multiplexing chip comprises a plurality of core computing units, wherein output data of a previous core computing unit is used as input data of a next core computing unit;
inputting a clock signal from the last core computing unit of the multiplexing chip, and reversely transmitting the clock signal to the first core computing unit; the reverse direction transfer is opposite to the forward direction transfer.
In another embodiment of the foregoing method according to the present invention, the inversely transferring the clock signal from the last core computing unit to the first core computing unit of the multiplexing chip includes:
taking the last core computing unit as a current core computing unit, inputting a generated current clock signal into the current core computing unit and a group of cache units by a clock generator, wherein the group of cache units comprises at least one cache unit;
performing iteration, namely taking the clock signal processed by the cache unit as a current clock signal, and taking the last core calculation unit as a current core calculation unit; and inputting the current clock signal into the current core computing unit and a group of buffer units until the current core computing unit is the first core computing unit.
In another embodiment of the foregoing method according to the present invention, the method further includes: all the cache units form a clock tree.
In another embodiment based on the foregoing method of the present invention, a group of cache units in the clock tree corresponding to the current core computing unit is a first group of cache units, and a group of cache units corresponding to the previous core computing unit is a second group of cache units; the number of the cache units included in the second group of cache units exceeds the number of the cache units included in the first group of cache units by a preset number.
In another embodiment of the above method according to the invention, the clock tree is a trapezoidal clock tree with increasing size.
In another embodiment based on the foregoing method of the present invention, the cache unit repairs the received clock signal, and the clock signal is repaired to reach a preset standard and then transmitted to the current core computing unit and the group of cache units.
In another embodiment of the method according to the present invention, the core computing unit includes a plurality of basic computing units connected in series, and each of the basic computing units performs the same operation on the input data.
According to another aspect of the embodiments of the present invention, there is provided a clock signal transmission apparatus, including:
the direction obtaining unit is used for obtaining the forward transmission direction of the data based on the transmission of the data from the first core computing unit to the last core computing unit in the multiplexing chip; the multiplexing chip comprises a plurality of core computing units, wherein output data of a previous core computing unit is used as input data of a next core computing unit;
the clock transmission unit is used for inputting a clock signal from the last core calculation unit of the multiplexing chip and reversely transmitting the clock signal to the first core calculation unit; the reverse direction transfer is opposite to the forward direction transfer.
In another embodiment of the above apparatus according to the present invention, the clock transmission unit includes:
the signal transmission module is used for taking the last core calculation unit as a current core calculation unit, the clock generator inputs the generated current clock signal into the current core calculation unit and a group of cache units, and the group of cache units comprise at least one cache unit;
the iteration module is used for performing iteration, the clock signal processed by the cache unit is used as a current clock signal, and the last core calculation unit is used as a current core calculation unit; and inputting the current clock signal into the current core computing unit and a group of buffer units until the current core computing unit is the first core computing unit.
In another embodiment of the above apparatus according to the present invention, the clock transmission unit further includes: and the tree construction module is used for constructing all the cache units into a clock tree.
In another embodiment of the above apparatus according to the present invention, a group of cache units corresponding to the current core computing unit in the clock tree is a first group of cache units, and a group of cache units corresponding to the previous core computing unit is a second group of cache units; the number of the cache units included in the second group of cache units exceeds the number of the cache units included in the first group of cache units by a preset number.
In another embodiment of the above apparatus according to the present invention, the clock tree is a trapezoidal clock tree including a plurality of groups of buffer units.
In another embodiment of the above apparatus according to the present invention, the buffer unit is configured to repair the received clock signal, and transmit the repaired clock signal to the current core computing unit and the group of buffer units after the repaired clock signal meets a preset criterion.
In another embodiment of the above apparatus according to the present invention, the core computing unit includes a plurality of basic computing units connected in series, and each of the basic computing units performs the same operation on the input data.
According to another aspect of the embodiments of the present invention, there is provided a multiplexing chip, including:
a plurality of core computing units for receiving data and transmitting the data sequentially; the sequential transmission is from a first core computing unit to a last core computing unit; wherein, the output data of the last core computing unit is used as the input data of the next core computing unit;
and the clock tree is used for transmitting the clock signal in the direction opposite to the data flow direction in the core computing unit.
In another embodiment of the multiplexing chip according to the present invention, the clock tree includes a plurality of groups of buffer units, each group of buffer units includes at least one buffer unit, and the buffer units are configured to repair the received clock signal.
In another embodiment of the multiplexing chip according to the invention, the number of the buffer units included in the next group of buffer units in the clock tree exceeds the number of the buffer units included in the previous group of buffer units by a preset number.
According to another aspect of the embodiments of the present invention, there is provided an electronic device including the clock signal transfer apparatus as described above or the multiplexing chip as described above.
According to another aspect of the embodiments of the present invention, there is provided an electronic device including: a memory for storing executable instructions;
and a processor for communicating with the memory to execute the executable instructions to perform the operations of the clock signaling method of the multiplexing chip as described above.
Based on the clock signal transmission method, the clock signal transmission device, the multiplexing chip and the electronic device provided by the embodiments of the present invention, a forward transmission direction of data is obtained based on the transmission of the data from the first core computing unit to the last core computing unit in the multiplexing chip; the clock signal is input from the last core computing unit of the multiplexing chip and reversely transmitted to the first core computing unit, and the clock signal is transmitted in the direction opposite to the data flow direction, so that the time sequence check of adjacent operation cores naturally meets the requirement, an additional cache unit is not required to be added, and a large amount of chip area and power consumption are saved.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the computer system/server include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set top boxes, programmable consumer electronics, network pcs, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.
The computer system/server may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
In the prior art, the data flow direction of the traditional clock tree and the clock tree growth direction are the same (from the first core computing unit to the last core computing unit), the clock tree growing to the next core computing unit is larger and longer than the clock tree of the previous core computing unit, and a large number of buffers have to be inserted to avoid serious timing violations.
FIG. 1 is a flowchart illustrating a clock signal transmission method according to an embodiment of the present invention. As shown in fig. 1, the method of this embodiment includes:
step 101, obtaining the forward transmission direction of data based on the transmission of data from the first core computing unit to the last core computing unit in the multiplexing chip.
The multiplexing chip comprises a plurality of core computing units, wherein output data of a previous core computing unit is used as input data of a next core computing unit;
102, inputting a clock signal from the last core computing unit of the multiplexing chip, and reversely transmitting the clock signal to the first core computing unit; reverse direction transfers are opposite to forward direction transfers.
Based on the clock signal transmission method provided by the above embodiment of the present invention, a forward transmission direction of data is obtained based on the transmission of data from a first core computing unit to a last core computing unit in a multiplexing chip; the clock signal is input from the last core computing unit of the multiplexing chip and reversely transmitted to the first core computing unit, and the clock signal is transmitted in the direction opposite to the data flow direction, so that the time sequence check of adjacent operation cores naturally meets the requirement, an additional cache unit is not required to be added, and a large amount of chip area and power consumption are saved.
All clock tree structures are in the operation core, and space and resources of a top-level design are not occupied.
Both the branching portion of the clock tree and the on-chip skew cost are determined. The timing problem can be solved in advance inside the operation. The top layer does not need to do extra work.
The clock tree length difference near the operation core is small, and the debit time sequence is favorably met. The difference in the length of the clock tree is equal to the height of each step of the ladder structure.
The clock tree structure of the inverse data flow enables the time sequence checking of the adjacent operation core to be naturally satisfied, and a large amount of chip area and power consumption can be saved.
In another embodiment of the clock signal transmission method according to the present invention, based on the above embodiments, operation 102 includes:
taking the last core computing unit as a current core computing unit, inputting a generated current clock signal into the current core computing unit and a group of cache units by a clock generator, wherein the group of cache units comprises at least one cache unit;
performing iteration, namely taking the clock signal processed by the cache unit as a current clock signal, and taking the last core calculation unit as a current core calculation unit; and inputting the current clock signal into the current core computing unit and a group of buffer units until the current core computing unit is the first core computing unit.
The clock signal is generated by a clock generator on the mainboard, the clock signal is transmitted forward from the last core computing unit in sequence, a clock tree for transmitting the clock signal is established based on the transmitted cache unit, and compared with the clock tree transmitted forward in the prior art, the growth direction of a reverse clock tree is opposite to the data stream inflow direction when the reverse clock tree grows, so that serious timing violation can not occur, and a buffer is not required to be inserted for repairing the timing violation; FIG. 2 is a schematic diagram of timing check between core compute units. As shown in fig. 2, there is a very complicated timing check between each core computing unit, specifically: signals need to be propagated from one time sequence unit to another time sequence unit, time sequence inspection requires that the arrival time 1+ the arrival time 2 of the signals is more than the arrival time 3, a chip can normally operate, if a clock tree and a data stream both flow in the forward direction, the time for the clock tree and the data stream to arrive at the [ time sequence unit 21 ] is short, the time for the clock tree and the data stream to arrive at the [ time sequence unit 22 ] is long, a buffer has to be inserted at the time sequence unit 22 in order to avoid serious time sequence violation, the time spent on the arrival time 1+ the arrival time 2 is more than the arrival time 3, and when the data stream flows in the reverse direction, the time for the clock tree to arrive at the [ time sequence unit 22 ] is short, the time for the clock tree and the data stream to arrive at the [ time sequence unit 21 ] is long, and the arrival time 1+ the; the inverse data flow trapezoidal clock tree saves resources, reduces cost and meets the requirement of time sequence inspection.
In a specific example of the above embodiments of the clock signal transfer method of the present invention, all the buffer units form a clock tree.
The clock tree is a mesh structure which is built by a plurality of buffer cell buffer units in a balanced mode, and has a source point which is generally a clock input port clock input end and then is built by buffer units of one stage and one stage, wherein the specific stages are determined according to your setting and used units, and the purpose is to enable the clock skew (generally most concerned), interrupt delay insertion delay and transition of a used terminal point to meet design requirements.
In a specific example of the above embodiments of the clock signal transmission method of the present invention, a group of cache units corresponding to a current core computing unit in a clock tree is a first group of cache units, and a group of cache units corresponding to a previous core computing unit is a second group of cache units; the number of the buffer units included in the second group of buffer units exceeds the number of the buffer units included in the first group of buffer units by a preset number.
In this embodiment, the clock tree is reversely transferred, and a certain number of buffer units are correspondingly added every time a core computing unit is transferred, and the added buffer units are added into the clock tree to enlarge the clock tree, so that the timing inspection can be satisfied without adding a buffer.
In a specific example of the above embodiments of the clock signal transfer method of the present invention, the clock tree is a trapezoidal clock tree that gradually increases.
In the embodiment, one trapezoid in the trapezoid clock tree represents one core, so that resources are saved, the chip cost is reduced, and meanwhile, the set-up time check and the hold time check in the time sequence check are met.
In another embodiment of the clock signal transmission method according to the present invention, based on the above embodiments, the buffer unit repairs the received clock signal, and transmits the repaired clock signal to the current core computing unit and the group of buffer units after the repaired clock signal reaches the preset standard.
In this embodiment, the buffer unit has no functional function, and is only responsible for repairing signals and transmitting the signals with high quality without changing any function; specifically, the distorted signal may be patched by a filter, and the patched signal may be transmitted to a previous core computing unit, for example, the distorted square wave may be patched into a square wave again for transmission.
In another embodiment of the clock signal transmission method according to the present invention, based on the above embodiments, the core computing unit includes a plurality of basic computing units connected in series, and each basic computing unit performs the same operation on the input data.
In this embodiment, since many chips have complex functions, a large number of basic computing units need to be integrated in the chips, and in order to improve the processing efficiency of the chips, this embodiment proposes that a core computing unit is formed by a plurality of basic computing units, and adjacent core computing units are tightly attached to each other.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
FIG. 3 is a schematic structural diagram of a clock signal transmission apparatus according to an embodiment of the present invention. The apparatus of this embodiment may be used to implement the method embodiments of the present invention described above. As shown in fig. 3, the apparatus of this embodiment includes:
a direction obtaining unit 31, configured to obtain a forward transfer direction of the data based on transfer of the data from the first core computing unit to the last core computing unit in the multiplexing chip.
The multiplexing chip comprises a plurality of core computing units, wherein output data of a previous core computing unit is used as input data of a next core computing unit;
the clock transmission unit 32 is used for inputting a clock signal from the last core calculation unit of the multiplexing chip and reversely transmitting the clock signal to the first core calculation unit; reverse direction transfers are opposite to forward direction transfers.
Based on the clock signal transmission method provided by the above embodiment of the present invention, data flows from the first core computing unit to the last core computing unit in the multiplexing chip; the clock signal is transmitted from the last core computing unit of the multiplexing chip to the first core computing unit, and the clock signal is transmitted in the direction opposite to the data flow direction, so that the time sequence checking of adjacent computing cores naturally meets the requirement, an additional cache unit is not required to be added, and a large amount of chip area and power consumption are saved.
In another embodiment of the clock signal transmission apparatus of the present invention, based on the above embodiments, the clock transmission unit 32 includes:
the signal transmission module is used for taking the last core calculation unit as a current core calculation unit, the clock generator inputs the generated current clock signal into the current core calculation unit and a group of cache units, and the group of cache units comprise at least one cache unit;
the iteration module is used for performing iteration, the clock signal processed by the cache unit is used as a current clock signal, and the last core calculation unit is used as a current core calculation unit; and inputting the current clock signal into the current core computing unit and a group of buffer units until the current core computing unit is the first core computing unit.
The clock signal is generated by a clock generator on the mainboard, the clock signal is transmitted forward from the last core computing unit in sequence, a clock tree for transmitting the clock signal is established based on the transmitted cache unit, and compared with the clock tree transmitted forward in the prior art, the growth direction of a reverse clock tree is opposite to the data stream inflow direction when the reverse clock tree grows, so that serious timing violation can not occur, and a buffer is not required to be inserted for repairing the timing violation; the inverse data flow trapezoidal clock tree saves resources, reduces cost and meets the requirement of time sequence inspection.
In a specific example of the above embodiments of the clock signal transmission apparatus of the present invention, the clock transmission unit 32 further includes: and the tree construction module is used for constructing all the cache units into a clock tree.
In a specific example of each of the above embodiments of the clock signal transmitting apparatus of the present invention, a group of cache units corresponding to a current core computing unit in a clock tree is a first group of cache units, and a group of cache units corresponding to a previous core computing unit is a second group of cache units; the number of the buffer units included in the second group of buffer units exceeds the number of the buffer units included in the first group of buffer units by a preset number.
In a specific example of the above embodiments of the clock signal transmission apparatus of the present invention, the clock tree is a trapezoidal clock tree including a plurality of groups of buffer units.
In another embodiment of the clock signal transmission apparatus of the present invention, based on the above embodiments, the buffer unit is configured to repair the received clock signal, and transmit the repaired clock signal to the current core computing unit and the group of buffer units after the repaired clock signal reaches the preset standard.
In this embodiment, the buffer unit has no functional function, and is only responsible for repairing signals and transmitting the signals with high quality without changing any function; specifically, the distorted signal may be patched by a filter, and the patched signal may be transmitted to a previous core computing unit, for example, the distorted square wave may be patched into a square wave again for transmission.
In another embodiment of the clock signal transmission apparatus of the present invention, based on the above embodiments, the core computing unit includes a plurality of basic computing units connected in series, and each basic computing unit performs the same operation on the input data.
In this embodiment, since many chips have complex functions, a large number of basic computing units need to be integrated in the chips, and in order to improve the processing efficiency of the chips, this embodiment proposes that a core computing unit is formed by a plurality of basic computing units, and adjacent core computing units are tightly attached to each other.
In another aspect of the embodiments of the present invention, an embodiment of a multiplexing chip is provided, including:
a plurality of core computing units for receiving data and transmitting the data sequentially; the sequential transmission is from a first core computing unit to a last core computing unit; wherein, the output data of the last core computing unit is used as the input data of the next core computing unit;
and the clock tree is used for transmitting the clock signal in the direction opposite to the data flow direction in the core computing unit.
Fig. 4 is a schematic diagram of a data flow of a specific example of the multiplexing chip of the present invention. As shown in FIG. 4, the data flow is forward from the left into the plurality of core compute units, while the clock signal is backward from the right into the plurality of core compute units.
In a specific example of the foregoing embodiments of the multiplexing chip of the present invention, the clock tree includes a plurality of groups of buffer units, each group of buffer units includes at least one buffer unit, and the buffer units are configured to repair the received clock signal.
In a specific example of the foregoing embodiments of the multiplexing chip of the present invention, the number of the buffer units included in the next group of buffer units in the clock tree exceeds the number of the buffer units included in the previous group of buffer units by a preset number.
In another aspect of the embodiments of the present invention, an embodiment of an electronic device is provided, which includes any one of the above embodiments of the clock signal transfer apparatus of the present invention or any one of the above embodiments of the multiplexing chip of the present invention.
In another aspect of an embodiment of the present invention, another embodiment of an electronic device is provided, including: a memory for storing executable instructions;
and a processor in communication with the memory for executing the executable instructions to perform the operations of any of the above embodiments of the clock signal delivery method of the present invention.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The method and apparatus of the present invention may be implemented in a number of ways. For example, the methods and apparatus of the present invention may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustrative purposes only, and the steps of the method of the present invention are not limited to the order specifically described above unless specifically indicated otherwise. Furthermore, in some embodiments, the present invention may also be embodied as a program recorded in a recording medium, the program including machine-readable instructions for implementing a method according to the present invention. Thus, the present invention also covers a recording medium storing a program for executing the method according to the present invention. The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.