CN112948322B - Virtual channel based on elastic cache and implementation method - Google Patents

Virtual channel based on elastic cache and implementation method Download PDF

Info

Publication number
CN112948322B
CN112948322B CN202110218606.2A CN202110218606A CN112948322B CN 112948322 B CN112948322 B CN 112948322B CN 202110218606 A CN202110218606 A CN 202110218606A CN 112948322 B CN112948322 B CN 112948322B
Authority
CN
China
Prior art keywords
channel
write
data
clock
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110218606.2A
Other languages
Chinese (zh)
Other versions
CN112948322A (en
Inventor
于飞
尹莉
巨新刚
贾一鸣
刘彩苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Microelectronics Technology Institute
Original Assignee
Xian Microelectronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Microelectronics Technology Institute filed Critical Xian Microelectronics Technology Institute
Priority to CN202110218606.2A priority Critical patent/CN112948322B/en
Publication of CN112948322A publication Critical patent/CN112948322A/en
Application granted granted Critical
Publication of CN112948322B publication Critical patent/CN112948322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0866Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches for peripheral storage systems, e.g. disk cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Communication Control (AREA)

Abstract

The invention discloses a virtual channel based on elastic cache and an implementation method thereof, which can realize efficient and flexible transmission of data among nodes in an on-chip asynchronous interconnection structure, on one hand, a data transmitting end and a receiving end can effectively reduce the number of interconnection signals among the nodes in a source synchronous mode, and the realization of a rear-end layout is facilitated; on the other hand, the switching of the depth of the double-channel FIFO between 1 and N is realized through an elastic buffer dynamic adjustment mechanism, so that buffer resources are greatly reduced, and the buffer space utilization rate is improved. The invention can make up the defect of the traditional virtual channel mechanism in terms of high cache resource and difficult realization of the back end.

Description

Virtual channel based on elastic cache and implementation method
Technical Field
The invention belongs to the technical field of digital circuits, and particularly relates to a virtual channel based on elastic buffer memory and an implementation method thereof.
Background
In an on-chip interconnection network system facing heterogeneous multi-core/many-core processors, efficient transmission of data between network nodes through asynchronous interconnection is the most commonly used solution. The interconnection and communication between the source node and the destination node of the data transmission often involve two types of command data packets and reply data packets, in which case, if a single channel transmission path is adopted, data blocking will be generated, thereby causing deadlock. Therefore, in the conventional design, the transmission path between the source node and the destination node is split into a two-channel mode of one command channel and one reply channel, the command channel is responsible for transmitting the write command, the write data and the read command, the reply channel is responsible for transmitting the read data and the write response, and each channel is a unidirectional transmission channel. In the actual data transmission process, the usage rate conditions of the command channel and the reply channel are often different, and even have large differences. By using the method of the two-way real channel, a group of cache spaces with the same size are maintained on the command channel and the reply channel respectively, and the problem of data blockage caused by a single channel can be avoided, but the great waste of cache resources is definitely caused; meanwhile, with the increase of the interactive data lines among the nodes, certain trouble is brought to the layout realization of the rear end. While the traditional virtual channel design structure can effectively reduce the number of the transmission channel interconnecting lines by maintaining one real channel among nodes, thereby achieving the aim of easy layout realization; however, since two sets of buffer spaces with the same size are still maintained inside the transmission node, the problem of high resource occupation is unavoidable. In order to effectively solve the problems faced by data transmission in an on-chip interconnection system, a handshake mechanism based on a source synchronous distributed asynchronous FIFO is introduced on the conventional virtual channel method, and meanwhile, dynamic allocation is carried out on a virtual double-channel cache space in a network node, so that the difficulty in realizing a rear-end layout can be reduced, the effective utilization rate of the cache space can be greatly improved, and resource waste is avoided.
Disclosure of Invention
The invention aims to overcome the defects, and provides a virtual channel based on elastic cache and an implementation method thereof, which can realize efficient and flexible data transmission between nodes based on an on-chip asynchronous interconnection structure in a multi-core system, reduce the difficulty of layout implementation and are easy to design and expand.
In order to achieve the above purpose, a virtual channel implementation method based on elastic buffer memory includes the following steps:
the data transmitting end generates respective write operation and maintains write pointers based on the local write clock according to the non-full signals of the channel 0 and the channel 1 and the write request respectively;
outputting the write data signals after corresponding write enabling and arbitration to a data receiving end in a source synchronous mode, and synchronously outputting a write clock shared by channels and a data channel priority occupation mark signal to the data receiving end;
the data transmitting end maintains the read pointers of the respective channels according to the read enabling signals of the channel 0 and the channel 1 which are input by the source synchronous clock and the read clock signal shared by the channels, and synchronizes the read pointers to the local write clock domain to maintain the full/non-full signals of the channel 0 and the channel 1;
the data receiving end generates respective read operation and maintenance read pointers based on the local read clock according to the non-empty signals and the read requests of the channel 0 and the channel 1 respectively;
outputting corresponding read enabling signals and channel shared read clock signals to a data transmitting end in a source synchronous mode;
the data receiving end maintains respective write pointers and write clock domains according to write clocks shared by write enable signals of the channel 0 and the channel 1 which are input by source synchronization and the channel, data channel priority occupation mark signals and arbitrated write data, the data of the channel which obtains the priority occupation of the data transmission channel is written into the FIFO space with the depth of N, the data of the channel which does not obtain the priority occupation of the data transmission channel is written into the FIFO space with the depth of 1, and the write pointers of the channels are synchronized to the position under the local read clock domain, so that the empty/non-empty signals of the channel 0 and the channel 1 are maintained.
All the read pointers and write pointers of the data transmitting end and the receiving end are encoded in a Gray code mode.
And the source synchronous output control of the data transmitting end adopts a positive edge register to output write data after the write enabling and arbitration of the channel 0 and the channel 1, adopts a counter clock gating to output a write clock, reserves setup/hold margins of each half period for source synchronous input sampling of the data receiving end, and reduces clock and register overturning in the absence of data transmission.
And (3) performing source synchronous input control on a data transmitting end, performing register input on the read enabling of the channel 0 and the channel 1 based on the rising edge of a source synchronous input read clock, maintaining the read pointer based on the falling edge of the source synchronous input read clock, and performing two-stage synchronization of negative edge first and positive edge second on the read pointer of the channel 0 and the channel 1 based on a local write clock domain.
And the source synchronous output control of the data receiving end adopts positive edge register output for the reading enabling of the channel 0 and the channel 1, adopts inverse clock gating output for the reading clock, reserves setup/hold margins of each half period for source synchronous input sampling of the data transmitting end, and reduces clock and register overturn in no data transmission.
And (3) performing source synchronous input control on a data transmitting end, performing register input on write enabling of a channel 0 and a channel 1 and arbitrated write data and a data channel priority occupation mark signal based on an input write clock rising edge of source synchronization, maintaining cache writing of a write pointer and data based on a source synchronous input write clock falling edge, and performing two-stage synchronization of first negative edge and then positive edge on the write pointer of the channel 0 and the channel 1 and the data channel priority occupation mark signal based on a local read clock domain.
The buffer space depth of the channel 0 is N, and the buffer space depth of the channel 1 is 1;
when the channel 0 and the channel 1 generate data transmission requests at the same time, the priority of the channel 0 is higher than that of the channel 1;
the buffer memory space of the channel 0 and the buffer memory space of the channel 1 are in an empty state, meanwhile, the channel 1 initiates a data transmission request before the channel 0, at the moment, the buffer memory space depth of the channel 1 is switched to N, the buffer memory space depth of the channel 0 is switched to 1, and when the channel 0 and the channel 1 simultaneously generate a data transmission request, the priority of the channel 1 is higher than that of the channel 0;
the buffer space of the channel 0 and the buffer space of the channel 1 are in an empty state, meanwhile, the channel 0 initiates a data transmission request before the channel 1, at this time, the buffer space depth of the channel 0 is switched to N, the buffer space depth of the channel 1 is switched to 1, and when the channel 0 and the channel 1 generate data transmission requests at the same time, the priority of the channel 0 is higher than that of the channel 1.
A virtual channel based on elastic buffer comprises an independent two-channel FIFO writing module and a two-channel FIFO reading module, wherein the two-channel FIFO writing module and the two-channel FIFO reading module respectively maintain two sets of read or write control logic of the FIFO;
the dual-channel FIFO writing module comprises a pair internal input interface signal, wherein the pair internal input interface signal comprises a write clock WClk, two paths of write data WData0 and WData1 and two paths of write enabling Wen0 and Wen1, and the pair internal output interface signal mainly comprises two paths of Full marks Full0 and Full1; the two-channel FIFO writing module comprises an external output interface signal which mainly comprises a write output clock oClk, an output data channel priority occupation mark oSwitch, write output data oWData after arbitration and two write output enabling oWen0 and oWen1, and an external input interface signal which mainly comprises a read input clock iClk and two read input enabling iRen0 and iRen1;
the dual-channel FIFO reading module comprises a pair of internal input interface signals, wherein the pair of internal input interface signals comprises a read clock RClk, two paths of read enabling Ren0 and Ren1, and the pair of internal output interface signals comprises two paths of read data RData0 and RData1 and two paths of Empty marks Empty0 and Empty1; the two-channel FIFO reading module comprises an external output interface signal, wherein the external output interface signal comprises a read output clock oClk, two read output enabling oRen0 and oRen1, and the external input interface signal comprises a write input clock iClk, an input data channel priority occupation mark iSwitch, write input data after arbitration and two write input enabling iWen0 and iWen1.
The dual-channel FIFO reading module is provided with a buffer space with the depth of N+1, the depth parameter N supports parameterized configuration, and the buffer space with the depth of N+1 dynamically distributes two buffers with the depth of N and the depth of 1 to the channel 0 and the channel 1 according to the iSwitch indication signal and the Empty mark signals of the Empty mark 0 and the Empty mark 1.
In the two-channel FIFO write module, the write output clock oClk is the inverse clock gating output of the local write clock WClk, and the gating signal is write output enable oWen0 or oWen1; write output enables oWen0 and oWen1, and the arbitrated write data oWData is the register output of Wen0, wen1, oWData0 or oWData1 based on the write clock domain respectively;
in the double-channel FIFO read module, the read output clock oClk is the inverse clock gating output of the local read clock RClk, and the gating signal is the read output enable oRen0 or oRen1; the read output enabling oRen0 and oRen1 respectively read the register output of the enabling Ren0 and Ren1 based on the read clock domain;
between the two-channel FIFO writing module and the two-channel FIFO reading module, a writing output clock oClk is connected with a writing input clock iClk, writing output enabling oWen0 and oWen1 are respectively connected with writing input enabling iWen0 and iWen1, a data channel priority occupation mark oSwitch is connected with iSwitch, writing data oWData after arbitration is connected with iWData, a reading output clock oClk is connected with a reading input clock iClk, and reading output enabling oRen0 and oRen1 are respectively connected with reading input enabling iRen0 and iRen1.
Compared with the prior art, the method and the device can realize the efficient and flexible transmission of the data among the nodes in the on-chip asynchronous interconnection structure, on one hand, the data transmitting end and the receiving end can effectively reduce the number of interconnection signals among the nodes in a source synchronous mode, and the method and the device are beneficial to the realization of the rear-end layout; on the other hand, the switching of the depth of the double-channel FIFO between 1 and N is realized through an elastic buffer dynamic adjustment mechanism, so that buffer resources are greatly reduced, and the buffer space utilization rate is improved. The invention can make up the defect of the traditional virtual channel mechanism in terms of high cache resource and difficult realization of the back end.
In the method, in the nodes of the on-chip asynchronous interconnection structure, the data transmitting end and the data receiving end instantiate asynchronous FIFO write modules, respectively and simultaneously maintain two sets of independent FIFO write control logic, and a set of data transmission paths are shared between the data transmitting end and the receiving end, so that the number of interconnection lines among the nodes is reduced; the elastic buffer space exists at one side of the data receiving end and is shared by the two channels, so that the occupation of the buffer space is reduced; the priority occupation of the data transmission path is initially assigned to CH0, and the priority occupation of the data transmission path can be switched only when the two-channel FIFO spaces (two FIFO spaces with depth 1 and depth N) are empty. At this time, the channel which preferentially initiates the writing operation obtains the FIFO buffer space with depth N and has the priority occupation of the transmission path. The other channel which does not initiate writing operation only obtains the FIFO buffer space with depth of 1, so that the transmission efficiency is reduced, and the data sent under the FIFO mechanism is ensured not to be lost. This realizes dynamic switching of the dual channel FIFO depth between 1 and N, making data transfer more efficient and flexible.
Drawings
FIG. 1 is a system block diagram of the present invention;
FIG. 2 is a timing diagram of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, in order to be suitable for an on-chip interconnection architecture for heterogeneous multi-core/many-core processors, high-efficiency transmission between a source node and a destination node is realized, and meanwhile, difficulty in realization of a rear-end layout is reduced, a virtual channel structure based on elastic cache is designed, as shown in fig. 1.
The virtual channel structure based on elastic buffer memory is realized by a source synchronous distributed asynchronous FIFO, and is composed of an independent double-channel FIFO writing module and a double-channel FIFO reading module, wherein the reading and writing modules respectively maintain 2 sets of FIFO read or write control logic, and the read or write control logic comprises the following steps: the input interface signals in the double-channel FIFO writing module mainly comprise 1-path write clock WClk, 2-path write data WData0 and WData1, 2-path write enable Wen0 and Wen1, and the output interface signals in the double-channel FIFO writing module mainly comprise 2-path Full marks Full0 and Full1; the external output interface signals mainly comprise 1-path write output clock oClk, 1-path output data channel priority occupation mark oSwitch, 1-path arbitrated write output data oWData, 2-path write output enabling oWen0 and oWen1, and the external input interface signals mainly comprise 1-path read input clock iClk, 2-path read input enabling iRen0 and iRen1. The dual-channel FIFO reading module comprises an internal input interface signal which mainly comprises 1-path reading clock RClk, 2-path reading enabling Ren0 and Ren1, and an internal output interface signal which mainly comprises 2-path reading data RData0 and RData1 and 2-path Empty marks Empty0 and Empty1; the external output interface signals mainly comprise 1-path read output clock oClk, 2-path read output enabling oRen0 and oRen1, and the external input interface signals mainly comprise 1-path write input clock iClk, 1-path input data channel priority occupation mark iSwitch, 1-path arbitrated write input data, 2-path write input enabling iWen0 and iWen1.
The buffer space with depth of n+1 in the structure exists in the two-channel FIFO read module, and the depth parameter N supports parameterized configuration. The buffer space dynamically allocates two blocks of buffers with depth N and depth 1 to channel 0 and channel 1 according to the iSwitch indication signal and Empty flag signals of Empty0 and Empty 1. If the cache space with depth of n+1 is split into small cache stripes marked 0, 1 and 2 … N (as shown in fig. 1), then during dynamic adjustment allocation, the small cache stripes marked 0 are fixedly attributed to channel 0, the small cache stripes marked N are fixedly attributed to channel 1, and the remaining small cache stripes marked 1-N-1 are dynamically attributed to channel 0 or channel 1 under the allocation principle.
Virtual channel design using elastic buffer mechanism is realized based on source synchronous asynchronous FIFO structure. For the dual-channel FIFO write module, the write output clock oClk is the inverse clock gating output of the local write clock WClk, and the gating signal is write output enable oWen0 or oWen1; the write output enables oWen0, oWen1, and the arbitrated write data oddata is the register output of Wen0, wen1, oddata 0, or oddata 1, respectively, based on the write clock domain. Similarly, for the dual-channel FIFO read module, the read output clock oClk is the inverse clock gating output of the local read clock RClk, and the gating signal is the read output enable oRen0 or oRen1; the read output enables oRen0, oRen1 are based on the register outputs of the read clock domain for the read enable Ren0, ren1, respectively. Between the read-write modules, the write output clock oClk is connected with the write input clock iClk, the write output enabling oWen0 and oWen1 are respectively connected with the write input enabling iWen0 and iWen1, the data channel priority occupation mark oSwtch is connected with the iSwitch, and the write data oWData after arbitration is connected with the iWData; the read output clock oClk is connected to the read input clock iClk, and the read output enable oRen0 and oRen1 are connected to the read input enable iRen0 and iRen1, respectively. There is no source synchronization signal between the modules that requires bi-directional interaction.
In the node of the on-chip asynchronous interconnection structure, the data transmitting end instantiates an asynchronous FIFO write module and simultaneously maintains two independent FIFO write control logics of CH0 and CH 1; the data receiving end instantiates an asynchronous FIFO read module and simultaneously maintains two independent FIFO read control logics of CH0 and CH 1; a set of data transmission paths are shared between the data transmitting end and the receiving end, so that the number of interconnection lines between nodes is reduced; the elastic buffer space exists at one side of the data receiving end and is shared by two channels CH0 and CH1, so that the occupation of the buffer space is reduced; the priority occupation of the data transmission path is initially assigned to CH0, and the priority occupation of the data transmission path can be switched only when the two-channel FIFO spaces (two FIFO spaces with depth 1 and depth N) are empty. At this time, the channel which preferentially initiates the writing operation obtains the FIFO buffer space with depth N and has the priority occupation of the transmission path. The other channel which does not initiate writing operation only obtains the FIFO buffer space with depth of 1, so that the transmission efficiency is reduced, and the data sent under the FIFO mechanism is ensured not to be lost. This realizes dynamic switching of the dual channel FIFO depth between 1 and N, making data transfer more efficient and flexible.
The data transmitting end generates respective write operation and maintenance write pointers based on a local clock (write clock) according to the non-full signals of the channel 0 and the channel 1 and the write request, and simultaneously outputs corresponding write-enabled and arbitrated write data signals to the data receiving end in a source synchronous mode, and synchronously outputs the shared write clock of the channel and the data channel priority occupation mark signal to the data receiving end. Meanwhile, the data transmitting end maintains the read pointers (read clock domains) of the respective channels according to the read enabling signals of the channel 0 and the channel 1 and the read clock signals shared by the channels and synchronizes the read pointers to the local write clock domain, and maintains the full/non-full signals of the channel 0 and the channel 1.
The data receiving end generates respective read operation and maintenance read pointers based on a local clock (read clock) according to the non-empty signals of the channel 0 and the channel 1 and the read request respectively; and simultaneously, outputting corresponding read enabling signals and channel shared read clock signals to a data transmitting end in a source synchronization mode. Meanwhile, the data receiving end maintains respective write pointers and buffer spaces (write clock domains) according to write clocks shared by the write enabling signals of the channel 0 and the channel 1 and the channel which are input by the source synchronization, the data channel priority occupation mark signals and the arbitrated write data, the data of the channel which obtains the priority occupation of the data transmission channel is written into the FIFO space with the depth of N, and the data of the channel which does not obtain the priority occupation of the data transmission channel is written into the FIFO space with the depth of 1. The respective channel synchronizes its write pointer to the local read clock domain, maintaining the null/non-null signals of channel 0, channel 1.
All read-write pointers of the data transmitting end and the receiving end are coded in a Gray code mode.
The virtual channel design structure supporting the dynamic adjustment of the buffer space is realized based on the source synchronous distributed asynchronous FIFO. Therefore, in order to improve the transmission efficiency of the asynchronous FIFO and reduce the difficulty of the realization of the back end, the invention optimizes the clocks of the data transmitting end and the receiving end:
the source synchronous output control of the data transmitting end adopts a positive edge register to output write data after the write enabling and arbitration of the channel 0 and the channel 1, adopts a counter clock gating to output a write clock, reserves setup/hold margins of each half period for source synchronous input sampling of the data receiving end, reduces clock and register turning without data transmission, and reduces dynamic power consumption; the method comprises the steps of performing source synchronous input control on a data sending end, performing register input on read enabling of a channel 0 and a channel 1 based on a rising edge of a source synchronous input read clock, and maintaining a Gray code read pointer based on a falling edge of the source synchronous input read clock; and simultaneously, the read pointers of the channel 0 and the channel 1 are subjected to two-stage synchronization of negative edge first and positive edge second based on the local write clock domain, so that metastable state is eliminated, and synchronization period is accelerated.
The source synchronous output control of the data receiving end adopts positive edge register output for the reading enabling of the channel 0 and the channel 1, adopts inverse clock gating output for the reading clock, reserves setup/hold margins of each half period for source synchronous input sampling of the data transmitting end, reduces clock and register overturn without data transmission, and reduces dynamic power consumption; the method comprises the steps that source synchronous input control of a data sending end carries out register input on write enabling of a channel 0 and a channel 1, writing data after arbitration and a data channel priority occupation mark signal based on an input write clock rising edge of source synchronization, and cache writing of Gray code write pointers and data is maintained based on a source synchronous input write clock falling edge; and simultaneously, carrying out two-stage synchronization of negative edge first and positive edge second on write pointers of the channel 0 and the channel 1 and the data channel priority occupation mark signal based on the local read clock domain, and eliminating metastable state and accelerating synchronization period.
In order to reduce the occupied resources of the cache and further improve the utilization rate of the cache space, thereby realizing flexible transmission of data, the invention designs a dynamic allocation mechanism of the cache space, and the control logic of the mechanism is responsible for generating by a data sending end:
the default channel 0 has a buffer space depth of N, the channel 1 has a buffer space depth of 1, and when the channel 0 and the channel 1 simultaneously generate a data transmission request, the channel 0 has a higher priority than the channel 1; the buffer memory space of the channel 0 and the buffer memory space of the channel 1 are in an empty state, meanwhile, the channel 1 initiates a data transmission request before the channel 0, at the moment, the buffer memory space depth of the channel 1 is switched to N, the buffer memory space depth of the channel 0 is switched to 1, and when the channel 0 and the channel 1 simultaneously generate a data transmission request, the priority of the channel 1 is higher than that of the channel 0; the buffer space of the channel 0 and the buffer space of the channel 1 are in an empty state, meanwhile, the channel 0 initiates a data transmission request before the channel 1, at this time, the buffer space depth of the channel 0 is switched to N, the buffer space depth of the channel 1 is switched to 1, and when the channel 0 and the channel 1 generate data transmission requests at the same time, the priority of the channel 0 is higher than that of the channel 1.
The invention is applied to an on-chip asynchronous interconnection node of a heterogeneous multi-core processor system, wherein a dual-channel FIFO writing module is logically exemplified at a data transmitting end, and a dual-channel FIFO reading module is logically exemplified at a data receiving end, as shown in FIG. 1. Assuming that node a is at the transmitting end of the data (left interface signal in fig. 1) and node B is at the receiving end of the data (right interface signal in fig. 1):
node a generates the output signal shown in the upper left half of fig. 1 based on the local clock WClk, where oClk is the inverse clock gating output of WClk, oWen0, oWen1 are the register outputs of Wen0 and Wen1, respectively, oWData is the arbitrated register output of WData0 and WData1, and oSwitch is the register output of the channel priority occupancy signal generated by the local control logic, whose transitions only occur when the FIFO buffers of channel 0, channel 1 are both empty. Channel 0 and channel 1 generate corresponding write operations in case the respective FIFOs are not full.
After the output signals oWen0, oWen1, oWData and oSwitch of the data transmitting end of the node A reach the data receiving end of the node B, the iWen0, the iWen1, the iWData and the iSwtch are sampled based on a source synchronous clock iClk, and the setup allowance of half clock cycles is reserved for all data before the rising edge of the iClk clock arrives; likewise, after the iClk clock rising edge arrives, all data is also reserved for half clock period hold margin; after the data signals are sampled by the rising edge of the iClk clock, register input is carried out; while node B maintains a write pointer local to the receiving end based on the iClk clock falling edge, the iWData is stored in the FIFO buffer space corresponding to the depth N or depth 1 according to the indication of the ikw clock falling edge (ikw=0 indicates that the priority of the data transmission channel belongs to channel 0, the iWData from channel 0 is stored in the FIFO space of depth N from 0 to N-1 shown in fig. 1, while the iWData from channel 1 is stored in the FIFO space of depth 1 of the mark N shown in fig. 1, and conversely, the iww=1 indicates that the priority of the data transmission channel belongs to channel 1, the iWData from channel 1 is stored in the FIFO space of depth N from 1 to N shown in fig. 1, and the iWData from channel 0 is stored in the FIFO space of depth 1 of the mark 0 shown in fig. 1). And then, the receiving end carries out positive and negative edge two-stage synchronization on the respective write pointers of the two channels and the shared iSwitch signal based on the local read clock to eliminate metastable state, and the metastable state is used for judging the empty state of the FIFO and the priority occupation weight of the channel.
Node B generates the output signal shown in the lower right half of fig. 1 based on the local clock RClk, where oClk is the inverse clock-gated output of RClk, oRen0, oRen1 are the register outputs of Ren0 and Ren1, respectively. Channel 0 and channel 1 generate corresponding read operations in the event that the respective FIFOs are not empty.
After output signals oRen0 and oRen1 of a data receiving end of the node B reach a data transmitting end of the node A, iRen0 and iRen1 are sampled based on a source synchronous clock iClk, and a setup allowance of half clock period is reserved for all data before the rising edge of the iClk clock arrives; likewise, after the iClk clock rising edge arrives, all data is also reserved for half clock period hold margin; after the data signals are sampled by the rising edge of the iClk clock, register input is carried out; while node a maintains a local read pointer at the sender based on the iClk clock falling edge. And then, the transmitting end performs positive and negative edge two-stage synchronization on the respective read pointers of the two channels based on the local write clock to eliminate metastable states, and the metastable states are used for judging the full or empty states of the FIFO.
Fig. 2 shows a timing diagram of data transmission of virtual channels based on elastic buffer space: at time T0, channel 0 at the transmitting end of node a initiates a write operation (TX 0 FIFO is not full), at which time, because channel 0 has the priority of the Data transmission channel occupation, the oddata path continuously outputs TX0 Data0 to TX0 Data3, and writes the Data into the FIFO buffer space with the depth of N (N > 5) marked 0 to 3 in the receiving end of node B. At time T4, channel 1 at the sender of node a initiates a write operation (TX 1 FIFO is not full), at which time the oddata path outputs TX1 Data0 and writes the Data into the FIFO buffer space with a depth of 1 marked N at the receiver of node B, since channel 0 is no longer a new write request.
Assuming that write data is read after being stored in the FIFO buffer of the receiving end without considering delay on the data transmission path between nodes and register input-output delay of signals, at time T6, FIFO spaces corresponding to lane 0 and lane 1 are both empty. At this time, the channel 1 at the transmitting end of the node a initiates a write request, then the priority of the Data transmission channel occupation is changed from the channel 0 to the channel 1, and then the oddata path continuously outputs TX1 Data1 and TX1 Data2, and writes the Data into the FIFO buffer space with the depth of N marked 1-2 at the receiving end of the node B.
This enables dynamic adjustment of the buffer space in virtual channels based on source synchronous distributed asynchronous FOFO.

Claims (10)

1. The method for realizing the virtual channel based on the elastic cache is characterized by comprising the following steps:
the data transmitting end generates respective write operation and maintains write pointers based on the local write clock according to the non-full signals of the channel 0 and the channel 1 and the write request respectively;
outputting the write data signals after corresponding write enabling and arbitration to a data receiving end in a source synchronous mode, and synchronously outputting a write clock shared by channels and a data channel priority occupation mark signal to the data receiving end;
the data transmitting end maintains the read pointers of the respective channels according to the read enabling signals of the channel 0 and the channel 1 which are input by the source synchronous clock and the read clock signal shared by the channels, and synchronizes the read pointers to the local write clock domain to maintain the full/non-full signals of the channel 0 and the channel 1;
the data receiving end generates respective read operation and maintenance read pointers based on the local read clock according to the non-empty signals and the read requests of the channel 0 and the channel 1 respectively;
outputting corresponding read enabling signals and channel shared read clock signals to a data transmitting end in a source synchronous mode;
the data receiving end maintains respective write pointers and write clock domains according to write clocks shared by write enable signals of the channel 0 and the channel 1 which are input by source synchronization and the channel, data channel priority occupation mark signals and arbitrated write data, the data of the channel which obtains the priority occupation of the data transmission channel is written into the FIFO space with the depth of N, the data of the channel which does not obtain the priority occupation of the data transmission channel is written into the FIFO space with the depth of 1, and the write pointers of the channels are synchronized to the position under the local read clock domain, so that the empty/non-empty signals of the channel 0 and the channel 1 are maintained.
2. The method for implementing virtual channels based on elastic buffer memory according to claim 1, wherein all read pointers and write pointers of the data transmitting end and the receiving end are encoded by gray code.
3. The method of claim 1, wherein the source synchronous output control of the data transmitting end uses positive edge register output for write enable of channel 0 and channel 1 and write data after arbitration, uses inverse clock gating output for write clock, reserves setup/hold margin of each half period for source synchronous input sampling of the data receiving end, and reduces clock and register flip when no data is transmitted.
4. The method of claim 1, wherein the source synchronous input control of the data transmitting end performs register input on the basis of a rising edge of a source synchronous input read clock for the channel 0 and the channel 1, maintains a read pointer on the basis of a falling edge of the source synchronous input read clock, and performs two-stage synchronization of negative edge first and positive edge second on the basis of a local write clock domain.
5. The method of claim 1, wherein the source synchronous output control of the data receiving end uses positive edge register output for the read enable of channel 0 and channel 1, uses inverse clock gating output for the read clock, reserves setup/hold margins of each half period for the source synchronous input sample of the data transmitting end, and reduces clock and register flip without data transmission.
6. The method of claim 1, wherein the source synchronous input control of the data transmitting end performs register input on write enable of channel 0 and channel 1, the arbitrated write data and data channel priority occupation mark signal based on an input write clock rising edge of source synchronization, maintains the buffer write of the write pointer and the data based on a source synchronous input write clock falling edge, and performs two-stage synchronization of negative-first edge and positive-later edge on the write pointer of channel 0 and channel 1 and the data channel priority occupation mark signal based on a local read clock domain.
7. The method for implementing virtual channels based on elastic buffering as claimed in claim 1, wherein the buffer space depth of channel 0 is N, and the buffer space depth of channel 1 is 1;
when the channel 0 and the channel 1 generate data transmission requests at the same time, the priority of the channel 0 is higher than that of the channel 1;
the buffer memory space of the channel 0 and the buffer memory space of the channel 1 are in an empty state, meanwhile, the channel 1 initiates a data transmission request before the channel 0, at the moment, the buffer memory space depth of the channel 1 is switched to N, the buffer memory space depth of the channel 0 is switched to 1, and when the channel 0 and the channel 1 simultaneously generate a data transmission request, the priority of the channel 1 is higher than that of the channel 0;
the buffer space of the channel 0 and the buffer space of the channel 1 are in an empty state, meanwhile, the channel 0 initiates a data transmission request before the channel 1, at this time, the buffer space depth of the channel 0 is switched to N, the buffer space depth of the channel 1 is switched to 1, and when the channel 0 and the channel 1 generate data transmission requests at the same time, the priority of the channel 0 is higher than that of the channel 1.
8. The virtual channel system based on the elastic buffer is characterized by comprising an independent two-channel FIFO writing module and a two-channel FIFO reading module, wherein the two-channel FIFO writing module and the two-channel FIFO reading module respectively maintain read or write control logic of two sets of FIFOs;
the dual-channel FIFO writing module comprises a pair internal input interface signal, wherein the pair internal input interface signal comprises a write clock WClk, two paths of write data WData0 and WData1 and two paths of write enabling Wen0 and Wen1, and the pair internal output interface signal mainly comprises two paths of Full marks Full0 and Full1; the two-channel FIFO writing module comprises an external output interface signal which mainly comprises a write output clock oClk, an output data channel priority occupation mark oSwitch, write output data oWData after arbitration and two write output enabling oWen0 and oWen1, and an external input interface signal which mainly comprises a read input clock iClk and two read input enabling iRen0 and iRen1;
the dual-channel FIFO reading module comprises a pair of internal input interface signals, wherein the pair of internal input interface signals comprises a read clock RClk, two paths of read enabling Ren0 and Ren1, and the pair of internal output interface signals comprises two paths of read data RData0 and RData1 and two paths of Empty marks Empty0 and Empty1; the two-channel FIFO reading module comprises an external output interface signal, wherein the external output interface signal comprises a read output clock oClk, two read output enabling oRen0 and oRen1, and the external input interface signal comprises a write input clock iClk, an input data channel priority occupation mark iSwitch, write input data after arbitration and two write input enabling iWen0 and iWen1.
9. The virtual channel system based on elastic buffering according to claim 8, wherein the dual channel FIFO read module has a buffer space with a depth of n+1, the depth parameter N supports parameterized configuration, and the buffer space with a depth of n+1 dynamically allocates two buffers with a depth of N and a depth of 1 to channel 0 and channel 1 according to the iSwitch indication signal and Empty tag signals with a null 0 and null 1.
10. The virtual channel system based on elastic buffering according to claim 8, wherein in the dual-channel FIFO write module, the write output clock oClk is an inverse clock gating output of the local write clock WClk, and the gating signal is a write output enable oWen0 or oWen1; write output enables oWen0 and oWen1, and the arbitrated write data oWData is the register output of Wen0, wen1, oWData0 or oWData1 based on the write clock domain respectively;
in the double-channel FIFO read module, the read output clock oClk is the inverse clock gating output of the local read clock RClk, and the gating signal is the read output enable oRen0 or oRen1; the read output enabling oRen0 and oRen1 respectively read the register output of the enabling Ren0 and Ren1 based on the read clock domain;
between the two-channel FIFO writing module and the two-channel FIFO reading module, a writing output clock oClk is connected with a writing input clock iClk, writing output enabling oWen0 and oWen1 are respectively connected with writing input enabling iWen0 and iWen1, a data channel priority occupation mark oSwitch is connected with iSwitch, writing data oWData after arbitration is connected with iWData, a reading output clock oClk is connected with a reading input clock iClk, and reading output enabling oRen0 and oRen1 are respectively connected with reading input enabling iRen0 and iRen1.
CN202110218606.2A 2021-02-26 2021-02-26 Virtual channel based on elastic cache and implementation method Active CN112948322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110218606.2A CN112948322B (en) 2021-02-26 2021-02-26 Virtual channel based on elastic cache and implementation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110218606.2A CN112948322B (en) 2021-02-26 2021-02-26 Virtual channel based on elastic cache and implementation method

Publications (2)

Publication Number Publication Date
CN112948322A CN112948322A (en) 2021-06-11
CN112948322B true CN112948322B (en) 2023-05-16

Family

ID=76246512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110218606.2A Active CN112948322B (en) 2021-02-26 2021-02-26 Virtual channel based on elastic cache and implementation method

Country Status (1)

Country Link
CN (1) CN112948322B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113900978B (en) * 2021-10-27 2024-05-10 海光信息技术股份有限公司 Data transmission method, device and chip
CN114968861B (en) * 2022-05-25 2024-03-08 中国科学院计算技术研究所 Two-write two-read data transmission structure and on-chip multichannel interaction network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407997A (en) * 2014-12-18 2015-03-11 中国人民解放军国防科学技术大学 NAND flash memory single-channel synchronous controller with dynamic instruction scheduling function
CN104407809A (en) * 2014-11-04 2015-03-11 盛科网络(苏州)有限公司 Multi-channel FIFO (First In First Out) buffer and control method thereof
CN106095722A (en) * 2016-06-29 2016-11-09 合肥工业大学 A kind of Virtual Channel low consumption circuit being applied to network-on-chip
CN108829373A (en) * 2018-05-25 2018-11-16 西安微电子技术研究所 A kind of asynchronous fifo realization circuit
CN110188059A (en) * 2019-05-17 2019-08-30 西安微电子技术研究所 The flow control type FIFO buffer structure and method of the unified configuration of data valid bit
CN111124961A (en) * 2019-12-30 2020-05-08 武汉先同科技有限公司 Method for realizing conversion from single-port RAM to pseudo-dual-port RAM in continuous read-write mode
CN111324564A (en) * 2020-02-28 2020-06-23 西安微电子技术研究所 Elastic caching method
CN112100097A (en) * 2020-11-17 2020-12-18 杭州长川科技股份有限公司 Multi-test channel priority adaptive arbitration method and memory access controller

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1645967B1 (en) * 2004-10-11 2008-02-27 Texas Instruments Incorporated Multi-channel DMA with shared FIFO buffer
US7769956B2 (en) * 2005-09-07 2010-08-03 Intel Corporation Pre-coherence channel
US20100191814A1 (en) * 2008-12-23 2010-07-29 Marco Heddes System-On-A-Chip Employing A Network Of Nodes That Utilize Receive Side Flow Control Over Channels For Messages Communicated Therebetween
US9275704B2 (en) * 2014-07-31 2016-03-01 Texas Instruments Incorporated Method and apparatus for asynchronous FIFO circuit
US11036660B2 (en) * 2019-03-28 2021-06-15 Intel Corporation Network-on-chip for inter-die and intra-die communication in modularized integrated circuit devices

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407809A (en) * 2014-11-04 2015-03-11 盛科网络(苏州)有限公司 Multi-channel FIFO (First In First Out) buffer and control method thereof
CN104407997A (en) * 2014-12-18 2015-03-11 中国人民解放军国防科学技术大学 NAND flash memory single-channel synchronous controller with dynamic instruction scheduling function
CN106095722A (en) * 2016-06-29 2016-11-09 合肥工业大学 A kind of Virtual Channel low consumption circuit being applied to network-on-chip
CN108829373A (en) * 2018-05-25 2018-11-16 西安微电子技术研究所 A kind of asynchronous fifo realization circuit
CN110188059A (en) * 2019-05-17 2019-08-30 西安微电子技术研究所 The flow control type FIFO buffer structure and method of the unified configuration of data valid bit
CN111124961A (en) * 2019-12-30 2020-05-08 武汉先同科技有限公司 Method for realizing conversion from single-port RAM to pseudo-dual-port RAM in continuous read-write mode
CN111324564A (en) * 2020-02-28 2020-06-23 西安微电子技术研究所 Elastic caching method
CN112100097A (en) * 2020-11-17 2020-12-18 杭州长川科技股份有限公司 Multi-test channel priority adaptive arbitration method and memory access controller

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种多通道共享读写SDRAM的仲裁方法;张思政;《电子制作》(第19期);第20-24页 *

Also Published As

Publication number Publication date
CN112948322A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
KR0145321B1 (en) Two-way data transfer apparatus
US7925803B2 (en) Method and systems for mesochronous communications in multiple clock domains and corresponding computer program product
US6308229B1 (en) System for facilitating interfacing between multiple non-synchronous systems utilizing an asynchronous FIFO that uses asynchronous logic
CN112948322B (en) Virtual channel based on elastic cache and implementation method
CN112965689B (en) Distributed asynchronous FIFO data interaction method and FIFO structure based on source synchronization
JPH02245962A (en) Communication control system between parallel computers
CN108683536B (en) Configurable dual-mode converged communication method of asynchronous network on chip and interface thereof
US6249875B1 (en) Interface circuit using plurality of synchronizers for synchronizing respective control signals over a multi-clock environment
CN104915303A (en) High-speed digital I/O system based on PXIe bus
KR20080007506A (en) Latency insensitive fifo signaling protocol
JP2004532457A (en) Network to increase transmission link layer core speed
KR20030064376A (en) Efficient clock start and stop apparatus for clock forwarded system i/o
US7590146B2 (en) Information processing unit
CN111666248A (en) RS422 serial port communication control system and method based on FPGA
CN114153775B (en) FlexRay controller based on AXI bus
US7792030B2 (en) Method and system for full-duplex mesochronous communications and corresponding computer program product
CN101833431B (en) Bidirectional high speed FIFO storage implemented on the basis of FPGA
CN110120922B (en) FPGA-based data interaction network management system and method
CN110008162A (en) A kind of buffer interface circuit and the methods and applications based on the circuit transmission data
CN114185830A (en) Multi-processor communication method, device, system and storage medium based on mailbox
CN110705195A (en) Cross-clock-domain depth self-configuration FIFO system based on FPGA
JP2003157228A (en) Circuit for transferring data
CN112231261A (en) ID number compression device for AXI bus
CN114840458B (en) Read-write module, system on chip and electronic equipment
Weber et al. Reducing NoC Energy Consumption Exploring Asynchronous End-to-end GALS Communication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant