WO2001004770A2 - Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list - Google Patents

Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list Download PDF

Info

Publication number
WO2001004770A2
WO2001004770A2 PCT/US2000/018939 US0018939W WO0104770A2 WO 2001004770 A2 WO2001004770 A2 WO 2001004770A2 US 0018939 W US0018939 W US 0018939W WO 0104770 A2 WO0104770 A2 WO 0104770A2
Authority
WO
WIPO (PCT)
Prior art keywords
entry
linked list
index
fifo
entries
Prior art date
Application number
PCT/US2000/018939
Other languages
French (fr)
Other versions
WO2001004770A3 (en
Inventor
Keith Lee
Dean Schmaltz
Original Assignee
Alteon Web Systems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alteon Web Systems, Inc. filed Critical Alteon Web Systems, Inc.
Priority to AU59297/00A priority Critical patent/AU5929700A/en
Publication of WO2001004770A2 publication Critical patent/WO2001004770A2/en
Publication of WO2001004770A3 publication Critical patent/WO2001004770A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Definitions

  • the present invention relates to data transfer in a computer system. More particularly, the invention relates to a method and architecture for optimizing data throughput in a multi-processor environment by writing data to be processed to a central buffer and passing the various processors a FIFO-like data structure constituting a linked list of indexes to the buffered data.
  • U.S Patent No 6,067,300 (May 23,2000) describe a switch apparatus having a packet memory, a packet descriptor memory that stores pointers to the stored data packets and buffered data paths employing FIFO buffers.
  • the FIFO buffers utilize conventional queued data structures. While the described methods effectively avoid copying of data, both inter- and intra- processor, conventional methods of adding to and removing data descriptors from the queues are employed, requiring the allocation of an entry and a pointer, and a subsequent read-wnte-modify operation It would be highly desirable to further reduce processing overhead by streamlining enqueue and dequeue operations
  • the invention provides a method and architecture for optimizing data throughput in a multiprocessor environment through the use of a RAM-based, shared index FIFO linked list, in which data to be processed is written to a central buffer and the index FIFO, constituting a linked list of indexes to the buffered data is passed between processing units within the system
  • the invention advantageously reduces the overhead required to process a data stream in a variety of ways
  • the use of a FIFO-like structure, rather than a conventional pipeline greatly reduces pipeline interlock
  • the use of a FIFO-like linked list, instead of a FIFO frees the system of the requirement, imposed by a conventional FIFO, of processing data frames in sequence.
  • a novel method of dequeuing and enqueuing linked list entries enables entries to be enqueued and dequeued in a single cycle, with a single read, rather than the conventional read-modify-w ⁇ te method in common use
  • the invented method involves the steps of providing messages to be processed, writing the data messages to a central buffer, creating a linked list of indexes to the messages, where an index constitutes a pointer to a buffer address occupied by a specific message, and where an index constitutes an entry in said linked list, with each entry also including an index pointer to a next entry in said linked list, pipelining the linked list to a processing unit as an index FIFO so that the processor reads the entries of the linked list in sequence; as the entries are read, processing a message indicated by said entry; and enqueuing and dequeuing the entries in an index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.
  • the invention is also embodied as an architecture, the architecture including one or more processing units, the aforementioned central buffer and a RAM-based, shared index FIFO linked list; one or more pipelines for feeding the linked list to the processing units; and the afore-mentioned index FIFO RAM, wherein the linked lists are stored and entries dequeued and enqueued.
  • Figure 1 provides a block diagram of an architecture for optimizing data throughput in a multiprocessor environment, according to the invention
  • Figure 2 provides a diagram of a linked list of indexes, according to the invention.
  • Figure 3 provides an index FIFO RAM memory map, according to the invention.
  • the invention provides a method and architecture for optimizing data throughput in a multiprocessor environment through the use of a RAM-based, shared index FIFO linked list, in which data to be processed is written to a central buffer and the index FIFO, constituting a linked list of indexes to the buffered data, is passed between processing units within the system.
  • the invention is implemented in a network switch, for forwarding data frames over an IP network. Incoming data frames are matched with records of previous forwarding results to determine a next hop for each of the data frames.
  • a typical forwarding result record is approximately sixty-four bits; however an index pointer to that forwarding result record is only six bits. Therefore, the pronounced increase in throughput through the use of FIFO-like linked list of indexes, instead of a FIFO of actual data frames or results will be apparent to those skilled in the art.
  • the invention as described herein is implemented in a network switch, other implementations are possible. The invention finds application in any data processing environment in which a data stream is passed between processes or processing units.
  • a network switch 10 receives incoming data frames at an ingress port (not shown). The first sixty-four bits of each frame constitute the header.
  • the headers are stored in a header buffer RAM 12.
  • the header buffer RAM is implemented as a 256 x 64 bit dual port RAM, organized as 8 x 64 bits per frame, which allows for a total of 32 frame header buffers.
  • the write port is designed as a thirty-two data frame buffer FIFO, and the read port is designed to be randomly accessed by the various processing units. This description of the header buffer RAM is exemplary only, and is not intended to limit the invention.
  • each header buffer has a set of associated bytes reserved in the working RAM 1 1 that are used to pass information between the various processing units 14 of the switch 10.
  • An entry in the linked list of the invention includes a pointer to a specific header buffer 20 plus an index pointer to the next entry in the linked list 21.
  • the linked list is well known to those skilled in the art of computer programming and software design.
  • each linked list includes thirty-two entries to correspond to the thirty-two locations of the header buffer RAM.
  • each list also includes an empty entry at the tail as a placeholder, for a total of thirty-three
  • each processing unit may operate from a pipeline of these indexes, significantly reducing the overhead of processing the data stream by reducing
  • a further gate reduction is achieved, as compared to a convention FIFO of indexes, by sharing the linked list entries among processing queues.
  • the invention includes an additional enhancement.
  • Due to the interlocking nature of a pipeline however, data frames may not be processed out of the sequence imposed by the various stages of the pipeline. Thus, a frame may not proceed to the next stage of the pipeline, until the frame preceding it has cleared that stage.
  • the linked list is passed between processing units in the manner of a FIFO. Processing the linked list as a FIFO allows the processing of the entries of the list to proceed independently of the processing of the corresponding frame.
  • processing of an earlier frame takes longer due to the size of the frame, subsequent frames may still be processed, because processing of the corresponding index in the linked list is allowed to proceed, unhampered by a delay imposed by the processing of the larger frame.
  • processing of the linked list is allowed to proceed, unimpeded by bottlenecks that may be created due to memory latency, for example when a processing unit fetches a data frame from the working RAM for processing.
  • Processing the linked list as a FIFO yields yet another advantage. The characteristics of the FIFO allow all stages of the pipeline to be decoupled from each other, so that a delay in processing of a later frame does not create a bottleneck that prevents preceding frames from moving forward.
  • a linked list is provided for each processing unit. All linked lists are stored in an index FIFO RAM unit 13.
  • Figure 3 provides a map of an exemplary index FIFO RAM.
  • the shared index FIFO linked list is RAM-based, meaning that all operations to the linked lists occur in RAM. Operations on the linked list include dequeuing and enqueuing.
  • the serial arrangement of the various processing units creates a data flow in which a head entry from a linked list for a first processing unit 24 is dequeued and enqueued to the tail of a second processing unit 25.
  • dequeue and enqueue operations may be register-based or RAM based.
  • the empty entry allocated at the tail of each linked list allows the current invention to enqueue by writing an entry dequeued from the head of another list to the empty record, and reusing the index pointer as the new tail pointer.
  • Enqueue and dequeue operations are performed in the index FIFO RAM by enqueue and dequeue units (not shown), respectively. Listed below are the steps involved in dequeuing from a first linked list to a second linked list.
  • FifoBTailPtr ⁇ FifoAHeadPtr. This allows the old IdxFifo used by the AHeadPtr to be reused as the new BTailPtr.
  • enqueue and dequeue operations are entirely RAM-based, requiring only one write and one cycle, unlike conventional implementations, that require at least a read-write-modify of the RAM contents.
  • the invention is embodied as an architecture and a method. While the method has been described incident to the foregoing description of the invented architecture, for clarity, the general steps of the invented method are provided herein below:
  • each of the frame headers to a RAM buffer; • Creating a linked list of indexes to said the frame headers, where an index includes a pointer to a buffer occupied by a specific frame, each index constitutes an entry in the linked list, and each entry further includes an index pointer to the next entry in the linked list;

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

A method and architecture for optimizing data throughput in a multiprocessor environment makes use of a RAM-based, shared index FIFO linked list, in which data to be processed is written to a central buffer and the index FIFO, constituting a linked list of indexes to the buffered data is passed between processing units within the system, providing a substantial reduction in the gate count required for processing the data. Messages are written to a central buffer; a linked list of indexes to the messages is created, and then pipelined to a processing unit as an index FIFO, so that the processor reads the entries of the linked list in sequence; as the entries are read, a message indicated by the entry is processed. Entries are enqueued and dequeued in an index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.

Description

METHOD AND ARCHITECTURE FOR OPTIMIZING DATA
THROUGHPUT IN A MULTI-PROCESSOR ENVIRONMENT
USING A RAM-BASED SHARED INDEX FIFO LINKED LIST
BACKGROUND OF THE INVENTION
FIELD OF THE INVENTION
The present invention relates to data transfer in a computer system. More particularly, the invention relates to a method and architecture for optimizing data throughput in a multi-processor environment by writing data to be processed to a central buffer and passing the various processors a FIFO-like data structure constituting a linked list of indexes to the buffered data.
DESCRIPTION OF PRIOR ART
In the data processing art, it is an exceedingly common operation to pass data from one processing system to another. The data may be passed from process to process within a single processor, or between processing units in a multiple processor environment. Passing data between processing systems requires that the data be repeatedly copied to each processing system. The art provides various systems and methods for accomplishing data transfer in this manner
For example, P. Chambers, S. Harrow, Virtual contiguous FIFO for combining multiple data packets into a single contiguous stream, U.S. Patent No. 6,016,315 (January 18, 2000) describe an arrangement in which data packets are supplied to a DSP from a PCI bus through a plurality of FIFO RAM units operating in parallel. R. Panwar, System for efficient implementation of multi-ported logic structures in a processor, U.S. Patent No. 6,055,616 (April 25, 2000) describes a system and method for efficient implementation of a multi-port logic first-in, first- out structure that provides for reduced on-chip area requirements. A common feature of both disclosed systems is that data must be transferred between units by copying, creating a potential bottleneck, and wasting I/O bandwidth and memory bandwidth. Accordingly, it would be advantageous to provide a means for avoiding copying of data between processors.
R. Fishier, B. Zargham, System for transferring a data stream to a requestor without copying data segments to each one of multiple data source/sinks during data stream building, U.S. Patent No. 5,941 ,959 (August 24, 1999) describe a method for getting descriptors to data and passing the descriptors to data sources and sinks, thereby avoiding copying the data among the data sources and sinks. The data descriptors are organized into a queued I/O data structure comprising a doubly linked list R Baumert, A. Seaman, S Steves, Method and apparatus for optimizing the transfer of data packets between local area networks, U.S Patent No 6,067,300 (May 23,2000) describe a switch apparatus having a packet memory, a packet descriptor memory that stores pointers to the stored data packets and buffered data paths employing FIFO buffers. The FIFO buffers utilize conventional queued data structures. While the described methods effectively avoid copying of data, both inter- and intra- processor, conventional methods of adding to and removing data descriptors from the queues are employed, requiring the allocation of an entry and a pointer, and a subsequent read-wnte-modify operation It would be highly desirable to further reduce processing overhead by streamlining enqueue and dequeue operations
SUMMARY OF THE INVENTION
The invention provides a method and architecture for optimizing data throughput in a multiprocessor environment through the use of a RAM-based, shared index FIFO linked list, in which data to be processed is written to a central buffer and the index FIFO, constituting a linked list of indexes to the buffered data is passed between processing units within the system The invention advantageously reduces the overhead required to process a data stream in a variety of ways First, since the FIFO is composed of indexes, rather than the actual data, a significant reduction in gate count required for processing is achieved Second, the use of a FIFO-like structure, rather than a conventional pipeline, greatly reduces pipeline interlock, and third, the use of a FIFO-like linked list, instead of a FIFO, frees the system of the requirement, imposed by a conventional FIFO, of processing data frames in sequence. A novel method of dequeuing and enqueuing linked list entries enables entries to be enqueued and dequeued in a single cycle, with a single read, rather than the conventional read-modify-wπte method in common use In general, the invented method involves the steps of providing messages to be processed, writing the data messages to a central buffer, creating a linked list of indexes to the messages, where an index constitutes a pointer to a buffer address occupied by a specific message, and where an index constitutes an entry in said linked list, with each entry also including an index pointer to a next entry in said linked list, pipelining the linked list to a processing unit as an index FIFO so that the processor reads the entries of the linked list in sequence; as the entries are read, processing a message indicated by said entry; and enqueuing and dequeuing the entries in an index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.
The invention is also embodied as an architecture, the architecture including one or more processing units, the aforementioned central buffer and a RAM-based, shared index FIFO linked list; one or more pipelines for feeding the linked list to the processing units; and the afore-mentioned index FIFO RAM, wherein the linked lists are stored and entries dequeued and enqueued.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 provides a block diagram of an architecture for optimizing data throughput in a multiprocessor environment, according to the invention;
Figure 2 provides a diagram of a linked list of indexes, according to the invention; and
Figure 3 provides an index FIFO RAM memory map, according to the invention.
DETAILED DESCRIPTION
The invention provides a method and architecture for optimizing data throughput in a multiprocessor environment through the use of a RAM-based, shared index FIFO linked list, in which data to be processed is written to a central buffer and the index FIFO, constituting a linked list of indexes to the buffered data, is passed between processing units within the system. Several noteworthy advantages are provided by the invention:
• Since the FIFO is composed of indexes, rather than the actual data, a significant reduction in gate count required for processing is achieved;
• The use of a FIFO-like structure rather than a conventional pipeline, greatly reduces pipeline interlock; • The use of a FIFO-like linked list, instead of a FIFO, frees the system of the requirement, imposed by a conventional FIFO, of processing data frames in sequence; • A novel method of dequeuing and enqueuing linked list entries enables entries to be enqueued and dequeued in a single cycle, with a single read, rather than the conventional read-modify-write method in common use.
An example is provided to illustrate the dramatic reduction in gate count achievable with the invention: In the preferred embodiment, the invention is implemented in a network switch, for forwarding data frames over an IP network. Incoming data frames are matched with records of previous forwarding results to determine a next hop for each of the data frames. A typical forwarding result record is approximately sixty-four bits; however an index pointer to that forwarding result record is only six bits. Therefore, the pronounced increase in throughput through the use of FIFO-like linked list of indexes, instead of a FIFO of actual data frames or results will be apparent to those skilled in the art. While the invention as described herein is implemented in a network switch, other implementations are possible. The invention finds application in any data processing environment in which a data stream is passed between processes or processing units.
Referring now to Figure 1 , shown is an architecture for optimizing data throughput in a multiprocessor environment through the use of a RAM-based, shared index FIFO linked list. A network switch 10 receives incoming data frames at an ingress port (not shown). The first sixty-four bits of each frame constitute the header. The headers are stored in a header buffer RAM 12. In the preferred embodiment, the header buffer RAM is implemented as a 256 x 64 bit dual port RAM, organized as 8 x 64 bits per frame, which allows for a total of 32 frame header buffers. The write port is designed as a thirty-two data frame buffer FIFO, and the read port is designed to be randomly accessed by the various processing units. This description of the header buffer RAM is exemplary only, and is not intended to limit the invention. Other schemes for organizing the buffer RAM will be apparent to those skilled in the art. The remainder of the frame and records of previous forwarding results are stored in a working RAM 1 1 . From the time that a data frame is received at an ingress port to the time that it is routed to a next hop, the data frame is processed in a serial fashion by one or more processing units 14 within the network switch. Typically, processing will include reading a source address and a destination address from the frame header, searching a variety of data structures to find a forwarding result with a destination address matching that of the frame header, and possibly modifying the frame header. In order to pipeline data frames and results from one processing unit to another, each header buffer has a set of associated bytes reserved in the working RAM 1 1 that are used to pass information between the various processing units 14 of the switch 10.
5 Conventionally, when a data stream is pipelined, the actual data, or result, is passed from pipeline stage to pipeline stage. However, if the results and the data are housed in a central location, and an index of pointers instead are passed from pipeline stage to pipeline stage, the required gate count for processing is substantially reduced, as previously illustrated. Thus, the
10 invention provides a linked list to index frame data stored in the header buffer
RAM. The implementation of linked lists is well known to those skilled in the art of computer programming and software design. An entry in the linked list of the invention, shown in Figure 2, includes a pointer to a specific header buffer 20 plus an index pointer to the next entry in the linked list 21. The linked list
15 also includes a head pointer 22 to designate the first entry in the list and a tail pointer 23, to designate the final entry of the list. In the preferred embodiment of the invention, each linked list includes thirty-two entries to correspond to the thirty-two locations of the header buffer RAM. In addition, each list also includes an empty entry at the tail as a placeholder, for a total of thirty-three
20 entries per linked list. Other implementations of the linked list consistent with the spirit and scope of the invention will be apparent to those skilled in the art. The function of the placeholder entry will be described in detail further below. Thus, each processing unit may operate from a pipeline of these indexes, significantly reducing the overhead of processing the data stream by reducing
"> S gate count.
A further gate reduction is achieved, as compared to a convention FIFO of indexes, by sharing the linked list entries among processing queues. In a multi-processor system that doesn't share entries among processing queues, a conventional FIFO that allows up to a maximum number of frames, x, requires x entries for each processing unit. Therefore, an exemplary system having eight processing units, where x = 32, would require a total of 256 (32 x 8) entries. Utilizing a shared linked list requires only forty entries: x + the number of processing units, or 32 + 8.
33
However, the invention includes an additional enhancement. In the interest of maximizing data throughput, it is desirable to free up header buffers as quickly as possible. Due to the interlocking nature of a pipeline, however, data frames may not be processed out of the sequence imposed by the various stages of the pipeline. Thus, a frame may not proceed to the next stage of the pipeline, until the frame preceding it has cleared that stage. In the present invention, the linked list is passed between processing units in the manner of a FIFO. Processing the linked list as a FIFO allows the processing of the entries of the list to proceed independently of the processing of the corresponding frame. Therefore, if processing of an earlier frame takes longer due to the size of the frame, subsequent frames may still be processed, because processing of the corresponding index in the linked list is allowed to proceed, unhampered by a delay imposed by the processing of the larger frame. Furthermore, processing of the linked list is allowed to proceed, unimpeded by bottlenecks that may be created due to memory latency, for example when a processing unit fetches a data frame from the working RAM for processing. Processing the linked list as a FIFO yields yet another advantage. The characteristics of the FIFO allow all stages of the pipeline to be decoupled from each other, so that a delay in processing of a later frame does not create a bottleneck that prevents preceding frames from moving forward.
In the preferred embodiment of the invention, a linked list is provided for each processing unit. All linked lists are stored in an index FIFO RAM unit 13.
Figure 3 provides a map of an exemplary index FIFO RAM. As previously indicated, the shared index FIFO linked list is RAM-based, meaning that all operations to the linked lists occur in RAM. Operations on the linked list include dequeuing and enqueuing. As shown in Figure 2, the serial arrangement of the various processing units creates a data flow in which a head entry from a linked list for a first processing unit 24 is dequeued and enqueued to the tail of a second processing unit 25. In conventional implementations of linked lists, dequeue and enqueue operations may be register-based or RAM based. The empty entry allocated at the tail of each linked list, previously described, allows the current invention to enqueue by writing an entry dequeued from the head of another list to the empty record, and reusing the index pointer as the new tail pointer. Enqueue and dequeue operations are performed in the index FIFO RAM by enqueue and dequeue units (not shown), respectively. Listed below are the steps involved in dequeuing from a first linked list to a second linked list.
• {HeaderBufldxA, FifoANxtldxFifo} = IdxFifofFifoAHeadPtr]. Load Bufldx and next entry pointed by the head pointer. When done with processing, continue with next step for dequeue operation. • ldxFifo[FifoBTailPtr] <= {HeaderBufldxA, FifoAHeadPtr}. Copying the Bufldx instead of relinking avoids a race condition between enqueue and dequeue operations to the same linked list.
• FifoBTailPtr <= FifoAHeadPtr. This allows the old IdxFifo used by the AHeadPtr to be reused as the new BTailPtr.
• FifoAHeadPtr <= FifoANxtldxFifo.
Thus, enqueue and dequeue operations are entirely RAM-based, requiring only one write and one cycle, unlike conventional implementations, that require at least a read-write-modify of the RAM contents.
As previously mentioned, the invention is embodied as an architecture and a method. While the method has been described incident to the foregoing description of the invented architecture, for clarity, the general steps of the invented method are provided herein below:
• Providing data frames to be processed, where a portion of the frame constitutes a frame header. The provided data frames are received at the ingress port of a network switch;
• Storing each of the frame headers to a RAM buffer; • Creating a linked list of indexes to said the frame headers, where an index includes a pointer to a buffer occupied by a specific frame, each index constitutes an entry in the linked list, and each entry further includes an index pointer to the next entry in the linked list;
• Pipelining the linked list to a processing unit within said system as an index FIFO, so that the processing unit reads the entries in sequence;
• As entries are read, processing the corresponding data frame; and
• Enqueueing and dequeuing the entries in an index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.
Although the invention has been described herein with reference to certain preferred embodiments, one skilled in the art will readily appreciate that other applications may be substituted for those set forth herein without departing from the spirit and scope of the present invention. Accordingly, the invention should only be limited by the Claims included below.

Claims

CLAIMSDWhat is claimed is:
1. A method of optimizing data throughput in a data processing system, wherein data to be processed is written to a central buffer and an index FIFO 0 constituting a linked list of indexes to said buffered data is passed to a processing unit within said system, said method comprising the steps of: providing a plurality of messages to be processed; writing said data messages to a central buffer; creating a linked list of indexes to said messages, an index comprising a 5 pointer to a buffer address occupied by a specific message, each index comprising an entry in said linked list, each entry further comprising an index pointer to a next entry in said linked list; pipelining said linked list to a processing unit within said system as an index FIFO, so that said processor reads entries in sequence; 0 as entries are read, processing a message indicated by said entry; and enqueueing and dequeuing said entries in an index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.
5 2. The method of Claim 1 , wherein each of said messages comprises a header and wherein said buffer comprises a header buffer, so that said headers are written to said header buffer.
3. The method of Claim 2, wherein said data processing system comprises 0 a plurality of processing units, wherein an index FIFO, said index FIFO comprising said linked list, is processed in series by at least two of said processing units, so that messages corresponding to entries in said index FIFO's are processed accordingly, and wherein said linked list is shared between processing units. 3
4. The method of Claim 3, wherein said data processing unit comprises a network switch, and wherein said messages constitute data frames.
5. The method of Claim 3, wherein entries in said index FIFO are processed independently of said corresponding messages, so that subsequent entries in said index FIFO may be read while messages corresponding to previous entries are still being processed.
6. The method of Claim 3, wherein pipeline stages are decoupled b y allowing entries in said FIFO to be processed independently of each other, so that processing of one entry is unaffected by processing of another.
7. The method of Claim 3, wherein said message processing step comprises the steps of: reading a message from a location in said header buffer specified by its corresponding entry; copying said read message into a register; and using said copy of said message to perform a specified operation.
8. The method of Claim 7, said message processing step further comprising one or both of the steps of: writing a modified copy of said message to said header buffer location, wherein said read message is modified as a result of said specified operation; writing a result of said operation to a reserved location in said working RAM.
9. The method of Claim 3, wherein said linked list further comprises a head pointer and a tail pointer, said head pointer designating a first entry in said linked list, said tail pointer designating a last entry in said linked list, said last entry comprising a reserved entry for enqueuing an additional entry.
10. The method of Claim 9, wherein said deqeueing step comprises updating said head pointer to point from a current first entry to a current second entry, so that said current second entry becomes a new first entry.
1 1 . The method of Claim 10, wherein said enqueuing step comprises the steps of: writing a previously dequeued entry to said reserved entry, wherein an index pointer associated with said previously dequeued entry is reused as a new tail pointer, so that allocation of a new index pointer is unnecessary; and updating said new tail pointer.
12. An architecture for optimizing data throughput in a data processing system, wherein data to be processed is written to a central buffer, and an index FIFO constituting a linked list of indexes to said buffered data is passed to a processing unit within said system, said architecture comprising: a plurality of processing units; a central buffer, wherein data messages to be processed by said processing units are written; one or more index FIFO's, each of said index FIFO's comprising a linked list of indexes to said messages, an index comprising a pointer to a buffer address occupied by a specific message, each index comprising an entry in said linked list, each entry further comprising an index pointer to a next entry in said linked list; means for pipelining said linked list to said processing unit within said system as an index FIFO, so that said processor reads entries in sequence, wherein, as entries are read, a message indicated by said entry is processed; an index FIFO RAM; and means for enqueuing and dequeuing said entries in said index FIFO RAM, so that enqueuing and dequeuing are performed in a single cycle with a single write operation.
13. The architecture of Claim 12, wherein each of said messages comprises a header and wherein said buffer comprises a header buffer, so that said headers are written to said header buffer.
14. The architecture of Claim 13, wherein said data processing system comprises a plurality of processing units, wherein an index FIFO, said index FIFO comprising said linked list, is processed in series by at least two of said processing units, so that messages corresponding to entries in said index FIFO are processed accordingly, and wherein said linked list is shared between processing units.
15. The architecture of Claim 14, wherein said data processing unit comprises a network switch, and wherein said messages constitute data frames.
16. The architecture of Claim 14, wherein entries in said index FIFO are processed independently of said corresponding messages, so that subsequent entries in said index FIFO may be read while messages corresponding to previous entries are still being processed.
17. The architecture of Claim 14, wherein pipeline stages are decoupled b y allowing entries in said FIFO to be processed independently of each other, so that processing of one entry is unaffected by processing of another.
18. The architecture of Claim 14, wherein message are processed by: reading a message from a location in said header buffer specified by its corresponding entry; copying said read message into a register; and using said copy of said message to perform a specified operation.
19. The architecture of Claim 18, wherein said read message is modified as a result of said specified operation, so that said modified message is written to said header buffer location.
20. The architecture of Claim 18, further comprising a working RAM, wherein a result of said operation is written to a reserved location in said working RAM.
21. The architecture of Claim 14, wherein said linked list further comprises a head pointer and a tail pointer, said head pointer designating a first entry in said linked list, said tail pointer designating a last entry in said linked list, said last entry comprising a reserved entry for enqueuing an additional entry.
22. The architecture of Claim 21 , wherein said dequeuing means updates said head pointer to point from a current first entry to a current second entry, so that said current second entry becomes a new first entry.
23. The architecture of Claim 21 , wherein said enqueuing means writes a previously dequeued entry to said reserved entry, wherein an index pointer associated with said previously dequeued entry is reused as a new tail pointer, so that allocation of a new index pointer is unnecessary, and updates said new tail pointer.
PCT/US2000/018939 1999-07-13 2000-07-11 Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list WO2001004770A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU59297/00A AU5929700A (en) 1999-07-13 2000-07-11 Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14344599P 1999-07-13 1999-07-13
US60/143,445 1999-07-13

Publications (2)

Publication Number Publication Date
WO2001004770A2 true WO2001004770A2 (en) 2001-01-18
WO2001004770A3 WO2001004770A3 (en) 2001-08-30

Family

ID=22504112

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2000/018939 WO2001004770A2 (en) 1999-07-13 2000-07-11 Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list

Country Status (2)

Country Link
AU (1) AU5929700A (en)
WO (1) WO2001004770A2 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6389479B1 (en) 1997-10-14 2002-05-14 Alacritech, Inc. Intelligent network interface device and system for accelerated communication
US6427171B1 (en) 1997-10-14 2002-07-30 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6427173B1 (en) 1997-10-14 2002-07-30 Alacritech, Inc. Intelligent network interfaced device and system for accelerated communication
US6434620B1 (en) 1998-08-27 2002-08-13 Alacritech, Inc. TCP/IP offload network interface device
US6658480B2 (en) 1997-10-14 2003-12-02 Alacritech, Inc. Intelligent network interface system and method for accelerated protocol processing
US6687758B2 (en) 2001-03-07 2004-02-03 Alacritech, Inc. Port aggregation for network connections that are offloaded to network interface devices
US6697868B2 (en) 2000-02-28 2004-02-24 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6751665B2 (en) 2002-10-18 2004-06-15 Alacritech, Inc. Providing window updates from a computer to a network interface device
US6757746B2 (en) 1997-10-14 2004-06-29 Alacritech, Inc. Obtaining a destination address so that a network interface device can write network data without headers directly into host memory
US6807581B1 (en) 2000-09-29 2004-10-19 Alacritech, Inc. Intelligent network storage interface system
DE10360637A1 (en) * 2003-12-19 2005-07-21 Infineon Technologies Ag Program controlled microcontroller system uses buffer memory arrangement for data transmission and reception
US6965941B2 (en) 1997-10-14 2005-11-15 Alacritech, Inc. Transmit fast-path processing on TCP/IP offload network interface device
US7042898B2 (en) 1997-10-14 2006-05-09 Alacritech, Inc. Reducing delays associated with inserting a checksum into a network message
US7237036B2 (en) 1997-10-14 2007-06-26 Alacritech, Inc. Fast-path apparatus for receiving data corresponding a TCP connection
US7284070B2 (en) 1997-10-14 2007-10-16 Alacritech, Inc. TCP offload network interface device
WO2011009638A1 (en) * 2009-07-24 2011-01-27 Proximusda Gmbh Scheduling and communication in computing systems
US8019901B2 (en) 2000-09-29 2011-09-13 Alacritech, Inc. Intelligent network storage interface system
US8893159B1 (en) 2008-04-01 2014-11-18 Alacritech, Inc. Accelerating data transfer in a virtual computer system with tightly coupled TCP connections
US9055104B2 (en) 2002-04-22 2015-06-09 Alacritech, Inc. Freeing transmit memory on a network interface device prior to receiving an acknowledgment that transmit data has been received by a remote device
US9306793B1 (en) 2008-10-22 2016-04-05 Alacritech, Inc. TCP offload device that batches session layer headers to reduce interrupts as well as CPU copies
US9413788B1 (en) 2008-07-31 2016-08-09 Alacritech, Inc. TCP offload send optimization
CN109558107A (en) * 2018-12-04 2019-04-02 中国航空工业集团公司西安航空计算技术研究所 A kind of FC message sink management method of shared buffer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0551242A2 (en) * 1992-01-10 1993-07-14 Digital Equipment Corporation Multiprocessor buffer system
US5339418A (en) * 1989-06-29 1994-08-16 Digital Equipment Corporation Message passing method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5339418A (en) * 1989-06-29 1994-08-16 Digital Equipment Corporation Message passing method
EP0551242A2 (en) * 1992-01-10 1993-07-14 Digital Equipment Corporation Multiprocessor buffer system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IAN M. LESLIE, DEREK MCAULEY, RICHARD BLACK, TIMOTHY ROSCOE, PAUL BARHAM, DAVID EVERS, ROBIN FAIRBAIRNS, EOIN HYDEN: "The Design and Implementation of an Operating System to Support Distributed Multimedia Applications" IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, vol. 14, no. 7, September 1996 (1996-09), pages 1280-1296, XP000626277 *
PHILIP BUONADONNA <PHILIPBÐCS.BERKELEY.EDU>, ADREW GEWKE <GEWEKEÐCS.BERKELEY.EDU>, DAVID CULLER <CULLERÐCS.BERKELEY.EDU>: "An Implementation and Analysis of the Virtual Interface Architecture" INTERNET DOCUMENT, [Online] November 1998 (1998-11), XP002156218 Retrieved from the Internet: <URL:http://www.cs.berkeley.edu/~philipb/p apers/SC98/sc98_html/index.htm> [retrieved on 2000-12-19] *
THORSTEN VON EICKEN, ANINDYA BASU, VINEET BUCH, WERNER VOGELS: "U-NET: A USER-LEVEL NETWORK INTERFACE FOR PARALLEL AND DISTRIBUTED COMPUTING" OPERATING SYSTEMS REVIEW (SIGOPS),US,ACM HEADQUARTER. NEW YORK, vol. 29, no. 5, 1 December 1995 (1995-12-01), pages 40-53, XP000584816 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856379B2 (en) 1997-10-14 2014-10-07 A-Tech Llc Intelligent network interface system and method for protocol processing
US9009223B2 (en) 1997-10-14 2015-04-14 Alacritech, Inc. Method and apparatus for processing received network packets on a network interface for a computer
US6427173B1 (en) 1997-10-14 2002-07-30 Alacritech, Inc. Intelligent network interfaced device and system for accelerated communication
US8447803B2 (en) 1997-10-14 2013-05-21 Alacritech, Inc. Method and apparatus for distributing network traffic processing on a multiprocessor computer
US6658480B2 (en) 1997-10-14 2003-12-02 Alacritech, Inc. Intelligent network interface system and method for accelerated protocol processing
US8131880B2 (en) 1997-10-14 2012-03-06 Alacritech, Inc. Intelligent network interface device and system for accelerated communication
US8631140B2 (en) 1997-10-14 2014-01-14 Alacritech, Inc. Intelligent network interface system and method for accelerated protocol processing
US7284070B2 (en) 1997-10-14 2007-10-16 Alacritech, Inc. TCP offload network interface device
US6757746B2 (en) 1997-10-14 2004-06-29 Alacritech, Inc. Obtaining a destination address so that a network interface device can write network data without headers directly into host memory
US7042898B2 (en) 1997-10-14 2006-05-09 Alacritech, Inc. Reducing delays associated with inserting a checksum into a network message
US6389479B1 (en) 1997-10-14 2002-05-14 Alacritech, Inc. Intelligent network interface device and system for accelerated communication
US6427171B1 (en) 1997-10-14 2002-07-30 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6965941B2 (en) 1997-10-14 2005-11-15 Alacritech, Inc. Transmit fast-path processing on TCP/IP offload network interface device
US8782199B2 (en) 1997-10-14 2014-07-15 A-Tech Llc Parsing a packet header
US7237036B2 (en) 1997-10-14 2007-06-26 Alacritech, Inc. Fast-path apparatus for receiving data corresponding a TCP connection
US6434620B1 (en) 1998-08-27 2002-08-13 Alacritech, Inc. TCP/IP offload network interface device
US6697868B2 (en) 2000-02-28 2004-02-24 Alacritech, Inc. Protocol processing stack for use with intelligent network interface device
US6807581B1 (en) 2000-09-29 2004-10-19 Alacritech, Inc. Intelligent network storage interface system
US8019901B2 (en) 2000-09-29 2011-09-13 Alacritech, Inc. Intelligent network storage interface system
US6938092B2 (en) 2001-03-07 2005-08-30 Alacritech, Inc. TCP offload device that load balances and fails-over between aggregated ports having different MAC addresses
US6687758B2 (en) 2001-03-07 2004-02-03 Alacritech, Inc. Port aggregation for network connections that are offloaded to network interface devices
US9055104B2 (en) 2002-04-22 2015-06-09 Alacritech, Inc. Freeing transmit memory on a network interface device prior to receiving an acknowledgment that transmit data has been received by a remote device
US6751665B2 (en) 2002-10-18 2004-06-15 Alacritech, Inc. Providing window updates from a computer to a network interface device
DE10360637A1 (en) * 2003-12-19 2005-07-21 Infineon Technologies Ag Program controlled microcontroller system uses buffer memory arrangement for data transmission and reception
DE10360637B4 (en) * 2003-12-19 2010-10-07 Infineon Technologies Ag Program controlled unit
US8893159B1 (en) 2008-04-01 2014-11-18 Alacritech, Inc. Accelerating data transfer in a virtual computer system with tightly coupled TCP connections
US9413788B1 (en) 2008-07-31 2016-08-09 Alacritech, Inc. TCP offload send optimization
US9667729B1 (en) 2008-07-31 2017-05-30 Alacritech, Inc. TCP offload send optimization
US9306793B1 (en) 2008-10-22 2016-04-05 Alacritech, Inc. TCP offload device that batches session layer headers to reduce interrupts as well as CPU copies
EP2282264A1 (en) * 2009-07-24 2011-02-09 ProximusDA GmbH Scheduling and communication in computing systems
WO2011009638A1 (en) * 2009-07-24 2011-01-27 Proximusda Gmbh Scheduling and communication in computing systems
US9009711B2 (en) 2009-07-24 2015-04-14 Enno Wein Grouping and parallel execution of tasks based on functional dependencies and immediate transmission of data results upon availability
CN109558107A (en) * 2018-12-04 2019-04-02 中国航空工业集团公司西安航空计算技术研究所 A kind of FC message sink management method of shared buffer

Also Published As

Publication number Publication date
AU5929700A (en) 2001-01-30
WO2001004770A3 (en) 2001-08-30

Similar Documents

Publication Publication Date Title
WO2001004770A2 (en) Method and architecture for optimizing data throughput in a multi-processor environment using a ram-based shared index fifo linked list
KR100690418B1 (en) Efficient processing of multicast transmissions
US8761204B2 (en) Packet assembly module for multi-core, multi-thread network processors
US8499137B2 (en) Memory manager for a network communications processor architecture
US5561807A (en) Method and device of multicasting data in a communications system
US7304999B2 (en) Methods and apparatus for processing packets including distributing packets across multiple packet processing engines and gathering the processed packets from the processing engines
US7546399B2 (en) Store and forward device utilizing cache to store status information for active queues
US20020156908A1 (en) Data structures for efficient processing of IP fragmentation and reassembly
KR20160117108A (en) Method and apparatus for using multiple linked memory lists
US10146468B2 (en) Addressless merge command with data item identifier
US7293158B2 (en) Systems and methods for implementing counters in a network processor with cost effective memory
US7404015B2 (en) Methods and apparatus for processing packets including accessing one or more resources shared among processing engines
US9274586B2 (en) Intelligent memory interface
US7039054B2 (en) Method and apparatus for header splitting/splicing and automating recovery of transmit resources on a per-transmit granularity
US7042889B2 (en) Network switch with parallel working of look-up engine and network processor
US7477641B2 (en) Providing access to data shared by packet processing threads
US9846662B2 (en) Chained CPP command
US20060187963A1 (en) Method for sharing single data buffer by several packets
US9804959B2 (en) In-flight packet processing
US7756131B2 (en) Packet forwarding system capable of transferring packets fast through interfaces by reading out information beforehand for packet forwarding and method thereof
CA2494579C (en) Packet processing engine
US20040221066A1 (en) Method and apparatus for implementing packet command instructions for network processing
US20060140203A1 (en) System and method for packet queuing
US20060268868A1 (en) Method and system for processing multicast packets
US7468985B2 (en) System independent and scalable packet buffer management architecture for network processors

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP