CN115842872A

CN115842872A - Method for data processing for frame reception of interconnect protocol and storage device

Info

Publication number: CN115842872A
Application number: CN202111095857.2A
Authority: CN
Inventors: 林富雄
Original assignee: SK Hynix Inc
Current assignee: SK Hynix Inc
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2023-03-24

Abstract

The present disclosure relates to a method of data processing for frame reception of an interconnect protocol and a storage device, adapted for use in a first device capable of linking a second device according to said interconnect protocol. In the method, in the process of receiving a frame from the second device by the first device, in the process of transmitting data carried by the first frame from a data link layer to a network layer, symbols of the second frame are prefetched; and after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer. Thereby, the method can promote the improvement of the frame receiving efficiency of the data link layer when a plurality of back-to-back frames are received.

Description

Method for data processing for frame reception of interconnect protocol and storage device

Technical Field

The present invention relates to an electronic device, and more particularly, to a method of data processing for frame reception of an interconnect protocol and a storage device.

Background

Nowadays, the amount of data generated and processed in mobile devices (such as computing devices like smartphones, tablet computers, multimedia devices, wearable devices) is increasing, and chip-to-chip or mobile device-influenced interconnection interface technologies inside the mobile devices need to be further developed, so as to achieve the objectives of higher transmission speed, low power consumption operation, expandability, support for multi-tasking, easy adoption, and the like.

To this end, the Mobile Industry Processor Interface (MIPI) alliance has developed interconnection Interface technologies that can meet the above objectives, such as MIPI M-PHY specification for the physical layer and MIPI UniPro specification for the Unified Protocol (UniPro). On the other hand, joint Electron Device Engineering Council (JEDEC) utilizes MIPI M-PHY specification and MIPI UniPro specification to provide a next generation high performance non-volatile memory standard, called Universal Flash Storage (UFS), which can realize high speed transmission and low power operation at billion bit level per second, and has functions and extensibility required by high-level mobile systems, thereby facilitating rapid adoption in the industry.

When the products developed according to these interconnection technologies are related chips, electronic modules or electronic devices, technicians need to ensure that the functions and operations of the products meet the specifications. For example, a system implemented according to the UFS standard may include a computing device and a storage device that includes a non-volatile memory, where the computing device and the storage device may take the roles of a local host and a remote device, respectively. According to the UniPro specification, a bi-directional link (link) is established between a host and a device, and the link between the host and the device can be configured as multiple (up to 4) lanes (lanes) in any one transmission direction. Correspondingly, the host and the device each configure a processing circuit of an interconnect protocol according to the UniPro specification, which needs to have a function of processing a plurality of channels.

The UFS standard utilizes the UniPro specification to define multiple protocol layers in the link layer, including a physical adapter layer, a data link layer, a network layer, and a transport layer. The data link layer is implemented between the network layer and the physical adapter layer, and is responsible for data flow control and error handling. Since the UniPro specification mainly defines functions of each protocol layer and defines a conceptual service access point (service access point) model to specify interfaces of services provided by each protocol layer for implementation, developers need to use their respective technical solutions and may use hardware, firmware or software for specific implementation under the requirement of conforming to the UniPro specification. In a multi-lane application scenario, the number of symbols in a frame that the data link layer needs to process per unit clock cycle may reach 4, 8, or more. Therefore, how to make the data link layer efficiently receive symbols of multiple frames and effectively transmit data carried by the frames to the network layer is a very challenging place for the performance of the overall throughput of data transmission.

Disclosure of Invention

Embodiments provide a technique for data processing for frame reception of an interconnect protocol, suitable for use in a first device capable of linking a second device according to the interconnect protocol. The technology prefetches symbols of a second frame in the process of transmitting data carried by a first frame from a data link layer to a network layer in the process of receiving frames from the second device by the first device; and after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer. Thus, the techniques may facilitate an increase in frame reception performance of the data link layer when multiple back-to-back frames are received.

Various embodiments are presented below in terms of techniques such as methods and storage devices for data processing for frame reception for interconnect protocols.

Embodiments provide a method for data processing for frame reception of an interconnect protocol, adapted in a first device capable of linking a second device according to the interconnect protocol, the method comprising: in a process in which the first device receives a frame from the second device: a) Extracting symbols of a first frame of a data link layer by a hardware protocol engine of the first device for realizing the interconnection protocol, and transmitting data carried by the first frame to a network layer; b) Prefetching symbols of a second frame of the data link layer by the hardware protocol engine in the process of transmitting the data carried by the first frame to the network layer; and c) after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer.

Embodiments provide a storage device capable of linking hosts according to an interconnect protocol, the storage device comprising: interface circuit, equipment controller and hardware protocol engine. The interface circuit is to implement a physical layer of the interconnect protocol to link the hosts. The device controller for coupling to the interface circuit and the memory module, the device controller comprising: a hardware protocol engine to implement the interconnect protocol, wherein the hardware protocol engine performs a plurality of operations during processing of the storage device to receive frames from the host. The plurality of operations comprises: a) The hardware protocol engine extracts symbols of a first frame of a data link layer and transmits data carried by the first frame to a network layer; b) In the process of transmitting the data carried by the first frame to the network layer, the hardware protocol engine prefetches symbols of a second frame of the data link layer; and c) after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, the hardware protocol engine transmits the data carried by the second frame to the network layer.

In some embodiments of the above method or the storage device, when the hardware protocol engine receives a plurality of back-to-back frames, the hardware protocol engine performs the steps a) to c) or the operations a) to c) on the plurality of back-to-back frames until the plurality of back-to-back frames are transmitted to the network layer, thereby improving the efficiency of frame reception of the data link layer.

In some embodiments of the above method or the storage device, wherein in the step a) or the operation a), the hardware protocol engine extracts symbols of the first frame from a memory buffer and repeatedly registers the symbols of the first frame in a first register and a second register.

In some embodiments of the above method or the storage device, wherein in the step b) or the operation b), the hardware protocol engine prefetches symbols of the second frame from the receiving memory buffer and repeatedly registers the symbols of the second frame in the first and second register areas.

In some embodiments of the above method or the storage device, wherein in the step b) or the operation b), the hardware protocol engine decomposes symbols of the first frame in the first and second registers and aligns a frame end mark in the symbols of the first frame with a previous frame end mark, thereby transmitting data carried by the first frame to the network layer.

In some embodiments of the above method or the above memory device, in the step c) or the operation c), the hardware protocol engine decomposes the symbols of the second frame in the first and second registers and aligns the start of frame mark in the symbols of the second frame with the next start of frame mark, so as to transfer the data carried by the second frame to the network layer.

In some embodiments of the above method or Storage device, wherein the interconnect protocol is a Universal Flash Storage (UFS) standard.

Drawings

FIG. 1 is a schematic block diagram of a storage system according to one embodiment of the present invention;

FIG. 2 is a flow diagram of one embodiment of a method for data processing for frame reception of interconnect protocols;

FIG. 3 is a diagram illustrating a layered architecture of the storage system of FIG. 1 according to the UFS standard;

FIG. 4 is a diagram of a data frame format for a data link layer according to the UniPro standard;

FIG. 5 is a schematic diagram of one embodiment of a circuit architecture implementing the above-described method for frame-received data processing for interconnect protocols;

FIG. 6 is a schematic diagram of another embodiment of the DL RX header FIFO buffer of FIG. 5;

FIG. 7 is a diagram illustrating one embodiment of data processing for frame reception implemented in accordance with the method of FIG. 2;

FIG. 8 is a diagram illustrating one embodiment of a state machine implemented in accordance with the method of FIG. 2;

FIG. 9 is a schematic diagram of one embodiment of the timing of the state machine of FIG. 8; and

fig. 10 is a diagram illustrating an embodiment of data processing of frame reception implemented according to the method of fig. 2.

Reference numerals

1. Storage system

10. Main unit

11. Host interface

12. Host controller

13. Hardware protocol engine

14. Processing unit

16. Application processor

20. Storage device

21. Device interface

22. Equipment controller

23. Hardware protocol engine

24. Processing unit

26. Memory module

110 MIPI physical (M-PHY) layer

111. Transmitter

112. Receiver with a plurality of receivers

130 MIPI unified protocol (UniPro) layer

131. Physical adapter layer

132. Data link layer

133. Network layer

134. Transport layer

135. Equipment management entity (DME)

210 MIPI physical (M-PHY) layer

211. Transmitter

212. Receiver with a plurality of receivers

230 MIPI unified protocol (UniPro) layer

231. Physical adapter layer

232. Data link layer

233. Network layer

234. Transport layer

235. Equipment management entity (DME)

310 DL RX data buffer

320 DL RX header FIFO buffer

S10 to S30

First clock domain of CD1

Second clock domain of CD2

CLK clock line

Din, dout data line

RST reset wire

SL1 data channel

SL2 data channel

Detailed Description

For a fuller understanding of the objects, features and advantages of the present invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings.

The following embodiments provide various embodiments of a technique for data processing for frame reception of an interconnect protocol, suitable for use in a first device capable of linking a second device according to the interconnect protocol. The technology prefetches symbols of a second frame in the process of transmitting data carried by a first frame from a data link layer to a network layer in the process of receiving frames from the second device by the first device; and after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer. Thus, the techniques may facilitate an increase in frame reception performance of the data link layer when multiple back-to-back frames are received.

To facilitate understanding and explanation, an embodiment of a circuit architecture is first provided according to the described technology, which is flexible enough and can be efficiently configured to meet the requirements of different products to accommodate the design of various manufacturers to facilitate product development. As shown in fig. 1, when the circuit architecture is applied to the storage system 1, a controller (e.g., the host controller 12) of the host 10 of the storage system 1 or a controller (e.g., the device controller 22) of the storage device 20 of the storage system 1 may be implemented as a circuit architecture including a hardware protocol engine and a processing unit, respectively, wherein the processing unit of the controller is optional. A method in accordance with the technique for information configuration of the interconnect protocol will be disclosed in fig. 2.

Please refer to fig. 1, which is a schematic block diagram of a memory system according to an embodiment of the present invention. As shown in fig. 1, the storage system 1 includes a host 10 and a storage device 20. The host 10 and the storage device 20 communicate via an interconnect protocol to allow the host 10 to access data from the storage device 20. The interconnect protocol is, for example, the Universal Flash Storage (UFS) standard. The host 10 is a computing device such as a smartphone, a tablet computer, and a multimedia device. The storage device 20 is, for example, a storage device internal or external to the arithmetic device, and is, for example, a storage device based on a nonvolatile memory. The storage device 20 may write data under the control of the host 10 or provide the written data to the host 10. The storage device 20 may be implemented as a solid State Storage Device (SSD), a multimedia card (MMC), an embedded MMC (eMMC), a Secure Digital (SD) card, or a Universal Flash Storage (UFS) device, although implementations of the present disclosure are not limited to the above examples.

The host 10 includes a host interface 11, a host controller 12, and an application processor 16.

The host interface 11 is used to implement the physical layer of the interconnect protocol to link the storage devices 20. For example, the host interface 11 is used to implement the physical (M-PHY) layer of the UFS standard.

The host controller 12 is coupled between the host interface 11 and the application processor 16. When the application processor 16 needs to access the storage device 20, it sends a command representing a corresponding access action to the host controller 12, and communicates with the storage device 20 through the interconnection protocol, so as to access the storage device 20.

The host controller 12 includes a hardware protocol engine 13 and a processing unit 14. Wherein the processing unit 14 is optional.

The hardware protocol engine 13 is used to implement the protocol layers of the interconnect protocol. Taking the interconnect Protocol as the UFS standard as an example, the Protocol layer is a Unified Protocol (unifie) layer. The hardware protocol engine 13 communicates with the host interface 11 and the processing unit 14 and converts information according to the specification of the protocol layer.

A processing unit 14 coupled to the hardware protocol engine 13 for communicating with an application processor 16. Processing unit 14 may execute one or more firmware. For example, access action commands issued by an operating system, driver or application executed by the application processor 16 are converted by firmware executed by the processing unit 14 into a format of instructions conforming to a protocol layer of the interconnect protocol, and then sent to the hardware protocol engine 13 for processing in accordance with specifications of the protocol layer. The firmware may be stored, for example, in an internal memory of the processing unit 14, or in an internal memory of the host controller 12, where the internal memory may include volatile memory and non-volatile memory.

The storage device 20 includes a device interface 21, a device controller 22, and a storage module 26.

The device interface 21 is used to implement the physical layer of the interconnect protocol to link the hosts 10. For example, host interface 21 is used to implement the physical (M-PHY) layer of the UFS standard.

The device controller 22 is coupled between the device interface 21 and the memory module 26. The device controller 22 may control a write operation, a read operation, or an erase operation of the memory module 26. The device controller 22 may exchange data with the memory module 26 via an address bus or a data bus. The memory module 26 is, for example, a memory chip containing one or more non-volatile memories.

The device controller 22 includes a hardware protocol engine 23 and a processing unit 24. Wherein the processing unit 24 is optional.

The hardware protocol engine 23 is used to implement the protocol layers of the interconnect protocol. Taking the interconnect protocol as the UFS standard as an example, the protocol layer is a UniPro layer. The hardware protocol engine 13 communicates with the device interface 21 and the processing unit 24 and converts information according to the specification of the protocol layer.

A processing unit 24 coupled to the hardware protocol engine 23 for communicating with the host 10 through the device interface 21. Processing unit 24 may execute one or more firmware. For example, processing unit 24 executes one or more firmware to control or direct write, read, or erase operations of memory module 26, to process messages from hardware protocol engine 23, or to send messages to hardware protocol engine 23. The firmware may be stored, for example, in an internal memory of the processing unit 24, an internal memory of the device controller 22, or in a particular storage area of the storage module 26, where the internal memory may include volatile and non-volatile memory.

As shown in fig. 1, the host interface 11 can be coupled to the device interface 21 through data lines Din and Dout for transmitting/receiving data, a reset line RST for transmitting a hardware reset signal, and a clock line CLK for transmitting data. The data lines Din and Dout may be implemented as a plurality of pairs, wherein a pair of data lines Din or Dout may be referred to as a lane (lane). The host interface 11 may communicate with the device interface 21 using at least one interface protocol, such as Mobile Industrial Processor Interface (MIPI), universal Flash Storage (UFS), small Computer System Interface (SCSI), or Serial Attached SCSI (SAS), although implementations of the present disclosure are not limited to the above examples. Under the UFS standard, a plurality of sublinks can be configured to be supported between the host 10 and the storage device 20 to improve the efficiency of transmission, wherein at most 2 channels can be currently supported in any direction from the host 10 to the storage device 20 or from the storage device 20 to the host 10, and the channels can be selectively set to be enabled or disabled.

The following illustrates a method of implementing data processing for frame reception of an interconnect protocol based on a circuit architecture in which a controller (e.g., host controller 12 or device controller 22) shown in fig. 1 may be implemented as including a hardware protocol engine and a processing unit, respectively.

Please refer to fig. 2, which is a flowchart of an embodiment of a method for data processing of frame reception of an interconnect protocol. The method may be used in a first device (e.g., storage device 20) capable of linking a second device (e.g., host 10) according to an interconnect protocol. For convenience of explanation, the following description will be given by taking the first device as the storage device 20 and the second device as the host 10. As shown in fig. 2, the method includes steps S10 to S30. The steps are performed during the process of the first device (e.g., storage device 20) receiving a frame from the second device (e.g., host 10) in accordance with a protocol layer of the interconnect protocol by a hardware protocol engine (e.g., hardware protocol engine 23) of the first device (e.g., storage device 20) that implements the protocol layer of the interconnect protocol.

In step S10, a hardware protocol engine of the first device for implementing the interconnection protocol extracts a symbol of a first frame of a data link layer, and transmits data carried by the first frame to a network layer.

In step S20, the hardware protocol engine prefetches symbols of a second frame of the data link layer during the process of transmitting the data carried by the first frame to the network layer.

In step S30, after the data carried by the first frame is transmitted to the network layer and the symbol of the second frame is prefetched, the data carried by the second frame is transmitted to the network layer by the hardware protocol engine.

In some embodiments, when the hardware protocol engine receives a plurality of back-to-back frames, the hardware protocol engine performs the steps S10 to S30 on the plurality of back-to-back frames until the plurality of back-to-back frames are transmitted to the network layer, thereby improving the performance of frame reception of the data link layer.

In some embodiments, wherein in the step S10, the hardware protocol engine extracts symbols of the first frame from a memory buffer and repeatedly registers the symbols of the first frame in a first register and a second register.

In some embodiments, wherein in the step S20, the hardware protocol engine prefetches symbols of the second frame from the receiving memory buffer and repeatedly registers the symbols of the second frame in the first and second registers.

In some embodiments, in the step S20, the hardware protocol engine decomposes the symbols of the first frame in the first and second registers and aligns the end-of-frame marker in the symbols of the first frame with the previous end-of-frame marker, so as to transfer the data carried by the first frame to the network layer.

In some embodiments, in the step S30, the hardware protocol engine decomposes the symbols of the second frame in the first and second registers and aligns the start of frame mark in the symbols of the second frame with the next start of frame mark, so as to transmit the data carried by the second frame to the network layer.

Although the first device is the storage device 20 and the second device is the host 10 in the above embodiment of the method in fig. 2, the method is also applicable to the case where the first device is the host 10 and the second device is the storage device 20.

The following is described in detail by taking the interconnect protocol as the Universal Flash Storage (UFS) standard as an example. The UFS standard includes a UFS Command Set Layer (USC), a UFS Transport protocol Layer (UTP), and a UFS Interconnect Layer (UIC). The UIC further comprises a link layer and a physical layer, the link layer of the UIC being defined according to the UniPro specification, and the physical layer of the UIC being defined according to the M-PHY specification.

Please refer to fig. 3, which is a schematic diagram of a layered architecture of the storage system of fig. 1 according to the UFS standard. Since the UFS standard is based on a MIPI unified protocol (UniPro) layer and a MIPI physical (M-PHY) layer, the host interface 11 and the hardware protocol engine 13 of the host 10 shown in fig. 1 are respectively used to implement the M-PHY layer 110 and the UniPro layer 130 in fig. 3; the device interface 21 and the hardware protocol engine 23 of the storage device 20 shown in fig. 1 are respectively used to implement the M-PHY layer 210 and the UniPro layer 230 in fig. 3.

As shown in fig. 3, the UniPro layer 130 (or 230) may include a physical adapter layer (PA) 131 (or 231), a data link layer (DL) 132 (or 232), a network layer (network layer) 133 (or 233), and a transport layer (transport layer) 134 (or 234). The various layers in the UniPro layer 230 of the storage device 20 may also operate and be implemented similarly.

The physical adapter layer (131 or 231) is used to couple the M-PHY layer (110 or 210) to the data link layer (132 or 232). The physical adapter layer (131 or 231) may perform bandwidth control, power management, etc. between the M-PHY layer (110 or 210) and the data link layer (132 or 232). When implemented, the M-PHY layer 110 of the host 10 includes a transmitter 111 and a receiver 112, and the M-PHY layer 210 of the storage device 20 includes a transmitter 211 and a receiver 212, thereby enabling the establishment of data lanes SL1 and SL2 for full duplex communication. The UniPro specification supports multiple data lanes on each link in each transmission direction (e.g., forward or reverse).

The data link layer (132 or 232) may perform flow control for data transfer between the host 10 and the storage device 20. That is, the data link layer (132 or 232) may monitor data transmission or control data transmission rates. In addition, the data link layer (132 or 232) may perform Cyclic Redundancy Check (CRC) based error control. The data link layer (132 or 232) may generate frames (frames) using packets received from the network layer (133 or 233) or may generate packets using frames received from the physical adapter layer (131 or 231).

The network layer (133 or 233) is a routing function for selecting a transmission path for a packet received from the transport layer (134 or 234).

The transport layer (134 or 234) may configure a data segment (segment) suitable for a protocol using a command received from the UFS application layer and send the data segment to the network layer (133 or 233), or may extract a command from a packet received by the network layer (133 or 233) and send the command to the UFS application layer. The transport layer (134 or 234) may use a sequence-based error control scheme to ensure the effectiveness of data transfer.

Furthermore, a Device Management Entity (DME) (135 or 235) is defined in the UniPro layer (130 or 230), which can communicate with each layer in the M-PHY layer (110 or 210) and the UniPro layer (130 or 230), such as the physical adapter layer (131 or 231), the data link layer (132 or 232), the network layer (133 or 231), and the transport layer (134 or 234) to the UFS application layer, so as to implement functions related to the integrity of the unified protocol (UniPro), such as functions of controlling or configuring power on, power off, reset, power mode change, etc.

Please refer to fig. 4, which is a diagram illustrating a format of a data frame of a data link layer according to the UniPro standard. As shown in fig. 4, when the payload of a data frame (L2 payload may be called) carries data of 0 bytes, the data frame contains at least 4 protocol data units (e.g. 4 data of 16 bits), wherein 1 protocol data unit contains a start of frame flag (SOF). In addition, the pdu containing a start of frame flag (SOF) may also contain a Traffic Class (TC) flag, e.g., TC0, TC1, to represent priority levels. In addition, the ESC _ DL flag represents that the frame is a data link layer frame, the EOF _ EVEN (or EOF — ODD) represents an end of frame flag (EOF), and the CRC-16 represents a cyclic redundancy check code. Thus, a frame such as that of fig. 4 may be considered to include at least a plurality of symbols, otherwise known as Protocol Data Units (PDUs). In the following figures or description, a frame comprising a plurality of symbols is represented, for example, by SOF, dx0, \8230 \ 8230, dxy, EOF, CRC, where x represents the frame number and y represents the y-th symbol of this frame x. For example, D00 represents the 1 st symbol of the 1 st frame, D01 represents the 2 nd symbol of the 1 st frame, and so on. Of course, there may be more than one L2payload, so a frame may contain 4 or more symbols.

In the case of active multiple sub-links, the framing transmission method is to transmit symbols (Symbol) of one frame synchronously through multiple sub-links (each Symbol represents 16 bits of valid data). For example, in the UFS standard, from the MIPI M-PHY v4.X specification, the data width from M-PHY to PA layer is 32 bits for a maximum of 1 lane and 64 bits for 2 lanes.

To increase data throughput (throughput), in some implementations, the inventors propose that M-PHY can be implemented with 64 bits for 1 channel and 128 bits for 2 channels, which is an implementation beyond the limitations of the M-PHY specification in the UFS standard at present. In other words, the data width from the PA layer to the DL layer is 4 symbols for 1 channel and 8 symbols for 2 channels. Thus, the Data width per clock cycle is at most 8 symbols, and in any one clock cycle of the Receiver (RX) of the DL layer, the Data frame of DL may be back-to-back with the next Data frame (SOF + TC0 Data #0+ eof + crc + SOF + tc0 Data #1+ eof + crc).

In practice, the DL layer receives the symbols of the frame from the PA layer by using a buffer (e.g., implemented by a non-volatile memory such as SRAM) to store the symbols, which requires data writing to the buffer. On the other hand, the DL layer processes these frames, such as removing the header (e.g., SOF symbol) and trailer (e.g., EOF _ EVEN or EOF _ ODD and CRC symbol) after CRC validation and passes the user data to the upper network layer, which requires a data read operation from the buffer. Due to the unbalanced processing speed of these two types of operations (e.g., faster write operation, slower read operation or other internal operation factors), if the DL layer frame reception is not implemented according to the method of fig. 2, there is a time difference between the transmission of each frame to the upper layer and the start of the next frame due to the unbalanced condition, and this time difference is referred to as idle time (idle time). This results in limited performance of the DL layer in frame reception. Moreover, it should be noted that even in the case where the received frames are back-to-back frames, if the DL layer is not implemented according to the method of fig. 2, there is still idle time between frames transmitted to the upper layer.

In contrast, according to the method of steps S10, S20, and S30 in fig. 2, during the process of transmitting the data carried by the first frame from the data link layer to the network layer, the symbols of the second frame are pre-fetched; and after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer. Therefore, when a plurality of back-to-back frames are received, idle time between frames transmitted to an upper layer can be reduced or avoided, and therefore the improvement of the frame receiving efficiency of a DL layer can be promoted. Therefore, when implementing the product, it is only necessary to adopt the method according to fig. 2 at the transmitting end and the receiving end to implement and utilize the back-to-back method to transmit the frame, so as to obtain the benefit of improving the frame receiving efficiency.

Please refer to fig. 5, which is a schematic diagram illustrating an embodiment of a circuit architecture for implementing the method of fig. 2. As shown in fig. 5, the data link layer (132 or 232) may be implemented in the hardware protocol engines of the host and storage devices according to the method of fig. 2.

As in one embodiment, the data buffer (or simply data buffer) 310 and the header first-in first-out (FIFO) buffer (or simply header FIFO buffer) 320 of the data link layer receiver (DL RX) are implemented in the data link layer (132 or 232). Data frames from the physical adapter layer (PA layer) are stored to a data buffer 310. The header FIFO buffer 320 is used for storing information of each data frame, such as a position of a header (e.g., SOF), an offset (offset), a sequence number (sequence number), and a byte count (byte count). When the upper layers (e.g., network layers) are ready, the DL RX retrieves the received data from the data buffer 310 according to the information in the header FIFO buffer 320. In fig. 5, the data buffer 310 and the header FIFO buffer 320 are adapted to implement two asynchronous clock domains (clock domains) for transmitting data, such as between two different clock domains of the physical adapter layer and the network layer. The data buffer 310 and the header FIFO buffer 320 are an implementation of the asynchronous FIFO architecture, which utilizes the control flags (or signals) FIFO _ empty, FIFO _ full, read _ en, and write _ en of the asynchronous FIFO, and adds two flags (or signals) of prefetch read control flags read _ en _ prefetch and FIFO _ empty _ prefetch according to the prefetching scheme in the method of fig. 2. For frames of different Transport Classes (TC), for example, TC0 and TC1, corresponding circuits may be implemented according to the asynchronous FIFO architecture of fig. 5.

FIG. 6 is a diagram illustrating an embodiment of the header FIFO buffer 320 according to FIG. 5. For example, fig. 6 illustrates the header FIFO buffer 320 of fig. 5 being implemented for the frame of the transport level TC0, so that some of the flags according to fig. 5 in fig. 6 are added with "TC0" for distinction. To reduce idle time and increase throughput, in FIG. 6 the header FIFO buffer 320 uses a pair of specified prefetch read control flags tc0_ FIFO _ read _ en _ prefetch and FIFO _ empty _ prefetch, in addition to some read control flags such as FIFO _ empty, FIFO _ read _ data _ head _ tc0, tc0_ FIFO _ read _ en, tc0_ FIFO _ head _ wr _ data, FIFO _ full. In fig. 6, for example, the lower dashed box represents a Receiver (RX) clock domain (or RMMI RX clock domain for short) or RX Symbol (RX Symbol) clock domain of a first clock domain CD1 such as a "Reference M-PHY Module Interface" (RMMI), and the upper dashed box represents a second clock domain CD2 such as a CPort clock domain. For example, the RMMI RX clock domain and CPort clock domain may be implemented according to the UFS standard and the UniPro specification. The CPort clock domain refers to a circuit for a DL layer and an upper layer that operates using one clock, and the RMMI RX clock domain refers to a circuit for a DL layer and a PA layer that operates using another clock. In fig. 6, the header FIFO buffer 320 informs the second clock domain CD2 whether prefetching is currently available through the prefetch read control flag FIFO _ empty _ prefetch; the second clock domain CD2 informs the header FIFO buffer 320 whether the upper layer is ready or not by prefetching the read _ en _ prefetch control flag. The following example illustrates the detailed operation.

To implement the prefetching scheme in the method of fig. 2, the following embodiments use the function of prefetching the read control flag and the implementation of the associated DL RX as described below, wherein the detailed timing information is shown in fig. 7. An embodiment of the method according to fig. 2 is presented below, and this embodiment includes the following schemes (1) to (5).

Scheme (1): according to step S10, when the DL RX starts to fetch a data frame, a Read Enable (Read Enable) flag Read _ en _ prefetch related to the prefetch (pre-prefetch) manner is preset, for example, rd _ ptr _ prefetch is added by 1. Note that after the DL RX finishes sending each data frame to the upper layer in fig. 7, the read _ en flag used in the normal fetch (relative to the prefetch) is set, and the DL RX updates the credit value (credit) of the flow control in parallel. Each time data is fetched from the data buffer 310, the fetched data is repeatedly stored in two registers, referred to as a "current register" and a "delay register", which may be used to implement an embodiment of the method of fig. 2, serving as the first and second register regions, respectively, mentioned in the method of fig. 2. The above operation is performed in the same way as normal fetching or prefetching, i.e., every time data is prefetched or fetched from the data buffer 310, the prefetched or fetched data is repeatedly stored in two registers.

Scheme (2): when each data frame is almost completely transmitted to the upper layer (e.g. less remaining symbols need to be left for the next or two clock cycles to be transmitted to the upper layer) according to step S20, the DL RX checks whether the next data frame is to be processed (relative to wr _ ptr _ prefetch and rd _ ptr _ prefetch) by means of the fifo _ empty _ prefetch flag, and also determines in parallel whether a frame back-to-back condition occurs and therefore a prefetch is to be performed. If the DL RX determines that a frame back-to-back condition occurs, the DL RX prefetches data and seamlessly (two actions involved in this process-resolve/align) sends the data back-to-back to the upper layer and the previous data frame in the next clock cycle, and sets the read _ en _ prefetch flag again in the prefetching mode (same as flow (1)). As shown in the timing diagram of fig. 7, after the end flag (or EOP) of the previous frame (dl 2nt _ rx _ EOP) is set to be enabled (e.g., high) in a certain clock cycle, the start flag (or SOP) of the next frame (dl 2nt _ rx _ SOP) is set to be enabled (e.g., high) in the following clock cycle. If the DL RX determines that frame back-to-back conditions do not occur, then processing of the current data frame is completed and the idle state is returned.

Scheme (3): in flow (2), when the prefetch mode condition is satisfied, the DL RX decomposes the symbol of the current register into the end symbol of the current frame and the start symbol of the next frame. In the same clock cycle, the DL RX aligns the end symbol of the current frame with the symbol in the delay register to the complete "end of current frame" (DL 2nt _ RX _ eop = 1) and transfers the complete frame to the upper layer, as shown in fig. 10. Note that: in this example, each data frame is stored back-to-back into a Receiver (RX) side buffer. Therefore, the starting symbol of each data frame may have an address offset (e.g., four symbols).

Scheme (4): according to step S30, after the DL RX finishes transmitting each frame to the upper layer, the DL RX updates the credit value for flow control. In the next clock cycle, DL RX aligns the delay register (the starting symbol of the next frame) with the symbol in the current register to the complete "start of next frame" (DL 2nt _ RX _ sop = 1) to reduce or avoid idle time between back-to-back frames, as shown in fig. 10. For the remaining data frames, the same decomposition and alignment operations may be utilized for processing the remaining data frames.

5. Flow (2) through flow (4) are repeated, and when the last data is prefetched and there is no data frame in the data buffer 310, i.e., when wr _ ptr _ prefetch equals rd _ ptr _ prefetch, the fifo _ empty _ prefetch flag is also set to an inactive state (deasserted) (e.g., low level).

Please refer to fig. 7, which is a diagram illustrating an embodiment of frame received data processing implemented according to the method of fig. 2. In the example of fig. 7, tn is used to represent the number of clock cycles in a first clock domain (e.g., the RMMI RX clock domain or RX symbol clock domain). In fig. 7, the upper grid represents symbols from the physical adapter layer, arranged in the data buffer (e.g., 310 of fig. 5) of the receiver in DL RX, depending on the order and timing of reception, where the symbols in each grid follow the shorthand notation described in the description of fig. 4. Also in fig. 7, the stacking pattern of the grid of the "data buffer" below the time axis is to indicate frames that have been received by the data buffer as a function of time, where for convenience of illustration, frames such as a transport level TC0 are represented by TC or TC # m, and m is the frame number.

In clock cycles T0/T1: the host starts to transmit downlink frames. The DL RX stores TC #0 to a data buffer of the DL RX.

When fifo _ empty is low, representing that the data buffer is not empty, the DL RX starts to fetch the received data from the data buffer and transmits the data to the network layer after the TC #0 data in the data buffer is ready.

In clock cycle T2: when the next data frame TC #1 is already in the DL RX data buffer, the DL RX sets the fifo _ empty _ prefetch flag to the active state.

In clock cycle T4: when TC #0 is about to complete transmission, DL RX prefetches TC #1 (fifo _ empty _ prefetch is high).

In clock cycle T5: after TC #0 completes transmission, the DL RX sends TC #1 to seamlessly back-to-back with TC # 0.

In clock cycle T8: when TC #1 is about to complete transmission, DL RX prefetches TC #2 (fifo _ empty _ prefetch is high).

In clock cycle T9: after TC #1 completes transmission, DL RX sends TC #2 to seamlessly back-to-back with TC # 1.

In clock period TN-1: when the DL RX prefetches TC #3 and there is no data frame in the DL RX data buffer that needs to be processed, the DL RX sets the fifo _ empty _ prefetch flag to inactive state (at which time wr _ ptr _ prefetch equals rd _ ptr _ prefetch).

In fig. 7, the pulse wave of the waveform corresponding to the start flag dl2nt _ rx _ sop represents the start of a frame (e.g., TC0#0, TC0#1, TC0#2, and TC0# 3), and the pulse wave of the waveform corresponding to the end flag dl2nt _ rx _ eop represents the end of a frame (e.g., TC0#0, TC0#1, TC0#2, and TC0# 3). As shown in the timing diagram of fig. 7, after the end flag dl2nt _ rx _ eop of the previous frame is set to be enabled (e.g., high) in a certain clock cycle, the start flag dl2nt _ rx _ sop of the next frame is set to be enabled (e.g., high) in the following clock cycle. Therefore, in the process of transmitting the symbols of the frames from the back to the back of the physical adapter layer to the network layer by the DL RX, the idle time can be effectively reduced or avoided, and the transmission efficiency is enhanced.

Please refer to fig. 8, which is a diagram illustrating an embodiment of a state machine implemented according to the method of fig. 2. FIG. 9 is a diagram of one embodiment of a timing sequence of the state machine of FIG. 8. The state machine of fig. 9 can be implemented in the second clock domain CD2 of the DL RX of fig. 6, so that the frame receiving process is performed according to the method of fig. 2, and the frame is transmitted to the upper network layer, and in the case of frame back-to-back transmission, the occurrence of idle time is effectively reduced or avoided, thereby improving the frame receiving process performance. Referring to fig. 8, the states of the state machine are described as follows.

ST _IDLE: waiting in this state until the FIFO is not empty. Then the next state, ST LOAD LEN, is entered.

ST _LOAD _LEN: one cycle is used to load the data frame length and then the next state, ST _ SET _ VLD, is entered.

ST _SET _VLD: dl2nt _ rx _ valid is set from 0 to 1 with one cycle. And then enters the next state ST _ DATA.

ST _LOAD _LEN _PREF: prefetch the next data frame length with one cycle and then enter the next state, ST _ SET _ VLD _ PREF

ST _SET _VLD _PREF: dl2nt _ rx _ valid is set from 0 to 1 with one cycle. Handover back-to-back frames. Then the next state, ST _ DATA, is entered.

ST_DATA: and continuing to transmit data until the length of the data frame is met. (a) If a back-to-back frame is forthcoming and there is time to LOAD the next frame to achieve back-to-back, go to ST LOAD LEN PREF. (b) ST _ DATA _ BREAK _ B2B is entered if a back-to-back frame is coming and there is no time available to load the next frame DATA to achieve back-to-back. (c) If no data remains, the next state ST READ FIFO is entered.

ST _DATA _BREAK _B2B: if FIFO is not empty, enter next state, ST _ LOAD _ LEN; otherwise, the next state ST _ READ _ FIFO is entered.

ST _READ _FIFO: the FIFO read burst is sent to pop data. The frame information has been transmitted and then the next state, ST _ IDLE, is entered.

In fig. 8, when the condition of a state is not satisfied, the flag (nt 2dl _ rx _ abort) indicating leaving of the state is set to enable, and accordingly the state machine returns to the initial state ST _ IDLE.

Please refer to fig. 10, which is a diagram illustrating an embodiment of frame received data processing implemented according to the method of fig. 2.

As shown in fig. 10, according to the method illustrated in fig. 7 (store, prefetch, split, align). In fig. 10, two squares indicated by arrows 1010 represent: as data (e.g., symbols D0n-7 to D0 n-4) is extracted from the data buffer 310, the extracted data is repeatedly stored in two registers, e.g., a current register and a delay register. Two blocks indicated by arrows 1020 represent: as data (e.g., symbols D13-D10) is prefetched from the data buffer 310, the prefetched data is repeatedly stored in two registers, a current register and a delay register. Two blocks indicated by arrow 1030 represent: DL RX operates by breaking up and aligning the symbols of the current and delay registers (e.g., symbols D13-D10, D17-D14). Thereby, TC data frames (e.g., symbols D0 n-7-D0 n, D17-D10) can be sent back-to-back to upper layers in DL RX based on the SOP-EOP frame structure, and performance will be improved without idle time (e.g., DL2nt _ RX _ SOP and DL2nt _ RX _ EOP are enabled at two adjacent clock cycles, respectively).

Furthermore, in the above embodiments regarding the host and the storage device, the Hardware protocol engine in the host controller or the device controller may be designed based on the technology using Hardware Description Language (HDL) such as Verilog Language or any other design method of digital circuit familiar to those skilled in the art, and may be implemented based on one or more of circuits using Field Programmable Gate Array (FPGA), or specific integrated circuit (ASIC), or Complex Programmable Logic Device (CPLD), or may be implemented using dedicated circuits or modules. The host controller or device controller (or a processing unit or hardware protocol engine therein) may also be implemented based on a microcontroller, processor, or digital signal processor.

While the invention has been described in terms of preferred embodiments, it will be understood by those skilled in the art that the examples are illustrative only and should not be taken as limiting the scope of the invention. It is noted that equivalent variations and substitutions for the illustrated embodiments are intended to be included within the scope of the present invention. Therefore, the protection scope of the present invention is defined by the claims.

Claims

1. A method for data processing for frame reception of an interconnect protocol, the method being adapted in a first device capable of linking a second device according to the interconnect protocol, the method comprising:

in a process of the first device receiving a frame from the second device:

a) Extracting symbols of a first frame of a data link layer by a hardware protocol engine of the first device for realizing the interconnection protocol, and transmitting data carried by the first frame to a network layer;

b) Prefetching symbols of a second frame of the data link layer by the hardware protocol engine in the process of transmitting the data carried by the first frame to the network layer; and

c) And after the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, transmitting the data carried by the second frame to the network layer.

2. The method as claimed in claim 1, wherein when the hardware protocol engine receives a plurality of back-to-back frames, the hardware protocol engine performs the steps a) to c) on the plurality of back-to-back frames until the plurality of back-to-back frames are transmitted to the network layer, thereby improving the performance of frame reception of the data link layer.

3. The method as claimed in claim 1, wherein in step a), the hardware protocol engine extracts the symbols of the first frame from a memory buffer and repeatedly registers the symbols of the first frame in a first register and a second register.

4. The method according to claim 3, wherein in step b), the hardware protocol engine prefetches symbols of the second frame from the receiving memory buffer and repeatedly registers the symbols of the second frame in the first and second register areas.

5. The method as claimed in claim 4, wherein in the step b), the hardware protocol engine decomposes the symbols of the first frame in the first and second registers and aligns the end-of-frame marker in the symbols of the first frame with the previous end-of-frame marker, thereby transmitting the data carried by the first frame to the network layer.

6. The method as claimed in claim 5, wherein in the step c), the hardware protocol engine decomposes the symbols of the second frame in the first and second registers and aligns the frame start mark in the symbols of the second frame with the next frame start mark, thereby transmitting the data carried by the second frame to the network layer.

7. The method of claim 1, wherein the interconnect protocol is a Universal Flash Storage (UFS) standard.

8. A storage device capable of linking hosts according to an interconnect protocol, the storage device comprising:

interface circuitry to implement a physical layer of the interconnect protocol to link the hosts; and

a device controller for coupling to the interface circuit and a memory module, the device controller comprising:

a hardware protocol engine to implement the interconnect protocol, wherein during processing of the storage device to receive frames from the host, the hardware protocol engine performs a plurality of operations comprising:

a) The hardware protocol engine extracts symbols of a first frame of a data link layer and transmits data carried by the first frame to a network layer;

b) In the process of transmitting the data carried by the first frame to the network layer, the hardware protocol engine prefetches symbols of a second frame of the data link layer; and

c) After the data carried by the first frame is transmitted to the network layer and the symbols of the second frame are prefetched, the hardware protocol engine transmits the data carried by the second frame to the network layer.

9. The storage device of claim 8, wherein when the hardware protocol engine receives a plurality of back-to-back frames, the hardware protocol engine performs the operations a) through c) on the plurality of back-to-back frames until the plurality of back-to-back frames are transmitted to the network layer, thereby improving the performance of frame reception at the data link layer.

10. The memory device according to claim 8, wherein in the operation a), the hardware protocol engine extracts symbols of the first frame from a memory buffer and repeatedly registers the symbols of the first frame in the first and second registers.

11. The memory device according to claim 10, wherein in the operation b), the hardware protocol engine prefetches symbols of the second frame from the receiving memory buffer and repeatedly registers the symbols of the second frame in the first and second register areas.

12. The storage device of claim 11, wherein in operation b), the hardware protocol engine decomposes the symbols of the first frame in the first and second registers and aligns an end-of-frame marker in the symbols of the first frame with a previous end-of-frame marker, thereby transmitting the data carried by the first frame to the network layer.

13. The memory device of claim 12, wherein in operation c), the hardware protocol engine decomposes the symbols of the second frame in the first and second registers and aligns the start of frame marker in the symbols of the second frame with the next start of frame marker, thereby transmitting the data carried by the second frame to the network layer.

14. The storage device of claim 8, wherein the interconnect protocol is a Universal Flash Storage (UFS) standard.