CN114397999A - Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission - Google Patents

Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission Download PDF

Info

Publication number
CN114397999A
CN114397999A CN202111415577.5A CN202111415577A CN114397999A CN 114397999 A CN114397999 A CN 114397999A CN 202111415577 A CN202111415577 A CN 202111415577A CN 114397999 A CN114397999 A CN 114397999A
Authority
CN
China
Prior art keywords
rpmsg
nvme
memory interface
over
operating system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111415577.5A
Other languages
Chinese (zh)
Inventor
张广
李德建
肖堃
王于波
杨立新
白志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Jiangsu Electric Power Co Ltd
Beijing Smartchip Microelectronics Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Information and Telecommunication Co Ltd
State Grid Jiangsu Electric Power Co Ltd
Beijing Smartchip Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Information and Telecommunication Co Ltd, State Grid Jiangsu Electric Power Co Ltd, Beijing Smartchip Microelectronics Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111415577.5A priority Critical patent/CN114397999A/en
Publication of CN114397999A publication Critical patent/CN114397999A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/26Special purpose or proprietary protocols or architectures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Information Transfer Systems (AREA)

Abstract

The embodiment of the invention provides a communication method, device and equipment based on a nonvolatile memory interface-remote message transfer (NVMe-over-RPMsg), and belongs to the technical field of computers. For virtualizing a remote storage system on a heterogeneous multi-core system-on-chip, the NVMe-over-RPMsg includes: guest operating systems and remote operating systems, the communication method comprises the following steps: enabling the guest operating system to identify that the remote operating system is a destination end of NVMe-over-RPMsg; encapsulating the NVMe-over-RPMsg destination into an NVMe SSD on the guest operating system through a customized nonvolatile memory interface driver, wherein the guest operating system comprises a front end of the NVMe-over-RPMsg; sending a nonvolatile memory interface command sent from the guest operating system to a simulation NVMe SSD controller of a destination end of the NVMe-over-RPMsg; enabling the front end of the NVMe-over-RPMSG and the destination end of the NVMe-over-RPMSG to communicate with each other through an RPMsg tunnel. The communication method eliminates high overhead system calls and reduces long I/O stacks while improving random read/write throughput.

Description

Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission
Technical Field
The invention relates to the technical field of computers, in particular to a communication method, a device and equipment based on a nonvolatile memory interface-remote message transfer (NVMe-over-RPMsg).
Background
In the course of the complexity and diversity of hardware systems, attempts have been made to find a uniform interface to access different types of storage devices. Storage virtualization shields the complexity of physical devices by abstracting and adding a management layer to control resources. It simplifies the management of infrastructure and improves the utilization rate and capacity of storage resources. Conventional storage virtualization solutions mainly include software virtualization, hardware virtualization (e.g., VT-d or SR-IOV), and paravirtualization.
The above-described conventional storage virtualization solutions have some limitations:
1) they are highly dependent on other development tools, and the virtual machines bring high overhead system calls that result in very long I/O stacks;
2) it is difficult to migrate these methods to embedded platforms.
Disclosure of Invention
The application provides a virtual storage frame called NVMe-over-remote processor messages (NVMe-over-RPMsg), which can simulate a remote storage system as a general NVMe SSD in an embedded environment. The implementation can be achieved by modifying the local NVMe driver, rather than migrating the complex virtual machine to the SoC. To emulate traditional PCIe configuration read/write operations between a host and a device, messages are passed between the guest OS and the remote OS using RPMsg (an efficient inter-processor communication protocol).
An object of an embodiment of the present invention is to provide a communication method based on a non-volatile memory interface-remote processing message transfer NVMe-over-RPMsg, which is used for virtualizing a remote storage system on a heterogeneous multi-core system-level chip, and the NVMe-over-RPMsg includes: guest operating systems and remote operating systems, the communication method comprises the following steps: enabling the guest operating system to identify that the remote operating system is a destination end of NVMe-over-RPMsg; encapsulating the NVMe-over RPMsg destination end into a nonvolatile memory interface solid state disk (NVMe SSD) on the guest operating system through a customized nonvolatile memory interface driver, wherein the guest operating system comprises the front end of the NVMe-over RPMsg; sending a nonvolatile memory interface command sent from the guest operating system to a simulation NVMe SSD controller of a destination end of the NVMe-over-RPMsg; enabling the front end of the NVMe-over-RPMsg and the destination end of the NVMe-over-RPMsg to communicate with each other through a remote processing message delivery RPMsg tunnel.
Optionally, the front end of NVMe-over-RPMsg comprises: a customized non-volatile memory interface driver and a local RPMsg driver.
Optionally, the local RPMsg driver enumerates the NVMe-over-RPMsg destination on a para-virtualized Bus RPMsg-Virtio-Bus and provides an access interface to the customized non-volatile memory interface driver.
Optionally, the customized non-volatile memory interface driver parses an input/output I/O request and converts the I/O request into the non-volatile memory interface command.
Optionally, the RPMsg tunnel is established by: loading the local RPMsg driver after the guest operating system is started; the local RPMsg driver creates an abstract RPMsg device and registers a callback function created by the corresponding RPMsg channel; suspending the local RPMsg driver until a name service notification is received from the remote operating system; and the guest operating system sends a name service confirmation message to the remote operating system so as to establish the RPMsg tunnel.
Optionally, the sending the nonvolatile memory interface command sent from the guest operating system to the emulated NVMe SSD controller of the destination of the NVMe-over-RPMsg includes: receiving an inter-core interrupt sent by the guest operating system, and calling the callback function; causing the emulated NVMe SSD controller to process RPMsg data packets; processing a nonvolatile memory interface command received from the guest operating system by using a nonvolatile memory interface protocol parser; and the user space nonvolatile memory interface driver realizes read/write operation on the solid state disk by importing the source address, the destination address and the data size provided by the nonvolatile memory interface protocol parser.
Optionally, the RPMsg packet includes the following two types and is configured to: for a first type, modifying a register of the emulated NVMe SSD controller according to contents of the RPMsg data packet; and for the second type, caching the nonvolatile memory interface command and transmitting the nonvolatile memory interface command to the nonvolatile memory interface protocol parser.
By the communication method, the remote storage system can be simulated as a general NVMe SSD in an embedded environment. The implementation can be achieved by modifying the local NVMe driver, rather than migrating the complex virtual machine to the SoC. To emulate traditional PCIe configuration read/write operations between a host and a device, messages are passed between a guest OS and a remote OS using RPMsg.
In another aspect, the present invention provides a non-volatile memory interface-remote processing messaging (NVMe-over-RPMsg) -based communication apparatus for virtualizing a remote storage system on a heterogeneous multi-core system-on-chip, the communication apparatus comprising: the remote operating system is identified as an NVMe-over-RPMsg destination by the guest operating system; a customized nonvolatile memory interface driver, configured to encapsulate the NVMe-over-RPMsg destination into an NVMe SSD on the guest operating system, wherein the guest operating system includes a front end of the NVMe-over-RPMsg; the simulation NVMe SSD controller is used for enabling a destination end of the NVMe-over-RPMsg to process a nonvolatile memory interface command received from the guest operating system, wherein the destination end of the NVMe-over-RPMsg is realized on the remote operating system; an RPMsg tunnel to communicate a front end of the NVMe-over-RPMSg and a destination end of the NVMe-over-RPMsg with each other.
Optionally, the front end of NVMe-over-RPMsg comprises: the customized nonvolatile memory interface driver and a local RPMsg driver, wherein the local RPMsg driver enumerates the NVMe-over-RPMsg destination terminal on a para-virtualized Bus RPMsg-Virtio-Bus and provides an access interface to the customized nonvolatile memory interface driver; and the customized non-volatile memory interface driver parses an input/output I/O request and converts the I/O request into the non-volatile memory interface command.
Optionally, the communication device further comprises: the non-volatile memory interface protocol parser is used for processing the non-volatile memory interface command received from the guest operating system module; and the user space nonvolatile memory interface driver is used for importing the source address, the destination address and the data size provided by the nonvolatile memory interface protocol parser and realizing read/write operation on the SSD.
Optionally, the emulated NVMe SSD controller is configured to: receiving an inter-core interrupt sent by the guest operating system, and calling a callback function; causing the emulated NVMe SSD controller to process RPMsg data packets; processing a non-volatile memory interface command received from the guest operating system by using the non-volatile memory interface protocol parser; enabling a user space nonvolatile memory interface driver to implement read/write operations on an SSD by importing a source address, a destination address and a data size provided by a nonvolatile memory interface protocol parser, wherein the RPMsg data packet includes the following two and is configured to: for a first type, modifying a register of the emulated NVMe SSD controller according to contents of the RPMsg data packet; and for the second type, caching the nonvolatile memory interface command and transmitting the command to the nonvolatile memory interface protocol resolver.
In another aspect, the present invention provides an apparatus comprising a processor and a memory, wherein the processor is configured to execute the NVMe-over-RPMsg-based communication method.
Optionally, the device is a chip.
Through the technical scheme, the remote storage system can be simulated as a general NVMe SSD in an embedded environment. The implementation can be achieved by modifying the local NVMe driver, rather than migrating the complex virtual machine to the SoC. The communication method eliminates high overhead system calls and reduces long I/O stacks while improving random read/write throughput.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the embodiments of the invention without limiting the embodiments of the invention. In the drawings:
FIG. 1 illustrates a layer model of the RPMsg protocol;
FIG. 2 shows a flow diagram of a NVMe-over-RPMsg-based communication method;
fig. 3 shows a NVMe communication structure diagram on RPMsg;
FIG. 4 shows an initialization logic diagram after system startup;
FIG. 5 illustrates a commit and complete flow diagram of an input output NVMe command;
FIG. 6 shows a block diagram of a NVMe-over-RPMsg-based communication device;
FIG. 7A illustrates a configuration information diagram for a NVMe-over-RPMsg-based communication device;
FIG. 7B shows a data diagram for result 1;
FIG. 7C shows a data plot for result 2;
fig. 7D shows a data diagram of result 3.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating embodiments of the invention, are given by way of illustration and explanation only, not limitation.
Fig. 1 shows a layer model of the RPMsg protocol.
Referring to fig. 1, NVMe described herein is understood to be a non-volatile memory interface and NVMe-over-RPMsg is understood to be a non-volatile memory interface-Remote Processor Messaging, where RPMsg is collectively referred to as Remote Processor Messaging, which defines a standard binary interface used in communication between cores in a heterogeneous multi-core processing system (AMP). Based on the NVMe protocol, the non-volatile memory interface solid state disk NVMe SSD is typically connected to the host over a PCIe physical bus. To maintain the advantages of the NVMe protocol (e.g., low latency, high bandwidth) while virtualizing (also referred to as emulating) the storage device, the local driver of NVMe-over-PCIe is modified to satisfy the virtual NVMe controller.
According to an embodiment, RPMsg is used instead of the role of PCIe. The RPMsg protocol requires a shared memory to enable communication between two heterogeneous processor cores. The overall communication implementation can be divided into three different OSI protocol layers: a transport layer, a Medium Access Control (MAC) layer, and a physical layer. Each RPMsg message is contained in a buffer in shared memory. The buffer is pointed to by the address field of the buffer descriptor from the buffer descriptor pool (descriptor pool) of Vring.
According to an embodiment, the first 16 bytes of the buffer, called RPMsg header, are used internally by the transport layer, including the source address, destination address and data size. Upon receiving an Inter-core Interrupt (i.e., Inter-Processor Interrupt, IPI) from the sender, the receiver enters a callback function according to the content of the packet to complete the host's request.
NVMe protocol, NVMe BAR space is set to NVMe controller registers that can be mapped to host memory, and then the host manages the NVMe controller (controller) using PCIe configuration read/write (e.g., MMIO). The system is designed with two RPMsg packets: Type-I is related to state switching of virtual controllers, such as enable, disable, and close notifications; Type-II is used to notify that there is a new NVMe command to process. These design policies are used to inform the device host that the state of the controller has been recently changed or that an I/O request is released. The Doorbell (Doorbell) register need not be written as before.
Fig. 2 shows a flow chart of a NVMe-over-RPMsg-based communication method.
Referring to fig. 2, the NVMe-over-RPMsg architecture defines a framework for communication for virtualizing a remote storage system on a heterogeneous multi-core system-on-a-chip SoC. With a separate NVMe SSD as the actual storage device. NVMe-over-RPMSG consists of two parts: a guest operating system (guest OS) for native applications and a remote operating system (remote OS) for independent storage management.
S201, enabling the guest OS to identify that the remote OS is a destination end of RPMsg.
According to an embodiment, the remote (remote) OS may be recognized by the guest OS as an RPMsg endpoint, i.e. the RPMsg endpoint may be understood as a destination, the RPMsg backend being comprised in the remote OS, thereby enabling communication between the guest OS and the remote OS.
S203, encapsulating the RPMsg destination end into an NVMe SSD on the guest OS through a customized non-volatile memory interface Driver (Tailored NVMe Driver), wherein the guest OS is used as the front end of the NVMe-over-RPMsg.
According to the embodiment, on the basis of S201, an NVMe driver is customized on a guest OS, and a common RPMsg endpoint is packaged into an NVMe SSD to serve as a front end of the NVMe-over-RPMSG.
S205, the NVMe command sent by the guest OS is sent to the simulation NVMe SSD controller of the destination end of the NVMe-over-RPMsg.
According to an embodiment, since the backend of NVMe-over-RPMSG is implemented on remote OS, the backend emulates NVMe SSD controller and is responsible for processing NVMe commands received from guest OS.
S207, enabling the front end of the NVMe-over-RPMSG and the destination end of the NVMe-over-RPMSG to communicate with each other through the RPMsg channel.
According to an embodiment, the front and back ends of the NVMe-over-RPMSG communicate with each other via an RPMSG tunnel (i.e., a transport layer entity that includes a source address and a destination address). Finally, the back-end of NVMe-over-RPMSG also requires actual disk I/O operations over PCIe to the NVMe SSD.
The method of communication emulates a remote storage system as a general NVMe SSD in an embedded environment. Using an RPMsg efficient inter-processor communication protocol to pass messages between the guest OS and remote OS can be achieved by only modifying the local NVMe driver, rather than migrating the complex virtual machine to the SoC.
Fig. 3 shows NVMe communication structure over RPMsg.
Referring to fig. 3, according to the embodiment, the front end 301 of NVMe-over-RPMsg is responsible for enumerating the back end 303 of NVMe-over-RPMsg as NVMe SSD device 305, and then the front end 301 of NVMe-over-RPMsg registers a gendisk (i.e. a generic disk in Linux kernel representing a standalone disk device) structure to the upper Block layer 300. The front end 301 of NVMe-over-RPMsg comprises: a customized NVMe drive and a local RPMsg drive.
According to an embodiment, a custom NVMe Driver (Tailored NVMe Driver)3011 is responsible for parsing and converting I/O requests, which may also include block I/O requests, into general NVMe commands. The RPMsg device is provided to the customized NVMe driver instead of the local PCIe device, for example, the customized NVMe driver defines a proprietary architecture named NVMe-RPMsg-device. NVMe-RPMsg-device is an abstraction of a specific virtual SSD device, adding RPMsg interface support beyond the native NVMe device characteristics. The difference is the need to send RPMsg packets to notify the remote OS when a controller state change or a new NVMe command enqueues. And checking the completion condition of the NVMe command in a polling mode.
According to an embodiment, a local RPMsg driver (Native RPMsg driver)3013 enumerates RPMsg endpoints on the paravirtualized Bus RPMsg-Virtio-Bus (i.e., logical Bus of RPMsg device in kernel) and provides an access interface to the customized NVMe driver through which RPMsg-Virtio-Bus can call various RPMsg APIs such as send, poll. For example, when the RPMsg send function is called, the local RPMsg driver will route the packet with the RPMsg header appended to it to the correct endpoint.
According to an embodiment, the remote OS manages various storage devices and provides a unified interface for the guest OS. An NVMe-over-RPMsg rear end is designed on remote OS, and comprises four parts: customized RPMsg driver 3011, emulated NVMe controller 3035, NVMe protocol parser 3037, user space NVMe driver 3039.
According to an embodiment, an RPMsg endpoint (destination) represents a particular RPMsg device identified by the local RPMsg driver 3013. As a recipient, the RPMsg endpoint is responsible for extracting the payload of the message from the customized RPMsg driver 3011 and calling the callback function upon receiving an IPI from the sender (guest OS). The callback function is used to define the functionality of the rest of the back-end.
According to an embodiment, (Emulated) Emulated NVMe controller 3035, which may also be an Emulated nvmesd controller, is responsible for handling both types of RPMsg packets. For the first Type I, modifying the simulation NVMeSSD controller register according to the content of the RPMsg data packet; for Type II Type-II, the NVMe command will be cached and passed to the NVMe protocol parse, 3037.
The emulated NVMe controller defines a storage subsystem for managing various storage devices and providing a generic logical interface (e.g., NVMe) for the guest OS. The complexity of the bottom layer system is reduced, and the expandability is enhanced.
According to an embodiment, the NVMe protocol parser 3037 utilizes a software model to process NVMe commands received from the guest OS. According to the NVMe specification, there are two types of NVMe commands: an admin command and an I/O command. The admin command is typically used to identify the emulated NVMe controller 3035 and the namespace. Taking the identification of the namespace as an example, the initialization data of the virtual namespace is directly transferred to the guest OS by means of pointer transfer, which is one of zero-copy techniques. For I/O commands, since the data buffer is located in shared memory, it is not necessary to move the data immediately upon parsing the command. A Physical Region Page (PRP) field may be extracted in the NVMe command and then the memory segments reorganized according to address continuity. The reassembled address information is then provided to the user space NVMe driver 3039 to generate new NVMe commands for the actual NVMe SSD.
According to an embodiment, the user space NVMe driver 3039 operates in a bare metal environment, understood as remote OS, providing a simple read-write interface for upper layer applications. To implement read/write operations on the SSD, the source address, destination address, and data size provided by NVMe protocol parser 3037 may be imported.
Fig. 4 shows the initialization logic diagram after system startup.
Referring to fig. 4, the initialization process after the system is started is as follows:
s401, the local RPMsg driver will first load the driver after the guest OS is started, and the local RPMsg driver will create an abstract (virtual) RPMsg device and register the callback function created by the corresponding RPMsg tunnel.
S403, the driver will then be suspended until it receives a name service notification (i.e., handshake request) from the remote OS (remote OS), which means that the remote OS has created an RPMsg endpoint and registered a callback function.
The guest OS driver then sends a name service confirmation message to the remote OS context S405.
S407, at this point the RPMsg tunnel is then established. All RPMsg APIs can then be used on both sides for real-time communication between the guest OS context and the remote OS context.
S409, after the local RPMsg driver is initialized, the customized NVMe driver starts to work. First, it registers the RPMsg device based on the created local NVMe-RPMsg driver and assigns corresponding attributes to it, such as Direct Memory Access (DMA) masks, i.e. the legacy PCIe SSD driver calls a large number of DMA related APIs, while the local RPMsg device is not related to the device's DMA.
S410, before managing the virtual NVMe controller, map the virtual NVMe SSD controller registers (e.g., controller function, configuration, status registers) to the guest OS memory space.
S411, the subsequent steps are the same as the traditional PCIe NVMe driver, creating management and I/O queue pairs (qpaires).
S413, identify the virtual NVMe SSD controller and namespace, and register the gendisk to the Block Layer (Block Layer). After initialization is over, the NVMe device is emulated in the guest OS, where normal I/O operations can be implemented.
FIG. 5 shows a flow diagram of the submission and completion of an input output NVMe command.
Referring to fig. 5, the submission and completion of the input output NVMe command is understood as the process of command submission and completion processing from the guest OS to the NVMe SSD.
According to an embodiment, the guest OS places one or more commands in the next free virtual commit queue slot in the shared memory for execution.
② the guest OS sends RPMsg Type-II packet indicating that the remote OS has new command submission process.
And thirdly, the simulation NVMe SSD controller in the remote OS directly acquires the command in the virtual submission queue slot without extra data copying.
And fourthly, the NVMe SSD controller is simulated to execute the NVMe command.
Fifthly, the simulation NVMe SSD controller submits the regenerated command to an actual NVMe SSD and polls whether the command is completed (equivalent to virtual-to-actual).
After the emulated NVMe SSD controller receives a Completion entry from the bottom SSD, as a Virtual SSD, the emulated NVMe SSD controller itself needs to place a Completion Queue entry in the next free slot of the associated Virtual Completion Queue (Virtual Completion Queue) and move forward the Virtual commit Queue (Virtual sub Queue) head pointer in the Completion entry to indicate the most recently used commit Queue entry.
And allowing guest (guest) driver to continuously poll the virtual completion queue. Once the phase tag of the completion entry is inverted, the guest driver will process the new completion entry.
FIG. 6 shows a block diagram of a communication device based on NVMe-over-RPMsg.
Referring to fig. 6 in conjunction with fig. 3, an NVMe-over-RPMsg-based communication apparatus 600 for virtualizing a remote storage system on a heterogeneous multi-core SoC includes:
guest OS 601 and remote OS 603, which is recognized by the guest OS as RPMsg destination 6030.
According to an embodiment, the customized NVMe driver 6011 is configured to encapsulate the RPMsg destination into an NVMe SSD on the guest OS, where the guest OS includes the front end 6010 of the NVMe-over-RPMsg.
An emulated NVMe SSD controller 6033 to cause a destination of NVMe-over-RPMsg to process NVMe commands received from the guest OS, wherein the destination of NVMe-over-RPMsg is implemented on the remote OS.
An RPMsg tunnel 602 for communicating a front end 6010 of the NVMe-over-RPMSG and a destination end 6030 of the NVMe-over-RPMSG with each other.
According to an embodiment, the NVMe-over-RPMsg front end 6010 includes:
a customized NVMe driver 6011 and a local RPMsg driver 6013, wherein the local RPMsg driver 6013 enumerates the RPMsg destination 6030 on an RPMsg-Virtio-Bus and provides an access interface to the customized NVMe driver 6011; and the customized NVMe driver 6011 parses the I/O request and converts the I/O request into the NVMe command.
According to an embodiment, the communication device further comprises:
an NVMe protocol parser 6035 for processing NVMe commands received from the guest OS 601.
And a user space NVMe driver 6037 configured to import the source address, the destination address, and the data size provided by the NVMe protocol parser 6035, and implement a read/write operation on the SSD.
According to an embodiment, the emulated NVMe SSD controller 6033 is configured to:
receiving IPI sent by a guest OS, and calling a callback;
causing the emulated NVMe SSD controller to process the RPMsg data packet;
causing the NVMe protocol parser to process NVMe commands received from the guest OS;
enabling a user space NVMe driver to implement a read/write operation on the SSD by importing a source address, a destination address and a data size provided by an NVMe protocol parser, wherein the RPMsg data packet comprises one of the following: for the first type I, modifying a register of the simulation NVMe SSD controller according to the content of the RPMsg data packet; for the second type-II, NVMe commands are cached and passed to the NVMe protocol parser.
According to the embodiment, the NVMe-over-RPMsg-based communication device can implement the methods as described in fig. 2 and fig. 3, which are not described herein.
The technical benefits of NVMe-over-RPMsg based communication devices will be reflected by the resulting embodiments of simulation shown in fig. 7A-7D.
Referring to fig. 7A, to achieve and evaluate NVMe-overRPMsg, the present application uses the xilinxvivado2018.2 design kit. In addition, XilinxZCU102FPGA boards and 256GB SSD of Samsung 970EVO NVMe were used to run. Delay and throughput of NVMe-over-RPMsg, QEMNVMe simulation solution (QEMU-NVMe) in the prior art and Linux local driver (Native-Drive) without virtualization based on the application are respectively calculated. Experimental results were obtained from a specific evaluation using a popular Flexile IO (FIO) benchmark. To eliminate the effect of the page cache, the FIO is configured using the libaio engine and the directIO flag.
According to an embodiment, in terms of delay: the blocksize was configured to 4KB and three sets of tests were set according to the read-write percentage (random read 100/0, random write 65/35, random write 0/100) with the results shown in fig. 7B. Unlike a typical storage device, the samsung 970EVO SSD performs better in random writes than random reads. Compared to QEMU-NVMe, NVMe-over-RPMsg reduced delay times (in units of us) by 48.1%, 46.9%, and 41.3%, respectively, on three different tasks. Compared to Native-Drive, the delay increment is less than 12.5% (6.9%, 8.3%, 12.5%) (percentage of time). The reason is that NVMe-over-RPMsg based communication simulates SSD by redesigning the relevant drivers instead of using virtual machines, thereby eliminating high overhead system calls and reducing long I/O stacks. In addition, the zero-copy technique employed in the design of the present application also results in small delay increments.
In terms of throughput: fig. 7C shows the throughput results for the three solutions described above. To simulate the actual workload on the guest OS, throughput was evaluated by increasing the block size of the random read/write. As shown in fig. 7C, NVMe-over-RPMsg achieves higher throughput than QEMU-NVMe solution as the block size increases. Specifically, the read bandwidth of NVMe-over-RPMsg is better than the QEMU NVMe scheme (1.86X,1.74X,1.72X,1.8X,1.74X,1.76X), while the write bandwidth is improved (1.71X,1.57X, 1.77X, 1.74X,1.76X, 1.68X) where X represents a multiple, such as 1.86X represents 1.86 times. The bandwidth is increased because the NVMe-over-RPMsg framework inherits the standard NVMe protocol and the use of shared memory reduces unnecessary data movement. The performance penalty for reading (0.91X, 0.9X, 0.89X, 0.88X, 0.85X, 0.79X) and writing (0.93X, 0.92X, 0.93X, 0.89X, 0.86X, 0.82X) is small compared to Native-Drive for local storage, which is caused by unavoidable context switching of the analog NVMe device. And the loss increases with increasing block size because the NVMe-over-RPMsg backend needs to split a large block into several new blocks to transfer to the SSD.
In fact, the amount of I/O requests with a block size less than 32KB exceeds 75% of all requests generated by the OS. Therefore, the I/O performance penalty of NVMe-over-RPMsg is acceptable and the improvement over the native QEMU-NVMe solution is very significant.
Based on the requirement of storage virtualization in a heterogeneous multi-core embedded system and the limitation of the existing NVMe SSD simulation method of QEMU. NVMe-over-RPMsg can model a complex remote storage system as a local NVMe SSD. ZCU102 test results of the prototype system on board showed improvement in performance of the model. Compared with the existing QEMU-NVMe method, the NVMe-over-RPMsg reduces the random read/write delay of 45.4% on average, and simultaneously extends to 1.74 times of random read/write throughput.
An embodiment of the present invention provides a storage medium, on which a program is stored, and when the program is executed by a processor, the NVMe-over-RPMsg-based communication method is implemented.
The embodiment of the invention provides a processor, which is used for running a program, wherein the NVMe-over-RPMsg-based communication method is executed when the program runs.
The present application further provides a computer program product adapted to perform a program initialized with NVMe-over-RPMsg based communication method steps when executed on a data processing device.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (13)

1. A communication method based on NVMe-over-RPMsg (nonvolatile memory interface-remote processing message transfer) is used for virtualizing a remote storage system on a heterogeneous multi-core system-level chip, and is characterized in that the NVMe-over-RPMsg comprises the following steps: guest operating systems and remote operating systems, the communication method comprises the following steps:
enabling the guest operating system to identify that the remote operating system is a destination end of NVMe-over-RPMsg;
encapsulating the NVMe-over RPMsg destination end into a nonvolatile memory interface solid state disk (NVMe SSD) on the guest operating system through a customized nonvolatile memory interface driver, wherein the guest operating system comprises the front end of the NVMe-over RPMsg;
sending a nonvolatile memory interface command sent from the guest operating system to a simulation NVMe SSD controller of a destination end of the NVMe-over-RPMsg;
enabling the front end of the NVMe-over-RPMsg and the destination end of the NVMe-over-RPMsg to communicate with each other through a remote processing message delivery RPMsg tunnel.
2. The communication method of claim 1, wherein the front end of the NVMe-over-RPMsg comprises: a customized non-volatile memory interface driver and a local RPMsg driver.
3. The communication method of claim 2, wherein the local RPMsg driver enumerates the NVMe-over-RPMsg destination on a para-virtualized Bus RPMsg-Virtio-Bus and provides an access interface to the customized non-volatile memory interface driver.
4. The method of claim 2, wherein the customized non-volatile memory interface driver parses an input/output (I/O) request and converts the I/O request into the non-volatile memory interface command.
5. The communication method according to claim 2, wherein the RPMsg tunnel is established by:
loading the local RPMsg driver after the guest operating system is started;
the local RPMsg driver creates an abstract RPMsg device and registers a callback function created by the corresponding RPMsg channel;
suspending the local RPMsg driver until a name service notification is received from the remote operating system;
and the guest operating system sends a name service confirmation message to the remote operating system so as to establish the RPMsg tunnel.
6. The communication method according to claim 5, wherein sending the nonvolatile memory interface command sent from the guest operating system to the emulated NVMe SSD controller of the destination of the NVMe-over-RPMsg comprises:
receiving an inter-core interrupt sent by the guest operating system, and calling the callback function;
causing the emulated NVMe SSD controller to process RPMsg data packets;
processing a nonvolatile memory interface command received from the guest operating system by using a nonvolatile memory interface protocol parser;
and the user space nonvolatile memory interface driver realizes read/write operation on the solid state disk by importing the source address, the destination address and the data size provided by the nonvolatile memory interface protocol parser.
7. The communication method according to claim 6, wherein the RPMsg packet includes the following two types and is configured to:
for a first type, modifying a register of the emulated NVMeSSD controller according to the contents of the RPMsg data packet; and
and for the second type, caching the nonvolatile memory interface command and transmitting the nonvolatile memory interface command to the nonvolatile memory interface protocol parser.
8. A non-volatile memory interface-remote processing messaging (NVMe-over-RPMsg) -based communication device for virtualizing a remote storage system on a heterogeneous multi-core system-on-chip, the communication device comprising:
the remote operating system is identified as an NVMe-over-RPMsg destination by the guest operating system;
a customized nonvolatile memory interface driver, configured to encapsulate the NVMe-over-RPMsg destination into an NVMe SSD on the guest operating system, wherein the guest operating system includes a front end of the NVMe-over-RPMsg;
the simulation NVMe SSD controller is used for enabling a destination end of the NVMe-over-RPMsg to process a nonvolatile memory interface command received from the guest operating system, wherein the destination end of the NVMe-over-RPMsg is realized on the remote operating system;
an RPMsg tunnel to communicate a front end of the NVMe-over-RPMSg and a destination end of the NVMe-over-RPMsg with each other.
9. The communication device of claim 8, wherein the front end of the NVMe-over-RPMsg comprises:
the customized non-volatile memory interface driver and the local RPMsg driver, wherein
The local RPMsg driver enumerates the NVMe-over-RPMsg destination end on a para-virtualized Bus RPMsg-Virtio-Bus and provides an access interface for a customized non-volatile memory interface driver; and
the customized non-volatile memory interface driver parses an input/output I/O request and converts the I/O request into the non-volatile memory interface command.
10. The communications device of claim 8, further comprising:
the non-volatile memory interface protocol parser is used for processing the non-volatile memory interface command received from the guest operating system module;
and the user space nonvolatile memory interface driver is used for importing the source address, the destination address and the data size provided by the nonvolatile memory interface protocol parser and realizing read/write operation on the SSD.
11. The communication apparatus of claim 8, wherein the emulated NVMe SSD controller is configured to:
receiving an inter-core interrupt sent by the guest operating system, and calling a callback function;
causing the emulated NVMe SSD controller to process RPMsg data packets;
processing a non-volatile memory interface command received from the guest operating system by using the non-volatile memory interface protocol parser;
enabling a user space nonvolatile memory interface driver to realize read/write operation on the SSD by importing a source address, a destination address and a data size provided by a nonvolatile memory interface protocol parser, wherein
The RPMsg packet includes both of the following and is configured to:
for a first type, modifying a register of the emulated NVMeSSD controller according to the contents of the RPMsg data packet; and
for the second type, caching the non-volatile memory interface command and transmitting the command to the non-volatile memory interface protocol parser.
12. An apparatus comprising a processor and a memory, wherein the processor is configured to perform the method of any one of claims 1-7.
13. The apparatus of claim 12, wherein the apparatus is a chip.
CN202111415577.5A 2021-11-25 2021-11-25 Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission Pending CN114397999A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111415577.5A CN114397999A (en) 2021-11-25 2021-11-25 Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111415577.5A CN114397999A (en) 2021-11-25 2021-11-25 Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission

Publications (1)

Publication Number Publication Date
CN114397999A true CN114397999A (en) 2022-04-26

Family

ID=81225516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111415577.5A Pending CN114397999A (en) 2021-11-25 2021-11-25 Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission

Country Status (1)

Country Link
CN (1) CN114397999A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069453A (en) * 2023-04-04 2023-05-05 苏州浪潮智能科技有限公司 Simulation system
EP4280070A1 (en) * 2022-05-17 2023-11-22 Samsung Electronics Co., Ltd. Systems and methods for solid state device (ssd) simulation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4280070A1 (en) * 2022-05-17 2023-11-22 Samsung Electronics Co., Ltd. Systems and methods for solid state device (ssd) simulation
CN116069453A (en) * 2023-04-04 2023-05-05 苏州浪潮智能科技有限公司 Simulation system

Similar Documents

Publication Publication Date Title
WO2017114283A1 (en) Method and apparatus for processing read/write request in physical host
CN103282881B (en) Smart machine is directly shared by virtualization
JP5608243B2 (en) Method and apparatus for performing I / O processing in a virtual environment
KR101514088B1 (en) Optimized interrupt delivery in a virtualized environment
CN107209681B (en) Storage device access method, device and system
JP5180373B2 (en) Lazy processing of interrupt message end in virtual environment
Peng et al. {MDev-NVMe}: A {NVMe} Storage Virtualization Solution with Mediated {Pass-Through}
EP2831727B1 (en) Accessing a device on a remote machine
US20110153909A1 (en) Efficient Nested Virtualization
JP7310924B2 (en) In-server delay control device, server, in-server delay control method and program
WO2018140202A1 (en) Technologies for pooling accelerators over fabric
CN114397999A (en) Communication method, device and equipment based on nonvolatile memory interface-remote processing message transmission
US11435958B2 (en) Shared memory mechanism to support fast transport of SQ/CQ pair communication between SSD device driver in virtualization environment and physical SSD
KR101716715B1 (en) Method and apparatus for handling network I/O apparatus virtualization
WO2023179388A1 (en) Hot migration method for virtual machine instance
CN113419845A (en) Calculation acceleration method and device, calculation system, electronic equipment and computer readable storage medium
CN110874336B (en) Distributed block storage low-delay control method and system based on Shenwei platform
CN114691286A (en) Server system, virtual machine creation method and device
CN115113977A (en) Descriptor reading apparatus and device, method and integrated circuit
Diakhaté et al. Efficient shared memory message passing for inter-VM communications
CN111651269A (en) Method, device and computer readable storage medium for realizing equipment virtualization
US10164911B2 (en) Shim layer used with a virtual machine virtual NIC and a hardware platform physical NIC
CN117389694B (en) Virtual storage IO performance improving method based on virtio-blk technology
CN108829530B (en) Image processing method and device
CN116954830B (en) Method for enabling virtual machine to realize msi/x interrupt under jailhouse

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination