WO2016068870A1

WO2016068870A1 - Media controller with coordination buffer

Info

Publication number: WO2016068870A1
Application number: PCT/US2014/062593
Authority: WO
Inventors: Fred A. SPRAGUE; Gregg B. Lesartre
Original assignee: Hewlett Packard Enterprise Development Lp
Priority date: 2014-10-28
Filing date: 2014-10-28
Publication date: 2016-05-06

Abstract

A system includes a media controller to control data access from a set of computing nodes to a shared memory. A coordination buffer holds data update requests generated from the set of computing nodes to the media controller. The media controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer.

Description

MEDIA CONTROLLER WITH COORDINATION BUFFER

BACKGROUND

[0001] Nonstop class computing systems refer to systems that have redundant computing nodes such that if any one of the redundant computing nodes fails the remaining nodes continue system operations. Nonstop class systems also must detect process failures and errors before data is committed to persistent memory. Increasingly, these systems are employing new non-volatile memory (NVM) technologies operating at near main memory latencies to improve system

performance by allowing new flat memory hierarchies that permit servers to write directly to non-volatile memory as storage. These systems can also take advantage of both the shorter latencies offered by the NVM memory technology and the reduced software management layers that are necessary with the current separate memory and storage system architectures.

BRIEF DESCRIPTION OF THE DRAWINGS

[0002] FIG. 1 illustrates an example of a media controller that employs a coordination buffer to control access to a shared memory.

[0003] FIG. 2 illustrates an example of a media controller that employs a coordination buffer to control access to a segmented shared memory.

[0004] FIG. 3 illustrates an example of a network of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory.

[0005] FIG. 4 illustrates an example of a method to control access to a shared memory. DETAILED DESCRIPTION

[0006] This disclosure relates to a media controller that employs a coordination buffer to control access to a shared memory. The media controller controls data access from a set of computing nodes to the shared memory by processing data update requests from the set of computing nodes. The data update requests represent data that a given computing node desires to modify at a selected address of the shared memory. If data updates from a subset of computing nodes correlates (e.g., data requests from at least two nodes to the same shared memory address matches), the media controller enables the shared memory to be modified in accordance with the data update request. For example, if one computing node requests that data be updated to a given data value, the media controller can hold off the actual modification of memory until at least one other node requests the same update to the same shared memory address.

[0007] Different control configurations are possible in the media controller. In some examples, all computing nodes in the set of computing nodes may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory can commence. In another example, a proper subset of computing nodes (e.g., some number less than all of the computing nodes in the set) may request an update. If the subset of nodes generates the same update request, then the media controller updates the shared memory. The coordination buffer holds data update requests generated from the set of computing nodes to the memory controller where the data update requests represent desired updates to the shared memory. The memory controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer. In one example, the media controller determines if a subset of the set of computing nodes generate the same data update request before enabling the data in the shared memory to be modified. [0008] FIG. 1 illustrates an example of a media controller 1 10 that employs a coordination buffer 120 to control access to a shared memory 130. The shared memory 130 is typically a non-volatile memory (e.g., Memristor, PC RAM, Spin Torque, and so forth) although volatile memory can also be employed. The media controller 1 10 controls data access from a set of computing nodes 140, shown as nodes 1 though N, with N being a positive integer, to the shared memory 130 by processing data update requests from the set of computing nodes. The coordination buffer 120 holds the data update requests, shown as update request 1 though M with M being a positive integer, generated from the set of computing nodes 140 to the memory controller 1 10. The data update requests held in the coordination buffer 120 represent desired data updates to the shared memory 130 at a selected address of memory as requested from the respective computing node from the set 140. As disclosed herein, the media controller 1 10 and coordination buffer can be provided as a circuit and/or as part of a memory bus system to control access to the shared memory 130.

[0009] The media controller 1 10 enables data in the shared memory 130 to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer 120. In one example, the media controller 1 10 determines if a subset of the set of computing nodes 140 generate the same data update request before enabling the data in the shared memory 130 to be modified. The data update requests represent data that a given computing node from the set of nodes 140 desires to modify at a selected address of the shared memory 130. If data updates from a subset of computing nodes correlates (e.g., data from at least two nodes to the same shared memory address matches), the media controller 1 10 enables the shared memory 130 to be modified in accordance with the data update request. For example, if one computing node requests that data be updated to a given data value, the media controller 1 10 can hold off the actual modification of shared memory 130 until at least one other node from the set 140 requests the same update to the same shared memory address. [00010] Different control configurations are possible in the media controller 1 10. In some examples, all computing nodes in the set of computing nodes 140 may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory 130 can commence. In another example, a proper subset of computing nodes (e.g., some number less than all of the computing nodes in the set) may request an update. If the proper subset of nodes (e.g., simple majority, predetermined number defining majority) generate the same update request, then the media controller 1 10 updates the shared memory 130. The shared memory 130 and media controller disclosed herein can be employed in one example to serialize concurrent accesses by multiple redundancy controllers to memory (e.g., See e.g., FIG. 3). Redundancy controllers, for instance, may access memory using redundant array of independent disks (RAID) algorithms and/or memory mirroring, to provide fault tolerance in the event of a shared memory module [304A-M] failure.

[00011] The computing nodes in the set 140 can include a central processing unit (CPU) that can include a single core or can include multiple cores where each core is given similar or dissimilar permissions by the media controller to access the shared memory 130. The CPU can also be bundled with other CPU's to perform a server and/or client function, for example. Multiple servers and/or clients can be employed to access the memory 130 via the media controller 1 10 (or controllers). Thus the media controller 1 10 can control and facilitate access to the memory 130 with respect to a single CPU core, multiple CPU cores, multiple servers, and/or multiple clients, for example.

[00012] In some examples, the media controller 1 10 can be provided as part of a memory bus architecture to provide access the shared memory 130. This can also include employment of a memory controller (not shown). In some examples, the functions of the memory controller and media controller 1 10 can be combined into a single integrated circuit. The media controller 1 10 controls aspects of the memory interface that are specific to the type of medium attached (e.g. various non-volatile memory types, DRAM, flash, and so forth). These may include, for example, media- specific decoding or interleave (e.g., Row/Column/Bank/Rank), media-specific wear management (e.g., Wear Leveling), media-specific error management (e.g., ECC correction, CRC detection, Wear-out relocation, device deletion), and/or media- specific optimization (e.g. conflict scheduling). If a memory controller is also employed, the memory controller controls aspects of the memory interface that are independent of media, but specific to the CPU or system features employed. This may include, for example, system address decoding (e.g., interleaving between multiple media controllers, if there are more than one), and redundancy features described below with respect to FIG. 3, for example (e.g., RAID, mirroring, and so forth).

[00013] With a memory based entity (e.g., no I/O directly controlled from a core) the comparison between multiple cores running the same application can be checked for proper operation by monitoring the changes they desire to make to shared memory 130. The memory subsystem including the media controller 1 10 (or device between the cores and the memory subsystem) receives an update request for change and waits for the "other" computing nodes from the set 140 to request the same operation before committing to the shared memory 130. Should the

computing nodes not agree on the change, the majority can rule via the media controller 1 10 whereas the non-matching computing node's update may be rejected.

[00014] The media controller 1 10 can track new update requests (e.g., writes) to shared memory 130 in the coordination buffer 120 that holds the transaction pending until additional writes to the same memory location from the set of coordinated validating systems (e.g., computing nodes) from the set 140 are received. The coordination buffer 120 can check any newly arriving writes against pending writes, and upon a match, compare the data to determine whether the update matches the pending transaction - or not. Any number of validating systems may be supported in this memory configuration.

[00015] A matching data update request can be recorded as a "vote for" the update, whereas a mismatch can be recorded as a "vote against" via the coordination buffer 120. If three or more updates are considered, then subsequent updates can continue to be accumulated until all (or subset) cooperating system updates are observed, or non-reporting systems are determined to be non-reporting and removed from the set 140. The prevalent data (e.g., that of the majority) can then be committed to shared memory 130. If only one update is received, then updating system from the set of computing nodes 140 can be determined to be suspect, and the update is discarded (e.g., after a predetermined period of time or after a number of events have occurred). When a conclusion is reached by examining the multiple updates of the data, write responses can be returned to all (or a subset) of updating systems, with success, or failure indicated to each requesting node to trigger a suitable response to mismatched data (e.g., success flag if data written or error flag if update request rejected by media controller).

[00016] Example implementations may achieve a tight consistency of data update timing, in which case a direct hardware implementation of the update tracking buffer 120 can be sufficient to collect and process pending updates within the expected time window. Pending updates may be rejected in this model when the capacity of the coordination buffer 120 is overrun before the necessary corroborating updates have been received. Other example implementations may allocate a portion of shared memory 130 (See e.g., FIG. 2) to gather pending updates still requiring validating updates from other systems. Such a configuration can manage pending updates from a hardware front end as described herein. While some multisystem architectures may have a tendency to have their timing drift relative to one another, this media controller 1 10 and coordination buffer 120 can tend to re-align (e.g., synchronize) the timing by aligning the completion indication of matching updates, for example.

[00017] Not all shared memory 130 need to be considered part of the validated address space. For example, each computing node from the set 140 may be assigned address ranges that are private, to be used to accumulate data before software algorithms gather the data intended for cross validation. Other address ranges may be shared, but not validated so that it may be used for communication between the cooperating systems to keep them in sync with one-another. New cores (or systems) can be added (e.g., in a non-stop redundancy system) by temporarily stopping operation, mirroring the memory image of that core to a new core, and then continuing operation of all cores. As noted previously, non-volatile memories may be redundantly deployed, with mirroring, and/or RAIDing of the data from the compute node 140 across non-volatile memory modules with each independently validated.

[00018] In one example, the media controller 1 10 and coordination buffer enables straight-forward majority rule (e.g., nonstop) operation within the memory media controller 1 10, allows direct memory access performance without special hardware on the processor while providing the system reliability required of non-stop systems. With this non-stop model, a variety of commodity processors may be applied to this computing space. Alternatively, the management of pending transactions may be handled by a "machine-in-the-middle" that provides the described functionality before handing committed updates to standard memory modules. The media controller 1 10 and coordination buffer 120 provides for quick comparison and sign off of matching writes to fulfill the expectations of processor load/store latencies.

[00019] FIG. 2 illustrates an example of a media controller 210 that employs a coordination buffer 220 to control access to a segmented shared memory 230. In this example, the memory 230 can include a private memory segment 240. The private memory segment 240 can be reserved for a given node from the set of nodes described above. In some examples, each of the nodes from the set of nodes can be assigned a separate private memory segment, where update requests to the private memory do not have to have corroboration by the memory controller 210 and can be directly updated upon request by the given node. The memory 230 can also include shared non-controlled access memory 250. The memory 250 can be accessed by multiple computing nodes upon request yet do not need to be validated by another computing node in the set before the memory controller proceeds to update the memory 250. The memory 230 can also include a shared controlled access memory segment 260. The memory segment 260 requires validation by the memory controller 210 before it can be modified. For example, validation can include the requirement that at least two computing nodes from a set of computing nodes generate the same update request before the memory controller 210 proceeds to modify the memory segment 260.

[00020] FIG. 3 illustrates an example of a network system 300 of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory. The system 300 can be deployed as a fault tolerant system to serialize concurrent accesses by multiple redundancy controllers to fault tolerant memory according to an example of the present disclosure. It should be understood that the system 300 may include additional components and that one or more of the components described herein may be removed and/or modified without departing from a scope of the system 300. The system 300 may include multiple computing nodes 300A-N (where the number of computing nodes is greater than or equal to 1 ), multiple redundancy controllers 302A-N, a network interconnect module 340, and memory modules 304A-M.

[00021] The multiple compute nodes 300A-N may be coupled to the memory modules 304A-M by the network interconnect module 340. The memory

modules 304A-M may include media controllers 320A-M and memories 321 A-M. Each media controller, for instance, may communicate with its associated memory and control access to the memory. The media controllers 320A-M provide access to regions of memory with each media controller including a respective coordination buffer (not shown) as disclosed herein. The regions of memory can be accessed by multiple redundancy controllers 302A-N in the compute nodes 300A-N using access primitives such as read, write, lock, unlock, and so forth. In order to support aggregation or sharing of memory, media controllers 320A-M may be accessed by multiple redundancy controllers (e.g., acting on behalf of multiple servers). Thus, there can be a many-to-many relationship between redundancy controllers and media controllers. The memory 321 A-M may include volatile dynamic random access memory (DRAM) with battery backup, non-volatile phase change random access memory (PCRAM), spin transfer torque-magnetoresistive random access memory (STT-MRAM), resistive random access memory (reRAM), memristor, FLASH, or other types of memory devices. For example, the memory may be solid state, persistent, dense, fast memory. Fast memory can be memory having an access time similar to DRAM memory.

[00022] As described in the disclosed examples, the redundancy controller 302A- M may maintain fault tolerance across the memory modules 304A-M. The

redundancy controller 302A-M may receive read or write commands from one or more processors, I/O devices, or other sources. In response to these, it generates sequences of primitive accesses to multiple media controllers 320A-M. The redundancy controller 302A-M may also generate certain sequences of primitives independently, not directly resulting from processor commands. These include sequences used for scrubbing, initializing, migrating, or error-correcting memory, for example.

[00023] In view of the foregoing structural and functional features described above, an example method will be better appreciated with reference to FIG. 4.

While, for purposes of simplicity of explanation, the method is shown and described as executing serially, it is to be understood and appreciated that the method is not limited by the illustrated order, as parts of the method could occur in different orders and/or concurrently from that shown and described herein. Such method can be executed by various components and executed by an integrated circuit, computer, or a controller, for example.

[00024] FIG. 4 illustrates an example of a method 400 to control access to a shared memory. At 410, the method 400 includes comparing a set of data update requests generated from a set of computing nodes (e.g., via coordination buffer 120 of FIG. 1 ). At 420, the method 400 includes determining if the set of data update requests from the set of computing nodes represent the same data (e.g., via media controller 1 10 of FIG. 1 ). At 430, the method 400 includes modifying data in a shared memory if a subset of the data update requests generated from the set of computing nodes represents the same data (e.g., via media controller 1 10 of FIG. 1 ). The method 400 can also include sending a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sending a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.

[00025] What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, the invention is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite "a," "an," "a first," or "another" element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term

"includes" means includes but not limited to, and the term "including" means including but not limited to. The term "based on" means based at least in part on.

Claims

CLAIMS What is claimed is:

1 . A circuit, comprising:

a media controller to control data access from a set of computing nodes to a shared memory; and

a coordination buffer that holds data update requests generated from the set of computing nodes to the media controller, wherein the media controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer.

2. The circuit of claim 1 , wherein the shared memory is employed by redundancy controllers in a redundant array of independent disks (RAID) or memory mirroring configuration.

3. The circuit of claim 1 , wherein each data update request in the coordination buffer functions as a vote from a respective member of the set of computing nodes for updating the shared memory.

4. The circuit of claim 3, wherein update requests for data to a given address of the shared memory that match at least one other update request function as positive votes and update requests that do not match data to the given address of shared memory function as negative votes.

5. The circuit of claim 4, wherein the media controller sends a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sends a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.

6. The circuit of claim 4, wherein an update request that does not receive a matching positive vote is discarded by the media controller after a predetermined period of time.

7. The circuit of claim 1 , wherein the media controller waits for the subset of the set of computing nodes to generate the same data update request before enabling the data in the shared memory to be modified.

8. The circuit of claim 7, wherein each of the plurality of computing nodes includes a redundancy controller to communicate with a plurality of media controllers.

9. The circuit of claim 1 , wherein the shared memory is segmented such that a portion of memory is designated as private memory to a given computing node, a portion of memory is designated as shared non-controlled access between computing nodes, and a portion of memory is designated as shared controlled access between computing nodes.

10. The circuit of claim 1 , wherein the media controller utilizes the coordination buffer to synchronize timing between multiple computing nodes from the set of computing nodes.

1 1 . A system, comprising:

a coordination buffer that holds data update requests generated from the set of computing nodes for comparison by the media controller, wherein the media controller determines if a subset of the set of computing nodes generate the same data update request before enabling the data in the shared memory to be modified.

12. The system of claim 1 1 , wherein update requests for data to a given address of the shared memory that match at least one other update request function as positive votes and update requests that do not match data to the given address of the shared memory function as negative votes.

13. The system of claim 12, wherein the media controller sends a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sends a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.

14. A method, comprising:

comparing, by a controller, a set of data update requests generated from a set of computing nodes;

determining, by the controller, if the set of data update requests from the set of computing nodes represent the same data; and

modifying, by the controller, data in a shared memory if a subset of the data update requests generated from the set of computing nodes represents the same data.

15. The method of claim 14, further comprising sending a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sending a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.