WO2016068870A1 - Media controller with coordination buffer - Google Patents

Media controller with coordination buffer Download PDF

Info

Publication number
WO2016068870A1
WO2016068870A1 PCT/US2014/062593 US2014062593W WO2016068870A1 WO 2016068870 A1 WO2016068870 A1 WO 2016068870A1 US 2014062593 W US2014062593 W US 2014062593W WO 2016068870 A1 WO2016068870 A1 WO 2016068870A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
computing nodes
shared memory
media controller
Prior art date
Application number
PCT/US2014/062593
Other languages
French (fr)
Inventor
Fred A. SPRAGUE
Gregg B. Lesartre
Original Assignee
Hewlett Packard Enterprise Development Lp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett Packard Enterprise Development Lp filed Critical Hewlett Packard Enterprise Development Lp
Priority to PCT/US2014/062593 priority Critical patent/WO2016068870A1/en
Publication of WO2016068870A1 publication Critical patent/WO2016068870A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/082Associative directories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer

Definitions

  • Nonstop class computing systems refer to systems that have redundant computing nodes such that if any one of the redundant computing nodes fails the remaining nodes continue system operations. Nonstop class systems also must detect process failures and errors before data is committed to persistent memory. Increasingly, these systems are employing new non-volatile memory (NVM) technologies operating at near main memory latencies to improve system
  • FIG. 1 illustrates an example of a media controller that employs a coordination buffer to control access to a shared memory.
  • FIG. 2 illustrates an example of a media controller that employs a coordination buffer to control access to a segmented shared memory.
  • FIG. 3 illustrates an example of a network of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory.
  • FIG. 4 illustrates an example of a method to control access to a shared memory.
  • This disclosure relates to a media controller that employs a coordination buffer to control access to a shared memory.
  • the media controller controls data access from a set of computing nodes to the shared memory by processing data update requests from the set of computing nodes.
  • the data update requests represent data that a given computing node desires to modify at a selected address of the shared memory. If data updates from a subset of computing nodes correlates (e.g., data requests from at least two nodes to the same shared memory address matches), the media controller enables the shared memory to be modified in accordance with the data update request. For example, if one computing node requests that data be updated to a given data value, the media controller can hold off the actual modification of memory until at least one other node requests the same update to the same shared memory address.
  • all computing nodes in the set of computing nodes may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory can commence.
  • a proper subset of computing nodes e.g., some number less than all of the computing nodes in the set
  • the media controller updates the shared memory.
  • the coordination buffer holds data update requests generated from the set of computing nodes to the memory controller where the data update requests represent desired updates to the shared memory.
  • the memory controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer.
  • FIG. 1 illustrates an example of a media controller 1 10 that employs a coordination buffer 120 to control access to a shared memory 130.
  • the shared memory 130 is typically a non-volatile memory (e.g., Memristor, PC RAM, Spin Torque, and so forth) although volatile memory can also be employed.
  • the media controller 1 10 controls data access from a set of computing nodes 140, shown as nodes 1 though N, with N being a positive integer, to the shared memory 130 by processing data update requests from the set of computing nodes.
  • the coordination buffer 120 holds the data update requests, shown as update request 1 though M with M being a positive integer, generated from the set of computing nodes 140 to the memory controller 1 10.
  • the data update requests held in the coordination buffer 120 represent desired data updates to the shared memory 130 at a selected address of memory as requested from the respective computing node from the set 140.
  • the media controller 1 10 and coordination buffer can be provided as a circuit and/or as part of a memory bus system to control access to the shared memory 130.
  • the media controller 1 10 enables data in the shared memory 130 to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer 120.
  • the media controller 1 10 determines if a subset of the set of computing nodes 140 generate the same data update request before enabling the data in the shared memory 130 to be modified.
  • the data update requests represent data that a given computing node from the set of nodes 140 desires to modify at a selected address of the shared memory 130. If data updates from a subset of computing nodes correlates (e.g., data from at least two nodes to the same shared memory address matches), the media controller 1 10 enables the shared memory 130 to be modified in accordance with the data update request.
  • the media controller 1 10 can hold off the actual modification of shared memory 130 until at least one other node from the set 140 requests the same update to the same shared memory address.
  • the media controller 1 10 can hold off the actual modification of shared memory 130 until at least one other node from the set 140 requests the same update to the same shared memory address.
  • All computing nodes in the set of computing nodes 140 may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory 130 can commence.
  • a proper subset of computing nodes e.g., some number less than all of the computing nodes in the set) may request an update.
  • the media controller 1 10 updates the shared memory 130.
  • the shared memory 130 and media controller disclosed herein can be employed in one example to serialize concurrent accesses by multiple redundancy controllers to memory (e.g., See e.g., FIG. 3). Redundancy controllers, for instance, may access memory using redundant array of independent disks (RAID) algorithms and/or memory mirroring, to provide fault tolerance in the event of a shared memory module [304A-M] failure.
  • RAID redundant array of independent disks
  • the computing nodes in the set 140 can include a central processing unit (CPU) that can include a single core or can include multiple cores where each core is given similar or dissimilar permissions by the media controller to access the shared memory 130.
  • the CPU can also be bundled with other CPU's to perform a server and/or client function, for example.
  • Multiple servers and/or clients can be employed to access the memory 130 via the media controller 1 10 (or controllers).
  • the media controller 1 10 can control and facilitate access to the memory 130 with respect to a single CPU core, multiple CPU cores, multiple servers, and/or multiple clients, for example.
  • the media controller 1 10 can be provided as part of a memory bus architecture to provide access the shared memory 130. This can also include employment of a memory controller (not shown). In some examples, the functions of the memory controller and media controller 1 10 can be combined into a single integrated circuit. The media controller 1 10 controls aspects of the memory interface that are specific to the type of medium attached (e.g. various non-volatile memory types, DRAM, flash, and so forth).
  • media-specific decoding or interleave e.g., Row/Column/Bank/Rank
  • media-specific wear management e.g., Wear Leveling
  • media-specific error management e.g., ECC correction, CRC detection, Wear-out relocation, device deletion
  • media-specific optimization e.g. conflict scheduling
  • the memory controller controls aspects of the memory interface that are independent of media, but specific to the CPU or system features employed. This may include, for example, system address decoding (e.g., interleaving between multiple media controllers, if there are more than one), and redundancy features described below with respect to FIG. 3, for example (e.g., RAID, mirroring, and so forth).
  • the comparison between multiple cores running the same application can be checked for proper operation by monitoring the changes they desire to make to shared memory 130.
  • the memory subsystem including the media controller 1 10 (or device between the cores and the memory subsystem) receives an update request for change and waits for the "other" computing nodes from the set 140 to request the same operation before committing to the shared memory 130. Should the
  • the majority can rule via the media controller 1 10 whereas the non-matching computing node's update may be rejected.
  • the media controller 1 10 can track new update requests (e.g., writes) to shared memory 130 in the coordination buffer 120 that holds the transaction pending until additional writes to the same memory location from the set of coordinated validating systems (e.g., computing nodes) from the set 140 are received.
  • the coordination buffer 120 can check any newly arriving writes against pending writes, and upon a match, compare the data to determine whether the update matches the pending transaction - or not. Any number of validating systems may be supported in this memory configuration.
  • a matching data update request can be recorded as a "vote for" the update, whereas a mismatch can be recorded as a "vote against” via the coordination buffer 120. If three or more updates are considered, then subsequent updates can continue to be accumulated until all (or subset) cooperating system updates are observed, or non-reporting systems are determined to be non-reporting and removed from the set 140. The prevalent data (e.g., that of the majority) can then be committed to shared memory 130. If only one update is received, then updating system from the set of computing nodes 140 can be determined to be suspect, and the update is discarded (e.g., after a predetermined period of time or after a number of events have occurred).
  • write responses can be returned to all (or a subset) of updating systems, with success, or failure indicated to each requesting node to trigger a suitable response to mismatched data (e.g., success flag if data written or error flag if update request rejected by media controller).
  • Example implementations may achieve a tight consistency of data update timing, in which case a direct hardware implementation of the update tracking buffer 120 can be sufficient to collect and process pending updates within the expected time window. Pending updates may be rejected in this model when the capacity of the coordination buffer 120 is overrun before the necessary corroborating updates have been received.
  • Other example implementations may allocate a portion of shared memory 130 (See e.g., FIG. 2) to gather pending updates still requiring validating updates from other systems. Such a configuration can manage pending updates from a hardware front end as described herein. While some multisystem architectures may have a tendency to have their timing drift relative to one another, this media controller 1 10 and coordination buffer 120 can tend to re-align (e.g., synchronize) the timing by aligning the completion indication of matching updates, for example.
  • each computing node from the set 140 may be assigned address ranges that are private, to be used to accumulate data before software algorithms gather the data intended for cross validation. Other address ranges may be shared, but not validated so that it may be used for communication between the cooperating systems to keep them in sync with one-another.
  • New cores or systems
  • non-volatile memories may be redundantly deployed, with mirroring, and/or RAIDing of the data from the compute node 140 across non-volatile memory modules with each independently validated.
  • the media controller 1 10 and coordination buffer enables straight-forward majority rule (e.g., nonstop) operation within the memory media controller 1 10, allows direct memory access performance without special hardware on the processor while providing the system reliability required of non-stop systems.
  • a variety of commodity processors may be applied to this computing space.
  • the management of pending transactions may be handled by a "machine-in-the-middle" that provides the described functionality before handing committed updates to standard memory modules.
  • the media controller 1 10 and coordination buffer 120 provides for quick comparison and sign off of matching writes to fulfill the expectations of processor load/store latencies.
  • FIG. 2 illustrates an example of a media controller 210 that employs a coordination buffer 220 to control access to a segmented shared memory 230.
  • the memory 230 can include a private memory segment 240.
  • the private memory segment 240 can be reserved for a given node from the set of nodes described above.
  • each of the nodes from the set of nodes can be assigned a separate private memory segment, where update requests to the private memory do not have to have corroboration by the memory controller 210 and can be directly updated upon request by the given node.
  • the memory 230 can also include shared non-controlled access memory 250.
  • the memory 250 can be accessed by multiple computing nodes upon request yet do not need to be validated by another computing node in the set before the memory controller proceeds to update the memory 250.
  • the memory 230 can also include a shared controlled access memory segment 260.
  • the memory segment 260 requires validation by the memory controller 210 before it can be modified. For example, validation can include the requirement that at least two computing nodes from a set of computing nodes generate the same update request before the memory controller 210 proceeds to modify the memory segment 260.
  • FIG. 3 illustrates an example of a network system 300 of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory.
  • the system 300 can be deployed as a fault tolerant system to serialize concurrent accesses by multiple redundancy controllers to fault tolerant memory according to an example of the present disclosure. It should be understood that the system 300 may include additional components and that one or more of the components described herein may be removed and/or modified without departing from a scope of the system 300.
  • the system 300 may include multiple computing nodes 300A-N (where the number of computing nodes is greater than or equal to 1 ), multiple redundancy controllers 302A-N, a network interconnect module 340, and memory modules 304A-M.
  • the multiple compute nodes 300A-N may be coupled to the memory modules 304A-M by the network interconnect module 340.
  • the memory modules 304A-M may be coupled to the memory modules 304A-M by the network interconnect module 340.
  • modules 304A-M may include media controllers 320A-M and memories 321 A-M. Each media controller, for instance, may communicate with its associated memory and control access to the memory.
  • the media controllers 320A-M provide access to regions of memory with each media controller including a respective coordination buffer (not shown) as disclosed herein.
  • the regions of memory can be accessed by multiple redundancy controllers 302A-N in the compute nodes 300A-N using access primitives such as read, write, lock, unlock, and so forth.
  • media controllers 320A-M may be accessed by multiple redundancy controllers (e.g., acting on behalf of multiple servers).
  • the memory 321 A-M may include volatile dynamic random access memory (DRAM) with battery backup, non-volatile phase change random access memory (PCRAM), spin transfer torque-magnetoresistive random access memory (STT-MRAM), resistive random access memory (reRAM), memristor, FLASH, or other types of memory devices.
  • DRAM volatile dynamic random access memory
  • PCRAM non-volatile phase change random access memory
  • STT-MRAM spin transfer torque-magnetoresistive random access memory
  • reRAM resistive random access memory
  • memristor FLASH
  • the memory may be solid state, persistent, dense, fast memory.
  • Fast memory can be memory having an access time similar to DRAM memory.
  • the redundancy controller 302A- M may maintain fault tolerance across the memory modules 304A-M.
  • redundancy controller 302A-M may receive read or write commands from one or more processors, I/O devices, or other sources. In response to these, it generates sequences of primitive accesses to multiple media controllers 320A-M. The redundancy controller 302A-M may also generate certain sequences of primitives independently, not directly resulting from processor commands. These include sequences used for scrubbing, initializing, migrating, or error-correcting memory, for example.
  • FIG. 4 illustrates an example of a method 400 to control access to a shared memory.
  • the method 400 includes comparing a set of data update requests generated from a set of computing nodes (e.g., via coordination buffer 120 of FIG. 1 ).
  • the method 400 includes determining if the set of data update requests from the set of computing nodes represent the same data (e.g., via media controller 1 10 of FIG. 1 ).
  • the method 400 includes modifying data in a shared memory if a subset of the data update requests generated from the set of computing nodes represents the same data (e.g., via media controller 1 10 of FIG. 1 ).
  • the method 400 can also include sending a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sending a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

A system includes a media controller to control data access from a set of computing nodes to a shared memory. A coordination buffer holds data update requests generated from the set of computing nodes to the media controller. The media controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer.

Description

MEDIA CONTROLLER WITH COORDINATION BUFFER
BACKGROUND
[0001] Nonstop class computing systems refer to systems that have redundant computing nodes such that if any one of the redundant computing nodes fails the remaining nodes continue system operations. Nonstop class systems also must detect process failures and errors before data is committed to persistent memory. Increasingly, these systems are employing new non-volatile memory (NVM) technologies operating at near main memory latencies to improve system
performance by allowing new flat memory hierarchies that permit servers to write directly to non-volatile memory as storage. These systems can also take advantage of both the shorter latencies offered by the NVM memory technology and the reduced software management layers that are necessary with the current separate memory and storage system architectures.
BRIEF DESCRIPTION OF THE DRAWINGS
[0002] FIG. 1 illustrates an example of a media controller that employs a coordination buffer to control access to a shared memory.
[0003] FIG. 2 illustrates an example of a media controller that employs a coordination buffer to control access to a segmented shared memory.
[0004] FIG. 3 illustrates an example of a network of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory.
[0005] FIG. 4 illustrates an example of a method to control access to a shared memory. DETAILED DESCRIPTION
[0006] This disclosure relates to a media controller that employs a coordination buffer to control access to a shared memory. The media controller controls data access from a set of computing nodes to the shared memory by processing data update requests from the set of computing nodes. The data update requests represent data that a given computing node desires to modify at a selected address of the shared memory. If data updates from a subset of computing nodes correlates (e.g., data requests from at least two nodes to the same shared memory address matches), the media controller enables the shared memory to be modified in accordance with the data update request. For example, if one computing node requests that data be updated to a given data value, the media controller can hold off the actual modification of memory until at least one other node requests the same update to the same shared memory address.
[0007] Different control configurations are possible in the media controller. In some examples, all computing nodes in the set of computing nodes may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory can commence. In another example, a proper subset of computing nodes (e.g., some number less than all of the computing nodes in the set) may request an update. If the subset of nodes generates the same update request, then the media controller updates the shared memory. The coordination buffer holds data update requests generated from the set of computing nodes to the memory controller where the data update requests represent desired updates to the shared memory. The memory controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer. In one example, the media controller determines if a subset of the set of computing nodes generate the same data update request before enabling the data in the shared memory to be modified. [0008] FIG. 1 illustrates an example of a media controller 1 10 that employs a coordination buffer 120 to control access to a shared memory 130. The shared memory 130 is typically a non-volatile memory (e.g., Memristor, PC RAM, Spin Torque, and so forth) although volatile memory can also be employed. The media controller 1 10 controls data access from a set of computing nodes 140, shown as nodes 1 though N, with N being a positive integer, to the shared memory 130 by processing data update requests from the set of computing nodes. The coordination buffer 120 holds the data update requests, shown as update request 1 though M with M being a positive integer, generated from the set of computing nodes 140 to the memory controller 1 10. The data update requests held in the coordination buffer 120 represent desired data updates to the shared memory 130 at a selected address of memory as requested from the respective computing node from the set 140. As disclosed herein, the media controller 1 10 and coordination buffer can be provided as a circuit and/or as part of a memory bus system to control access to the shared memory 130.
[0009] The media controller 1 10 enables data in the shared memory 130 to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer 120. In one example, the media controller 1 10 determines if a subset of the set of computing nodes 140 generate the same data update request before enabling the data in the shared memory 130 to be modified. The data update requests represent data that a given computing node from the set of nodes 140 desires to modify at a selected address of the shared memory 130. If data updates from a subset of computing nodes correlates (e.g., data from at least two nodes to the same shared memory address matches), the media controller 1 10 enables the shared memory 130 to be modified in accordance with the data update request. For example, if one computing node requests that data be updated to a given data value, the media controller 1 10 can hold off the actual modification of shared memory 130 until at least one other node from the set 140 requests the same update to the same shared memory address. [00010] Different control configurations are possible in the media controller 1 10. In some examples, all computing nodes in the set of computing nodes 140 may have to request the same data update before modification (e.g., write to shared memory address) of the shared memory 130 can commence. In another example, a proper subset of computing nodes (e.g., some number less than all of the computing nodes in the set) may request an update. If the proper subset of nodes (e.g., simple majority, predetermined number defining majority) generate the same update request, then the media controller 1 10 updates the shared memory 130. The shared memory 130 and media controller disclosed herein can be employed in one example to serialize concurrent accesses by multiple redundancy controllers to memory (e.g., See e.g., FIG. 3). Redundancy controllers, for instance, may access memory using redundant array of independent disks (RAID) algorithms and/or memory mirroring, to provide fault tolerance in the event of a shared memory module [304A-M] failure.
[00011] The computing nodes in the set 140 can include a central processing unit (CPU) that can include a single core or can include multiple cores where each core is given similar or dissimilar permissions by the media controller to access the shared memory 130. The CPU can also be bundled with other CPU's to perform a server and/or client function, for example. Multiple servers and/or clients can be employed to access the memory 130 via the media controller 1 10 (or controllers). Thus the media controller 1 10 can control and facilitate access to the memory 130 with respect to a single CPU core, multiple CPU cores, multiple servers, and/or multiple clients, for example.
[00012] In some examples, the media controller 1 10 can be provided as part of a memory bus architecture to provide access the shared memory 130. This can also include employment of a memory controller (not shown). In some examples, the functions of the memory controller and media controller 1 10 can be combined into a single integrated circuit. The media controller 1 10 controls aspects of the memory interface that are specific to the type of medium attached (e.g. various non-volatile memory types, DRAM, flash, and so forth). These may include, for example, media- specific decoding or interleave (e.g., Row/Column/Bank/Rank), media-specific wear management (e.g., Wear Leveling), media-specific error management (e.g., ECC correction, CRC detection, Wear-out relocation, device deletion), and/or media- specific optimization (e.g. conflict scheduling). If a memory controller is also employed, the memory controller controls aspects of the memory interface that are independent of media, but specific to the CPU or system features employed. This may include, for example, system address decoding (e.g., interleaving between multiple media controllers, if there are more than one), and redundancy features described below with respect to FIG. 3, for example (e.g., RAID, mirroring, and so forth).
[00013] With a memory based entity (e.g., no I/O directly controlled from a core) the comparison between multiple cores running the same application can be checked for proper operation by monitoring the changes they desire to make to shared memory 130. The memory subsystem including the media controller 1 10 (or device between the cores and the memory subsystem) receives an update request for change and waits for the "other" computing nodes from the set 140 to request the same operation before committing to the shared memory 130. Should the
computing nodes not agree on the change, the majority can rule via the media controller 1 10 whereas the non-matching computing node's update may be rejected.
[00014] The media controller 1 10 can track new update requests (e.g., writes) to shared memory 130 in the coordination buffer 120 that holds the transaction pending until additional writes to the same memory location from the set of coordinated validating systems (e.g., computing nodes) from the set 140 are received. The coordination buffer 120 can check any newly arriving writes against pending writes, and upon a match, compare the data to determine whether the update matches the pending transaction - or not. Any number of validating systems may be supported in this memory configuration.
[00015] A matching data update request can be recorded as a "vote for" the update, whereas a mismatch can be recorded as a "vote against" via the coordination buffer 120. If three or more updates are considered, then subsequent updates can continue to be accumulated until all (or subset) cooperating system updates are observed, or non-reporting systems are determined to be non-reporting and removed from the set 140. The prevalent data (e.g., that of the majority) can then be committed to shared memory 130. If only one update is received, then updating system from the set of computing nodes 140 can be determined to be suspect, and the update is discarded (e.g., after a predetermined period of time or after a number of events have occurred). When a conclusion is reached by examining the multiple updates of the data, write responses can be returned to all (or a subset) of updating systems, with success, or failure indicated to each requesting node to trigger a suitable response to mismatched data (e.g., success flag if data written or error flag if update request rejected by media controller).
[00016] Example implementations may achieve a tight consistency of data update timing, in which case a direct hardware implementation of the update tracking buffer 120 can be sufficient to collect and process pending updates within the expected time window. Pending updates may be rejected in this model when the capacity of the coordination buffer 120 is overrun before the necessary corroborating updates have been received. Other example implementations may allocate a portion of shared memory 130 (See e.g., FIG. 2) to gather pending updates still requiring validating updates from other systems. Such a configuration can manage pending updates from a hardware front end as described herein. While some multisystem architectures may have a tendency to have their timing drift relative to one another, this media controller 1 10 and coordination buffer 120 can tend to re-align (e.g., synchronize) the timing by aligning the completion indication of matching updates, for example.
[00017] Not all shared memory 130 need to be considered part of the validated address space. For example, each computing node from the set 140 may be assigned address ranges that are private, to be used to accumulate data before software algorithms gather the data intended for cross validation. Other address ranges may be shared, but not validated so that it may be used for communication between the cooperating systems to keep them in sync with one-another. New cores (or systems) can be added (e.g., in a non-stop redundancy system) by temporarily stopping operation, mirroring the memory image of that core to a new core, and then continuing operation of all cores. As noted previously, non-volatile memories may be redundantly deployed, with mirroring, and/or RAIDing of the data from the compute node 140 across non-volatile memory modules with each independently validated.
[00018] In one example, the media controller 1 10 and coordination buffer enables straight-forward majority rule (e.g., nonstop) operation within the memory media controller 1 10, allows direct memory access performance without special hardware on the processor while providing the system reliability required of non-stop systems. With this non-stop model, a variety of commodity processors may be applied to this computing space. Alternatively, the management of pending transactions may be handled by a "machine-in-the-middle" that provides the described functionality before handing committed updates to standard memory modules. The media controller 1 10 and coordination buffer 120 provides for quick comparison and sign off of matching writes to fulfill the expectations of processor load/store latencies.
[00019] FIG. 2 illustrates an example of a media controller 210 that employs a coordination buffer 220 to control access to a segmented shared memory 230. In this example, the memory 230 can include a private memory segment 240. The private memory segment 240 can be reserved for a given node from the set of nodes described above. In some examples, each of the nodes from the set of nodes can be assigned a separate private memory segment, where update requests to the private memory do not have to have corroboration by the memory controller 210 and can be directly updated upon request by the given node. The memory 230 can also include shared non-controlled access memory 250. The memory 250 can be accessed by multiple computing nodes upon request yet do not need to be validated by another computing node in the set before the memory controller proceeds to update the memory 250. The memory 230 can also include a shared controlled access memory segment 260. The memory segment 260 requires validation by the memory controller 210 before it can be modified. For example, validation can include the requirement that at least two computing nodes from a set of computing nodes generate the same update request before the memory controller 210 proceeds to modify the memory segment 260.
[00020] FIG. 3 illustrates an example of a network system 300 of computing nodes that communicate with media controllers that employs a coordination buffer to control access to a shared memory. The system 300 can be deployed as a fault tolerant system to serialize concurrent accesses by multiple redundancy controllers to fault tolerant memory according to an example of the present disclosure. It should be understood that the system 300 may include additional components and that one or more of the components described herein may be removed and/or modified without departing from a scope of the system 300. The system 300 may include multiple computing nodes 300A-N (where the number of computing nodes is greater than or equal to 1 ), multiple redundancy controllers 302A-N, a network interconnect module 340, and memory modules 304A-M.
[00021] The multiple compute nodes 300A-N may be coupled to the memory modules 304A-M by the network interconnect module 340. The memory
modules 304A-M may include media controllers 320A-M and memories 321 A-M. Each media controller, for instance, may communicate with its associated memory and control access to the memory. The media controllers 320A-M provide access to regions of memory with each media controller including a respective coordination buffer (not shown) as disclosed herein. The regions of memory can be accessed by multiple redundancy controllers 302A-N in the compute nodes 300A-N using access primitives such as read, write, lock, unlock, and so forth. In order to support aggregation or sharing of memory, media controllers 320A-M may be accessed by multiple redundancy controllers (e.g., acting on behalf of multiple servers). Thus, there can be a many-to-many relationship between redundancy controllers and media controllers. The memory 321 A-M may include volatile dynamic random access memory (DRAM) with battery backup, non-volatile phase change random access memory (PCRAM), spin transfer torque-magnetoresistive random access memory (STT-MRAM), resistive random access memory (reRAM), memristor, FLASH, or other types of memory devices. For example, the memory may be solid state, persistent, dense, fast memory. Fast memory can be memory having an access time similar to DRAM memory.
[00022] As described in the disclosed examples, the redundancy controller 302A- M may maintain fault tolerance across the memory modules 304A-M. The
redundancy controller 302A-M may receive read or write commands from one or more processors, I/O devices, or other sources. In response to these, it generates sequences of primitive accesses to multiple media controllers 320A-M. The redundancy controller 302A-M may also generate certain sequences of primitives independently, not directly resulting from processor commands. These include sequences used for scrubbing, initializing, migrating, or error-correcting memory, for example.
[00023] In view of the foregoing structural and functional features described above, an example method will be better appreciated with reference to FIG. 4.
While, for purposes of simplicity of explanation, the method is shown and described as executing serially, it is to be understood and appreciated that the method is not limited by the illustrated order, as parts of the method could occur in different orders and/or concurrently from that shown and described herein. Such method can be executed by various components and executed by an integrated circuit, computer, or a controller, for example.
[00024] FIG. 4 illustrates an example of a method 400 to control access to a shared memory. At 410, the method 400 includes comparing a set of data update requests generated from a set of computing nodes (e.g., via coordination buffer 120 of FIG. 1 ). At 420, the method 400 includes determining if the set of data update requests from the set of computing nodes represent the same data (e.g., via media controller 1 10 of FIG. 1 ). At 430, the method 400 includes modifying data in a shared memory if a subset of the data update requests generated from the set of computing nodes represents the same data (e.g., via media controller 1 10 of FIG. 1 ). The method 400 can also include sending a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sending a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.
[00025] What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, the invention is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Additionally, where the disclosure or claims recite "a," "an," "a first," or "another" element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term
"includes" means includes but not limited to, and the term "including" means including but not limited to. The term "based on" means based at least in part on.

Claims

CLAIMS What is claimed is:
1 . A circuit, comprising:
a media controller to control data access from a set of computing nodes to a shared memory; and
a coordination buffer that holds data update requests generated from the set of computing nodes to the media controller, wherein the media controller enables data in the shared memory to be modified in accordance with the data update requests based on a comparison of the data update requests in the coordination buffer.
2. The circuit of claim 1 , wherein the shared memory is employed by redundancy controllers in a redundant array of independent disks (RAID) or memory mirroring configuration.
3. The circuit of claim 1 , wherein each data update request in the coordination buffer functions as a vote from a respective member of the set of computing nodes for updating the shared memory.
4. The circuit of claim 3, wherein update requests for data to a given address of the shared memory that match at least one other update request function as positive votes and update requests that do not match data to the given address of shared memory function as negative votes.
5. The circuit of claim 4, wherein the media controller sends a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sends a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.
6. The circuit of claim 4, wherein an update request that does not receive a matching positive vote is discarded by the media controller after a predetermined period of time.
7. The circuit of claim 1 , wherein the media controller waits for the subset of the set of computing nodes to generate the same data update request before enabling the data in the shared memory to be modified.
8. The circuit of claim 7, wherein each of the plurality of computing nodes includes a redundancy controller to communicate with a plurality of media controllers.
9. The circuit of claim 1 , wherein the shared memory is segmented such that a portion of memory is designated as private memory to a given computing node, a portion of memory is designated as shared non-controlled access between computing nodes, and a portion of memory is designated as shared controlled access between computing nodes.
10. The circuit of claim 1 , wherein the media controller utilizes the coordination buffer to synchronize timing between multiple computing nodes from the set of computing nodes.
1 1 . A system, comprising:
a media controller to control data access from a set of computing nodes to a shared memory; and
a coordination buffer that holds data update requests generated from the set of computing nodes for comparison by the media controller, wherein the media controller determines if a subset of the set of computing nodes generate the same data update request before enabling the data in the shared memory to be modified.
12. The system of claim 1 1 , wherein update requests for data to a given address of the shared memory that match at least one other update request function as positive votes and update requests that do not match data to the given address of the shared memory function as negative votes.
13. The system of claim 12, wherein the media controller sends a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sends a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.
14. A method, comprising:
comparing, by a controller, a set of data update requests generated from a set of computing nodes;
determining, by the controller, if the set of data update requests from the set of computing nodes represent the same data; and
modifying, by the controller, data in a shared memory if a subset of the data update requests generated from the set of computing nodes represents the same data.
15. The method of claim 14, further comprising sending a success flag to computer nodes who have succeeded with a respective update request associated with a positive vote and sending a failure flag to computer nodes who have had been rejected for a respective update request associated with a negative vote.
PCT/US2014/062593 2014-10-28 2014-10-28 Media controller with coordination buffer WO2016068870A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2014/062593 WO2016068870A1 (en) 2014-10-28 2014-10-28 Media controller with coordination buffer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2014/062593 WO2016068870A1 (en) 2014-10-28 2014-10-28 Media controller with coordination buffer

Publications (1)

Publication Number Publication Date
WO2016068870A1 true WO2016068870A1 (en) 2016-05-06

Family

ID=55857987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/062593 WO2016068870A1 (en) 2014-10-28 2014-10-28 Media controller with coordination buffer

Country Status (1)

Country Link
WO (1) WO2016068870A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6601151B1 (en) * 1999-02-08 2003-07-29 Sun Microsystems, Inc. Apparatus and method for handling memory access requests in a data processing system
US20050108231A1 (en) * 2003-11-17 2005-05-19 Terrascale Technologies Inc. Method for retrieving and modifying data elements on a shared medium
US20070050574A1 (en) * 2005-09-01 2007-03-01 Hitachi, Ltd. Storage system and storage system management method
WO2008047070A1 (en) * 2006-10-17 2008-04-24 Arm Limited Handling of write access requests to shared memory in a data processing apparatus
US20140215159A1 (en) * 2010-09-23 2014-07-31 International Business Machines Corporation Managing concurrent accesses to a cache

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6601151B1 (en) * 1999-02-08 2003-07-29 Sun Microsystems, Inc. Apparatus and method for handling memory access requests in a data processing system
US20050108231A1 (en) * 2003-11-17 2005-05-19 Terrascale Technologies Inc. Method for retrieving and modifying data elements on a shared medium
US20070050574A1 (en) * 2005-09-01 2007-03-01 Hitachi, Ltd. Storage system and storage system management method
WO2008047070A1 (en) * 2006-10-17 2008-04-24 Arm Limited Handling of write access requests to shared memory in a data processing apparatus
US20140215159A1 (en) * 2010-09-23 2014-07-31 International Business Machines Corporation Managing concurrent accesses to a cache

Similar Documents

Publication Publication Date Title
US11698844B2 (en) Managing storage systems that are synchronously replicating a dataset
US11086740B2 (en) Maintaining storage array online
US7996608B1 (en) Providing redundancy in a storage system
US8255739B1 (en) Achieving data consistency in a node failover with a degraded RAID array
US11467732B2 (en) Data storage system with multiple durability levels
US20200083909A1 (en) Data storage system with enforced fencing
US10146646B1 (en) Synchronizing RAID configuration changes across storage processors
US8010829B1 (en) Distributed hot-spare storage in a storage cluster
US9384065B2 (en) Memory array with atomic test and set
US8521685B1 (en) Background movement of data between nodes in a storage cluster
US20140281138A1 (en) Synchronous mirroring in non-volatile memory systems
US10521316B2 (en) System and method for handling multi-node failures in a disaster recovery cluster
US20200117377A1 (en) Serializing access to fault tolerant memory
US10402113B2 (en) Live migration of data
US20210406280A1 (en) Non-disruptive transition to synchronous replication state
US10649764B2 (en) Module mirroring during non-disruptive upgrade
WO2016068870A1 (en) Media controller with coordination buffer
US9336102B2 (en) Systems and methods for preventing input/output performance decrease after disk failure in a distributed file system
US11467736B1 (en) Dropped write detection and correction
US11372730B2 (en) Method and system for offloading a continuous health-check and reconstruction of data in a non-accelerator pool
US20220342767A1 (en) Detecting corruption in forever incremental backups with primary storage systems
US20240143189A1 (en) Mapped raid configuration with multiple disk groups and a shared pool of hot spare extents
CN116257177A (en) Distributed storage system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14904954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14904954

Country of ref document: EP

Kind code of ref document: A1