WO2015088483A1 - Nosql database data consistency - Google Patents

Nosql database data consistency Download PDF

Info

Publication number
WO2015088483A1
WO2015088483A1 PCT/US2013/073933 US2013073933W WO2015088483A1 WO 2015088483 A1 WO2015088483 A1 WO 2015088483A1 US 2013073933 W US2013073933 W US 2013073933W WO 2015088483 A1 WO2015088483 A1 WO 2015088483A1
Authority
WO
WIPO (PCT)
Prior art keywords
heartbeat
consistency
barrier
database
timestamp
Prior art date
Application number
PCT/US2013/073933
Other languages
French (fr)
Inventor
Kimberly Keeton
Charles B. Morrey Iii
Sebastien TANDEL
Hamilton de Freitas COUTINHO
Joaquim Gomes Da Costa Eulalio DE SOUZA
Lucas Holz Boffo
Oskar Y. BATUNER
Alistair Veitch
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2013/073933 priority Critical patent/WO2015088483A1/en
Publication of WO2015088483A1 publication Critical patent/WO2015088483A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity

Definitions

  • Eventual consistency means that data in a database might be inconsistent cluster-wide at certain points in time, but over time, eventually become consistent. However, in scenarios where the inconsistency results from events generated outside the database, attempting to apply out-of-order updates to the database may result in execution errors.
  • FIG. 1 is a block diagram of an example system for NoSQL database data consistency, according to an example
  • FIG. 2 is a process flow chart of a method for NoSQL database data consistency, according to an example
  • Fig. 3 is an example execution of the method for NoSQL database data consistency, according to an example
  • FIG. 4 is a block diagram of an example system for NoSQL database data consistency, according to an example.
  • Fig. 5 is a block diagram showing an example tangible, non-transitory, machine-readable medium that stores code for NoSQL database data consistency, according to an example.
  • events for updating the database may be generated from multiple sources.
  • the order of events might not be respected when the updates are applied to the database. For example, an event that creates a table row may arrive after an event that deletes the same row.
  • the database may provide a result to a query that indicates the deleted row is still in existence.
  • results may be returned with incomplete data.
  • Another issue with inconsistent databases is that users querying the data do not know what data may be inconsistent, or when the data becomes outdated.
  • a NoSQL database management system applies updates to database tables in a consistent way, even though a process ingesting updates into the DB is not able to give strict guarantees on that order of ingestion. Further, this method provides useful timing information regarding the freshness of the data.
  • a consistency barrier is provided to resolve these issues. The consistency barrier specifies at what point in time events across the cluster have all been applied to the database. This enables NoSQL database systems to prevent events older than the consistency barrier from updating the database, and have their databases end in a consistent state.
  • Fig. 1 is a block diagram of a system 100 for NoSQL database data consistency, according to an example.
  • the system 1 00 includes a NoSQL DBMS 1 02 running on clusters 104 composed of tens of servers 106.
  • the NoSQL DBMS 102 uses a share nothing architecture. As such, the servers 106 do not maintain a state for their databases 108, but are coordinated by a master 1 1 0.
  • the DBMS 1 02 uses a service model that separates update processing from read-only queries. In this service model, updates to the database 108 (e.g., adds, modifies, and deletes) are observational, meaning that data updates provide new values that overwrite, or delete, existing data.
  • Each of the servers 106 may be the owner of some parts of specific databases 108 stored thereon.
  • the databases 108 include object tables 1 12 where the update events are applied.
  • the DBMS 102 uses a pipeline 1 14 to process the update events.
  • the pipeline 1 14 includes three stages: ingest 1 16, sort 1 18, and merge 1 20. Once ingested, the update events are sorted and merged with the existing object tables 1 12. Each stage may be performed by one or more worker processes (workers). Specifically, the ingest stage 1 1 6 is performed by the scanner 122. The operation of the pipeline 1 12 is coordinated by the master process 1 10.
  • the NoSQL DBMS 102 provides scalable, high-throughput ingest of updates, and quick read queries to a stale- but-consistent version of the data.
  • the pipeline 1 14 also includes an id remapping stage 132, which transforms initial, temporary identifiers of update events into global identifiers.
  • a scanner 122 selects events for the ingest stage based on a delete table 124 and a freshness table 1 26.
  • the asynchronous processing of the DBMS 1 02 means that queries that read the object tables 1 12 provide results that represent an out-of-date view of the data.
  • the freshness table 126 provides a range, delimited by the oldest and newest of timestamps of the events provided by the individual sources.
  • a scanner 122 generates a consistency barrier 130 that provides information about the progress of all the servers 106 in the cluster 104.
  • Each server 106 sends heartbeats to the scanner with a predetermined frequency.
  • the heartbeat is a timestamp generated by a kernel 128 the server 106.
  • the scanner 122 stores the most recent heartbeat for each server 106 in the freshness table 126, which is a private table, internal to the database 108. In this way, an upper bound on the possible delay of any heartbeat has been specified, making it possible to define a notion of progress in time of each server 106 in the cluster 1 04.
  • the consistency barrier 130 is the oldest, i.e., minimum, timestamp of all the heartbeats stored in the freshness table 126.
  • the consistency barrier 130 provides the guarantee no event will be ingested with a timestamp older than the consistency barrier 130.
  • databases 1 08 often rely on clock synchronization, vector clocks or consensus protocols, each with their own advantages and drawbacks.
  • NTP Network Time Protocol
  • Hardware such as a global positioning system (GPS) provides better precision than NTP, but increases costs, and 2) may lose accuracy in the event of system failures.
  • Lamport Clock is used to order the events.
  • the Lamport Clock value is passed as an opaque value together with the timestamp associated with the update event.
  • the scanner 122 uses this value to distinguish events that happen close together in time.
  • a Lamport Clock typically relies on the underlying system determining an order of the events; however, such an ordering would hinder the
  • Lamport clock is used for each object in the object tables 1 12. Accordingly, each event related to a specific object increments the Lamport clock of related object. This approach allows better scaling and guarantees a strict order between the events, avoiding inconsistent states in the database 108.
  • Some updates may be applied to multiple rows of the object tables 1 12. Because dependent events might arrive out-of-order, the multi-row updates may be applied over a period of time. For example, in the case of a row deleted from the object tables 1 1 2, associated rows that arrive out of order may be prevented from being ingested into the pipeline 1 14.
  • the deletes table 124 stores the events representing the deleted rows to enable such functionality.
  • the deletes table 124 may be a private table referenced by the scanner 1 22 during pipeline processing. Additionally, the consistency barrier 1 30 provides a way to reliably discard the delete events from the deletes table 1 24, to avoid applying them perpetually.
  • the freshness table 126 may be used to provide information to read query clients at query time by returning the query result along with a tuple representing the consistency barrier, along with the timestamp for the most recent heartbeat of all the servers.
  • the consistency barrier 1 30 indicates that all events across the cluster 104 that precede the consistency barrier 1 30 have applied to the object tables 1 1 2.
  • the consistency barrier 130 also gives clients further hints about server failures or stalls. For example, if the
  • consistency barrier 130 does not change over the course of multiple queries, this may indicate that one of the servers 106 is stuck. This may enable a user to act upon such a scenario sooner than possible using current methods.
  • Fig. 2 is a process flow chart of an example method 200 for NoSQL database data consistency, according to an example.
  • the method 200 begins at block 202, for each event provided by a server 106.
  • the scanner 122 updates the freshness table 126 in response to heartbeat events from the servers 106.
  • the scanner 122 may determine the consistency barrier 130 based on the row in the freshness table 126 containing the oldest heartbeat timestamp.
  • the scanner 122 may remove delete events older or equal to the consistency barrier from the deletes table 124.
  • the scanner 1 22 may determine whether the Lamport Clock for the event is older than the Lamport Clock for the object associated with the event. If not, the event may be ingested by the pipeline 1 14. If so, the event is ignored, and control flows back to block 202, where the next event is processed.
  • the consistency barrier 130 and the most recent heartbeat timestamp in the freshness table is presented with the results from the query. In this way, customers are enabled to create efficient queries against the database 108. For example, subsequent queries can be limited to retrieving data updates applied after the consistency barrier 130 of a previous query.
  • Fig. 3 is an example execution of the method for NoSQL database data consistency, according to an example.
  • An example implementation of the system 1 00 includes a scale-out network-attached storage (NAS) solution capable of managing hundreds of millions of files and up to 16PB of data in a single namespace.
  • NAS network-attached storage
  • This system provides rich metadata search features through the use of the NoSQL DBMS.
  • file system events e.g., create, delete, read, write, rename
  • system metadata e.g., file size, owner, last modified time.
  • the implementation also manages custom metadata tags that can be added to files by the users.
  • the file system is distributed across many different servers 106, and each server generates a stream of file system events that are batched together, and ingested into the NoSQL DBMS 1 02.
  • Certain operations such as renaming a top-level directory, or deleting all custom attributes associated with a particular file, result in changes to multiple rows of the object tables 1 1 2 (e.g., all files in the directory for rename). To maintain consistency, this multi-row operation appears atomic from the client's perspective.
  • some of the target rows for these multi-row operations may still be in the pipeline 1 14 when the multi-row operation is received (e.g., a new file is created in a directory before the directory is renamed). In such a scenario, the multi-row operation is applied to all target rows, even those in the pipeline 1 14.
  • Fig. 3 represents an example execution 300 of the method 200 against such an example implementation, including a cluster 104 of 3 servers: nodes 1 , 2, and 3.
  • the example execution 300 shows events 302, grouped in columns, representing successive batches for the various nodes. Additionally, the contents of example object tables, FileObject table 304 and attribute key value (AKV) table 306.
  • the FileObject table 304 contains file metadata
  • the AKV table 306 contains custom metadata tags associated with the files represented in the FileObject table.
  • each column represents a batch of updates from one node being applied to the database: the top of the column shows the batched updates, and the bottom shows the results applied to the tables.
  • HB heartbeat
  • the FileObject table has a new row inserted for file, foo; AKV table includes new rows for each new key value, and the freshness table 308 has a new row inserted for Nodel , with the same timestamp, t1 .
  • the freshness table 308 is assumed to have the rows for nodes 2 and 3 before the events 302 in column 1 .
  • the batched events come from node2 and consist of 2 operations: 1 ) the HB from node 2 , and 2) another update to user-defined key- value, k1 for foo.
  • the freshness table 308 is updated with the heartbeat timestamp for node2.
  • the batched events come from node3.
  • there is an update event for the deletion of the file, foo As such, the file has been deleted from File Object 304. Additionally, the rows in the AKV table associated with foo are also deleted.
  • the consistency barrier 130 is t1 , as represented by the freshness table 308. As such, it is possible that update events with a timestamp greater than t1 might yet arrive. As such, the delete operation is inserted into the Deletes table 310 until the consistency barrier passes t2.
  • Express Query provides a few features no traditional NoSQL is able to offer to its clients: 1 ) atomic updates (from client's perspective) of multiple rows in the face of asynchronous updates coming from different sources, 2) a flexible way to choose the accuracy of the clock synchronization (NTP, GPS, Paxos, etc.) by the application and managed internally by the database, 3) information regarding the freshness of the data, enabling clients to configure their process, programs efficiently, as well as to react faster to some failure domains.
  • Fig. 4 is a block diagram of an example system 400 for NoSQL database data consistency, in accordance with embodiments.
  • the functional blocks and devices shown in Fig. 4 may include hardware elements including circuitry, software elements including computer code stored on a tangible, non- transitory, machine-readable medium, or a combination of both hardware and software elements. Additionally, the functional blocks and devices of the system 400 are but one example of functional blocks and devices that may be implemented in examples.
  • the system 400 can include any number of computing devices, such as cell phones, personal digital assistants (PDAs), computers, servers, laptop computers, or other computing devices.
  • PDAs personal digital assistants
  • the example system 400 can includes clusters of database servers 402 having one or more processors 404 connected through a bus 406 to a storage 408.
  • the storage 408 is a tangible, computer-readable media for the storage of operating software, data, and programs, such as a hard drive or system memory.
  • the storage 408 may include, for example, the basic input output system (BIOS) (
  • the memory 408 includes a DBMS 410, a scanner 412, a freshness table 414, a deletes table 416, object tables 420, and a kernel 422.
  • the scanner 412 operates according to the techniques described herein.
  • the server 402 can be connected through the bus 404 to a network interface card (NIC) 422.
  • the NIC 422 can connect the database server 402 to a network 424 that connects the servers 402 of a cluster to various clients (not shown) that provide the queries.
  • the network 424 may be a local area network (LAN), a wide area network (WAN), or another network configuration.
  • the network 424 may include routers, switches, modems, or any other kind of interface devices used for interconnection. Further, the network 424 may include the Internet or a corporate network.
  • Fig. 5 is a block diagram showing an example tangible, non-transitory, machine-readable medium 500 that stores code for NoSQL database data consistency, according to an example.
  • the machine-readable medium is generally referred to by the reference number 500.
  • the machine-readable medium 500 may correspond to any typical storage device that stores computer-implemented instructions, such as programming code or the like.
  • the machine-readable medium 500 may be included in the storage 408 shown in Fig. 4.
  • the machine-readable medium 500 include a DBMS 506, which includes a scanner 508 that performs the techniques described herein.
  • a heartbeat timestamp value associated with the heartbeat event is inserted into a private database table.
  • the scanner determines a consistency barrier for a database based on the heartbeat timestamps stored in the private table.
  • An update event for the database is ingested if the heartbeat timestamp for the update event is more recent than the consistency barrier.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for maintaining database data consistency for a NoSQL database management system (DBMS) includes, in response to a heartbeat event for one of a plurality of servers of a cluster, inserting a heartbeat timestamp value associated with the heartbeat event in a private database table. The method also includes determining a consistency barrier for a database based on a plurality of heartbeat timestamps stored in the private table. The database includes the private database table. The method further includes ingesting an update event for the database if a heartbeat timestamp associated with the update event is more recent than the heartbeat timestamp value for the one server, the heartbeat timestamp value comprising a consistency barrier.

Description

NOSQL DATABASE DATA CONSISTENCY
BACKGROUND
[0001] Not only SQL (NoSQL) database systems are increasingly used in Big Data environments with distributed clusters of servers. These systems store and retrieve data using less constrained consistency models than traditional relational databases, which allow for rapid access to, and retrieval of, their data.
[0002] The NoSQL databases implement the concept of eventual
consistency. Eventual consistency means that data in a database might be inconsistent cluster-wide at certain points in time, but over time, eventually become consistent. However, in scenarios where the inconsistency results from events generated outside the database, attempting to apply out-of-order updates to the database may result in execution errors.
BRIEF DESCRIPTION OF THE DRAWINGS
[0003] Certain examples are described in the following detailed description and in reference to the drawings, in which:
[0004] Fig. 1 is a block diagram of an example system for NoSQL database data consistency, according to an example;
[0005] Fig. 2 is a process flow chart of a method for NoSQL database data consistency, according to an example;
[0006] Fig. 3 is an example execution of the method for NoSQL database data consistency, according to an example;
[0007] Fig. 4 is a block diagram of an example system for NoSQL database data consistency, according to an example; and
[0008] Fig. 5 is a block diagram showing an example tangible, non-transitory, machine-readable medium that stores code for NoSQL database data consistency, according to an example.
DETAILED DESCRIPTION
[0009] In distributed systems, such as NoSQL, events for updating the database may be generated from multiple sources. As such, the order of events might not be respected when the updates are applied to the database. For example, an event that creates a table row may arrive after an event that deletes the same row. In such a scenario, the database may provide a result to a query that indicates the deleted row is still in existence. Furthermore, results may be returned with incomplete data. Another issue with inconsistent databases is that users querying the data do not know what data may be inconsistent, or when the data becomes outdated.
[0010] In an example method, a NoSQL database management system (DBMS) applies updates to database tables in a consistent way, even though a process ingesting updates into the DB is not able to give strict guarantees on that order of ingestion. Further, this method provides useful timing information regarding the freshness of the data. In an example, a consistency barrier is provided to resolve these issues. The consistency barrier specifies at what point in time events across the cluster have all been applied to the database. This enables NoSQL database systems to prevent events older than the consistency barrier from updating the database, and have their databases end in a consistent state.
[0011] Fig. 1 is a block diagram of a system 100 for NoSQL database data consistency, according to an example. The system 1 00 includes a NoSQL DBMS 1 02 running on clusters 104 composed of tens of servers 106. The NoSQL DBMS 102 uses a share nothing architecture. As such, the servers 106 do not maintain a state for their databases 108, but are coordinated by a master 1 1 0. Additionally, the DBMS 1 02 uses a service model that separates update processing from read-only queries. In this service model, updates to the database 108 (e.g., adds, modifies, and deletes) are observational, meaning that data updates provide new values that overwrite, or delete, existing data.
[0012] Each of the servers 106 may be the owner of some parts of specific databases 108 stored thereon. The databases 108 include object tables 1 12 where the update events are applied. The DBMS 102 uses a pipeline 1 14 to process the update events. The pipeline 1 14 includes three stages: ingest 1 16, sort 1 18, and merge 1 20. Once ingested, the update events are sorted and merged with the existing object tables 1 12. Each stage may be performed by one or more worker processes (workers). Specifically, the ingest stage 1 1 6 is performed by the scanner 122. The operation of the pipeline 1 12 is coordinated by the master process 1 10. In this way, the NoSQL DBMS 102 provides scalable, high-throughput ingest of updates, and quick read queries to a stale- but-consistent version of the data. The pipeline 1 14 also includes an id remapping stage 132, which transforms initial, temporary identifiers of update events into global identifiers.
[0013] Because updates are processed asynchronously, it may not be possible to determine the most current value of a column in the object tables 1 1 2. Rather, the most recent update may still be being processed by the pipeline 1 14. In an example, a scanner 122 selects events for the ingest stage based on a delete table 124 and a freshness table 1 26.
[0014] The asynchronous processing of the DBMS 1 02 means that queries that read the object tables 1 12 provide results that represent an out-of-date view of the data. However, because multiple sources upload the update events, which are batched independently for different sources, it is not possible to provide a single point-in-time view of the entire system's data. Instead, the freshness table 126 provides a range, delimited by the oldest and newest of timestamps of the events provided by the individual sources. In an example, a scanner 122 generates a consistency barrier 130 that provides information about the progress of all the servers 106 in the cluster 104. Each server 106 sends heartbeats to the scanner with a predetermined frequency. The heartbeat is a timestamp generated by a kernel 128 the server 106.
[0015] The scanner 122 stores the most recent heartbeat for each server 106 in the freshness table 126, which is a private table, internal to the database 108. In this way, an upper bound on the possible delay of any heartbeat has been specified, making it possible to define a notion of progress in time of each server 106 in the cluster 1 04. The consistency barrier 130 is the oldest, i.e., minimum, timestamp of all the heartbeats stored in the freshness table 126. The consistency barrier 130 provides the guarantee no event will be ingested with a timestamp older than the consistency barrier 130. [0016] To address the challenge of ordering events, databases 1 08 often rely on clock synchronization, vector clocks or consensus protocols, each with their own advantages and drawbacks. Network Time Protocol (NTP) is inexpensive, but does not provide a resolution lower than 20 microseconds. Hardware, such as a global positioning system (GPS), provides better precision than NTP, but increases costs, and 2) may lose accuracy in the event of system failures.
Options such as, a vector clock or Paxos work well, but can potentially increase the complexity of the system 1 00 and hinder the performance. Thus, in an example, a Lamport Clock is used to order the events. The Lamport Clock value is passed as an opaque value together with the timestamp associated with the update event. The scanner 122 uses this value to distinguish events that happen close together in time.
[0017] A Lamport Clock typically relies on the underlying system determining an order of the events; however, such an ordering would hinder the
performance of the system 100. As such, in an example, a Lamport clock is used for each object in the object tables 1 12. Accordingly, each event related to a specific object increments the Lamport clock of related object. This approach allows better scaling and guarantees a strict order between the events, avoiding inconsistent states in the database 108.
[0018] Some updates may be applied to multiple rows of the object tables 1 12. Because dependent events might arrive out-of-order, the multi-row updates may be applied over a period of time. For example, in the case of a row deleted from the object tables 1 1 2, associated rows that arrive out of order may be prevented from being ingested into the pipeline 1 14. In an example, the deletes table 124 stores the events representing the deleted rows to enable such functionality. The deletes table 124 may be a private table referenced by the scanner 1 22 during pipeline processing. Additionally, the consistency barrier 1 30 provides a way to reliably discard the delete events from the deletes table 1 24, to avoid applying them perpetually.
[0019] Further, the freshness table 126 may be used to provide information to read query clients at query time by returning the query result along with a tuple representing the consistency barrier, along with the timestamp for the most recent heartbeat of all the servers. The consistency barrier 1 30 indicates that all events across the cluster 104 that precede the consistency barrier 1 30 have applied to the object tables 1 1 2. Further, the consistency barrier 130 also gives clients further hints about server failures or stalls. For example, if the
consistency barrier 130 does not change over the course of multiple queries, this may indicate that one of the servers 106 is stuck. This may enable a user to act upon such a scenario sooner than possible using current methods.
[0020] Fig. 2 is a process flow chart of an example method 200 for NoSQL database data consistency, according to an example. The method 200 begins at block 202, for each event provided by a server 106. At block 204, the scanner 122 updates the freshness table 126 in response to heartbeat events from the servers 106.
[0021] At block 206, the scanner 122 may determine the consistency barrier 130 based on the row in the freshness table 126 containing the oldest heartbeat timestamp. At block 208, the scanner 122 may remove delete events older or equal to the consistency barrier from the deletes table 124.
[0022] At block 210, in response to an update event, the scanner 1 22 may determine whether the Lamport Clock for the event is older than the Lamport Clock for the object associated with the event. If not, the event may be ingested by the pipeline 1 14. If so, the event is ignored, and control flows back to block 202, where the next event is processed. At block 214, in response to read-only queries against the object tables 1 12, the consistency barrier 130 and the most recent heartbeat timestamp in the freshness table is presented with the results from the query. In this way, customers are enabled to create efficient queries against the database 108. For example, subsequent queries can be limited to retrieving data updates applied after the consistency barrier 130 of a previous query.
[0023] Fig. 3 is an example execution of the method for NoSQL database data consistency, according to an example. An example implementation of the system 1 00 includes a scale-out network-attached storage (NAS) solution capable of managing hundreds of millions of files and up to 16PB of data in a single namespace. This system provides rich metadata search features through the use of the NoSQL DBMS. In the example implementation, file system events (e.g., create, delete, read, write, rename) are automatically tracked and used to update system metadata (e.g., file size, owner, last modified time). The implementation also manages custom metadata tags that can be added to files by the users. In this architecture, the file system is distributed across many different servers 106, and each server generates a stream of file system events that are batched together, and ingested into the NoSQL DBMS 1 02. Certain operations, such as renaming a top-level directory, or deleting all custom attributes associated with a particular file, result in changes to multiple rows of the object tables 1 1 2 (e.g., all files in the directory for rename). To maintain consistency, this multi-row operation appears atomic from the client's perspective. However, as discussed previously, because updates are processed asynchronously, some of the target rows for these multi-row operations may still be in the pipeline 1 14 when the multi-row operation is received (e.g., a new file is created in a directory before the directory is renamed). In such a scenario, the multi-row operation is applied to all target rows, even those in the pipeline 1 14.
[0024] Current approaches may order the events before applying any changes to the tables of a database 108. For example, Paxos or Zab protocols could order the events if implemented at the file system level. However, impact on performance is computationally expensive. Further, even with such a protocol, there would not be any information guaranteeing the age of the information in the database.
[0025] Fig. 3 represents an example execution 300 of the method 200 against such an example implementation, including a cluster 104 of 3 servers: nodes 1 , 2, and 3. The example execution 300 shows events 302, grouped in columns, representing successive batches for the various nodes. Additionally, the contents of example object tables, FileObject table 304 and attribute key value (AKV) table 306. The FileObject table 304 contains file metadata, and the AKV table 306 contains custom metadata tags associated with the files represented in the FileObject table. In the example execution, each column represents a batch of updates from one node being applied to the database: the top of the column shows the batched updates, and the bottom shows the results applied to the tables.
[0026] In column 1 , the batched events come from nodel and consist of 4 operations: 1 ) the creation of the file foo at to.1 , to being the timestamp and 1 , the Lamport Clock, 2) the creation of user-defined key-value, k1 =v2 and k2=v2 associated with the file foo at, respectively, t1 .3 and t2.4 and 3) a heartbeat (HB) from Nodel at timestamp t1 , meaning no further events are to come from Nodel that have a timestamp smaller or equal to t1 . In response to the batched events 302, the FileObject table has a new row inserted for file, foo; AKV table includes new rows for each new key value, and the freshness table 308 has a new row inserted for Nodel , with the same timestamp, t1 . In this example, the freshness table 308 is assumed to have the rows for nodes 2 and 3 before the events 302 in column 1 .
[0027] In column 2, the batched events come from node2 and consist of 2 operations: 1 ) the HB from node 2 , and 2) another update to user-defined key- value, k1 for foo. The freshness table 308 is updated with the heartbeat timestamp for node2. However, there is already a key-value, for the file, foo, with the same key (created at t1 .3). Since the Lamport Clock of the batched event from this object is greater than 2, this event is not ingested, avoiding an inconsistency that would otherwise have resulted from this out-of-order event.
[0028] In column 3, the batched events come from node3. In addition to the heartbeat event, there is an update event for the deletion of the file, foo. As such, the file has been deleted from File Object 304. Additionally, the rows in the AKV table associated with foo are also deleted. It is noted that the consistency barrier 130 is t1 , as represented by the freshness table 308. As such, it is possible that update events with a timestamp greater than t1 might yet arrive. As such, the delete operation is inserted into the Deletes table 310 until the consistency barrier passes t2.
[0029] In column 4, there is only one update event, creating a user-defined key-value for the file foo at t2.5. This is an out-of-order event and because there is a delete operation of file foo in the Deletes table 310, this event is not ingested, and hence, not inserted into the AKV table 306. [0030] Column 5 shows one heartbeat event, demonstrating that each node sends a heartbeat even if no update event occurs on the node. In this way, the consistency barrier continues to progress. Accordingly, the freshness table 308 is updated, the consistency barrier reaches t2, and, as a result, the entry in Deletes table 310 is removed.
[0031] As a result of the execution 300, it is possible to provide information about the data freshness to customers when reading data from the database 108. Hence, if a client reads data from FileObject 304 at the 2nd batched update, the result set will contain the file foo, and a tuple {t1 , t2}, indicating that no further data will be ingested into the database 108 that has a timestamp < t1 . If the same client executes the same request later, it will receive an empty set and the tuple {t2, t3}, meaning that the cluster progressed in time and now all the data until t2 has been processed by the pipeline 1 14.
[0032] Express Query provides a few features no traditional NoSQL is able to offer to its clients: 1 ) atomic updates (from client's perspective) of multiple rows in the face of asynchronous updates coming from different sources, 2) a flexible way to choose the accuracy of the clock synchronization (NTP, GPS, Paxos, etc.) by the application and managed internally by the database, 3) information regarding the freshness of the data, enabling clients to configure their process, programs efficiently, as well as to react faster to some failure domains.
[0033] Fig. 4 is a block diagram of an example system 400 for NoSQL database data consistency, in accordance with embodiments. The functional blocks and devices shown in Fig. 4 may include hardware elements including circuitry, software elements including computer code stored on a tangible, non- transitory, machine-readable medium, or a combination of both hardware and software elements. Additionally, the functional blocks and devices of the system 400 are but one example of functional blocks and devices that may be implemented in examples. The system 400 can include any number of computing devices, such as cell phones, personal digital assistants (PDAs), computers, servers, laptop computers, or other computing devices. [0034] The example system 400 can includes clusters of database servers 402 having one or more processors 404 connected through a bus 406 to a storage 408. The storage 408 is a tangible, computer-readable media for the storage of operating software, data, and programs, such as a hard drive or system memory. The storage 408 may include, for example, the basic input output system (BIOS) (not shown).
[0035] In an example, the memory 408 includes a DBMS 410, a scanner 412, a freshness table 414, a deletes table 416, object tables 420, and a kernel 422. The scanner 412 operates according to the techniques described herein. The server 402 can be connected through the bus 404 to a network interface card (NIC) 422. The NIC 422 can connect the database server 402 to a network 424 that connects the servers 402 of a cluster to various clients (not shown) that provide the queries. The network 424 may be a local area network (LAN), a wide area network (WAN), or another network configuration. The network 424 may include routers, switches, modems, or any other kind of interface devices used for interconnection. Further, the network 424 may include the Internet or a corporate network.
[0036] Fig. 5 is a block diagram showing an example tangible, non-transitory, machine-readable medium 500 that stores code for NoSQL database data consistency, according to an example. The machine-readable medium is generally referred to by the reference number 500. The machine-readable medium 500 may correspond to any typical storage device that stores computer-implemented instructions, such as programming code or the like. Moreover, the machine-readable medium 500 may be included in the storage 408 shown in Fig. 4. The machine-readable medium 500 include a DBMS 506, which includes a scanner 508 that performs the techniques described herein.
[0037] In response to a heartbeat event for a server in a cluster, a heartbeat timestamp value associated with the heartbeat event is inserted into a private database table. The scanner determines a consistency barrier for a database based on the heartbeat timestamps stored in the private table. An update event for the database is ingested if the heartbeat timestamp for the update event is more recent than the consistency barrier.

Claims

CLAIMS What is claimed is:
1 . A method for maintaining database data consistency for a NoSQL database management system (DBMS), the method comprising:
in response to a heartbeat event for one of a plurality of servers of a cluster, inserting a heartbeat timestamp value associated with the heartbeat event in a freshness database table; determining a consistency barrier for a database comprising the freshness database table based on a plurality of heartbeat timestamps stored in the freshness database table; and ingesting an update event for the database if a heartbeat
timestamp associated with the update event is more recent than the heartbeat timestamp value for the one server, the heartbeat timestamp value comprising a consistency barrier.
2. The method of claim 1 , wherein the consistency barrier comprises an oldest heartbeat timestamp of the heartbeat timestamps of the private database table.
3. The method of claim 2, comprising inserting a row in an object table comprising an object associated with the update event.
4. The method of claim 3, comprising performing a READ-ONLY query against the database, a result of the query comprising one or more rows of the object table, the consistency barrier, and a most recent heartbeat timestamp of the heartbeat timestamps.
5. The method of claim 4, the result comprising a node associated with the consistency barrier, and a node associated with the most recent heartbeat timestamp.
6. The method of claim 5, comprising determining that the node associated with the consistency barrier is experiencing a delay based on the consistency barrier and a predetermined upper bound.
7. The method of claim 6, comprising presenting an alert for the delay.
8. A system, comprising:
a plurality of clusters, each comprising a plurality of servers, the servers comprising:
a memory with
computer-implemented instructions that when executed by a
processor direct the processor to:
in response to a heartbeat event for one the servers a cluster, insert the heartbeat timestamp value in a private table; determine a consistency barrier for a comprising the private table based on a plurality of heartbeat timestamps stored in the private table; and
ingest an update event if a heartbeat timestamp associated with the update event is more recent than the heartbeat timestamp, the heartbeat timestamp comprising a consistency barrier.
9. The system of claim 8, wherein the consistency barrier comprises an oldest heartbeat timestamp of the heartbeat timestamps of the private table.
10. The system of claim 8, comprising computer-implemented instructions that when executed by the processor direct the processor to insert a row in an object table comprising an object associated with the update event.
1 1 . The system of claim 10, comprising computer-implemented instructions that when executed by the processor direct the processor to perform a READ-ONLY query against the, a result of the query comprising one or more rows of the object table, the consistency barrier, and a most recent heartbeat timestamp of the heartbeat timestamps.
12. The system of claim 1 1 , the result comprising a node associated with the consistency barrier, and a node associated with the most recent heartbeat timestamp.
13. The system of claim 8, comprising computer-implemented instructions that when executed by the processor direct the processor to determine that the node associated with the consistency barrier is experiencing a delay based on the consistency barrier and a predetermined upper bound.
14. A tangible, non-transitory, computer-readable medium comprising instructions that direct a processor to:
in response to a heartbeat event for one the servers a cluster, insert the heartbeat timestamp value in a private table; determine a consistency barrier for a comprising the private table based on a plurality of heartbeat timestamps stored in the private table;
ingest an update event for the if a heartbeat timestamp associated with the update event is more recent than the heartbeat timestamp, the heartbeat timestamp comprising a consistency barrier, the consistency barrier comprises an oldest heartbeat timestamp of the heartbeat timestamps of the private table.
15. The computer-readable medium of claim 14, comprising instructions to direct a processor to:
insert a row in an object table comprising an object associated with the update event; perform a READ-ONLY query against the object table, a result of the query comprising:
one or more rows of the object table, the consistency barrier, and a most recent heartbeat timestamp of the heartbeat timestamps; and
a node associated with the consistency barrier, and a node
associated with the most recent heartbeat timestamp; and determine that the node associated with the consistency barrier is experiencing a delay based on the consistency barrier and a predetermined upper bound.
PCT/US2013/073933 2013-12-09 2013-12-09 Nosql database data consistency WO2015088483A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2013/073933 WO2015088483A1 (en) 2013-12-09 2013-12-09 Nosql database data consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2013/073933 WO2015088483A1 (en) 2013-12-09 2013-12-09 Nosql database data consistency

Publications (1)

Publication Number Publication Date
WO2015088483A1 true WO2015088483A1 (en) 2015-06-18

Family

ID=53371597

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/073933 WO2015088483A1 (en) 2013-12-09 2013-12-09 Nosql database data consistency

Country Status (1)

Country Link
WO (1) WO2015088483A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318521B2 (en) 2016-11-29 2019-06-11 International Business Machines Corporation Query processing with bounded staleness for transactional mutations in NoSQL database
US10885007B2 (en) 2017-07-11 2021-01-05 International Business Machines Corporation Custom metadata extraction across a heterogeneous storage system environment
US11036690B2 (en) 2017-07-11 2021-06-15 International Business Machines Corporation Global namespace in a heterogeneous storage system environment
US11487730B2 (en) 2017-07-11 2022-11-01 International Business Machines Corporation Storage resource utilization analytics in a heterogeneous storage system environment using metadata tags
US11625373B2 (en) 2020-04-30 2023-04-11 International Business Machines Corporation Determining additions, deletions and updates to database tables

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110283045A1 (en) * 2010-04-12 2011-11-17 Krishnan Manavalan Event processing in a flash memory-based object store
US20120278398A1 (en) * 2011-04-27 2012-11-01 Lowekamp Bruce B System and method for reliable distributed communication with guaranteed service levels
US20130173530A1 (en) * 2009-12-14 2013-07-04 Daj Asparna Ltd. Revision control system and method
US20130226971A1 (en) * 2010-09-28 2013-08-29 Yiftach Shoolman Systems, methods, and media for managing an in-memory nosql database
US20130318060A1 (en) * 2011-09-02 2013-11-28 Palantir Technologies, Inc. Multi-row transactions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130173530A1 (en) * 2009-12-14 2013-07-04 Daj Asparna Ltd. Revision control system and method
US20110283045A1 (en) * 2010-04-12 2011-11-17 Krishnan Manavalan Event processing in a flash memory-based object store
US20130226971A1 (en) * 2010-09-28 2013-08-29 Yiftach Shoolman Systems, methods, and media for managing an in-memory nosql database
US20120278398A1 (en) * 2011-04-27 2012-11-01 Lowekamp Bruce B System and method for reliable distributed communication with guaranteed service levels
US20130318060A1 (en) * 2011-09-02 2013-11-28 Palantir Technologies, Inc. Multi-row transactions

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318521B2 (en) 2016-11-29 2019-06-11 International Business Machines Corporation Query processing with bounded staleness for transactional mutations in NoSQL database
US10885007B2 (en) 2017-07-11 2021-01-05 International Business Machines Corporation Custom metadata extraction across a heterogeneous storage system environment
US11036690B2 (en) 2017-07-11 2021-06-15 International Business Machines Corporation Global namespace in a heterogeneous storage system environment
US11487730B2 (en) 2017-07-11 2022-11-01 International Business Machines Corporation Storage resource utilization analytics in a heterogeneous storage system environment using metadata tags
US11625373B2 (en) 2020-04-30 2023-04-11 International Business Machines Corporation Determining additions, deletions and updates to database tables

Similar Documents

Publication Publication Date Title
Armbrust et al. Delta lake: high-performance ACID table storage over cloud object stores
US11429641B2 (en) Copying data changes to a target database
US11995043B2 (en) Map-reduce ready distributed file system
US10025528B2 (en) Managing transformations of snapshots in a storage system
KR102579190B1 (en) Backup and restore in distributed databases using consistent database snapshots
JP5890071B2 (en) Distributed key value store
US8341134B2 (en) Asynchronous deletion of a range of messages processed by a parallel database replication apply process
US9069704B2 (en) Database log replay parallelization
US9569446B1 (en) Cataloging system for image-based backup
US20180373604A1 (en) Systems and methods of restoring a dataset of a database for a point in time
US9515878B2 (en) Method, medium, and system for configuring a new node in a distributed memory network
EP2746971A2 (en) Replication mechanisms for database environments
JP7263297B2 (en) Real-time cross-system database replication for hybrid cloud elastic scaling and high-performance data virtualization
US10372726B2 (en) Consistent query execution in hybrid DBMS
US20140101102A1 (en) Batch processing and data synchronization in cloud-based systems
US11599514B1 (en) Transactional version sets
WO2015088483A1 (en) Nosql database data consistency
US11169887B2 (en) Performing a database backup based on automatically discovered properties
US10013315B2 (en) Reverse snapshot clone
Pillai et al. Crash consistency
CN107209707B (en) Cloud-based staging system preservation
Vandiver et al. Eon mode: Bringing the vertica columnar database to the cloud
Margara et al. A model and survey of distributed data-intensive systems
US11709809B1 (en) Tree-based approach for transactionally consistent version sets
Pillai et al. Crash Consistency: Rethinking the Fundamental Abstractions of the File System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13899241

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13899241

Country of ref document: EP

Kind code of ref document: A1