WO2016148680A1 - Method and apparatus for key-value-store de-duplication leveraging relational database configuration - Google Patents

Method and apparatus for key-value-store de-duplication leveraging relational database configuration Download PDF

Info

Publication number
WO2016148680A1
WO2016148680A1 PCT/US2015/020661 US2015020661W WO2016148680A1 WO 2016148680 A1 WO2016148680 A1 WO 2016148680A1 US 2015020661 W US2015020661 W US 2015020661W WO 2016148680 A1 WO2016148680 A1 WO 2016148680A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
key
database
keys
command
Prior art date
Application number
PCT/US2015/020661
Other languages
French (fr)
Inventor
Akira Deguchi
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/US2015/020661 priority Critical patent/WO2016148680A1/en
Publication of WO2016148680A1 publication Critical patent/WO2016148680A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Definitions

  • the present invention relates generally to data storage, migration, and caching for database and storage system, and, more particularly, to key-value-store (KVS) leveraging relational database (RDB) configuration,
  • KVS key-value-store
  • RDB relational database
  • relational database and key value store in storage systems.
  • US2014/0012938 which is directed to preventing race condition from causing stale data items in cache, discloses the use of relational database as data tier and the use of key value store as cache tier. It is also known to minimize duplication (or redundancy) of record using relational database normalization techniques.
  • Exemplary embodiments of the invention provide ways to leverage the RDB configuration in the KVS environment.
  • One approach detects duplicated data by leveraging information of RDB normalization and uses it to reduce the size of the KVS data. This invention is used to reduce resource consumption in a multiple databases environment.
  • a computer system comprises: a memory storing a plurality of data records in a first database which has a first data structure configured by a plurality of tables; and a processor configured, when the processor copies the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
  • the second database has a second data structure which does not have a predefined data structure.
  • the information of the first data structure includes a plurality of keys of the plurality of tables.
  • the processor is further configured to: obtain a data record from the first database; split the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and store a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
  • the processor is configured to store a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment.
  • the processor is configured not to store a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment.
  • the processor is further configured to add a key of a next data segment of a data segment to the data segment.
  • the processor is further configured to: obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and store SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
  • SQL Structured Query Language
  • the plurality of tables include a key table listing key and corresponding number of keys for each of the data records.
  • the processor is configured to receive a get command including a SQL statement, and to: obtain from the SQL key information the one or more primary keys corresponding to the SQL statement included in the get command, then for each of the plurality of data records obtained using one or more SQL statements, and for each primary key of the one or more primary keys, obtain from the key table a number of keys corresponding to said each primary key, obtain data from the second database using said each primary key, (a) if the number of keys is one, then integrate the obtained data for said each primary key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table for said each primary key, and integrate the obtained data for said each primary key; and after each of the plurality of data records obtained using one or more SQL statements have been processed, send the integrated data in reply to the get command
  • the processor is further configured to obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and if the data segment is first data in the obtained plurality of data records, use the SQL statement as a key of the data segment in the second database.
  • SQL Structured Query Language
  • the plurality of tables include a key table listing key and corresponding number of keys for each of the data records.
  • the processor is configured to receive a get command including a key, and to: obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a
  • processing number of times and repeat performing of (a) using the next key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, and check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a processing number of times and repeat performing of (b) using the next key.
  • the second data structure is a key-value- store (KVS) structure having KVS key and value.
  • KVS key-value- store
  • the plurality of tables include a key table listing key and corresponding number of keys for each of the data records, wherein the plurality of tables include a pointer table listing key and corresponding number of pointers representing a number of the data records sharing value incorporating the listed key.
  • the processor is configured to receive a command and to: if the command is a get command including a key, obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then send the obtained data in reply to the get command, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, integrate all the obtained data, and send the integrated data in reply to the get command; if the command is a delete command including a key, obtain from the key table a number of keys corresponding to the key included in the delete command, and if the number of keys is greater than zero, then obtain data from the second database using the key included in the delete command, extract a next key from
  • Another aspect of the invention is directed to a method of processing a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables.
  • the method comprises, when copying the plurality of records from the first database to a second database, determining if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
  • the information of the first data structure includes a plurality of keys of the plurality of tables.
  • the method further comprises: obtaining a data record from the first database; splitting the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and storing a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
  • the method further comprises: storing a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment; and not storing a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment.
  • the method further comprises: obtaining a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and storing SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
  • SQL Structured Query Language
  • Another aspect of this invention is directed to a non-transitory computer-readable storage medium storing a plurality of instructions for controlling a data processor to process a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables.
  • the plurality of instructions comprise instructions that cause the data processor, when copying the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
  • FIG. 1 illustrates an example of a computer in which the method and apparatus of the invention may be applied.
  • FIG. 2 illustrates a configuration in which the data stored in the
  • RDB is copied, replicated, or migrated to the KVS according to a first embodiment of the present invention.
  • FIG. 3 shows an example of normalization of tables of RDB.
  • FIG. 4 shows an example of a diagram illustrating the storing of data in the KVS without de-duplication.
  • FIG. 5 shows an example of control information.
  • FIG. 6 shows an example of a program unit.
  • FIG. 7 shows an example of data in a KVS with de-duplication according to the first embodiment.
  • FIG. 8 shows an example of a key table.
  • FIG. 9 shows an example of a pointer table.
  • FIG. 10 shows an example of a flow diagram illustrating a process for the copy program and put program according to the first embodiment.
  • FIG. 11 shows an example of a flow diagram illustrating a process for the get program which receives the get command and sends the target data to the requester according to the first embodiment.
  • FIG. 12 shows an example of a flow diagram illustrating a process for the deletion program according to the first embodiment.
  • FIG. 13 shows an example of a flow diagram illustrating a process for the update program.
  • FIG. 14 shows an example of a flow diagram illustrating a process for the partial update program which updates a part of the KVS data.
  • FIG. 15 shows another example of data in a KVS with de- duplication according to a second embodiment of the invention.
  • FIG. 16 shows an example of a sql-key table which manages the relationship between SQL statement and primary key of the RDB according to the second embodiment.
  • FIG. 17 shows an example of a flow diagram illustrating a process for the copy program (2) and put program (2) which use SQL statement as a KVS key according to the second embodiment.
  • FIG. 18 shows another example of a flow diagram illustrating a process for a get program (2) according to the second embodiment.
  • FIG. 19 shows another example of a flow diagram illustrating a process for a delete program (2) according to the second embodiment.
  • FIG. 20 shows another example of a flow diagram illustrating a process for an update program (2) according to the second embodiment.
  • FIG. 21 shows another example of a flow diagram illustrating a process for a partial update program (2) according to the second embodiment.
  • FIG. 22 shows another example of data in a KVS with de- duplication according to a third embodiment of the invention.
  • FIG. 23 shows another example of a flow diagram illustrating a process for a put program (3) according to the third embodiment.
  • FIG. 24 shows another example of a flow diagram illustrating a process for a get program (3) according to the third embodiment.
  • FIG. 25 illustrates different configurations in which the data stored in the RDB is copied, replicated, or migrated to the KVS.
  • FIG. 26 illustrates another example of a computer in which the method and apparatus of the invention may be applied.
  • FIG. 27 illustrates yet another example of a computer in which the method and apparatus of the invention may be applied.
  • FIG. 28 shows another example of a flow diagram illustrating a process for a delete program (3) according to the third embodiment.
  • FIG. 29 shows another example of a flow diagram illustrating a process for an update program (3) according to the third embodiment.
  • FIG. 30 shows another example of a flow diagram illustrating a process for a partial update program (3) according to the third embodiment.
  • processing can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
  • the present invention also relates to an apparatus for performing the operations herein.
  • This apparatus may be specially constructed for the required purposes, or it may include one or more general- purpose computers selectively activated or reconfigured by one or more computer programs.
  • Such computer programs may be stored in a computer- readable storage medium including non-transitory medium, such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of media suitable for storing electronic information.
  • the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus.
  • Various general-purpose systems may be used with programs and modules in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps.
  • the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
  • the instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
  • Exemplary embodiments of the invention provide apparatuses, methods and computer programs for leveraging the RDB configuration in the KVS environment to reduce resource consumption.
  • FIG. 1 illustrates an example of a computer in which the method and apparatus of the invention may be applied.
  • the computer 100 has a processor 102, a memory 103, and a storage device 109.
  • the computer can have two or more processors and storage devices.
  • the OS (Operating System) 101 are included in memory 103.
  • the application 106 includes a program which realizes an online shopping site, online
  • the middleware 107 includes database software, development tools, or the like.
  • the cache unit 107 includes database software, development tools, or the like.
  • the storage device 109 stores data used by the middleware
  • the processor 102 is a computing resource.
  • FIG. 2 illustrates a configuration in which the data stored in the
  • Databasel 400 is a
  • RDB and Database2 is a KVS.
  • the copy program 300 reads the data from the RDB 400 and writes it to the KVS 500.
  • Any database that accesses and stores data by managing the relationship between key and data can be used as Database2 500.
  • the KVS is used as Database2 500.
  • FIG. 3 shows an example of normalization of tables of RDB.
  • the first table 600 is an employee table in which employee ID, name, and officelD are stored.
  • the ID is a primary key and the officelD is a foreign key.
  • the foreign key can have value stored in an officelD column of an office table which is the second table 700.
  • the office table stores OfficelD, area, and address.
  • the use of two RDB tables in this example avoids duplication. If only one table is used to store the same data, the data of the office table 700 will be duplicated. In order to avoid this duplication, the design uses multiple tables and data is divided and stored in multiple tables.
  • the primary key is used as a primary identifier of each record while the foreign key is used to manage the relationship between the two tables.
  • FIG. 4 shows an example of a diagram illustrating the storing of data in the KVS without de-duplication.
  • Three records read from the RDB (see FIG. 3) are stored in the KVS.
  • the ID which is the primary key of the RDB is used as a key of the KVS, and the other part of the RDB record is stored as a value of the KVS in this example.
  • the key and value are paired and stored as shown in FIG. 4.
  • SQL query statement can be used as a key of KVS.
  • the record read from the RDB may include duplicated data.
  • the area and address information is duplicated data for ID: 1 and ID:2 (Area:Kanto and Address:Yokohama).
  • the ID is not included in the value part in this example. In a different example, however, the ID can be included in the value part. Approaches to avoid this duplication are described herein below.
  • FIG. 5 shows an example of control information.
  • Control information includes key table, pointer table, and sql-key table.
  • the key table manages the number of the RDB keys.
  • the pointer table manages the number of pointers to share the duplicated data.
  • the sql-key table manages the keys of the record which is searched by the sql statement. If the sql statement is used as the key of KVS, this table is used. Details of these tables are described herein below.
  • FIG. 6 shows an example of a program unit.
  • the program unit 104 includes copy program, put program, get program, delete program, update program, and partial update program. Details of these programs are described herein below.
  • FIG. 7 shows an example of data in a KVS with de-duplication according to the first embodiment.
  • the record read from the
  • RDB (see FIG. 3) is separated and stored in the KVS individually to avoid duplication.
  • the first three data use the ID which is the key of the RDB as the key of the KVS.
  • the value includes name and officelD information.
  • the ID can be included in the value part of the KVS record.
  • the last two data use the officelD which is the foreign key of the RDB as the key of the KVS.
  • the value includes area and address information.
  • the Office ID can be included in the value part of the KVS record.
  • FIG. 8 shows an example of a key table.
  • the key table has key and num-of-keys (number of keys) as attributes.
  • the first record indicates that the RDB record with I D: 1 has two keys. They are ID and officelD. This information is used to read the data from the KVS.
  • the number of keys is managed for each record.
  • the number of keys of a RDB table is fixed.
  • the number of keys can be managed for a table.
  • the key of the KVS and the table name are needed to access the KVS data.
  • the same name of RDB column can exist in multiple tables. In such a case, it is needed to manage the RDB name or table name in addition to the key or to use different names of KVS databases on the Database2 side (see FIG. 2).
  • FIG. 9 shows an example of a pointer table.
  • the pointer table has key and num-of-pointers (number of pointers) as attributes.
  • This table manages how many records share the KVS data. For example, the value specified under "OfficelD:b" is shared by two records. These records are record with ID: 1 and record with ID:2 (see FIG. 7). This information is used to decide whether the key value can be deleted or not in a delete operation (see FIG. 12).
  • FIG. 10 shows an example of a flow diagram illustrating a process for the copy program and put program according to the first embodiment.
  • the copy program reads the data from the RDB and inserts the data into the KVS.
  • the put program receives a put command and stores data in the KVS.
  • the copy program issues a query to the RDB and obtains the RDB record data.
  • the copy program obtains the
  • step S102 the program calculates the position of keys in the obtained record data and issues a put command to the KVS in step
  • This command includes the calculated key positions.
  • the put program receives the put command
  • the data is split by using the officelD key as the key for the second data.
  • the officelD key is included in the first data in order to integrate the split data when the data is read.
  • the program checks if all split data is stored in KVS or not (S107). If the result is "no,” the program checks if the KVS is having the KVS data whose key is the same as the notified RDB key (S108). For example, the program checks if there is KVS data with "ID: 1" or not. In the second execution of step S108, existence of KVS data with "OfficelD:b" is checked. If the result of step S108 is "yes,” the program skips step S109.
  • step S108 If the result of step S108 is "no," the program stores the data as KVS data (S109).
  • the RDB key included in the data can be used as the KVS key (S109).
  • Foreign key is used as KVS key. Then, the program increments the number of pointers in the pointer table (S110).
  • step S110 the program returns to the step S107 and executes the process for the next split data. For example, "OfficelD: b, Area:Kanto, Address: Yokohama" is processed. If the result of step S107 is "yes,” the program proceeds to step S1 11 and sends the completion message and terminates the processing (S1 1 1). The copy program which receives the completion message terminates the processing (S112).
  • steps S101 to 11 1 are executed for each of the records.
  • FIG. 11 shows an example of a flow diagram illustrating a process for the get program which receives the get command and sends the target data to the requester according to the first embodiment. Because the data is split and stored in the KVS individually, the get command searches all
  • KVS data integrates the data, and sends the integrated data to the requester.
  • the get program receives a get command which includes a
  • the KVS key is the ID.
  • the program obtains the number of keys from the key table (S201) and executes the same number of steps S203 and S204 as the obtained number.
  • the program checks if the obtained number of keys has been reached or not (S202). If the result is "no," the program obtains the KVS data using a key (S203).
  • the key sent from the requester will be the key which is used for the search in step S203.
  • the program extracts the RDB foreign key from the obtained value (S204).
  • the foreign key will be next key which is used for the search in step S203 in next loop. In the last loop, the foreign key extraction is not necessary because the next loop processing is not needed.
  • step S204 After execution of step S204, the program returns to step S202 and repeats steps S203 and S204. If the result of step S202 is "yes,” the program proceeds to step S205 and integrates all data obtained from the KVS (S205). Finally, the program sends the search result to the requester and terminates the processing (S206).
  • the first key which is sent from the requester is ID: 1
  • the first obtained value is "Name:Hitachi Taro, OfficelD:b”
  • the first foreign key is "OfficelD:b.”
  • the second obtained value is "OfficelD:b
  • FIG. 12 shows an example of a flow diagram illustrating a process for the deletion program according to the first embodiment. Because some KVS data are shared, a shared counter/pointer should be considered when the program deletes the KVS data.
  • the delete program receives a delete command which includes a KVS key (S300).
  • the KVS key is the ID.
  • the program obtains the number of keys from the key table (S201) and executes the same number of steps S303 to S307 as the obtained number.
  • the program checks if the obtained number of keys has been reached or not
  • step S304 the program extracts the RDB foreign key from the obtained value (S304).
  • the foreign key will be the next key which is used for the search in step S303 in the next loop.
  • the foreign key extract is not necessary because the next loop processing is not needed.
  • step S305 the program decrements the number of pointers in the pointer table (S305). The program checks if the number of pointers is "0" or not (S306). If the result is "no,” the program returns to step S302. If the result is "yes,” the program executes the deletion of the data from the KVS (S307) and returns to step S302. After that, the program repeats step S303 to S307. If the result of step S302 is "yes,” the program proceeds to step S308 and reports the completion message to the requester and terminates the processing (S308).
  • FIG. 13 shows an example of a flow diagram illustrating a process for the update program.
  • the update program receives an update command with a RDB key and update data (S400) and splits the data by using keys (S401). This is the same as that for the put command. Then, the program checks if all the split data is stored in the KVS or not (S402). If the result is "no," the program checks if the KVS has the KVS data whose key is the same as the notified RDB key (S403). For example, the program checks if there is KVS data with "ID:1" or not. In the second execution of step S403, the existence of KVS data with "OfficelD:b" is checked.
  • step S405 the program sends an error message to the requester and terminates the processing (S405). Since this is an update operation, the KVS should have the KVS data whose key is the same as the notified RDB key. If the result is "yes” in step S403, the program overwrites the KVS data using the update data (S404) and returns to step S402. If the result of step S402 is "yes,” the program sends the completion message to the requester and terminates the processing (S406).
  • FIG. 14 shows an example of a flow diagram illustrating a process for the partial update program which updates a part of the KVS data.
  • the partial update program receives a partial update command which including update address, update data, and RDB key which can identify the target data (S410).
  • the program obtains the number of keys from the key table (S41 1 ) and executes the same number of step S412 to S415 as obtained number.
  • the program checks if the obtained number of keys has been reached or not (S412). If the result is "no," the program obtains the KVS data using the key (S413). In the first loop, the key sent from the requester will be the key which is used for the search in step S413.
  • step S413 the program extracts the RDB foreign key from the obtained value (S413).
  • the foreign key will be the next key which is used for the search in step S413 in the next loop. In the last loop, the foreign key extraction is not necessary because the next loop processing is not needed.
  • step S44 the program checks if the data corresponds to the update target address or not (S415). If the result is "no,” the program extracts the RDB foreign key from the value (S418, not necessary for the last loop since next loop processing is not needed) and returns to step S412 to execute the next loop. If the result is "yes,” the program overwrites the KVS data using the update data (S416) and proceeds to step S417. If the result of step S402 is "yes,” the program reports the completion message to the requester and terminates the processing (S417).
  • the number of update parts is one. In general, two or more update parts can be considered. As such, the program receives two or more update addresses and update data in step S410. The program returns to step S412 after execution of step S415 in order to update the next part.
  • the number of the keys of RDB record is managed using the key table (see FIG. 8). Another method which does not use the key table can be considered. Such a method stores the number in the value part of the KVS instead of the key table. For example, the put program adds the number to head of value. The get program searches the value by using the ID which is notified from the copy program and obtains the number from the head of value. With the obtained value, the program can determine the number of KVS data which should be searched.
  • the RDB primary key is used as the KVS key. Any other keys can be used.
  • the SQL statement can be the KVS key.
  • FIG. 15 shows another example of data in a KVS with de- duplication according to a second embodiment of the invention.
  • the SQL statement is the KVS key in this example.
  • SQL1 is a SQL statement which was used to search the RDB record "ID:1 , Name:Hitachi Taro, OfficelD:b, Area:Kanto, Address:Yokohama.”
  • SQL2 is a SQL statement which was used to search the RDB record "ID:2, Name:Hitachi Hanako, OfficelD:b,
  • a first method uses the control information to link the SQL statement and ID, as illustrated in FIG. 16, and the data structure of FIG. 7.
  • FIG. 16 shows an example of a sql-key table which manages the relationship between SQL statement and primary key of the RDB according to the second embodiment.
  • SQL1 corresponds to
  • SQL1 was 3 records and the primary keys of these three records were ID: 1 ,
  • This table can be managed by the application side or the KVS side. In the case with the application side, the table and processing can be implemented in the KVS client or caching program. User application is aware only of the SQL which is used to search. As such, it is not necessary that the application be aware of the ID of each searched RDB record. An access interface of KVS is not changed from the ID which is primary key of the record. If the table is managed by the KVS side, the access interface of KVS is changed to SQL statement. In that case, the copy program also notifies the boundary address information of records in addition to key position in a record in order to split the data to each of multiple records. The changed copy program and put program are illustrated in FIG. 17. These are named copy program (2) and put program (2).
  • FIG. 17 shows an example of a flow diagram illustrating a process for the copy program (2) and put program (2) which use SQL statement as a KVS key according to the second embodiment.
  • FIG. 17 contains some of the same steps as those in FIG. 10. Steps S500, S501 , S502, S503, S504, S505, and S506 are different steps from those of the copy program and put program in FIG. 10.
  • the copy program (2) calculates boundary address of records in the searched data from the RDB (S500). Then, the copy program (2) calculates the key position for each searched record (S501 ). While step S102 in FIG. 10 calculates the key position for one record, step S501 in FIG. 17 performs the calculation for all records. After the calculations, the copy program (2) issues a put command
  • the put program (2) checks if all records are processed or not just after receiving the put command in step S105 (S503). If the result is "yes,” the put program (2) sends a completion message to the requester and terminates the processing (S1 11). If the result is "no,” the put program (2) determines one target record and updates the key table and the sql-key table (S504). By using the key position information notified from the copy program (2), the put program (2) obtains the key information, and stores the SQL statement and the obtained key information to the sql-key table. Then, the put program (2) splits the data by using RDB keys in step S106 and checks whether all data is stored in the KVS or not (S505). If the result is "yes,” the put program (2) returns to step S503 and executes the same steps for next records.
  • FIG. 18 shows another example of a flow diagram illustrating a process for a get program (2) according to the second embodiment.
  • the get program (2) which receives the SQL statement should be changed to search the result including two or more records.
  • FIG. 18 contains some of the same steps as those in FIG. 11. Steps S600, S601 , and
  • S602 are different steps from those of the get program in FIG. 11.
  • the get program (2) receives the get command including the
  • step S600 If the result is "no," the program determines one record as the processing target (S602). After that, the program executes search processing for one record in steps S201 to S205, which are the same as step S201 to
  • step S205 in FIG. 11. After step S205, the program returns to step S601 to search the next record. If the result of step S601 is "yes,” the program sends the result to the requester and terminates the processing (S206).
  • FIG. 19 shows another example of a flow diagram illustrating a process for a delete program (2) according to the second embodiment.
  • the delete program (2) can receive the SQL statement as key and search the result including two or more records.
  • FIG. 19 contains some of the same steps as those in FIG. 12. Steps S600, S601 , and S602 are different steps from those of the delete program in FIG. 12. Also, these steps are the same as steps S600, S601 , and S602 in FIG. 18. These steps are added between step S300 and step S301. If the result of step S302 is "yes," the program returns to step S601 instead of proceeding to step S308.
  • FIG. 20 shows another example of a flow diagram illustrating a process for an update program (2) according to the second embodiment.
  • the update program (2) can receive the SQL statement as key and search the result including two or more records.
  • FIG. 20 contains some of the same steps as those in FIG. 13. Steps S503 and S504 are different steps from those of the update program in FIG. 13. Also, these steps are the same as steps S503 and S504 in FIG. 17. These steps are added between step S400 and step S401. If the result of step S402 is "yes," the program returns to step S503 instead of proceeding to step S406.
  • FIG. 21 shows another example of a flow diagram illustrating a process for a partial update program (2) according to the second embodiment.
  • the partial update program (2) can receive the SQL statement as key and search the result including two or more records.
  • FIG. 21 contains some of the same steps as those in FIG. 14. Steps S600, S601 , and S602 are different steps from those of the partial update program in FIG. 14. Also, these steps are the same as steps S600, S601 , and S602 in FIG. 18. These steps are added between step S410 and S41 1. If the result of step S412 is "yes," the program returns to step S601 instead of proceeding to step S416.
  • a second method for multi record KVS is used in a third embodiment.
  • This method uses the SQL statement as a KVS key for the first record which is included in the search result, and the ID is used as a KVS key for the record after the second record.
  • SQL1 is used to store the record "ID: 1 , Name: Hitachi Taro, OfficelD:b, Area:Kanto,
  • Address:Yokohama" and ID:2 is used to store the record "ID:2, Name:Hitachi Hanako, OfficelD:b, Area:Kanto, Address: Yokohama.”
  • FIG. 22 shows another example of data in a KVS with de- duplication according to a third embodiment of the invention.
  • SQL1 is a SQL statement which was used to search three RDB records with ID:1 , ID:2, and ID:3. The requester notifies the only SQL statement for access.
  • the data with ID: 1 can be searched by the SQL statement.
  • the data with ID:2 or ID:3 are not linked with the SQL statement.
  • "ID:2" is added to the value part of ID: 1
  • ID:3 is added to the value part of ID:2.
  • the data with ID:2 and ID:3 are searched.
  • "Null" is added to the value part of ID:3 to indicate last record.
  • the different programs including the put program, the get program, delete program, update program, and partial update program are changed to treat this case.
  • FIG. 23 shows another example of a flow diagram illustrating a process for a put program (3) according to the third embodiment.
  • FIG. 23 contains some of the same steps as those in FIG. 17.
  • Step S800, S801 , S802, S803, and S804 are different steps from those of the put program (2) in FIG. 17.
  • the copy program (3) is the same as the copy program (2) described in FIG. 17.
  • Step S800 determines one target record. Update of the sql-key table is not needed (as opposed to step S504 in FIG. 17), because the second method does not use the sql-key table.
  • Steps S801 to S804 are executed between step S108 and step S1 10. If the result of step S108 is
  • step S801. The program checks if the target data is first data of the record or not. For example, in FIG. 22, "ID: 1 ,
  • step S801 or S803 If the result of step S801 or S803 is "no," the program proceeds to step S801 or S803
  • KVS key For example, ID:2, ID:3, OfficelD:a, or OfficelD:b is used as KVS key.
  • Step S1 10 is performed after step S804 or after step S109.
  • FIG. 24 shows another example of a flow diagram illustrating a process for a get program (3) according to the third embodiment.
  • FIG. 24 contains some of the same steps as those in FIG. 1 1. Step S700, S701 , and
  • step S702 are different steps from those of the get program in FIG. 11.
  • the get program (3) extracts key information for the next record from the tail of the value part (S700). If the result of step S202 is "yes,” the program proceeds to step 701 instead of proceeding to the step S205 (in FIG. 11 ).
  • step S701 the program checks if the key of the next data is null or not (S701). If the result is "no,” the program sets the key obtained in step S700 as the search key (S702) before returning to step S201.
  • the data stored in the RDB is copied, replicated, or migrated to the KVS.
  • the copy program executed the copy, replication, or migration processing.
  • the copy, replication, or migration processing can be executed as processes of the RDB or the KVS.
  • FIG. 28 shows another example of a flow diagram illustrating a process for a delete program (3) according to the third embodiment.
  • the delete program (3) can be realized by modifying the get program (3) in FIG. 24.
  • Step S305, S306, and S307 are added just after step S204. These steps are same as those of the delete program in FIG. 12. If the next key is null at S701 , the program reports the completion and terminates the processing (S308).
  • FIG. 29 shows another example of a flow diagram illustrating a process for an update program (3) according to the third embodiment.
  • the update program (3) can be realized by modifying the put program (3) in FIG. 23.
  • Steps S900 and S901 are executed instead of the S804 and S109, respectively.
  • Steps S900 and S901 each overwrite the data instead of storing the data.
  • FIG. 30 shows another example of a flow diagram illustrating a process for a partial update program (3) according to the third embodiment.
  • the partial update program (3) can be realized by modifying the get program (3) in FIG. 24. Steps S404 and S405 are added just after step S700. These steps are the same as those of the delete program in FIG. 12. If the result of step S404 is "no,” the program proceeds to step S204. If the result of step S404 is "yes,” the program overwrites the data (S405). After the overwrite, the program proceeds to step S701 to execute the same step for the next record.
  • FIG. 25 illustrates different configurations in which the data stored in the RDB is copied, replicated, or migrated to the KVS.
  • a first configuration 10 similar to FIG. 2
  • the program that executes the processing is independent from the databases.
  • the program is a part of the copy target database.
  • the program is a part of the copy source database.
  • FIG. 26 illustrates another example of a computer in which the method and apparatus of the invention may be applied.
  • the storage device has the computing resource, control information, program unit and storage media (as opposed to the memory 103 in FIG. 1).
  • General- purpose processor, ASIC (application specific integrated circuit), FPGA (Field Programmable Gate Array), or the like can be the computing resource.
  • the key table, pointer table, and sql-key table are stored in the control information in the storage device.
  • the put program, get program, delete program, update program, and partial update program are stored in the program unit and executed inside the storage device.
  • the upper program is a user of the KVS. For example, the copy program mentioned above is an upper program.
  • FIG. 27 illustrates yet another example of a computer in which the method and apparatus of the invention may be applied.
  • the computer executes the upper program and the storage system executes the put program, get program, delete program, update program, and partial update program; they are separated.
  • the computer has storage l/F
  • the storage (interface) 108 and the storage system has storage l/F 202. These storage l/Fs are coupled via a network 110 and mediate communication between the computer and storage system.
  • the put program, get program, delete program, update program, and partial update program are stored in a storage program 208 and executed by a processor 203.
  • the key table, pointer table, and sql-key table are stored in storage control information 207.
  • FIGS. 1 , 26, and 27 are purely exemplary of information systems in which the present invention may be implemented, and the invention is not limited to a particular hardware configuration.
  • the computers and storage systems implementing the invention can also have known I/O devices (e.g., CD and DVD drives, floppy disk drives, hard drives, etc.) which can store and read the modules, programs and data structures used to implement the above-described invention.
  • These modules, programs and data structures can be encoded on such computer-readable media.
  • the data structures of the invention can be stored on computer-readable media independently of one or more computer-readable media on which reside the programs used in the invention.
  • the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include local area networks, wide area networks, e.g., the Internet, wireless networks, storage area networks, and the like.
  • the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways.
  • the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A computer system comprises: a memory storing a plurality of data records in a first database which has a first data structure configured by a plurality of tables; and a processor configured, when the processor copies the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database. In some embodiments, the second database has a second data structure which does not have a predefined data structure. The information of the first data structure includes a plurality of keys of the plurality of tables.

Description

METHOD AND APPARATUS FOR KEY-VALUE-STORE DE-DUPLICATION LEVERAGING RELATIONAL DATABASE CONFIGURATION
BACKGROUND OF THE INVENTION
[0001] The present invention relates generally to data storage, migration, and caching for database and storage system, and, more particularly, to key-value-store (KVS) leveraging relational database (RDB) configuration,
[0002] The use of relational database and key value store in storage systems is known. For example, US2014/0012938, which is directed to preventing race condition from causing stale data items in cache, discloses the use of relational database as data tier and the use of key value store as cache tier. It is also known to minimize duplication (or redundancy) of record using relational database normalization techniques.
[0003] When the data stored in the relational database are copied, replicate, or migrated, information about normalization is not taken over. Thus, the duplicated data are stored in KVS and the data size is increased. In many cases, the KVS database stores the data in the main memory.
Because the main memory is the expensive part of the computer system, it results in high cost. When the flash is used as the storage media, endurance or durability of the flash is shortened by writing the duplicated data into the flash. On the other hand, if the KVS has de-duplication functionality, processing resource to execute de-duplication is needed. This causes a decrease in performance. [0004] Exemplary embodiments of the invention provide ways to leverage the RDB configuration in the KVS environment. One approach detects duplicated data by leveraging information of RDB normalization and uses it to reduce the size of the KVS data. This invention is used to reduce resource consumption in a multiple databases environment.
[0005] In accordance with an aspect of the present invention, a computer system comprises: a memory storing a plurality of data records in a first database which has a first data structure configured by a plurality of tables; and a processor configured, when the processor copies the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
[0006] In some embodiments, the second database has a second data structure which does not have a predefined data structure. The information of the first data structure includes a plurality of keys of the plurality of tables. The processor is further configured to: obtain a data record from the first database; split the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and store a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
[0007] In specific embodiments, the processor is configured to store a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment. The processor is configured not to store a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment. The processor is further configured to add a key of a next data segment of a data segment to the data segment. The processor is further configured to: obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and store SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
[0008] In some embodiments, the plurality of tables include a key table listing key and corresponding number of keys for each of the data records.
The processor is configured to receive a get command including a SQL statement, and to: obtain from the SQL key information the one or more primary keys corresponding to the SQL statement included in the get command, then for each of the plurality of data records obtained using one or more SQL statements, and for each primary key of the one or more primary keys, obtain from the key table a number of keys corresponding to said each primary key, obtain data from the second database using said each primary key, (a) if the number of keys is one, then integrate the obtained data for said each primary key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table for said each primary key, and integrate the obtained data for said each primary key; and after each of the plurality of data records obtained using one or more SQL statements have been processed, send the integrated data in reply to the get command.
[0009] In specific embodiments, the processor is further configured to obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and if the data segment is first data in the obtained plurality of data records, use the SQL statement as a key of the data segment in the second database. The plurality of tables include a key table listing key and corresponding number of keys for each of the data records. The processor is configured to receive a get command including a key, and to: obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a
processing number of times and repeat performing of (a) using the next key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, and check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a processing number of times and repeat performing of (b) using the next key.
[0010] In some embodiments, the second data structure is a key-value- store (KVS) structure having KVS key and value. The plurality of tables include a key table listing key and corresponding number of keys for each of the data records, wherein the plurality of tables include a pointer table listing key and corresponding number of pointers representing a number of the data records sharing value incorporating the listed key. The processor is configured to receive a command and to: if the command is a get command including a key, obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then send the obtained data in reply to the get command, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, integrate all the obtained data, and send the integrated data in reply to the get command; if the command is a delete command including a key, obtain from the key table a number of keys corresponding to the key included in the delete command, and if the number of keys is greater than zero, then obtain data from the second database using the key included in the delete command, extract a next key from a value of the obtained data, decrement the number of pointers corresponding to the extracted next key listed in the pointer table, (c) if the number of pointers is zero, then delete the obtained data, and (d) if the number of pointers is not zero, then use the extracted next key to obtain a next data from the second database, extract another next key from a value of the obtained next data, decrement the number of pointers corresponding to the extracted next key listed in the pointer table, and repeat the using, the extracting, and the decrementing until the number of pointers is zero, and delete the obtained data, and (e) if a total number of keys used to extract data from the second database is below the number of keys obtained from the key, table, then repeat performing of (d) until the total number of keys used to extract data from the second database equals the number of keys obtained from the key table; and if the command is an update command including a key and update data, then for each data record obtained from the first database, split the data record to a plurality of data segments based on key information of the plurality of keys of the first database, check whether all the split data segments are stored in the second database or not, (f) if not all the split data segments are stored in the second database, then check whether the second database has data whose key is same as the key included in the update command, (g) if the second database does not have data whose key is the same as the key included in the update command, then send an error message in reply to the update command, (h) if the second database has data whose key is the same as the key included in the update command, then use the update data to overwrite data, and (i) check whether all the split data segments are stored in the second database or not, and if yes, then terminate processing of the update command, and if no, then repeat performing of (f) to (i). [0011] Another aspect of the invention is directed to a method of processing a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables. The method comprises, when copying the plurality of records from the first database to a second database, determining if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
[0012] In some embodiments, the information of the first data structure includes a plurality of keys of the plurality of tables. The method further comprises: obtaining a data record from the first database; splitting the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and storing a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
[0013] In specific embodiments, the method further comprises: storing a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment; and not storing a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment. The method further comprises: obtaining a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and storing SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
[0014] Another aspect of this invention is directed to a non-transitory computer-readable storage medium storing a plurality of instructions for controlling a data processor to process a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables. The plurality of instructions comprise instructions that cause the data processor, when copying the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
[0015] These and other features and advantages of the present invention will become apparent to those of ordinary skill in the art in view of the following detailed description of the specific embodiments.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] FIG. 1 illustrates an example of a computer in which the method and apparatus of the invention may be applied.
[0017] FIG. 2 illustrates a configuration in which the data stored in the
RDB is copied, replicated, or migrated to the KVS according to a first embodiment of the present invention.
[0018] FIG. 3 shows an example of normalization of tables of RDB.
[0019] FIG. 4 shows an example of a diagram illustrating the storing of data in the KVS without de-duplication. [0020] FIG. 5 shows an example of control information.
[0021] FIG. 6 shows an example of a program unit.
[0022] FIG. 7 shows an example of data in a KVS with de-duplication according to the first embodiment.
[0023] FIG. 8 shows an example of a key table.
[0024] FIG. 9 shows an example of a pointer table.
[0025] FIG. 10 shows an example of a flow diagram illustrating a process for the copy program and put program according to the first embodiment.
[0026] FIG. 11 shows an example of a flow diagram illustrating a process for the get program which receives the get command and sends the target data to the requester according to the first embodiment.
[0027] FIG. 12 shows an example of a flow diagram illustrating a process for the deletion program according to the first embodiment.
[0028] FIG. 13 shows an example of a flow diagram illustrating a process for the update program.
[0029] FIG. 14 shows an example of a flow diagram illustrating a process for the partial update program which updates a part of the KVS data.
[0030] FIG. 15 shows another example of data in a KVS with de- duplication according to a second embodiment of the invention.
[0031] FIG. 16 shows an example of a sql-key table which manages the relationship between SQL statement and primary key of the RDB according to the second embodiment. [0032] FIG. 17 shows an example of a flow diagram illustrating a process for the copy program (2) and put program (2) which use SQL statement as a KVS key according to the second embodiment.
[0033] FIG. 18 shows another example of a flow diagram illustrating a process for a get program (2) according to the second embodiment.
[0034] FIG. 19 shows another example of a flow diagram illustrating a process for a delete program (2) according to the second embodiment.
[0035] FIG. 20 shows another example of a flow diagram illustrating a process for an update program (2) according to the second embodiment.
[0036] FIG. 21 shows another example of a flow diagram illustrating a process for a partial update program (2) according to the second embodiment.
[0037] FIG. 22 shows another example of data in a KVS with de- duplication according to a third embodiment of the invention.
[0038] FIG. 23 shows another example of a flow diagram illustrating a process for a put program (3) according to the third embodiment.
[0039] FIG. 24 shows another example of a flow diagram illustrating a process for a get program (3) according to the third embodiment.
[0040] FIG. 25 illustrates different configurations in which the data stored in the RDB is copied, replicated, or migrated to the KVS.
[0041] FIG. 26 illustrates another example of a computer in which the method and apparatus of the invention may be applied.
[0042] FIG. 27 illustrates yet another example of a computer in which the method and apparatus of the invention may be applied.
[0043] FIG. 28 shows another example of a flow diagram illustrating a process for a delete program (3) according to the third embodiment. [0044] FIG. 29 shows another example of a flow diagram illustrating a process for an update program (3) according to the third embodiment.
[0045] FIG. 30 shows another example of a flow diagram illustrating a process for a partial update program (3) according to the third embodiment.
DETAILED DESCRIPTION OF THE INVENTION
[0046] In the following detailed description of the invention, reference is made to the accompanying drawings which form a part of the disclosure, and in which are shown by way of illustration, and not of limitation, exemplary embodiments by which the invention may be practiced. In the drawings, like numerals describe substantially similar components throughout the several views. Further, it should be noted that while the detailed description provides various exemplary embodiments, as described below and as illustrated in the drawings, the present invention is not limited to the embodiments described and illustrated herein, but can extend to other embodiments, as would be known or as would become known to those skilled in the art. Reference in the specification to "one embodiment," "this embodiment," or "these
embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention, and the appearances of these phrases in various places in the specification are not necessarily all referring to the same embodiment. Additionally, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that these specific details may not all be needed to practice the present invention. In other circumstances, well-known structures, materials, circuits, processes and interfaces have not been described in detail, and/or may be illustrated in block diagram form, so as to not unnecessarily obscure the present invention.
[0047] Furthermore, some portions of the detailed description that follow are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to most effectively convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In the present invention, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.
Usually, though not necessarily, these quantities take the form of electrical or magnetic signals or instructions capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, instructions, or the like. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise, as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as
"processing," "computing," "calculating," "determining," "displaying," or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system's memories or registers or other information storage, transmission or display devices.
[0048] The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general- purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer- readable storage medium including non-transitory medium, such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid state devices and drives, or any other types of media suitable for storing electronic information. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs and modules in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the present invention is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.
[0049] Exemplary embodiments of the invention, as will be described in greater detail below, provide apparatuses, methods and computer programs for leveraging the RDB configuration in the KVS environment to reduce resource consumption.
[0050] First Embodiment
[0051] FIG. 1 illustrates an example of a computer in which the method and apparatus of the invention may be applied. The computer 100 has a processor 102, a memory 103, and a storage device 109. The computer can have two or more processors and storage devices. A program unit 104, control information 105, application 106, middleware 107, cache unit 108, and
OS (Operating System) 101 are included in memory 103. The application 106 includes a program which realizes an online shopping site, online
communication site, social network service, or the like. The middleware 107 includes database software, development tools, or the like. The cache unit
108 has memory to store the frequent accessed data for performance boost.
Details of the program unit 104 and control information 105 are described herein below. The storage device 109 stores data used by the middleware
107 or application 106. The processor 102 is a computing resource.
[0052] FIG. 2 illustrates a configuration in which the data stored in the
RDB is copied, replicated, or migrated to the KVS according to a first embodiment of the present invention. In this example, Databasel 400 is a
RDB and Database2 is a KVS. The copy program 300 reads the data from the RDB 400 and writes it to the KVS 500. Any database that accesses and stores data by managing the relationship between key and data can be used as Database2 500. In this embodiment, the KVS is used as Database2 500.
[0053] FIG. 3 shows an example of normalization of tables of RDB.
This example has two RDB tables. The first table 600 is an employee table in which employee ID, name, and officelD are stored. The ID is a primary key and the officelD is a foreign key. The foreign key can have value stored in an officelD column of an office table which is the second table 700. The office table stores OfficelD, area, and address. The use of two RDB tables in this example avoids duplication. If only one table is used to store the same data, the data of the office table 700 will be duplicated. In order to avoid this duplication, the design uses multiple tables and data is divided and stored in multiple tables. In this particular example, the primary key is used as a primary identifier of each record while the foreign key is used to manage the relationship between the two tables.
[0054] FIG. 4 shows an example of a diagram illustrating the storing of data in the KVS without de-duplication. Three records read from the RDB (see FIG. 3) are stored in the KVS. The ID which is the primary key of the RDB is used as a key of the KVS, and the other part of the RDB record is stored as a value of the KVS in this example. The key and value are paired and stored as shown in FIG. 4. It is noted that SQL query statement can be used as a key of KVS. The record read from the RDB may include duplicated data. In this example, the area and address information is duplicated data for ID: 1 and ID:2 (Area:Kanto and Address:Yokohama). The ID is not included in the value part in this example. In a different example, however, the ID can be included in the value part. Approaches to avoid this duplication are described herein below.
[0055] FIG. 5 shows an example of control information. Control information includes key table, pointer table, and sql-key table. The key table manages the number of the RDB keys. The pointer table manages the number of pointers to share the duplicated data. The sql-key table manages the keys of the record which is searched by the sql statement. If the sql statement is used as the key of KVS, this table is used. Details of these tables are described herein below.
[0056] FIG. 6 shows an example of a program unit. The program unit 104 includes copy program, put program, get program, delete program, update program, and partial update program. Details of these programs are described herein below.
[0057] FIG. 7 shows an example of data in a KVS with de-duplication according to the first embodiment. In this example, the record read from the
RDB (see FIG. 3) is separated and stored in the KVS individually to avoid duplication. First, the first three data use the ID which is the key of the RDB as the key of the KVS. For those three data, the value includes name and officelD information. In other examples, the ID can be included in the value part of the KVS record. The last two data use the officelD which is the foreign key of the RDB as the key of the KVS. For those two data, the value includes area and address information. In other examples, the Office ID can be included in the value part of the KVS record. In this example, duplications of
"area:kanto" and "address:yokohama" are avoided (as compared to FIG. 4).
[0058] FIG. 8 shows an example of a key table. The key table has key and num-of-keys (number of keys) as attributes. For example, the first record indicates that the RDB record with I D: 1 has two keys. They are ID and officelD. This information is used to read the data from the KVS. In this example, the number of keys is managed for each record. Generally, the number of keys of a RDB table is fixed. Hence, the number of keys can be managed for a table. With that case, the key of the KVS and the table name are needed to access the KVS data. Moreover, for the multiple RDB or multiple table case, the same name of RDB column can exist in multiple tables. In such a case, it is needed to manage the RDB name or table name in addition to the key or to use different names of KVS databases on the Database2 side (see FIG. 2).
[0059] FIG. 9 shows an example of a pointer table. The pointer table has key and num-of-pointers (number of pointers) as attributes. This table manages how many records share the KVS data. For example, the value specified under "OfficelD:b" is shared by two records. These records are record with ID: 1 and record with ID:2 (see FIG. 7). This information is used to decide whether the key value can be deleted or not in a delete operation (see FIG. 12).
[0060] FIG. 10 shows an example of a flow diagram illustrating a process for the copy program and put program according to the first embodiment. The copy program reads the data from the RDB and inserts the data into the KVS. The put program receives a put command and stores data in the KVS. In step S100, the copy program issues a query to the RDB and obtains the RDB record data. In step S101 , the copy program obtains the
RDB key information. In the example described above, the ID and officelD are the RDB keys. In step S102, the program calculates the position of keys in the obtained record data and issues a put command to the KVS in step
S103. This command includes the calculated key positions.
[0061] On the KVS side, the put program receives the put command
(S105) and updates the key table (S1 13). Notified ID and the number of the key are inserted to the key table. Then, the program splits the data by using the RDB keys (S106). For example, If put data is "ID:1 , Name: Hitachi Taro, OfficelD:b, Area:Kanto, Address:Yokohama", the data is split into "ID: 1 , Name:Hitachi Taro, OfficelD:b" and "OfficelD:b, Area:Kanto,
Address:Yokohama." The data is split by using the officelD key as the key for the second data. The officelD key is included in the first data in order to integrate the split data when the data is read. Next, the program checks if all split data is stored in KVS or not (S107). If the result is "no," the program checks if the KVS is having the KVS data whose key is the same as the notified RDB key (S108). For example, the program checks if there is KVS data with "ID: 1" or not. In the second execution of step S108, existence of KVS data with "OfficelD:b" is checked. If the result of step S108 is "yes," the program skips step S109. If the result of step S108 is "no," the program stores the data as KVS data (S109). The RDB key included in the data can be used as the KVS key (S109). Foreign key is used as KVS key. Then, the program increments the number of pointers in the pointer table (S110).
[0062] After step S110, the program returns to the step S107 and executes the process for the next split data. For example, "OfficelD: b, Area:Kanto, Address: Yokohama" is processed. If the result of step S107 is "yes," the program proceeds to step S1 11 and sends the completion message and terminates the processing (S1 1 1). The copy program which receives the completion message terminates the processing (S112).
[0063] In this example, the process is explained based on the copy operation. However, this technology can be applied to data migration process, data caching process, and so on. If two or more records are read at step S100, steps S101 to 11 1 are executed for each of the records.
[0064] As mentioned above, duplicated data storing into the KVS can be avoided in the present invention. Moreover, this processing executes neither comparison of the data itself nor hash value calculation from the data for an identity check, because this processing leverages the normalization information of RDB.
[0065] FIG. 11 shows an example of a flow diagram illustrating a process for the get program which receives the get command and sends the target data to the requester according to the first embodiment. Because the data is split and stored in the KVS individually, the get command searches all
KVS data, integrates the data, and sends the integrated data to the requester.
[0066] First, the get program receives a get command which includes a
KVS key (S200). For example, the KVS key is the ID. Then, the program obtains the number of keys from the key table (S201) and executes the same number of steps S203 and S204 as the obtained number. The program checks if the obtained number of keys has been reached or not (S202). If the result is "no," the program obtains the KVS data using a key (S203). In the first loop, the key sent from the requester will be the key which is used for the search in step S203. Then, the program extracts the RDB foreign key from the obtained value (S204). The foreign key will be next key which is used for the search in step S203 in next loop. In the last loop, the foreign key extraction is not necessary because the next loop processing is not needed.
After execution of step S204, the program returns to step S202 and repeats steps S203 and S204. If the result of step S202 is "yes," the program proceeds to step S205 and integrates all data obtained from the KVS (S205). Finally, the program sends the search result to the requester and terminates the processing (S206).
[0067] For example, the first key which is sent from the requester is ID: 1 , the first obtained value is "Name:Hitachi Taro, OfficelD:b," and the first foreign key is "OfficelD:b." The second obtained value is "OfficelD:b,
Area.Kanto, Address: Yokohama." From these obtained data, "ID: 1 ,
Name:Hitachi Taro, OfficelD:b, Area:Kanto, Address:Yokohama" is created and sent to the requester.
[0068] FIG. 12 shows an example of a flow diagram illustrating a process for the deletion program according to the first embodiment. Because some KVS data are shared, a shared counter/pointer should be considered when the program deletes the KVS data.
[0069] First, the delete program receives a delete command which includes a KVS key (S300). For example, the KVS key is the ID. Then, the program obtains the number of keys from the key table (S201) and executes the same number of steps S303 to S307 as the obtained number. The program checks if the obtained number of keys has been reached or not
(5302) . If the result is "no," the program obtains the KVS data using the key
(5303) . In the first loop, the key sent from the requester will be the key which is used for search in step S303. Then, the program extracts the RDB foreign key from the obtained value (S304). The foreign key will be the next key which is used for the search in step S303 in the next loop. In the last loop, the foreign key extract is not necessary because the next loop processing is not needed. After execution of step S304, the program decrements the number of pointers in the pointer table (S305). The program checks if the number of pointers is "0" or not (S306). If the result is "no," the program returns to step S302. If the result is "yes," the program executes the deletion of the data from the KVS (S307) and returns to step S302. After that, the program repeats step S303 to S307. If the result of step S302 is "yes," the program proceeds to step S308 and reports the completion message to the requester and terminates the processing (S308).
[0070] FIG. 13 shows an example of a flow diagram illustrating a process for the update program. The update program receives an update command with a RDB key and update data (S400) and splits the data by using keys (S401). This is the same as that for the put command. Then, the program checks if all the split data is stored in the KVS or not (S402). If the result is "no," the program checks if the KVS has the KVS data whose key is the same as the notified RDB key (S403). For example, the program checks if there is KVS data with "ID:1" or not. In the second execution of step S403, the existence of KVS data with "OfficelD:b" is checked. If the result is "no," the program sends an error message to the requester and terminates the processing (S405). Since this is an update operation, the KVS should have the KVS data whose key is the same as the notified RDB key. If the result is "yes" in step S403, the program overwrites the KVS data using the update data (S404) and returns to step S402. If the result of step S402 is "yes," the program sends the completion message to the requester and terminates the processing (S406).
[0071] FIG. 14 shows an example of a flow diagram illustrating a process for the partial update program which updates a part of the KVS data. First, the partial update program receives a partial update command which including update address, update data, and RDB key which can identify the target data (S410). Then, the program obtains the number of keys from the key table (S41 1 ) and executes the same number of step S412 to S415 as obtained number. The program checks if the obtained number of keys has been reached or not (S412). If the result is "no," the program obtains the KVS data using the key (S413). In the first loop, the key sent from the requester will be the key which is used for the search in step S413. Then, the program extracts the RDB foreign key from the obtained value (S413). The foreign key will be the next key which is used for the search in step S413 in the next loop. In the last loop, the foreign key extraction is not necessary because the next loop processing is not needed. After execution of step S414, the program checks if the data corresponds to the update target address or not (S415). If the result is "no," the program extracts the RDB foreign key from the value (S418, not necessary for the last loop since next loop processing is not needed) and returns to step S412 to execute the next loop. If the result is "yes," the program overwrites the KVS data using the update data (S416) and proceeds to step S417. If the result of step S402 is "yes," the program reports the completion message to the requester and terminates the processing (S417).
[0072] In this example, the number of update parts is one. In general, two or more update parts can be considered. As such, the program receives two or more update addresses and update data in step S410. The program returns to step S412 after execution of step S415 in order to update the next part. [0073] In the methods mentioned above, the number of the keys of RDB record is managed using the key table (see FIG. 8). Another method which does not use the key table can be considered. Such a method stores the number in the value part of the KVS instead of the key table. For example, the put program adds the number to head of value. The get program searches the value by using the ID which is notified from the copy program and obtains the number from the head of value. With the obtained value, the program can determine the number of KVS data which should be searched.
[0074] In the methods mentioned above, the RDB primary key is used as the KVS key. Any other keys can be used. For example, the SQL statement can be the KVS key.
[0075] Second Embodiment
[0076] FIG. 15 shows another example of data in a KVS with de- duplication according to a second embodiment of the invention. The SQL statement is the KVS key in this example. SQL1 is a SQL statement which was used to search the RDB record "ID:1 , Name:Hitachi Taro, OfficelD:b, Area:Kanto, Address:Yokohama." SQL2 is a SQL statement which was used to search the RDB record "ID:2, Name:Hitachi Hanako, OfficelD:b,
Area:Kanto, Address: Yokohama." "ID: 1" is stored in the value part of the KVS data. Other parts of the data in the KVS are the same as the example shown in FIG. 7. When the SQL statement is used as the KVS key, the contents stored in the key column in the key table of FIG. 8 is changed from ID to SQL statement. To access the KVS data, a SQL statement is notified to the KVS side. For example, the SQL statement is sent in step S103 of the copy program (see FIG. 10). The SQL statement is received in step S105 (put program of FIG. 10), or S200 (get program of FIG. 11), or S300 (delete program of FIG. 12), or S400 (update program of FIG. 13).
[0077] Next, methods where the result of SQL search is two or more
RDB records are described. If a SQL statement search involves two or more
RDB records and the data structure in FIG. 15 is used, the keys will be the same. SQL1 , SQL2 and SQL3 will be the same in the example of FIG. 15.
This means that the value cannot be searched by using a key. The methods to avoid the problem are explained herein below.
[0078] First Method for Multiple Records Case
[0079] A first method uses the control information to link the SQL statement and ID, as illustrated in FIG. 16, and the data structure of FIG. 7.
[0080] FIG. 16 shows an example of a sql-key table which manages the relationship between SQL statement and primary key of the RDB according to the second embodiment. In this example, SQL1 corresponds to
ID: 1 , ID:2, and ID:3. This means that the number of searched records by
SQL1 was 3 records and the primary keys of these three records were ID: 1 ,
ID:2, and ID:3. This table can be managed by the application side or the KVS side. In the case with the application side, the table and processing can be implemented in the KVS client or caching program. User application is aware only of the SQL which is used to search. As such, it is not necessary that the application be aware of the ID of each searched RDB record. An access interface of KVS is not changed from the ID which is primary key of the record. If the table is managed by the KVS side, the access interface of KVS is changed to SQL statement. In that case, the copy program also notifies the boundary address information of records in addition to key position in a record in order to split the data to each of multiple records. The changed copy program and put program are illustrated in FIG. 17. These are named copy program (2) and put program (2).
[0081] FIG. 17 shows an example of a flow diagram illustrating a process for the copy program (2) and put program (2) which use SQL statement as a KVS key according to the second embodiment. FIG. 17 contains some of the same steps as those in FIG. 10. Steps S500, S501 , S502, S503, S504, S505, and S506 are different steps from those of the copy program and put program in FIG. 10.
[0082] After steps S100 and S101 as described above, the copy program (2) calculates boundary address of records in the searched data from the RDB (S500). Then, the copy program (2) calculates the key position for each searched record (S501 ). While step S102 in FIG. 10 calculates the key position for one record, step S501 in FIG. 17 performs the calculation for all records. After the calculations, the copy program (2) issues a put command
(S502). The calculated boundary information is included in the put command.
Also, the SQL statement which is used as a KVS key is also included.
[0083] On the KVS side, the put program (2) checks if all records are processed or not just after receiving the put command in step S105 (S503). If the result is "yes," the put program (2) sends a completion message to the requester and terminates the processing (S1 11). If the result is "no," the put program (2) determines one target record and updates the key table and the sql-key table (S504). By using the key position information notified from the copy program (2), the put program (2) obtains the key information, and stores the SQL statement and the obtained key information to the sql-key table. Then, the put program (2) splits the data by using RDB keys in step S106 and checks whether all data is stored in the KVS or not (S505). If the result is "yes," the put program (2) returns to step S503 and executes the same steps for next records.
[0084] FIG. 18 shows another example of a flow diagram illustrating a process for a get program (2) according to the second embodiment. When the SQL statement is used as KVS key and search result includes two or more records, the get program (2) which receives the SQL statement should be changed to search the result including two or more records. FIG. 18 contains some of the same steps as those in FIG. 11. Steps S600, S601 , and
S602 are different steps from those of the get program in FIG. 11.
[0085] The get program (2) receives the get command including the
SQL statement (S200) and obtains primary keys of the RDB from the sql-key table (S600). Then, the get program (2) checks if all records are processed or not (S601). This can be realized by using the number of keys obtained in step
S600. If the result is "no," the program determines one record as the processing target (S602). After that, the program executes search processing for one record in steps S201 to S205, which are the same as step S201 to
S205 in FIG. 11. After step S205, the program returns to step S601 to search the next record. If the result of step S601 is "yes," the program sends the result to the requester and terminates the processing (S206).
[0086] FIG. 19 shows another example of a flow diagram illustrating a process for a delete program (2) according to the second embodiment. The delete program (2) can receive the SQL statement as key and search the result including two or more records. FIG. 19 contains some of the same steps as those in FIG. 12. Steps S600, S601 , and S602 are different steps from those of the delete program in FIG. 12. Also, these steps are the same as steps S600, S601 , and S602 in FIG. 18. These steps are added between step S300 and step S301. If the result of step S302 is "yes," the program returns to step S601 instead of proceeding to step S308.
[0087] FIG. 20 shows another example of a flow diagram illustrating a process for an update program (2) according to the second embodiment. The update program (2) can receive the SQL statement as key and search the result including two or more records. FIG. 20 contains some of the same steps as those in FIG. 13. Steps S503 and S504 are different steps from those of the update program in FIG. 13. Also, these steps are the same as steps S503 and S504 in FIG. 17. These steps are added between step S400 and step S401. If the result of step S402 is "yes," the program returns to step S503 instead of proceeding to step S406.
[0088] FIG. 21 shows another example of a flow diagram illustrating a process for a partial update program (2) according to the second embodiment. The partial update program (2) can receive the SQL statement as key and search the result including two or more records. FIG. 21 contains some of the same steps as those in FIG. 14. Steps S600, S601 , and S602 are different steps from those of the partial update program in FIG. 14. Also, these steps are the same as steps S600, S601 , and S602 in FIG. 18. These steps are added between step S410 and S41 1. If the result of step S412 is "yes," the program returns to step S601 instead of proceeding to step S416.
[0089] Third Embodiment [0090] Second Method for Multiple Records Case
[0091] A second method for multi record KVS is used in a third embodiment. This method uses the SQL statement as a KVS key for the first record which is included in the search result, and the ID is used as a KVS key for the record after the second record. For example, SQL1 is used to store the record "ID: 1 , Name: Hitachi Taro, OfficelD:b, Area:Kanto,
Address:Yokohama" and ID:2 is used to store the record "ID:2, Name:Hitachi Hanako, OfficelD:b, Area:Kanto, Address: Yokohama."
[0092] FIG. 22 shows another example of data in a KVS with de- duplication according to a third embodiment of the invention. SQL1 is a SQL statement which was used to search three RDB records with ID:1 , ID:2, and ID:3. The requester notifies the only SQL statement for access. The data with ID: 1 can be searched by the SQL statement. However, the data with ID:2 or ID:3 are not linked with the SQL statement. Thus, "ID:2" is added to the value part of ID: 1 and ID:3 is added to the value part of ID:2. To do so, the data with ID:2 and ID:3 are searched. "Null" is added to the value part of ID:3 to indicate last record. The different programs including the put program, the get program, delete program, update program, and partial update program are changed to treat this case.
[0093] FIG. 23 shows another example of a flow diagram illustrating a process for a put program (3) according to the third embodiment. FIG. 23 contains some of the same steps as those in FIG. 17. Step S800, S801 , S802, S803, and S804 are different steps from those of the put program (2) in FIG. 17. The copy program (3) is the same as the copy program (2) described in FIG. 17. [0094] Step S800 determines one target record. Update of the sql-key table is not needed (as opposed to step S504 in FIG. 17), because the second method does not use the sql-key table. Steps S801 to S804 are executed between step S108 and step S1 10. If the result of step S108 is
"no," the put program (3) proceeds to step S801. The program checks if the target data is first data of the record or not. For example, in FIG. 22, "ID: 1 ,
Name:Hitachi Taro, OfficelD:b" is first data of one record and "ID:2,
Name:Hitachi Hanako, OfficelD:b" is first data of another record. If the result
"yes," the program obtains the key of the next record and adds it to the target data (S802). When the first data is "ID: 1 , Name:Hitachi Taro, OfficelD:b",
"ID:2" which is the key of the next record is obtained and added. Then the program checks if the target data is first data of the first record or not (S803).
In the example shown in FIG. 22, "ID:1 , Name:Hitachi Taro, OfficelD:b" is the first data of the first record. If the result is "yes," the program stores the target data as KVS data (S804), and the SQL statement is used as the key of the
KVS. If the result of step S801 or S803 is "no," the program proceeds to step
S109 and stores the data as KVS data (S109). The RDB key is used as the
KVS key. For example, ID:2, ID:3, OfficelD:a, or OfficelD:b is used as KVS key. Step S1 10 is performed after step S804 or after step S109.
[0095] FIG. 24 shows another example of a flow diagram illustrating a process for a get program (3) according to the third embodiment. FIG. 24 contains some of the same steps as those in FIG. 1 1. Step S700, S701 , and
S702 are different steps from those of the get program in FIG. 11. After the search of the KVS data (S203), the get program (3) extracts key information for the next record from the tail of the value part (S700). If the result of step S202 is "yes," the program proceeds to step 701 instead of proceeding to the step S205 (in FIG. 11 ). In step S701 , the program checks if the key of the next data is null or not (S701). If the result is "no," the program sets the key obtained in step S700 as the search key (S702) before returning to step S201.
[0096] Other configurations are described herein below. In the configuration mentioned above, the data stored in the RDB is copied, replicated, or migrated to the KVS. The copy program executed the copy, replication, or migration processing. The copy, replication, or migration processing can be executed as processes of the RDB or the KVS.
[0097] FIG. 28 shows another example of a flow diagram illustrating a process for a delete program (3) according to the third embodiment. The delete program (3) can be realized by modifying the get program (3) in FIG. 24. Step S305, S306, and S307 are added just after step S204. These steps are same as those of the delete program in FIG. 12. If the next key is null at S701 , the program reports the completion and terminates the processing (S308).
[0098] FIG. 29 shows another example of a flow diagram illustrating a process for an update program (3) according to the third embodiment. The update program (3) can be realized by modifying the put program (3) in FIG. 23. Steps S900 and S901 are executed instead of the S804 and S109, respectively. Steps S900 and S901 each overwrite the data instead of storing the data.
[0099] FIG. 30 shows another example of a flow diagram illustrating a process for a partial update program (3) according to the third embodiment.
The partial update program (3) can be realized by modifying the get program (3) in FIG. 24. Steps S404 and S405 are added just after step S700. These steps are the same as those of the delete program in FIG. 12. If the result of step S404 is "no," the program proceeds to step S204. If the result of step S404 is "yes," the program overwrites the data (S405). After the overwrite, the program proceeds to step S701 to execute the same step for the next record.
[0100] FIG. 25 illustrates different configurations in which the data stored in the RDB is copied, replicated, or migrated to the KVS. In a first configuration 10 (similar to FIG. 2), the program that executes the processing is independent from the databases. In a second configuration 20, the program is a part of the copy target database. In a third configuration 30, the program is a part of the copy source database.
[0101] FIG. 26 illustrates another example of a computer in which the method and apparatus of the invention may be applied. In this example, the storage device has the computing resource, control information, program unit and storage media (as opposed to the memory 103 in FIG. 1). General- purpose processor, ASIC (application specific integrated circuit), FPGA (Field Programmable Gate Array), or the like can be the computing resource. In this case, the key table, pointer table, and sql-key table are stored in the control information in the storage device. The put program, get program, delete program, update program, and partial update program are stored in the program unit and executed inside the storage device. The upper program is a user of the KVS. For example, the copy program mentioned above is an upper program. [0102] FIG. 27 illustrates yet another example of a computer in which the method and apparatus of the invention may be applied. In this example, the computer executes the upper program and the storage system executes the put program, get program, delete program, update program, and partial update program; they are separated. The computer has storage l/F
(interface) 108 and the storage system has storage l/F 202. These storage l/Fs are coupled via a network 110 and mediate communication between the computer and storage system. The put program, get program, delete program, update program, and partial update program are stored in a storage program 208 and executed by a processor 203. The key table, pointer table, and sql-key table are stored in storage control information 207.
[0103] As a combination of FIG. 26 and FIG. 27, different
configurations of installing the storage device in the storage system to manage various tables and programs and execute the programs can be considered.
[0104] Of course, the system configurations illustrated in FIGS. 1 , 26, and 27 are purely exemplary of information systems in which the present invention may be implemented, and the invention is not limited to a particular hardware configuration. The computers and storage systems implementing the invention can also have known I/O devices (e.g., CD and DVD drives, floppy disk drives, hard drives, etc.) which can store and read the modules, programs and data structures used to implement the above-described invention. These modules, programs and data structures can be encoded on such computer-readable media. For example, the data structures of the invention can be stored on computer-readable media independently of one or more computer-readable media on which reside the programs used in the invention. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include local area networks, wide area networks, e.g., the Internet, wireless networks, storage area networks, and the like.
[0105] In the description, numerous details are set forth for purposes of explanation in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that not all of these specific details are required in order to practice the present invention. It is also noted that the invention may be described as a process, which is usually depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. In addition, the order of the operations may be re-arranged.
[0106] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of embodiments of the invention may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out embodiments of the invention.
Furthermore, some embodiments of the invention may be performed solely in hardware, whereas other embodiments may be performed solely in software.
Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general purpose computer, based on instructions stored on a computer-readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.
[0107] From the foregoing, it will be apparent that the invention provides methods, apparatuses and programs stored on computer readable media for leveraging the RDB configuration in the KVS environment to reduce resource consumption. Additionally, while specific embodiments have been illustrated and described in this specification, those of ordinary skill in the art appreciate that any arrangement that is calculated to achieve the same purpose may be substituted for the specific embodiments disclosed. This disclosure is intended to cover any and all adaptations or variations of the present invention, and it is to be understood that the terms used in the following claims should not be construed to limit the invention to the specific embodiments disclosed in the specification. Rather, the scope of the invention is to be determined entirely by the following claims, which are to be construed in accordance with the established doctrines of claim interpretation, along with the full range of equivalents to which such claims are entitled.

Claims

WHAT IS CLAIMED IS:
1. A computer system comprising:
a memory storing a plurality of data records in a first database which has a first data structure configured by a plurality of tables; and
a processor configured, when the processor copies the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
2. The computer system according to claim 1 ,
wherein the second database has a second data structure which does not have a predefined data structure.
3. The computer system according to claim 2,
wherein the information of the first data structure includes a plurality of keys of the plurality of tables; and
wherein the processor is further configured to:
obtain a data record from the first database;
split the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and
store a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
4. The computer system according to claim 3,
wherein the processor is configured to store a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment; and
wherein the processor is configured not to store a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment.
5. The computer system according to claim 3,
wherein the processor is further configured to add a key of a next data segment of a data segment to the data segment.
6. The computer system according to claim 3, wherein the processor is further configured to:
obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and
store SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
7. The computer system according to claim 6, wherein the plurality of tables include a key table listing key and corresponding number of keys for each of the data records, and wherein the processor is configured to receive a get command including a SQL statement, and to: obtain from the SQL key information the one or more primary keys corresponding to the SQL statement included in the get command, then for each of the plurality of data records obtained using one or more SQL statements, and for each primary key of the one or more primary keys, obtain from the key table a number of keys corresponding to said each primary key, obtain data from the second database using said each primary key, (a) if the number of keys is one, then integrate the obtained data for said each primary key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table for said each primary key, and integrate the obtained data for said each primary key; and after each of the plurality of data records obtained using one or more SQL statements have been processed, send the integrated data in reply to the get command.
8. The computer system according to claim 3, wherein the processor is further configured to:
obtain a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and
use, if the data segment is first data in the obtained plurality of data records, the SQL statement as a key of the data segment in the second database.
9. The computer system according to claim 8, wherein the plurality of tables include a key table listing key and corresponding number of keys for each of the data records, and wherein the processor is configured to receive a get command including a key, and to:
obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a processing number of times and repeat performing of (a) using the next key, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, and check whether a next key extracted from a value of the obtained data is null or not, and if yes, then integrate the obtained data and send the integrated data in reply to the get command, and if no, then reset a processing number of times and repeat performing of (b) using the next key.
10. The computer system according to claim 1 , wherein the second data structure is a key-value-store (KVS) structure having KVS key and value, wherein the plurality of tables include a key table listing key and
corresponding number of keys for each of the data records, wherein the plurality of tables include a pointer table listing key and corresponding number of pointers representing a number of the data records sharing value incorporating the listed key, and wherein the processor is configured to receive a command and to:
if the command is a get command including a key, obtain from the key table a number of keys corresponding to the key included in the get command, obtain data from the second database using the key included in the get command, (a) if the number of keys is one, then send the obtained data in reply to the get command, and (b) if the number of keys is greater than one, then extract a next key from a value of the obtained data, use the extracted next key to obtain a next data from the second database, repeat the extracting and the using until a total number of keys used to extract data from the second database equals the number of keys obtained from the key table, integrate all the obtained data, and send the integrated data in reply to the get command;
if the command is a delete command including a key, obtain from the key table a number of keys corresponding to the key included in the delete command, and if the number of keys is greater than zero, then obtain data from the second database using the key included in the delete command, extract a next key from a value of the obtained data, decrement the number of pointers corresponding to the extracted next key listed in the pointer table, (c) if the number of pointers is zero, then delete the obtained data, and (d) if the number of pointers is not zero, then use the extracted next key to obtain a next data from the second database, extract another next key from a value of the obtained next data, decrement the number of pointers corresponding to the extracted next key listed in the pointer table, and repeat the using, the extracting, and the decrementing until the number of pointers is zero, and delete the obtained data, and (e) if a total number of keys used to extract data from the second database is below the number of keys obtained from the key table, then repeat performing of (d) until the total number of keys used to extract data from the second database equals the number of keys obtained from the key table; and
if the command is an update command including a key and update data, then for each data record obtained from the first database, split the data record to a plurality of data segments based on key information of the plurality of keys of the first database, check whether all the split data segments are stored in the second database or not, (f) if not all the split data segments are stored in the second database, then check whether the second database has data whose key is same as the key included in the update command, (g) if the second database does not have data whose key is the same as the key included in the update command, then send an error message in reply to the update command, (h) if the second database has data whose key is the same as the key included in the update command, then use the update data to overwrite data, and (i) check whether all the split data segments are stored in the second database or not, and if yes, then terminate processing of the update command, and if no, then repeat performing of (f) to (i).
11. A method of processing a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables, the method comprising, when copying the plurality of records from the first database to a second database: determining if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
12. The method according to claim 1 1 , wherein the information of the first data structure includes a plurality of keys of the plurality of tables, the method further comprising:
obtaining a data record from the first database;
splitting the data record to a plurality of data segments based on key information of the plurality of keys of the first database; and
storing a data segment of the plurality of data segments in the second database by using the plurality of keys included in the information of the first data structure as key information of the second data structure.
13. The method according to claim 12, further comprising:
storing a data segment of the plurality of data segments in the second database if the second database does not have a key, of the plurality of keys, associated to the data segment; and
not storing a data segment of the plurality of data segments in the second database if the second database has the key, of the plurality of keys, associated to the data segment.
14. The method according to claim 12, further comprising:
obtaining a plurality of data records from the first database by using one or more Structured Query Language (SQL) statements; and storing SQL key information which indicates relationship between each SQL statement and one or more primary keys in the plurality of keys.
15. A non-transitory computer-readable storage medium storing a plurality of instructions for controlling a data processor to process a plurality of data records stored in a memory of a computer system, the first database having a first data structure configured by a plurality of tables, the plurality of instructions comprising:
instructions that cause the data processor, when copying the plurality of records from the first database to a second database, to determine if a data segment included in a data record of the plurality of data records should be stored in the second database based on information of the first data structure of the first database.
PCT/US2015/020661 2015-03-16 2015-03-16 Method and apparatus for key-value-store de-duplication leveraging relational database configuration WO2016148680A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2015/020661 WO2016148680A1 (en) 2015-03-16 2015-03-16 Method and apparatus for key-value-store de-duplication leveraging relational database configuration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2015/020661 WO2016148680A1 (en) 2015-03-16 2015-03-16 Method and apparatus for key-value-store de-duplication leveraging relational database configuration

Publications (1)

Publication Number Publication Date
WO2016148680A1 true WO2016148680A1 (en) 2016-09-22

Family

ID=56920019

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/020661 WO2016148680A1 (en) 2015-03-16 2015-03-16 Method and apparatus for key-value-store de-duplication leveraging relational database configuration

Country Status (1)

Country Link
WO (1) WO2016148680A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581212A (en) * 2020-05-06 2020-08-25 深圳市朱墨科技有限公司 Data storage method, system, server and storage medium of relational database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267744A1 (en) * 2001-06-12 2004-12-30 Wolfgang Becker Changing the data structure which an application program in a computer system uses to access database systems
US20080189250A1 (en) * 2006-09-11 2008-08-07 Interdigital Technology Corporation Techniques for database structure and management
US20120054197A1 (en) * 2010-08-30 2012-03-01 Openwave Systems Inc. METHOD AND SYSTEM FOR STORING BINARY LARGE OBJECTS (BLObs) IN A DISTRIBUTED KEY-VALUE STORAGE SYSTEM

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040267744A1 (en) * 2001-06-12 2004-12-30 Wolfgang Becker Changing the data structure which an application program in a computer system uses to access database systems
US20080189250A1 (en) * 2006-09-11 2008-08-07 Interdigital Technology Corporation Techniques for database structure and management
US20120054197A1 (en) * 2010-08-30 2012-03-01 Openwave Systems Inc. METHOD AND SYSTEM FOR STORING BINARY LARGE OBJECTS (BLObs) IN A DISTRIBUTED KEY-VALUE STORAGE SYSTEM

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111581212A (en) * 2020-05-06 2020-08-25 深圳市朱墨科技有限公司 Data storage method, system, server and storage medium of relational database
CN111581212B (en) * 2020-05-06 2024-05-17 深圳市朱墨科技有限公司 Data storage method, system, server and storage medium of relational database

Similar Documents

Publication Publication Date Title
US10901861B2 (en) Systems and methods of restoring a dataset of a database for a point in time
US10061772B2 (en) Method and system for backup and recovery
JP6553822B2 (en) Dividing and moving ranges in distributed systems
EP3539021B1 (en) Formation and manipulation of test data in a database system
EP3026579B1 (en) Forced ordering of a dictionary storing row identifier values
EP3047397B1 (en) Mirroring, in memory, data from disk to improve query performance
US10262078B2 (en) Systems and methods for optimizing performance of graph operations
WO2018144255A1 (en) Systems, methods, and computer-readable media for a fast snapshot of application data in storage
JP2020525906A (en) Database tenant migration system and method
US20170185326A1 (en) Consistent transition from asynchronous to synchronous replication in hash-based storage systems
US9298727B2 (en) Plural architecture master data management
US8832022B2 (en) Transaction processing device, transaction processing method and transaction processing program
US10621071B2 (en) Formation and manipulation of test data in a database system
US20200201745A1 (en) Formation and manipulation of test data in a database system
US20140067776A1 (en) Method and System For Operating System File De-Duplication
US10970275B2 (en) System and methods for providing a data store having linked differential data structures
WO2016148680A1 (en) Method and apparatus for key-value-store de-duplication leveraging relational database configuration
US20150310127A1 (en) N-Way Inode Translation
US10242025B2 (en) Efficient differential techniques for metafiles
WO2016140658A1 (en) Non-volatile memory system having with keyvalue store database
JP4825504B2 (en) Data registration / retrieval system and data registration / retrieval method
Dobos et al. A comparative evaluation of nosql database systems
CN117076413B (en) Object multi-version storage system supporting multi-protocol intercommunication
JP5832592B1 (en) Data management device
US20230325378A1 (en) Online Migration From An Eventually Consistent System To A Strongly Consistent System

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15885710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15885710

Country of ref document: EP

Kind code of ref document: A1