JP6239697B2

JP6239697B2 - How to manage the database

Info

Publication number: JP6239697B2
Application number: JP2016123887A
Authority: JP
Inventors: 山田　浩之; 浩之山田
Original assignee: Murakumo Corp
Current assignee: Murakumo Corp
Priority date: 2016-06-22
Filing date: 2016-06-22
Publication date: 2017-11-29
Anticipated expiration: 2032-03-08
Also published as: JP2016184432A

Description

本発明は、データベースの管理方法、特に複数のマスタノードがネットワークにより階層的に接続されているデータベースの管理方法に関する。 The present invention relates to a database management method, and more particularly to a database management method in which a plurality of master nodes are hierarchically connected by a network.

本出願人は、トランザクションログを用いてデータベースを管理する技術に着目し、特開２００６−２９３９１０号公報（特許文献１）において、１対１のマスタ・スレーブ方式のデータ同期方法について提案し、さらにその後国際公開第２０１０／１０６９９１号（特許文献２）において、１対Ｎ個（Ｎは正整数）のマスタ・スレーブ方式のデータ同期方法の提案を行っている。 The present applicant pays attention to a technique for managing a database using a transaction log, and proposes a one-to-one master-slave data synchronization method in Japanese Patent Application Laid-Open No. 2006-293910 (Patent Document 1). Subsequently, International Publication No. 2010/106991 (Patent Document 2) proposes a 1-to-N (N is a positive integer) master-slave data synchronization method.

ここで、トランザクションログとは、データベースに加えられた変更の履歴を、データベースの記録領域とは異なる領域に記録することにより、変更の永続性を維持しながら同時に操作の高速性を実現する技術である。 Here, the transaction log is a technology that realizes high-speed operation while maintaining the persistence of changes by recording the history of changes made to the database in an area different from the recording area of the database. is there.

特に特許文献１は、トランザクションログを利用してレプリケーションシステムを実現する点に着目しており、一方特許文献２は、クライアントより検索指示を受領したスレーブノードがマスタノードにリクエストメッセージを送信し、所定時間以内にマスタノードから返信メッセージを受領しなかったときに、マスタノードに対してマスタデータベースの更新にかかる最新バージョンまでのトランザクションログを要求し、該要求を受領したマスタノードは、スレーブノードへ該トランザクションログを送信し、このログを参照してスレーブノードは自身のレプリカデータベースを更新するものであった。 In particular, Patent Document 1 focuses on realizing a replication system using a transaction log, while Patent Document 2 transmits a request message to a master node from a slave node that has received a search instruction from a client. When a reply message is not received from the master node within the time, the master node requests a transaction log up to the latest version for updating the master database, and the master node that has received the request sends the transaction log to the slave node. The transaction log is transmitted, and the slave node updates its replica database by referring to this log.

特開２００６−２９３９１０号公報JP 2006-293910 A 国際公開第２０１０／１０６９９１号International Publication No. 2010/106991

ところで、前記特許文献２では、そのノード構成が単一のマスタノードに対して複数のミラーノードを備えたネットワーク構成であることが前提だった。 By the way, in Patent Document 2, it was assumed that the node configuration is a network configuration including a plurality of mirror nodes for a single master node.

そのためミラーノードが独自にデータ更新命令（INSERT、UPDATE、DELETE）を実行することはないため、マスタノードからのトランザクションログを参照して自身のデータベースをアップデートしてやればよかった。 Therefore, since the mirror node does not independently execute data update instructions (INSERT, UPDATE, DELETE), it is only necessary to update its own database with reference to the transaction log from the master node.

一方、データベースの多様化・複雑化にともなって、マスタノードを複数有する所謂マルチマスタ方式が注目されるようになってきている。この点について、マスタノード同士に対称性があるネットワーク構成（マスタノード同士が並列関係にあるとき）では、マスタノード間の更新情報に順位付けを行い、マスタノード間の同期をとるための理論等が提案されているが全ノードの同期をとるための手続きが複雑となり、競合時の解決策が現実的でなかった。 On the other hand, with the diversification and complexity of databases, a so-called multi-master method having a plurality of master nodes has been attracting attention. In this regard, in a network configuration in which the master nodes are symmetrical (when the master nodes are in a parallel relationship), the update information between the master nodes is ranked and the theory for synchronizing between the master nodes, etc. However, the procedure for synchronizing all nodes is complicated, and the solution at the time of competition is not realistic.

本発明者はこのような点に鑑みて本発明をなし得たものであり、マスタノードを階層的に構築するとともに、その階層構造を利用して下位マスタノードの複数テーブルで更新が行われた場合でもノード間のデータベースの更新を確実かつ効率的に行うことのできるデ
ータベースの管理方法を実現することを技術的課題とする。 The present inventor has made the present invention in view of such points, and the master node is constructed hierarchically, and the hierarchical structure is used to update the plurality of tables of the lower master node. Even in this case, it is a technical problem to realize a database management method capable of reliably and efficiently updating a database between nodes.

本発明は、前記課題を解決するために、以下の手段を採用した。
本発明の第１の側面は、レコード更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、所定の下位マスタノードでデータベースの複数テーブルに対する更新命令が生じたときに、当該下位マスタノードのデータベース処理部が上位マスタノードに対して、自身のメモリ上に展開されたデータベースのテーブル毎のシャドウコピーとヒープタプルマップとの組み合わせを束にしたテーブル別書込セットとして生成・送信するステップと、前記上位マスタノードにおいて、前記下位マスタノードから受信した前記テーブル別書込セット中のテーブル毎のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているデータベースの該当するテーブルの該当行が別のテーブル別書込セット等で更新されているか否かを検証するステップと、前記更新がなされているときには前記所定の下位マスタノードから送信されたテーブル別書込セット全体をアボートし、更新がなされていないときには前記テーブル別書込セット中のテーブル毎の前記シャドウコピーを用いて上位マスタノードのデータベースの該当テーブルの該当行を更新するとともに、上位マスタノードのテーブル番号を含む更新記録をトランザクションログとして生成するステップと、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、前記下位マスタノードのトランザクションログ処理部は、前記で受信したトランザクションログに基づいて自身のデータベースの該当テーブルの該当行を更新するステップとからなるデータベースの管理方法である。 The present invention employs the following means in order to solve the above problems.
A first aspect of the present invention is a method for managing a write-once database having hierarchically upper and lower master nodes capable of record update, and an update instruction for a plurality of database tables is generated at a predetermined lower master node. When the database processing unit of the lower-level master node writes to the upper-level master node, a table-by-table write that combines a combination of a shadow copy and a heap tuple map for each database table developed in its own memory. A step of generating and sending as a set, and the upper master node compares the heap tuple map for each table in the table-specific writing set received from the lower master node with its own database, and is registered as a target. The corresponding row of the corresponding table of the selected database is a different table. A step of verifying whether or not the update is performed in another writing set, and when the updating is performed, the entire writing set by table transmitted from the predetermined lower master node is aborted and the update is not performed. Sometimes, using the shadow copy for each table in the table-by-table writing set, the corresponding row of the corresponding table in the database of the upper master node is updated, and an update record including the table number of the upper master node is generated as a transaction log. A step of delivering the transaction log to a lower-level master node including the lower-level master node of the transmission source, and the transaction log processing unit of the lower-level master node is based on the transaction log received in the above Update the corresponding row in the table That is a database management method comprising the steps.

本発明の第２の側面は、レコード更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、所定の下位マスタノードでデータベースの複数テーブルに対する更新命令が生じたときに、当該下位マスタノードのデータベース処理部が上位マスタノードに対して、自身のメモリ上に展開されたデータベースのテーブル番号を含むシャドウコピーとヒープタプルマップとを単一の統合書込セットとして生成・送信するステップと、前記上位マスタノードにおいて、前記下位マスタノードから受信した統合書込セット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているデータベースのテーブル番号に対応するテーブルの該当行が別の統合書込セット等で更新されているか否かを検証するステップと、いずれかのテーブルで前記更新がなされているときには当該統合書込セット全体をアボートし、更新がなされていないときには前記統合書込セットのシャドウコピーを用いて上位マスタノードのデータベースのテーブル番号に対応するテーブルを更新するとともに、上位マスタノードでテーブル番号を含む更新記録をトランザクションログとして生成するステップと、前記トランザクションログを前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、前記下位マスタノードのトランザクションログ処理部は、前記で受信したトランザクションログに基づいて自身のデータベースの該当テーブルの該当行を更新するステップとからなるデータベースの管理方法である。 A second aspect of the present invention is a method for managing a write-once database having hierarchically upper and lower master nodes capable of record update, and an update instruction for a plurality of database tables is generated at a predetermined lower master node. When the database processing unit of the lower master node, the shadow copy including the table number of the database expanded in its own memory and the heap tuple map are set as a single integrated writing set for the upper master node. In the upper master node, the heap tuple map in the integrated writing set received from the lower master node is compared with its own database, and the table number of the database registered as the target is obtained. The corresponding row in the corresponding table is updated with another integrated write set, etc. And if the update is made in any table, the entire integrated write set is aborted, and if no update is made, a shadow copy of the integrated write set is used Updating a table corresponding to the table number of the database of the master node, generating an update record including the table number as a transaction log in the upper master node, and a lower master including the lower master node of the transmission source The database management method includes a step of distributing to a node and a step in which the transaction log processing unit of the lower master node updates a corresponding row of a corresponding table of its own database based on the received transaction log.

本発明の第３の側面は、前記下位マスタノードにおけるテーブル別書込セットに含まれるテーブル毎のシャドウコピーは、新規の追加行のみで構成する第１の側面記載のデータベースの管理方法である。 A third aspect of the present invention is the database management method according to the first aspect, in which the shadow copy for each table included in the table-specific writing set in the lower master node is configured with only new additional rows.

本発明の第４の側面は、前記下位マスタノードにおける統合書込セットのシャドウコピーは、新規の追加行のみで構成する第２の側面記載のデータベースの管理方法である。 According to a fourth aspect of the present invention, there is provided the database management method according to the second aspect, wherein the shadow copy of the integrated writing set in the lower-level master node is configured by only new additional rows.

本発明の第５の側面は、前記下位マスタノードで前記テーブル別書込セットを生成している段階で当該下位マスタノードのマスタデータベースのテーブルに検索処理が実行され
たとき、前記下位マスタノードのデータベース処理部は、前記マスタデータベースのテーブルを参照するステップと、前記テーブルに対応するヒープタプルマップを参照して検索処理において該当行番号がエントリされているか否かを判定し、エントリされていないときには前記マスタデータベースのテーブルを直接の検索対象とし、前記該当行番号がエントリされているときには、前記エントリが削除指示であるか更新指示であるかを判定し、削除指示であるときには該当行番号を検索対象から除外し、更新指示である場合には前記ヒープタプルマップ内の前記テーブルに対応するシャドウコピーのエントリを検索対象とするステップを実行する第１または第３の側面に記載のデータベースの管理方法である。 According to a fifth aspect of the present invention, when a search process is executed on a master database table of the lower master node in a stage where the table-specific writing set is generated in the lower master node, The database processing unit refers to the table of the master database, and refers to the heap tuple map corresponding to the table to determine whether or not the corresponding row number is entered in the search process. The master database table is directly searched, and when the corresponding row number is entered, it is determined whether the entry is a delete instruction or an update instruction. If the entry is a delete instruction, the corresponding row number is searched. If it is an update instruction, the table in the heap tuple map is excluded. A first or managing database according to the third aspect performs the step of the search target entry shadow copy that corresponds to the table.

本発明の第６の側面は、前記下位マスタノードで前記統合書込セットを生成している段階で当該下位マスタノードのマスタデータベースのテーブルに検索処理が実行されたとき、前記下位マスタノードのデータベース処理部は、前記マスタデータベースのテーブルを参照するステップと、ヒープタプルマップを参照して検索処理において該当テーブルの該当行番号がエントリされているか否かを判定し、エントリされていないときには前記マスタデータベースを直接の検索対象とし、前記該当行番号がエントリされているときには、前記エントリが削除指示であるか更新指示であるかを判定し、削除指示であるときには該当行番号を検索対象から除外し、更新指示である場合には前記ヒープタプルマップ内のシャドウコピーのエントリを検索対象とするステップを実行する第２または第４の側面に記載のデータベースの管理方法である。 According to a sixth aspect of the present invention, when a search process is performed on a master database table of a lower master node in a stage where the integrated writing set is generated in the lower master node, the database of the lower master node The processing unit refers to the table of the master database, and refers to the heap tuple map to determine whether or not the corresponding row number of the corresponding table is entered in the search process. When the corresponding line number is entered, it is determined whether the entry is a delete instruction or an update instruction. When the entry is a delete instruction, the corresponding line number is excluded from the search target. If it is an update instruction, search for a shadow copy entry in the heap tuple map. A second or managing a database according to a fourth aspect to execute the steps of the elephants.

本発明の第７の側面は、前記下位マスタノードで前記テーブル別書込セットを生成している段階で当該下位マスタノードのマスタデータベースのテーブルに検索処理が実行されたとき、前記下位マスタノードのデータベース処理部は、前記マスタデータベースのテーブルを参照するステップと、前記テーブルに対応するヒープタプルマップの全体を参照して、検索対象となっている行番号のエントリを抽出してエントリされている行番号を全て削除されたものとして検索対象から除外するステップと、前記テーブルに対応するシャドウコピーを参照して、シャドウコピー内で追加されたエントリの行番号を参照して当該行番号のみを検索対象とするステップとを実行する第１または第３の側面に記載のデータベースの管理方法である。 According to a seventh aspect of the present invention, when a search process is performed on a master database table of the lower master node in a stage where the table-specific writing set is generated in the lower master node, The database processing unit refers to the table of the master database, refers to the entire heap tuple map corresponding to the table, extracts the entry of the row number to be searched, and enters the entered row. A step of excluding all numbers from being deleted as being deleted, and referring to the shadow copy corresponding to the table, referring to the row number of the entry added in the shadow copy, and searching only for that row number And a database management method according to the first or third aspect.

本発明の第８の側面は、前記下位マスタノードで前記統合書込セットを生成している段階で当該下位マスタノードのマスタデータベースのテーブルに検索処理が実行されたとき、前記下位マスタノードのデータベース処理部は、前記マスタデータベースのテーブルを参照するステップと、前記ヒープタプルマップの全体を参照して、検索対象となっているテーブル番号と行番号を持つエントリを抽出してエントリされている行番号を全て削除されたものとして検索対象から除外するステップと、前記シャドウコピーを参照して、シャドウコピー内で追加された検索対象となっているテーブルのエントリの行番号を参照して当該行番号のみを検索対象とするステップとを実行する第２または第４の側面に記載のデータベースの管理方法である。 According to an eighth aspect of the present invention, when a search process is performed on a master database table of a lower master node in a stage where the integrated write set is generated in the lower master node, the database of the lower master node The processing unit refers to the table of the master database, refers to the entire heap tuple map, extracts an entry having a table number and a row number to be searched, and the row number entered. Are excluded from the search target as all deleted, and the row number of the entry of the table that is the search target added in the shadow copy is referenced with reference to the shadow copy, and only that row number is referred to The database management method according to the second or fourth aspect, wherein the step of searching for is executed.

本発明の第９の側面は、前記下位マスタノードにおいてテーブル別書込セットまたは統合書込セットを生成したときに、これらを登録する下位マスタノード内のバックエンドメモリ（ＢＥＭ）の領域は、少なくとも上位マスタノードから配信されたトランザクションデータによる更新命令を実行するトランザクションログ処理部が参照できるようにし、前記トランザクションログ処理部は、前記バックエンドメモリ（ＢＥＭ）を参照して、この更新命令によって更新しようとしているテーブルの該当行が前記テーブル別書込セットの対応するテーブルのヒープタプルマップ（ＨＴＭ）に含まれている場合、または統合書込セットのヒープタプルマップ（ＨＴＭ）に含まれている場合には、当該ヒープタプルマップ（ＨＴＭ）を生成しているトランザクションをアボートさせる第１〜４の側面のいずれか１項に記載のデータベースの管理方法である。 According to a ninth aspect of the present invention, when the table-specific writing set or the integrated writing set is generated in the lower master node, the region of the back-end memory (BEM) in the lower master node that registers these is at least A transaction log processing unit that executes an update instruction based on transaction data distributed from a higher-order master node can be referred to, and the transaction log processing unit refers to the back-end memory (BEM) and tries to update by the update instruction. When the corresponding row of the table being included is included in the heap tuple map (HTM) of the corresponding table of the table-specific writing set, or when it is included in the heap tuple map (HTM) of the integrated writing set Is the token that generates the heap tuple map (HTM). It is a management method of a database according to any one of first to fourth aspects to abort the Nzakushon.

本発明の第１０の側面は、更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、いずれかの下位マスタノードのセッションにおいて、上位マスタノードに対して、当該下位のマスタノードのデータベースの更新対象となったテーブル情報を書込セットに登録して上位マスタノードに送信するステップと、前記上位マスタノードにおいて、データベースのテーブルに対するロック獲得命令が発生したときには、当該ロック獲得情報を下位マスタノードに通知するとともに、そのロック獲得情報を上位マスタノードに保持するステップと、前記上位マスタノードにおいて、前記下位マスタノードから受信した前記書込セット中のテーブル情報と前記で保持されたロック獲得情報とを比較して、競合するときには、前記書込セットをアボートするステップと、前記上位マスタノードからロック獲得情報を受信した前記下位マスタノードでは、ロック獲得情報と競合するトランザクションが存在しているときには、前記下位マスタノードにおいて当該トランザクションを廃棄するステップと、前記下位マスタノードにおいて前記上位マスタノードからのロック獲得情報に基づいて対象となるテーブルのロックを獲得するステップとからなる追記型データベースの管理方法である。 A tenth aspect of the present invention is a write-once database management method that hierarchically has upper and lower master nodes that can be updated. In any of the lower master node sessions, Registering the table information subject to update of the database of the lower master node in the writing set and transmitting it to the upper master node; and when the lock acquisition command for the database table is generated in the upper master node Notifying the lower master node of the lock acquisition information and holding the lock acquisition information in the upper master node; and in the upper master node, table information in the write set received from the lower master node; Compare with the lock acquisition information held in the above, and compete Sometimes, in the step of aborting the write set and in the lower master node that has received the lock acquisition information from the upper master node, if there is a transaction that conflicts with the lock acquisition information, the lower master node Is a write-once database management method comprising: a step of discarding a target table, and a step of acquiring a lock of a target table based on lock acquisition information from the upper master node in the lower master node.

本発明によれば、マルチマスタノードを階層的に構築し、下位マスタノードから上位マスタノードに対しては、自身のメモリ上に展開されたシャドウコピーとヒープタプルマップとを書込セットとして送信し、これを受信した上位マスタノードでは、当該行が既に別の書込セットによって更新されているか否かを検証して、更新されていないときには前記シャドウコピーとヒープタプルマップとを用いてデータベースの更新処理を行う。そして、その更新記録をトランザクションログとして下位マスタノードに送信することで、下位マスタノードから上位マスタノード、さらに上位マスタノードから配下の下位マスタノードに対して効率的に矛盾のないデータベース更新が可能となる。特に、上位マスタノードおよび下位マスタノードで複数のテーブルが更新されている場合に有効である。 According to the present invention, a multi-master node is constructed hierarchically, and a shadow copy and a heap tuple map developed in its own memory are transmitted as a writing set from a lower master node to an upper master node. The upper master node that has received this verifies whether the row has already been updated by another writing set, and updates the database using the shadow copy and the heap tuple map if it has not been updated. Process. By sending the update record as a transaction log to the lower-level master node, it is possible to efficiently update the database from the lower-level master node to the upper-level master node and from the higher-level master node to the lower-level master node under control efficiently. Become. This is particularly effective when a plurality of tables are updated in the upper master node and the lower master node.

本発明の実施形態１である階層的マスタノードのデータベース構造を示す概念図The conceptual diagram which shows the database structure of the hierarchical master node which is Embodiment 1 of this invention 実施形態１のマスタノードの機能ブロック図Functional block diagram of a master node according to the first embodiment 実施形態１のマスタノードのハードウエアブロック図Hardware block diagram of master node of embodiment 1 実施形態１の下位マスタノードのデータベースのページと生成される書込セット（ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ））の関係を示す図The figure which shows the relationship between the database page of the subordinate master node of Embodiment 1, and the write set (Heap tuple map (HTM) and shadow copy (SC)) produced | generated. 実施形態１において、下位マスタノードから送信された書込セットを用いて上位マスタノードを更新する際の説明図Explanatory drawing at the time of updating a high-order master node using the write set transmitted from the low-order master node in Embodiment 1. 実施形態１の上位マスタノードで生成されるトランザクションログを示す図The figure which shows the transaction log produced | generated by the high-order master node of Embodiment 1. 実施形態１の変形例で、書込セットをテーブル(テーブル別書込セット)毎に生成する場合の説明図Explanatory drawing in the case of producing | generating a writing set for every table (writing set according to table) in the modification of Embodiment 1. 実施形態１の変形例で、単一の書込セット中にテーブル番号を付加した場合（統合書込セット）の説明図Explanatory drawing in the case of adding a table number in a single writing set (integrated writing set) in a modification of the first embodiment 複数テーブルに対応したトランザクションログを示す図Diagram showing transaction log corresponding to multiple tables 実施形態２の階層的マスタノードのデータベース構造を示す概念図Conceptual diagram showing a database structure of a hierarchical master node according to the second embodiment. 実施形態２のマスタノードの機能ブロック図Functional block diagram of a master node according to the second embodiment 実施形態２の下位マスタノードのデータベースのページと生成される書込セット（ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ））の関係を示す図The figure which shows the relationship between the database page of the subordinate master node of Embodiment 2, and the write set (Heap tuple map (HTM) and shadow copy (SC)) produced | generated. 実施形態２の書込セットとマスタデータベース（１１ａ）との関係を示す図The figure which shows the relationship between the writing set of Embodiment 2, and a master database (11a). 実施形態２の上位マスタノードで生成されるトランザクションログを示す説明図Explanatory drawing which shows the transaction log produced | generated by the high-order master node of Embodiment 2. 実施形態２の階層的マスタノードのデータベース構造において、ロック獲得情報の伝搬方法の説明図Explanatory drawing of the propagation method of lock acquisition information in the database structure of the hierarchical master node of Embodiment 2. 実施形態２においてテーブル毎に書込セットを作成する場合の説明図Explanatory drawing when creating a writing set for each table in the second embodiment 実施形態２において書込セット中にテーブル情報を記録する場合の説明図Explanatory drawing when recording table information during writing set in the second embodiment

＜実施形態１＞
本発明を図に基づいて説明する。 <Embodiment 1>
The present invention will be described with reference to the drawings.

図１は、本実施形態の階層的マスタノードの構造を示している。同図に示すように、上位マスタノード（ＭＳ１０１）の下に階層的に下位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎや、ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）を有するノード構成となっている。各ノード（情報処理装置）にはデータベースを有している。また上位マスタノード（ＭＳ１０１）にはスレーブを有しているが、他の下位マスタノードにもスレーブを有していてもよい。このようなマスタ・スレーブ構成の場合には両者間のデータベースの更新には本出願人によるＰＣＴ／ＪＰ２０１０／０５４３１１（本出願人による特許文献２に係る先行出願）に記載された更新管理技術を適用することができる。 FIG. 1 shows the structure of the hierarchical master node of this embodiment. As shown in the figure, the node configuration has hierarchically lower master nodes (MS201, MS202... MS20n, MS301, MS302... MS30n) below the upper master node (MS101). Each node (information processing apparatus) has a database. The upper master node (MS101) has a slave, but other lower master nodes may have slaves. In the case of such a master / slave configuration, the update management technology described in PCT / JP2010 / 054311 by the present applicant (prior application related to Patent Document 2 by the present applicant) is applied to update the database between the two. can do.

上記特許文献２がマスタノードのトランザクションログを下位のノードに複製（レプリケーション）すればよかったのに対して、本実施形態では、階層的なマルチマスタノードで構成されており下位マスタノードでもアップデート命令が実行されていた場合、上位からのトランザクションログの参照だけでは全下位ノードの整合性を保つことができない点に着目した点が特徴である。以下に説明する。 In contrast to the above Patent Document 2 in which the transaction log of the master node is replicated (replicated) to a lower node, in this embodiment, it is composed of hierarchical multi-master nodes, and an update command is also sent to the lower master node. When executed, the feature is that the consistency of all the lower nodes cannot be maintained only by referring to the transaction log from the upper level. This will be described below.

図２は、下位マスタノード（ＭＳ２０１）の機能ブロック図であるが、上位マスタノード（ＭＳ１０１）も同様の機能を有している。 FIG. 2 is a functional block diagram of the lower master node (MS201), but the upper master node (MS101) has the same function.

同図に示すように、クライアント（ＣＬ）からデータベースの更新命令が入力されるとデータベース処理部（１１ｂ）は、メインメモリ（ＭＭ）上に構築されたバックエンドメモリ（ＢＥＭ）上で書込セットを生成する。この書込セットは図４に示すようにヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）とで構成される。ここでは、マスタデータベース（１１ａ）の行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな値（ｓｃ１）に書き換える（ＵＰＤＡＴＥ）する更新命令が入力されたものと仮定する。 As shown in the figure, when a database update command is input from the client (CL), the database processing unit (11b) writes a set on the back-end memory (BEM) built on the main memory (MM). Is generated. As shown in FIG. 4, this writing set includes a heap tuple map (HTM) and a shadow copy (SC). Here, it is assumed that an update command for deleting (DELETE) line number 4 in the master database (11a) and rewriting (UPDATE) line number 5 to a new value (sc1) is input.

このとき、データベース処理部１１ｂは、マスタデータベース（１１ａ）を参照しながら当該マスタデータベース（１１ａ）に直接書き込むことは行わずに、バックエンドメモリ（ＢＥＭ）で生成された書込セットを通信モジュール（１１ｄ）より上位マスタに送信する。 At this time, the database processing unit 11b does not write directly to the master database (11a) while referring to the master database (11a), but stores the write set generated in the back-end memory (BEM) as a communication module ( 11d) to the higher master.

このような処理は上位マスタノード（ＭＳ１０１）においても、下位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎや、ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）においても同様である。 Such processing is the same in the upper master node (MS101) and the lower master nodes (MS201, MS202... MS20n, MS301, MS302... MS30n).

図３は、前記機能を実現するためのハードウエア構成を示している。上位マスタノード（ＭＳ１０１）は汎用の情報処理装置であり、中央処理装置（ＣＰＵ）およびメインメモリ（ＭＭ）を中心にバス（ＢＵＳ）で接続された大規模記憶装置（ＨＤ）、マスタデータベース（１１ａ）、外部との通信のための通信インターフェース（Ｉ／Ｏ）（通信モジュール１１ｄ）を有している。なお、当該バス（ＢＵＳ）又は通信インターフェース（Ｉ／Ｏ）を介してクライアント端末（ＣＬ）が接続されており、命令を受け付けるようになっている。なお、マスタデータベース（１１ａ）は、大規模記憶装置（ＨＤ）上に構築され
ていてもよいし、メインメモリ（ＭＭ）上に構築されていてもよい。要するにマスタデータベース（１１ａ）の構築場所は限定されるものではない。 FIG. 3 shows a hardware configuration for realizing the functions. The upper master node (MS101) is a general-purpose information processing device, and a large-scale storage device (HD) and a master database (11a) connected by a bus (BUS) around a central processing unit (CPU) and a main memory (MM). ), And a communication interface (I / O) (communication module 11d) for communication with the outside. Note that a client terminal (CL) is connected via the bus (BUS) or communication interface (I / O), and receives commands. Note that the master database (11a) may be constructed on a large-scale storage device (HD) or may be constructed on a main memory (MM). In short, the construction location of the master database (11a) is not limited.

大規模記憶装置（ＨＤ）には、オペレーティングシステム（ＯＳ）とともに、アプリケーションプログラム（ＡＰＬ）が格納されており、当該プログラムをバス（ＢＵＳ）および主記憶装置（ＭＭ）を介して中央処理装置（ＣＰＵ）が読み込んで順次実行処理することによって、前述のマスタノードとしての機能が実現される。なお、説明は省略するが、下位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎや、ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）も同様の構成である。 An application program (APL) is stored in the large-scale storage device (HD) together with an operating system (OS), and the program is stored in a central processing unit (CPU) via a bus (BUS) and a main storage device (MM). ) Is read and sequentially executed, so that the function as the master node is realized. Although explanation is omitted, the subordinate master nodes (MS201, MS202... MS20n, MS301, MS302... MS30n) have the same configuration.

次に、図２で説明したデータベース処理部（１１ｂ）による処理を図４を用いてさらに詳細に説明する。なお、以下の説明では、下位マスタノードにおける書込セットの生成を簡略して説明するために、下位マスタノードにおいて所定のトランザクションに基づいて単一のテーブル上でしか更新が実行されていないことを前提として説明するが、巨大なデータベースシステムでは一つのトランザクションで複数のテーブルが更新される場合が通例であり、上位マスタノードはもとより下位マスタノードでも一つのトランザクションによって複数のテーブルが更新される。この点については図７〜図９を用いて後述する。 Next, the processing by the database processing unit (11b) described in FIG. 2 will be described in more detail with reference to FIG. In the following explanation, in order to simplify the generation of the write set in the lower master node, it is assumed that the update is executed only on a single table based on a predetermined transaction in the lower master node. As a premise, in a huge database system, a plurality of tables are usually updated by one transaction, and a plurality of tables are updated by one transaction not only at the upper master node but also at the lower master node. This point will be described later with reference to FIGS.

同図は、下位マスタノード（ＭＳ２０１）におけるマスタデータベース（１１ａ）と、書込セットとの関係を示している。マスタデータベース（１１ａ）は行番号と、命令内容と、ポインタとによって構成されており、新たな命令がクライアント端末（ＣＬ）からなされる毎に行番号が追加されていく追記型のデータベースである。同図の場合、前記で説明したように、行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな命令内容に書き換える（ｓｃ１にＵＰＤＡＴＥ）する場合を示している。 This figure shows the relationship between the master database (11a) in the lower master node (MS201) and the writing set. The master database (11a) is composed of line numbers, instruction contents, and pointers, and is a write-once database in which line numbers are added each time a new instruction is issued from the client terminal (CL). In the case of the same figure, as described above, the case where line number 4 is deleted (DELETE) and line number 5 is rewritten with new instruction contents (UPDATE to sc1) is shown.

下位マスタノード（ＭＳ２０１）においてクライアント端末（ＣＬ）からの命令によりマスタデータベースに対してこのような更新命令がなされると、前述のように、バックエンドメモリ（ＢＥＭ）上でヒープタプルマップ（ＨＴＭ、ヒープファイル）とシャドウコピー（ＳＣ）とからなる書込セットが生成される。 When such an update command is issued to the master database by a command from the client terminal (CL) in the lower master node (MS201), as described above, the heap tuple map (HTM, A writing set consisting of a heap file) and a shadow copy (SC) is generated.

ヒープタプルマップ（ＨＴＭ）には、元の行番号（ｃｔｉｄ）と、新しい行の行番号（ｓｃｔｉｄ）が関係付けられて登録されるようになっている。このようにヒープタプルマップ（ＨＴＭ）はデータベースの更新毎に追加生成されていく。なお、行番号５の命令内容（ｓｃ１）が書き込まれる行番号はこの段階ではまだ不明であるため、ｓｃｔｉｄには新しい命令（ｓｃ１）を書き込んでおく。 In the heap tuple map (HTM), the original line number (ctid) and the line number (sctid) of the new line are associated and registered. In this way, a heap tuple map (HTM) is additionally generated every time the database is updated. Note that since the line number in which the instruction content (sc1) of line number 5 is written is still unknown at this stage, a new instruction (sc1) is written in sctid.

一方、シャドウコピー（ＳＣ）には、マスタデータベース（１１ａ）を参照して書き換えられるべき行番号のシャドウコピーを生成する。このとき、新たに追加される行番号はこの段階では不明であるので、行番号には新たな命令（ｓｃ１）を書き込んでおく。 On the other hand, for the shadow copy (SC), a shadow copy of the line number to be rewritten with reference to the master database (11a) is generated. At this time, since the newly added line number is unknown at this stage, a new command (sc1) is written in the line number.

なお、この段階で下位マスタノード（ＭＳ２０１）のデータベース処理部（１１ｂ）は、ヒープタプルマップ（ＨＴＭ）の生成によりＤＥＬＥＴＥ命令が適用される行番号４と、ＵＰＤＡＴＥ命令が適用される旧行番号５は削除されることが既にわかるため、シャドウコピー（ＳＣ）としては新たな命令（ｓｃ１）だけを書き込んでおいてもよい。 At this stage, the database processing unit (11b) of the lower level master node (MS201) uses the line number 4 to which the DELETE instruction is applied by generating the heap tuple map (HTM) and the old line number 5 to which the UPDATE instruction is applied. Since it is already known that the command is deleted, only a new command (sc1) may be written as the shadow copy (SC).

このようにして生成された書込セットは、当該下位マスタノード（ＭＳ２０１）から上位マスタノード（ＭＳ１０１）に送信される。 The writing set generated in this way is transmitted from the lower master node (MS201) to the upper master node (MS101).

上位マスタノード（ＭＳ１０１）において、データベース処理部１１ｂ（中央処理装置（ＣＰＵ））は、前記下位マスタノード（ＭＳ２０１）から前記書込セットを受信すると
、前記更新命令にともなってトランザクションログ処理部（１１ｃ）を起動してトランザクションログの生成を開始する。そして、前記で受信した書込セットからヒープタプルマップ（ＨＴＭ）を読み出して、自身のマスタデータベース（１１ａ）と比較する。ここで、ターゲットとなっているタプル（ここでは行番号４，５および７）の内容がマスタデータベース（１１ａ）上で更新されているか否かを検証する。図５では、行番号４〜６については未更新であるため、行番号４に削除ポインタを付与し、書き換えられる旧番号５にも削除ポインタを付与する。そして、新たな行番号７に新しい命令（ｓｃ１）が書き込まれる。 In the upper master node (MS101), when the database processing unit 11b (central processing unit (CPU)) receives the write set from the lower master node (MS201), the transaction log processing unit (11c) ) To start transaction log generation. Then, the heap tuple map (HTM) is read from the received writing set and compared with its own master database (11a). Here, it is verified whether or not the contents of the target tuple (here, line numbers 4, 5, and 7) are updated on the master database (11a). In FIG. 5, since the line numbers 4 to 6 are not updated, a deletion pointer is assigned to the line number 4 and a deletion pointer is also assigned to the old number 5 to be rewritten. Then, a new command (sc1) is written in the new line number 7.

一方、下位マスタノード（Ｍ２０１）からの書込セット中のヒープタプルマップ（ＨＴＭ）と自身のデータベースを比較して、上位マスタノード（Ｍ２０１）において既に別の書込セットによって当該行が更新されているときには、当該書込セットによる処理はアボート（中断）される。 On the other hand, the heap tuple map (HTM) in the writing set from the lower master node (M201) is compared with its own database, and the row is already updated by another writing set in the upper master node (M201). If so, the processing by the writing set is aborted.

図６は、上位マスタノード（ＭＳ１０１）のマスタデータベース（１１ａ）が上記により更新されたときにトランザクションログ処理部（１１ｃ）で生成されるトランザクションログの一例である。このトランザクションログは、少なくとも命令とトランザクション内容（行番号とそれに対する実行処理内容）が時系列で連続的に記録されたファイルである。 FIG. 6 is an example of a transaction log generated by the transaction log processing unit (11c) when the master database (11a) of the upper master node (MS101) is updated as described above. This transaction log is a file in which at least instructions and transaction contents (line numbers and execution process contents corresponding thereto) are continuously recorded in time series.

同図によればトランザクションの開始命令（ＸＢ１）に続いて、命令番号と行番号とが対になったログが順次生成されている。たとえば、最初にＤＥＬＥＴＥ命令（Ｄ１）として行番号４を削除し（Ｄ１４）、次にＵＰＤＡＴＥ命令（Ｕ１）として行番号５を削除し行番号７を追加し（Ｕ１５７）、最後にこれらのコミット命令（ＸＣ１）を発行する。 According to the figure, following the transaction start instruction (XB1), a log in which an instruction number and a line number are paired is sequentially generated. For example, line number 4 is deleted first as a DELETE instruction (D1) (D14), line number 5 is then deleted as UPDATE instruction (U1), line number 7 is added (U157), and finally these commit instructions Issue (XC1).

このトランザクションログは、通信モジュール（１１ｄ）より前記送信元の下位マスタノード（ＭＳ２０１）をはじめ、すべての下位マスタノード（ＭＳ２０２・・・ＭＳ２０ｎや、ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）に対して配信される。 This transaction log is distributed from the communication module (11d) to the lower master node (MS201) of the transmission source and all lower master nodes (MS202 ... MS20n, MS301, MS302 ... MS30n). The

前記トランザクションログを受信した下位マスタノードでは、当該トランザクションログを自身のデータベースに複製（レプリケーション）する。 The lower master node that has received the transaction log replicates (replicates) the transaction log to its own database.

具体的には、下位マスタノード（たとえばＭ２０２）が図６に示したトランザクションログを通信モジュール（１１ｄ）で受信すると、トランザクションログ処理部（１１ｃ）を起動してこのトランザクションログを自身のマスタデータベース（１１ａ）にレプリケーションする。この結果、行番号４と５に削除ポインタが付与され、新たな行番号７が追加される。 Specifically, when the lower level master node (for example, M202) receives the transaction log shown in FIG. 6 by the communication module (11d), the transaction log processing unit (11c) is activated and this transaction log is stored in its own master database ( Replication to 11a). As a result, deletion pointers are assigned to the line numbers 4 and 5, and a new line number 7 is added.

このように、下位マスタノードでは上位マスタノードから送信されるトランザクションログのレプリケーションによって統一的にデータベースが管理されることになる。 As described above, the lower master node uniformly manages the database by replication of the transaction log transmitted from the upper master node.

以上の説明では、下位マスタノードにおける書込セットの生成を簡略して説明するために、下位マスタノードにおいて所定のトランザクションに基づいて単一のテーブル上でしか更新が実行されていないことを前提としたが、次に図７〜図９を用いて複数テーブルが更新された場合について説明する。 In the above description, in order to simplify the generation of the write set in the lower master node, it is assumed that the update is executed only on a single table based on a predetermined transaction in the lower master node. However, the case where a plurality of tables are updated will be described with reference to FIGS.

図７および図８はこのような複数のテーブルに対応したものであり、図７はテーブル別（図ではＴ１〜Ｔ４）のヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）との組み合わせを束にした書込セット（以下、「テーブル別書込セット」という）として上位マスタノードに送信する方法を示している。また、図８はヒープタプルマップ（ＨＴM）と
シャドウコピー（ＳＣ）の組み合わせ中にテーブル番号を記入して単一の書込セット（以下、「統合書込セット」という）として上位マスタノードに送信する方法を示している。 7 and 8 correspond to such a plurality of tables, and FIG. 7 shows a combination of heap tuple maps (HTM) and shadow copies (SC) for each table (T1 to T4 in the figure). It shows a method of transmitting to a higher-order master node as a written set (hereinafter referred to as “table-specific write set”). FIG. 8 shows the table number in the combination of the heap tuple map (HTM) and the shadow copy (SC), and sends it to the upper master node as a single writing set (hereinafter referred to as “integrated writing set”). Shows how to do.

このようなテーブル別書込セット（図７）および統合書込セット（図８）においても、ヒープタプルマップ（ＨＴＭ）の生成によりＤＥＬＥＴＥ命令が適用される行番号とＵＰＤＡＴＥ命令が適用される旧行番号は削除されることが既にわかるため、シャドウコピー（ＳＣ）としては新たな命令だけを書き込んでおいてもよいことは単一テーブルの場合と同様である。 In such a table-specific writing set (FIG. 7) and integrated writing set (FIG. 8), the line number to which the DELETE instruction is applied and the old line to which the UPDATE instruction is applied by generating the heap tuple map (HTM). Since it is already known that the number is deleted, only a new instruction may be written as a shadow copy (SC), as in the case of a single table.

図７に示す例において、たとえばあるトランザクションによってテーブルＴ１およびＴ２のそれぞれが所定の下位マスタノードで更新されたとすると、このテーブルＴ１とＴ２のそれぞれについてヒープタプルマップ（ＨＴM）とシャドウコピー（ＳＣ）との組み合
わせが生成され、この束がテーブル別書込セットとなる。 In the example shown in FIG. 7, if each of the tables T1 and T2 is updated at a predetermined lower master node by a certain transaction, for example, a heap tuple map (HTM) and a shadow copy (SC) for each of the tables T1 and T2 Are generated, and this bundle becomes a table-specific writing set.

テーブル別書込セットが上位マスタノードで受信されると、上位マスタノードではこのテーブル別書込セットの内容を自身のデータベースに反映させる。このとき、まずテーブル毎のヒープタプルマップ（ＨＴＭ）に基づいて自身のデータベースにアクセスして当該タプルが既に別のテーブル別書込セット（通常の書込セットまたは後述の統合書込セットを含む）により更新されていないかをチェックする。このとき、更新されていない場合は、このテーブル別書込セット中のテーブル毎のシャドウコピーを参照し、当該タプルを更新する。他方、既に更新されていることを検出した場合には、このテーブル別書込セット全体をアボートする。具体的には、一つのテーブル（たとえばＴ１）のヒープタプルマップ（ＨＴＭ）に対する上位マスタノードの該当行との間の競合が検出された場合、つまりテーブルＴ１のヒープタプルマップ（ＨＴＭ）の該当行が既に他の書込セット等により更新されているときは、当該テーブル別書込セット全体（Ｔ１に対するヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせ、およびテーブルＴ２に対するヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせの双方）をアボートする。この理由は、テーブル別書込セットに含まれるテーブル毎のヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせは単一のトランザクションに基づいて生成されているので、そのトランザクションの全ての処理が上位マスタノードのデータベースに反映されるか、あるいは全く反映されないものでなければデータベース内容に矛盾を生じてしまう。したがって、たとえ一つのテーブル（Ｔ１）についてのヒープタプルマップ（ＨＴＭ）でも上位マスタノードのデータベースの更新情報と競合することが検出された場合には、テーブル別書込セット（Ｔ１およびＴ２それぞれに対するヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせ）全体をアボートしなければならない。 When the table-by-table writing set is received by the upper master node, the upper master node reflects the contents of the table-by-table writing set in its own database. At this time, first, the own database is accessed based on the heap tuple map (HTM) for each table, and the tuple is already in another table-specific writing set (including a normal writing set or an integrated writing set described later). To check if it has been updated. At this time, if not updated, the shadow copy for each table in the writing set for each table is referred to and the tuple is updated. On the other hand, when it is detected that the table has already been updated, the entire writing set for each table is aborted. Specifically, when a conflict with the corresponding row of the upper master node for the heap tuple map (HTM) of one table (for example, T1) is detected, that is, the corresponding row of the heap tuple map (HTM) of the table T1. Is already updated by another writing set or the like, the entire writing set by table (a combination of heap tuple map (HTM) and shadow copy (SC) for T1, and heap tuple map (HTM) for table T2). ) And shadow copy (SC) combination). This is because the combination of the heap tuple map (HTM) and shadow copy (SC) for each table included in the table-specific writing set is generated based on a single transaction. If it is reflected in the database of the upper master node or not reflected at all, the database contents will be inconsistent. Therefore, even if it is detected that the heap tuple map (HTM) for one table (T1) conflicts with the update information of the database of the upper master node, the heap for each table write set (T1 and T2 respectively) The entire tuple map (HTM) and shadow copy (SC) must be aborted.

図８に示すヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせ中にテーブル番号を記載した統合書込セットを上位マスタノードに適用する場合にも同様である。すなわち、図８の統合書込セットが上位マスタノードで受信されると、ヒープタプルマップ(HTM)のエントリそれぞれについて、エントリに含まれるテーブル番号に該当する
テーブルの該当行がすでに別の統合書込セット等(通常の書込セットまたは前述のテーブ
ル別書込セット)により更新されていないかをチェックする。このとき、更新されていな
い場合は、この統合書込セットのシャドウコピーを参照して、当該テーブルの当該タプルを更新する。他方、すでに更新されていることを検出した場合には、この統合書込セット全体をアボートする。たとえばテーブルT1のヒープタプルマップ（ＨＴＭ）の該当行が既に他の統合書込セット等（通常の書込セットまたは前述のテーブル別書込セットを含む）により更新されているときは、この統合書込セット全体がアボートされる。 The same applies to the case where the integrated write set in which the table number is described in the combination of the heap tuple map (HTM) and the shadow copy (SC) shown in FIG. 8 is applied to the upper master node. That is, when the integrated write set of FIG. 8 is received by the upper master node, for each entry of the heap tuple map (HTM), the corresponding row of the table corresponding to the table number included in the entry is already another integrated write. It is checked whether it has been updated by a set or the like (normal writing set or the above-mentioned writing set by table). At this time, if not updated, the tuple of the table is updated with reference to the shadow copy of the integrated writing set. On the other hand, if it is detected that it has already been updated, the entire integrated writing set is aborted. For example, when the corresponding row of the heap tuple map (HTM) of table T1 has already been updated by another integrated writing set or the like (including the normal writing set or the above-mentioned table-specific writing set), this integrated document The entire set is aborted.

以上のように下位マスタノードで複数のテーブル情報を含むテーブル別書込セットまたは統合書込セット（図７および図８の場合）によって上位マスタノードが更新された場合
、上位マスタノードで生成されるトランザクションログもテーブル番号が付加されたフォーマットとなる。図９はその一例を示したものである。 As described above, when the upper master node is updated by the table-specific writing set or the integrated writing set (in the case of FIGS. 7 and 8) including a plurality of table information in the lower master node, it is generated in the upper master node. The transaction log also has a format with a table number added. FIG. 9 shows an example.

同図によれば、トランザクションログのフォーマットは「ＸＢ１」、「Ｄ１１４」、「Ｕ１２５７」、「ＸＣ１」であるが、これはそれぞれ以下の意味を有している。 According to the figure, the format of the transaction log is “XB1”, “D114”, “U1257”, “XC1”, which have the following meanings.

トランザクション１の開始命令（ＸＢ１）に続いて、命令番号とテーブル番号と行番号とが組み合わされたログが順次生成されている。たとえば、最初にトランザクション１のＤＥＬＥＴＥ命令（Ｄ１）としてテーブル１の行番号４を削除し（Ｄ１１４）、次にトランザクション１のＵＰＤＡＴＥ命令（Ｕ１）としてテーブル２の行番号５を削除し行番号７を追加し（Ｕ１２５７）最後にこれらのコミット命令（ＸＣ１）を発行する。 Subsequent to the transaction 1 start instruction (XB1), a log in which an instruction number, a table number, and a row number are combined is sequentially generated. For example, first, the line number 4 of the table 1 is deleted as the DELETE instruction (D1) of the transaction 1 (D114), and then the line number 5 of the table 2 is deleted as the UPDATE instruction (U1) of the transaction 1. Add (U1257) and finally issue these commit instructions (XC1).

具体的には、下位マスタノード（たとえばＭ２０２）が図９に示したトランザクションログを通信モジュール（１１ｄ）で受信すると、トランザクションログ処理部（１１ｃ）を起動してこのトランザクションログを自身のマスタデータベース（１１ａ）にレプリケーションする。この結果、トランザクション１としてテーブル１の行番号４に削除ポインタが付与され、テーブル２の行番号５に削除ポインタが付与されるとともに新たな行番号７が追加される。 Specifically, when the lower level master node (for example, M202) receives the transaction log shown in FIG. 9 by the communication module (11d), the transaction log processing unit (11c) is activated and the transaction log is stored in its own master database ( Replication to 11a). As a result, a deletion pointer is assigned to the line number 4 of the table 1 as the transaction 1, a deletion pointer is assigned to the line number 5 of the table 2, and a new line number 7 is added.

以上、本発明を実施形態に基づいて説明したが、本発明はこれに限定されるものではない。以下、その変形例について説明する。 As mentioned above, although this invention was demonstrated based on embodiment, this invention is not limited to this. Hereinafter, the modification is demonstrated.

＜上位マスタノードＭＳ１０１でデータベースの更新が行われる場合＞
下位マスタノード（たとえばＭＳ２０１）でマスタデータベースの更新命令が発生した場合の処理については、図２で説明したように、バックエンドメモリ（ＢＥＭ）上でヒープタプルマップ（ＨＴＭ、ヒープファイル）とシャドウコピー（ＳＣ）とからなる書込セット（複数テーブルの場合には図７に示すテーブル別書込セットまたは図８に示す統合書込セット）が生成されるが、上位マスタノード（ＭＳ１０１）でマスタデータベースの更新命令が発生した場合には、上位ノードに通知する必要がないため、書込セットは生成されない。すなわち、このような場合、上位ノード（ＭＳ１０１）では、図５の左図に示すようにマスタデータベース（１１ａ）に対して直接更新データの書込が行われるとともに、図６に示すトランザクションログ（複数テーブルの場合には図９に示すトランザクションログ）が生成される。このトランザクションログは下位マスタノードに配信され、前記トランザクションログを受信した下位マスタノードでは、当該トランザクションログを自身のマスタデータベースにレプリケーションする。 <When the database is updated in the upper master node MS101>
As described with reference to FIG. 2, the processing when a master database update command is generated in a lower master node (for example, MS 201) is a heap tuple map (HTM, heap file) and shadow copy on the back-end memory (BEM). (SC) is generated (in the case of a plurality of tables, the table-specific writing set shown in FIG. 7 or the integrated writing set shown in FIG. 8) is generated by the master database (MS101). When the update instruction is generated, it is not necessary to notify the upper node, and therefore the writing set is not generated. That is, in such a case, the upper node (MS 101) writes update data directly to the master database (11a) as shown in the left diagram of FIG. In the case of a table, a transaction log shown in FIG. 9 is generated. This transaction log is distributed to the lower master node, and the lower master node receiving the transaction log replicates the transaction log to its own master database.

＜下位マスタノードで書込セットを生成している段階で検索処理が実行されたとき＞
下位マスタノード（たとえばＭＳ２０１）において、図４に示すような書込セットを生成している段階で、当該下位マスタノードのマスタデータベースに対して検索が実行されたとき、書込セットの生成された行番号以外の行番号を対象とした検索であれば問題はないが、該当行（ここでは行番号４および行番号５）に対する検索が実行された場合、これらの行番号は既に削除されているため、検索対象にはできない。 <When search processing is executed while a write set is being generated on a lower master node>
When a search is executed on the master database of the subordinate master node in a stage where the subordinate master node (for example, MS 201) is generating a write set as shown in FIG. 4, the write set is generated. There is no problem as long as the search is for a line number other than the line number, but if a search for the corresponding line (here, line number 4 and line number 5) is executed, these line numbers are already deleted. Therefore, it cannot be searched.

このとき、下位マスタノードで生成されている書込セットが図７や図８に示すような複数テーブルを前提としている場合も同様であるので、検索対象にはできない。 At this time, the same applies to the case where the writing set generated in the lower-level master node is based on a plurality of tables as shown in FIG. 7 and FIG.

このような場合には以下の２通りの対応が考えられる。
第１の類型は、データベース処理部（１１ｂ）は、マスタデータベース（１１ａ）を参照した後に、ヒープタプルマップ（ＨＴＭ）を参照する。そしてこのヒープタプルマップ（ＨＴＭ）上で検索該当行番号がエントリされているか否かをチェックする。そしてエントリがあった場合には当該エントリが削除か更新かを判定し、更新の場合にはさらにシャドウコピー（ＳＣ）を参照して当該シャドウコピー（ＳＣ）のエントリ（ｓｃ１）を検索対象とする。たとえば図４に示す例で、検索対象が行番号３である場合、データベース処理部（１１ｂ）は、メインメモリ（ＭＭ）上に構築されたバックエンドメモリ（ＢＥＭ）上の書込セット内のヒープタプルマップ（ＨＴＭ）を参照して、該当行（行番号３）がエントリされているか否かを判定する。図４の例では、該当行はエントリされていない。その場合には、マスタデータベース１１ａに直接アクセスして該当行（行番号３）を検索する。 In such a case, the following two ways can be considered.
In the first type, the database processing unit (11b) refers to the heap tuple map (HTM) after referring to the master database (11a). Then, it is checked whether or not the retrieval relevant line number is entered on the heap tuple map (HTM). If there is an entry, it is determined whether the entry is deleted or updated. If the entry is updated, the shadow copy (SC) entry (sc1) is further searched with reference to the shadow copy (SC). . For example, in the example shown in FIG. 4, when the search target is line number 3, the database processing unit (11 b) uses the heap in the writing set on the back-end memory (BEM) built on the main memory (MM). With reference to the tuple map (HTM), it is determined whether or not the corresponding line (line number 3) has been entered. In the example of FIG. 4, the corresponding line is not entered. In that case, the master database 11a is directly accessed to search for the corresponding line (line number 3).

一方、図４に示す例で、検索対象が行番号４である場合、データベース処理部（１１ｂ）は書込セット内のヒープタプルマップ（ＨＴＭ）を参照したときに、該当行（行番号４）がエントリされていることを検出する。この場合、マスタデータベース１１ａにアクセスしても、残存する該当行は既に削除する更新命令の対象となっているため、検索対象とはならない。データベース処理部（１１ｂ）は、ヒープタプルマップ（ＨＴＭ）を参照して該当行（行番号４）が削除されていることを検出する。このように、検索対象行が削除されているため、データベース処理部（１１ｂ）は該当行を検索対象とはしない。 On the other hand, in the example shown in FIG. 4, when the search target is line number 4, when the database processing unit (11b) refers to the heap tuple map (HTM) in the writing set, the corresponding line (line number 4). Detect that is entered. In this case, even if the master database 11a is accessed, the remaining corresponding row is already a target of an update command to be deleted, and thus is not a search target. The database processing unit (11b) refers to the heap tuple map (HTM) and detects that the corresponding line (line number 4) has been deleted. Thus, since the search target line is deleted, the database processing unit (11b) does not set the corresponding line as the search target.

一方、図４に示す例で、検索対象が行番号５である場合、上記と同様に、ヒープタプルマップ（ＨＴＭ）を参照して該当行（行番号５）に対応するシャドウコピー（ＳＣ）のエントリ（ｓｃ１）が作成されていることを検出する。 On the other hand, in the example shown in FIG. 4, when the search target is the line number 5, the shadow copy (SC) corresponding to the corresponding line (line number 5) is referred to the heap tuple map (HTM) as described above. It detects that the entry (sc1) has been created.

このとき、データベース処理部（１１ｂ）は、シャドウコピー（ＳＣ）を参照して、行番号５を書き換えたエントリ（ｓｃ１）を検索対象とすればよい。 At this time, the database processing unit (11b) may refer to the shadow copy (SC) and search for the entry (sc1) with line number 5 rewritten.

以上単一のテーブルを更新する場合について説明したが、複数テーブルを更新する場合についても同様である。 Although the case of updating a single table has been described above, the same applies to the case of updating a plurality of tables.

具体的には、図７のテーブル別書込セットの場合には、検索の対象とするテーブルに対応するヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせを取り出して用いる。 Specifically, in the case of the table-by-table writing set in FIG. 7, a combination of a heap tuple map (HTM) and a shadow copy (SC) corresponding to a table to be searched is extracted and used.

マスタデータベースの検索対象のテーブルを参照した後に、そのテーブルに対応するヒープタプルマップ（ＨＴＭ）を参照し、検索該当行番号がエントリされているか否かをチェックする。エントリが存在しない場合は、マスタデータベースの検索対象テーブルの該当行を検索対象とする。そしてエントリがあった場合には当該エントリが削除か更新かを判定し、更新の場合にはさらにシャドウコピー（ＳＣ）を参照して当該シャドウコピー（ＳＣ）のエントリを検索対象とする。削除の場合には、該当行を検索対象とはしない。 After referring to a table to be searched in the master database, a heap tuple map (HTM) corresponding to the table is referred to and it is checked whether or not the row number to be searched is entered. If no entry exists, the corresponding row in the search target table of the master database is set as the search target. If there is an entry, it is determined whether the entry is deleted or updated. If the entry is updated, the shadow copy (SC) entry is further searched with reference to the shadow copy (SC). In the case of deletion, the corresponding line is not searched.

図８の統合書込セットの場合には、ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）のエントリのうち、検索対象テーブルに対応するテーブル番号が付与されたエントリを取り出して用いる。 In the case of the integrated write set of FIG. 8, the entry assigned the table number corresponding to the search target table is extracted from the heap tuple map (HTM) and shadow copy (SC) entries.

マスタデータベースの検索対象のテーブルを参照した後に、上記で取り出した検索対象
テーブルに対応するヒープタプルマップ（ＨＴＭ）のエントリを参照し、検索該当行番号が含まれているか否かをチェックする。エントリが含まれていない場合には、マスタデータベースの検索対象テーブルの該当行を検索対象とする。そしてエントリが含まれている場合には当該エントリが削除か更新かを判定し、更新の場合にはさらに上記で取り出したシャドウコピー（ＳＣ）のエントリを参照して当該エントリを検索対象とする。また、削除の場合には、該当行を検索対象とはしない。 After referring to the table to be searched in the master database, the entry of the heap tuple map (HTM) corresponding to the table to be searched extracted above is referenced to check whether or not the corresponding row number is included. If no entry is included, the corresponding row in the search target table of the master database is set as the search target. If the entry is included, it is determined whether the entry is deleted or updated. If the entry is updated, the entry of the shadow copy (SC) extracted as described above is further referred to as a search target. In the case of deletion, the corresponding line is not a search target.

第２の類型では、データベース処理部（１１ｂ）はまずマスタデータベース（１１ａ）を参照した後に、ヒープタプルマップ（ＨＴＭ）の全体を参照する。このとき、検索対象となっている行がエントリされているか否かをチェックし、エントリされている行番号（ここでは行番号４および５）を全て削除されたものとする（検索対象から除外する）。次に、データベース処理部１１ｂは、シャドウコピー（ＳＣ）を参照して、シャドウコピー内の追加されたエントリ（ｓｃ１）を参照しこれを検索対象とすればよい。 In the second type, the database processing unit (11b) first refers to the master database (11a) and then refers to the entire heap tuple map (HTM). At this time, it is checked whether or not the search target line has been entered, and all the entered line numbers (here, line numbers 4 and 5) are deleted (excluded from the search target). ). Next, the database processing unit 11b refers to the shadow copy (SC), refers to the added entry (sc1) in the shadow copy, and sets this as a search target.

図７のテーブル別書込セットの場合は、検索対象テーブルに対応するヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の組み合わせについて同様の処理を行う。 In the case of the table-by-table writing set in FIG. 7, the same processing is performed for the combination of the heap tuple map (HTM) and the shadow copy (SC) corresponding to the search target table.

図８の統合書込セットの場合は、ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）のエントリのうち、検索対象テーブルに対応するテーブル番号が付与されたエントリを取り出して、それらのエントリを用いて同様の処理を行う。 In the case of the integrated write set of FIG. 8, the entries with the table number corresponding to the search target table are extracted from the entries of the heap tuple map (HTM) and the shadow copy (SC), and these entries are used. The same processing is performed.

＜下位マスタで上位マスタからのトランザクションログによる更新が行われているときの競合が生じた場合＞
上位マスタから配信されたトランザクションログによって下位マスタノードのデータベースにレプリケーションが行われているときに、該当行に対して下位マスタのデータベースの更新命令が実行されている場合、競合が発生することになる。 <If there is a conflict when the lower-level master is updating by the transaction log from the higher-level master>
When replication is performed to the database of the lower master node by the transaction log distributed from the higher master, a conflict will occur if an update command for the lower master database is executed for the corresponding row. .

具体的には、下位マスタノードへの更新命令に基づいて行番号４と５とを更新しているときに、上位マスタから行番号５がエントリされたトランザクションログが配信されてきたときがこれに該当する。 Specifically, when line numbers 4 and 5 are updated based on an update instruction to the lower master node, a transaction log in which line number 5 is entered from the upper master is delivered. Applicable.

このような場合には、下位マスタノードで作成された書込セットが上位マスタノードに送信されたとしても、上位マスタノードでは既に該当行に関するトランザクションログが配信されているため、前記書込セットは上位マスタノードで競合が検出されてアボートされることになる。したがって、当該下位マスタノードにおける競合は無視して差し支えない。なお、下位マスタノードで生成されている書込セットが複数テーブルに対応している場合（図７および図８に示す）でも同様である。 In such a case, even if the write set created in the lower master node is transmitted to the upper master node, the transaction log related to the corresponding row has already been distributed in the upper master node. A conflict is detected at the upper master node and aborted. Therefore, the conflict in the lower master node can be ignored. The same applies to the case where the writing set generated in the lower master node corresponds to a plurality of tables (shown in FIGS. 7 and 8).

一方、このような下位マスタノードでの競合を解決する別の方法としては、下位マスタノードにおいて書込セット（ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ））を生成したときに、これらを当該下位マスタノード内のバックエンドメモリ（ＢＥＭ）上に登録し、これらの領域を２以上のプロセス（具体的にはレプリケーションプロセスと書込セットの生成プロセス）から参照できるようにしておいてもよい。つまり、下位マスタノードにおいて、書込セット、さらに限定すればヒープタプルマップ（図７および図８に示したテーブル別書込セットおよび統合書込セットも含む）のみを共有メモリに配置しておくことが望ましい。 On the other hand, as another method for resolving such a conflict in the lower master node, when a write set (heap tuple map (HTM) and shadow copy (SC)) is generated in the lower master node, these are applied. It may be registered on the back-end memory (BEM) in the lower master node so that these areas can be referred to by two or more processes (specifically, a replication process and a write set generation process). In other words, in the lower level master node, only the write set, and more specifically, the heap tuple map (including the table-specific write set and the integrated write set shown in FIGS. 7 and 8) are arranged in the shared memory. Is desirable.

この場合には、下位マスタノードがマスタデータベース１１ａのレプリケーションを実行する際に、バックエンドメモリ（ＢＥＭ）上の書込セットを参照することによって、更新命令と矛盾する書込セットを下位マスタノードの段階でアボートさせることができる。
具体的には、トランザクションログ処理部１１ｃは、前記バックエンドメモリ（ＢＥＭ）を参照して、この更新命令によって更新しようとしている該当行がヒープタプルマップ（ＨＴＭ）に含まれている場合には、当該ヒープタプルマップ（ＨＴＭ）を生成しているトランザクションをアボートさせる。 In this case, when the lower master node executes replication of the master database 11a, the write set inconsistent with the update instruction is referred to by the lower master node by referring to the write set on the back-end memory (BEM). Can be aborted in stages.
Specifically, the transaction log processing unit 11c refers to the back-end memory (BEM), and when the corresponding row to be updated by this update instruction is included in the heap tuple map (HTM), A transaction generating the heap tuple map (HTM) is aborted.

下位マスタノードで生成されている書込セットが複数テーブルに対応している場合（図７および図８に示す）でも同様である。図７に示すテーブル別書込セットの場合、下位マスタノードにおいてテーブル別書込セット（ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）の束）を生成したときに、これらを当該下位マスタノード内のバックエンドメモリ（ＢＥＭ）上に登録し、これらの領域を２以上のプロセス（具体的にはレプリケーションプロセスとテーブル別書込セットの生成プロセス）から参照できるようにしておいてもよい。つまり、下位マスタノードにおいて、テーブル別書込セット、さらに限定すればテーブル毎のヒープタプルマップのみを共有メモリに配置しておくことが望ましい。 The same applies when the writing set generated in the lower master node corresponds to a plurality of tables (shown in FIGS. 7 and 8). In the case of the write-by-table set shown in FIG. 7, when a write-by-table set (a bundle of heap tuple maps (HTM) and shadow copies (SC)) is generated in the lower master node, these are stored in the lower master node. May be registered on the back-end memory (BEM), and these areas may be referred to by two or more processes (specifically, a replication process and a table-specific write set generation process). In other words, in the lower master node, it is desirable to arrange only the write set for each table, and more specifically, only the heap tuple map for each table in the shared memory.

この場合には、下位マスタノードがマスタデータベース１１ａのレプリケーションを実行する際に、バックエンドメモリ（ＢＥＭ）上のテーブル別書込セットを参照することによって、更新命令と矛盾するテーブル別書込セットを下位マスタノードの段階でアボートさせることができる。 In this case, when the subordinate master node executes replication of the master database 11a, by referring to the table-specific write set on the back-end memory (BEM), the table-specific write set that conflicts with the update instruction is obtained. It can be aborted at the lower master node stage.

図８に示す統合書込セットの場合、単一のテーブル構成の場合と同様である。すなわち、ヒープタプルマップ中にテーブル番号が登録されているため、統合書込セット（ヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ））を生成したときに、これらを当該下位マスタノード内のバックエンドメモリ（ＢＥＭ）上に登録し、その後の処理は単一のテーブル構成の場合と同じである。 The integrated writing set shown in FIG. 8 is the same as in the case of a single table configuration. In other words, since the table number is registered in the heap tuple map, when the integrated write set (heap tuple map (HTM) and shadow copy (SC)) is generated, these are used as the back end in the lower master node. The processing after registration on the memory (BEM) is the same as in the case of a single table configuration.

このように、ヒープタプルマップ（ＨＴＭ）を共有メモリに配置して、複数プロセスから参照できるようにしたことにより、マルチマスタ方式のデータベースにおいても下位マスタノードの段階で、競合を防止できる。さらに、ヒープタプルマップ（ＨＴＭ）のみを共有メモリに配置しておけばよいので、貴重な共有メモリを占有してしまうこともない。 In this way, by arranging the heap tuple map (HTM) in the shared memory so that it can be referred to from a plurality of processes, even in a multi-master database, contention can be prevented at the lower master node stage. Furthermore, since only the heap tuple map (HTM) has only to be arranged in the shared memory, the valuable shared memory is not occupied.

＜実施形態２＞
本発明の別の実施形態（実施形態２）を図に基づいて説明する。 <Embodiment 2>
Another embodiment (Embodiment 2) of the present invention will be described with reference to the drawings.

図１０は、本実施形態の階層的マスタノードの構造を示している。同図に示すように、上位マスタノード（ＭＳ１０１）の下に階層的に中位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎ）や、下位マスタノード（ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）を有するノード構成となっている。各ノード（情報処理装置）にはデータベースを有している。また上位マスタノード（ＭＳ１０１）にはスレーブ（ＳＬ）を有しているが、他の中位・下位マスタノードにもスレーブを有していてもよい。このようなマスタ・スレーブ構成の場合、両者間のデータベースの更新には本出願人による特開２００６−２９３９１０号公報（本出願人による公開先行出願）に記載された更新管理技術を適用することができる。 FIG. 10 shows the structure of the hierarchical master node of this embodiment. As shown in the figure, a node configuration having a middle-order master node (MS201, MS202... MS20n) and a lower-order master node (MS301, MS302... MS30n) hierarchically under the upper master node (MS101). It has become. Each node (information processing apparatus) has a database. The upper master node (MS101) has a slave (SL), but other middle / lower master nodes may have slaves. In the case of such a master / slave configuration, the update management technique described in Japanese Patent Application Laid-Open No. 2006-293910 (published prior application by the present applicant) by the present applicant may be applied to update the database between the two. it can.

上記先行出願がマスタノードのトランザクションログデータを下位のノードに複製（レプリケーション）すればよかったのに対して、ＰＣＴ／ＪＰ２０１１／０６８０５７号（本出願人による未公開出願）では階層的なマルチマスタノードで構成されたデータベースにおいて下位マスタノードでもアップデート命令が実行されていた場合、上位からのトランザクションログの参照だけでは全下位ノードの整合性を保つことができない点に着目して、下位マスタノードで自身のメモリ上に展開されたデータベースのシャドウコピーとヒープタプルマップとを書込セットとして上位マスタノードに送信して上位マスタノードの
更新を行う点が特徴だった。しかしこれらの先行技術では上位マスタノードでテーブル自体を削除したりテーブル構造に変更を加えるような場合は想定していなかった。そのために書込セットによる更新によってデータベースの整合性がとれなくなる事態を生じる可能性があった。本実施形態はこのような場合にデータベースのロック機能を利用してマルチマスタノード構造全体で整合性を保つ技術である。以下に説明する。 Whereas the previous application only has to replicate (replicate) the transaction log data of the master node to a lower node, PCT / JP2011-0608057 (unpublished application by the present applicant) is a hierarchical multi-master node. If the update command is also executed in the lower-level master node in the configured database, pay attention to the fact that the consistency of all the lower-level nodes cannot be maintained only by referring to the transaction log from the upper level. The feature was that the shadow copy of the database expanded on the memory and the heap tuple map were sent as a writing set to the upper master node to update the upper master node. However, these prior arts do not assume a case where the table itself is deleted or the table structure is changed at the upper master node. For this reason, there is a possibility that the consistency of the database may not be obtained due to the update by the writing set. This embodiment is a technique for maintaining consistency in the entire multi-master node structure by utilizing a database lock function in such a case. This will be described below.

図１１は、中位マスタノード（ＭＳ２０１）・下位マスタノード（ＭＳ３０１）の機能ブロック図であるが、上位マスタノード（ＭＳ１０１）も同様の機能を有している。 FIG. 11 is a functional block diagram of the intermediate master node (MS201) and the lower master node (MS301), but the upper master node (MS101) also has the same function.

同図に示すように、クライアント（ＣＬ）からデータベースの更新命令が入力されるとデータベース処理部（１１ｂ）は、図示しないメインメモリ上に構築されたバックエンドメモリ（ＢＥＭ）上で書込セットを生成する。この書込セットは図１２に示すようにヒープタプルマップ（ＨＴＭ）とシャドウコピー（ＳＣ）とで構成される。ここでは、マスタデータベース（１１ａ）の行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな値（ｓｃ１）に書き換える（ＵＰＤＡＴＥ）する更新命令が入力されたものと仮定する。 As shown in the figure, when a database update command is input from a client (CL), the database processing unit (11b) performs a write set on a back-end memory (BEM) built on a main memory (not shown). Generate. This writing set is composed of a heap tuple map (HTM) and a shadow copy (SC) as shown in FIG. Here, it is assumed that an update command for deleting (DELETE) line number 4 in the master database (11a) and rewriting (UPDATE) line number 5 to a new value (sc1) is input.

このとき、データベース処理部１１ｂは、マスタデータベース（１１ａ）を参照しながら当該マスタデータベース（１１ａ）に直接書き込むことは行わずに、バックエンドメモリ（ＢＥＭ）で生成された書込セットを通信モジュール（１１ｄ）を介して上位マスタに送信する。 At this time, the database processing unit 11b does not write directly to the master database (11a) while referring to the master database (11a), but stores the write set generated in the back-end memory (BEM) as a communication module ( 11d) to the upper master.

このような書込セットの生成・送信は、中位マスタノード（ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎ）や下位マスタノード（ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ）においても同様である。 Such writing set generation / transmission is the same in the intermediate master nodes (MS201, MS202... MS20n) and the lower master nodes (MS301, MS302... MS30n).

ここで、上位マスタノード（ＭＳ１０１）で、テーブルを排他的に制御しなければならいような命令、たとえばテーブルの構造変更やテーブルの削除が発生したときには、そのテーブルに対するロック獲得命令を実行しそれに対応するロック番号を保持する。たとえば、テーブル１とテーブル３とテーブル２とテーブル４・・・に対するロック獲得命令が順次実行されたときには、テーブル１に対するロック番号は１番、テーブル３に対するロック番号は２番、テーブル２に対するロック番号は３番、テーブル４に対するロック番号は４番となり、これらのロック番号（１〜４）が保持される。 Here, when an instruction that requires exclusive control of the table, for example, a change in the structure of the table or deletion of the table, is executed on the upper master node (MS101), a lock acquisition instruction is executed for that table and the corresponding action is taken. Hold the lock number to be used. For example, when lock acquisition instructions for table 1, table 3, table 2, table 4,... Are executed sequentially, the lock number for table 1 is 1, the lock number for table 3 is 2, and the lock number for table 2 is. Is No. 3, the lock number for the table 4 is No. 4, and these lock numbers (1 to 4) are held.

このように、上位マスタノードでデータベースのテーブルに対するロック獲得命令が発生したときには、これをロック獲得情報として中位マスタノードおよび下位マスタノードに単独で通知するようにしてもよいし、後述のように上位マスタノードで生成されるトランザクションログに格納して中位マスタノードおよび下位マスタノードに通知してもよい。そして、前記上位マスタノードからロック獲得情報を受信した前記下位マスタノードでは、ロック獲得情報と競合するトランザクション、たとえばロック獲得情報が対象としているテーブルに対して更新を行っているトランザクションの有無をチェックして、そのようなトランザクションが存在しているときには、上位マスタノードで発生したロック獲得情報が優先されるため、この下位マスタノードにおけるトランザクションが廃棄される。 In this way, when a lock acquisition command for a database table is generated in the upper master node, this may be notified to the middle master node and the lower master node alone as lock acquisition information, as described later. You may store in the transaction log produced | generated by a high-order master node, and may notify a middle-order master node and a low-order master node. Then, the lower master node that has received the lock acquisition information from the upper master node checks whether or not there is a transaction competing with the lock acquisition information, for example, a transaction updating the table targeted by the lock acquisition information. When such a transaction exists, the lock acquisition information generated at the upper master node is given priority, and the transaction at this lower master node is discarded.

以上、トランザクションが競合する場合の一例として、ロック獲得情報が対象としているテーブルに対して更新を行っているトランザクションが存在している場合を説明したが、トランザクションの競合とはこれに限られない。たとえば、上位マスタノードから通知されたロック獲得情報がテーブルを削除する際に獲得するロック獲得情報である場合には、下位マスタノードにおいて該当するテーブルを単に参照するだけのトランザクションが存在しているだけであっても、前記ロック獲得情報とは競合することになるため、このような下位マスタノードにおけるトランザクションは廃棄される。 As described above, the case where there is a transaction updating the table targeted by the lock acquisition information has been described as an example of the case where the transaction conflicts. However, the transaction conflict is not limited to this. For example, when the lock acquisition information notified from the upper master node is the lock acquisition information acquired when the table is deleted, there is a transaction that simply refers to the corresponding table in the lower master node. Even so, since the contention is in conflict with the lock acquisition information, the transaction in such a lower master node is discarded.

そして、当該ロック獲得情報を格納したトランザクションログ（図１４参照）を生成する。同図に示したトランザクションログは、ＸＢ１で命令開始、ＬＴ１でテーブル１のロック、ＤＴ１で当該テーブル１のデリート、ＸＣ１でそのコミットを意味している。トランザクションログにはこのような一群の命令が繰り返されて格納されている。本実施形態では、ロック獲得情報毎にシーケンシャルな番号を付与して管理している。たとえば、ＬＴ１はテーブル１に対するもので１番目のロック獲得命令、ＬＴ３はテーブル３に対するもので２番目のロック獲得命令、ＬＴ２はテーブル２に対するもので３番目のロック獲得命令。すなわち、この例ではＬＴ１→ＬＴ３→ＬＴ２→ＬＴ４の順番でシーケンシャルにロック獲得番号１〜４が付与されて管理されている。 Then, a transaction log (see FIG. 14) storing the lock acquisition information is generated. The transaction log shown in the figure means that an instruction starts with XB1, locks table 1 with LT1, deletes the table 1 with DT1, and commits with XC1. Such a group of instructions is repeatedly stored in the transaction log. In this embodiment, a sequential number is assigned and managed for each lock acquisition information. For example, LT1 is for table 1 and is the first lock acquisition instruction, LT3 is for table 3 and is the second lock acquisition instruction, and LT2 is for table 2 and is the third lock acquisition instruction. That is, in this example, lock acquisition numbers 1 to 4 are sequentially assigned and managed in the order of LT1, LT3, LT2, and LT4.

しかし、このようなロック獲得命令をシーケンシャルに管理する方法としては、ログ毎に付与されているログシーケンス番号（ＬＳＮ）を用いてもよい。図１４では、ＬＴ１はＬＳＮ＝２，ＬＴ３はＬＳＮ＝８、ＬＴ２はＬＳＮ＝１３、ＬＴ４はＬＳＮ＝１８となる。 However, as a method for sequentially managing such lock acquisition commands, a log sequence number (LSN) assigned to each log may be used. In FIG. 14, LT1 is LSN = 2, LT3 is LSN = 8, LT2 is LSN = 13, and LT4 is LSN = 18.

このトランザクションログは、前記上位マスタノード（ＭＳ１０１）から中位・下位マスタノード（ＭＳ２０１，ＭＳ２０２，ＭＳ２０ｎ，ＭＳ３０１，ＭＳ３０２，ＭＳ３０ｎ・・・）に送信される。 This transaction log is transmitted from the upper master node (MS101) to the middle / lower master nodes (MS201, MS202, MS20n, MS301, MS302, MS30n,...).

それぞれの中位・下位マスタノードでは、前記トランザクションログを受信すると、前記トランザクションログの内容を自身のデータベースに対してレプリケーションする。 Upon receiving the transaction log, each middle / lower master node replicates the contents of the transaction log to its own database.

ここで、ロック獲得命令についてのみ説明すれば、ロック獲得命令ＬＴ１，ＬＴ３，ＬＴ２，ＬＴ４を順次実行して自身の共有メモリ上のテーブルを排他ロック状態として、中位・下位マスタノードでの他のトランザクションによるメモリアクセスを制限する。このとき、ロック獲得命令（ロック獲得情報）と競合するトランザクション、たとえばロック獲得命令が対象としているテーブルに対して更新を行っているトランザクションの有無をチェックして、そのようなトランザクションが存在しているときには、上位マスタノードで発生したロック獲得命令が優先されるため、この下位マスタノードにおけるトランザクションが廃棄される。 Here, only the lock acquisition command will be described. The lock acquisition commands LT1, LT3, LT2, and LT4 are sequentially executed to set the table in its own shared memory to the exclusive lock state, and the other master node and other master nodes can Limit memory access by transactions. At this time, a transaction that conflicts with the lock acquisition instruction (lock acquisition information), for example, a transaction that updates the table targeted by the lock acquisition instruction is checked, and such a transaction exists. In some cases, since the lock acquisition command generated in the upper master node is given priority, the transaction in the lower master node is discarded.

下位マスタノードにおいて、前記ロック獲得命令（ＬＴ１，ＬＴ３，ＬＴ２，ＬＴ４・・・）に基づいて順番にロックが獲得されると、当該下位マスタノードにおいてロックが獲得された最大値のロック獲得番号をノードロック番号（ＮＬＫＮ）として中位・上位マスタノードに通知する。図１５では、下位マスタノード（ＭＳ３０１）は３番目のロック獲得命令（ＬＴ２）まで完了しているのでＮＬＫＮ＝３（図１５では丸付き数字の３）、下位マスタノード（ＭＳ３０２）は２番目のロック獲得命令（ＬＴ３）まで完了しているのでＮＬＫＮ＝２（図１５で丸付き数字の２）となる。 When a lower master node sequentially acquires locks based on the lock acquisition command (LT1, LT3, LT2, LT4...), The maximum lock acquisition number at which the lock is acquired in the lower master node is obtained. The node lock number (NLKN) is notified to the middle / upper master node. In FIG. 15, since the lower master node (MS301) has completed the third lock acquisition command (LT2), NLKN = 3 (circled number 3 in FIG. 15), and the lower master node (MS302) Since the lock acquisition command (LT3) has been completed, NLKN = 2 (circled number 2 in FIG. 15).

図１２は、下位マスタノード（ＭＳ３０１）におけるマスタデータベース（１１ａ）と、書込セットとの関係を示している。下位マスタノードにおけるマスタデータベース（１１ａ）は行番号と、命令内容と、ポインタとによって構成されており、新たな命令がクライアント端末（ＣＬ）からなされる毎に行番号が追加されていく追記型のデータベースである。同図の場合、前記で説明したように、行番号４を削除（ＤＥＬＥＴＥ）し、行番号５を新たな命令内容に書き換える（ｓｃ１にＵＰＤＡＴＥ）する場合を示している。 FIG. 12 shows the relationship between the master database (11a) in the lower master node (MS301) and the writing set. The master database (11a) in the subordinate master node is composed of line numbers, instruction contents, and pointers, and a write-once type in which line numbers are added each time a new instruction is issued from the client terminal (CL). It is a database. In the case of the same figure, as described above, the case where line number 4 is deleted (DELETE) and line number 5 is rewritten with new instruction contents (UPDATE to sc1) is shown.

下位マスタノード（ＭＳ３０１）においてクライアント端末（ＣＬ）からの命令によりマスタデータベース（１１ａ）に対してこのような更新命令がなされると、前述のように、バックエンドメモリ（ＢＥＭ）上でヒープタプルマップ（ＨＴＭ、ヒープファイル）と
シャドウコピー（ＳＣ）とからなる書込セットが生成される。 When such an update command is issued to the master database (11a) by the command from the client terminal (CL) in the lower master node (MS301), as described above, the heap tuple map on the back-end memory (BEM). A writing set consisting of (HTM, heap file) and shadow copy (SC) is generated.

なお、この段階で下位マスタノード（ＭＳ３０１）のデータベース処理部（１１ｂ）は、ヒープタプルマップ（ＨＴＭ）の生成によりＤＥＬＥＴＥ命令が適用される行番号４と、ＵＰＤＡＴＥ命令が適用される旧行番号５は削除されることが既にわかるため、シャドウコピー（ＳＣ）としては新たな命令（ｓｃ１）だけを書き込んでおいてもよい。 At this stage, the database processing unit (11b) of the lower master node (MS301) uses the line number 4 to which the DELETE instruction is applied by generating the heap tuple map (HTM) and the old line number 5 to which the UPDATE instruction is applied. Since it is already known that the command is deleted, only a new command (sc1) may be written as the shadow copy (SC).

このような書込セットは、図１６に示すようにテーブル毎に作成されるが、図１７に示すように単一の書込セット中のヒータプルマップ（ＨＴＭ）の行番号（ｃｔｉｄ）に関係付けてテーブル番号（Ｔ）を登録するようにしてもよい。 Such a writing set is created for each table as shown in FIG. 16, but is related to the row number (ctid) of the heater pull map (HTM) in a single writing set as shown in FIG. In addition, the table number (T) may be registered.

前記書込セットには、前述のノードロック番号（ＮＬＫＮ）も格納される。このノードロッック番号（ＮＬＫＮ）は、前述のように、上位マスタノード（ＭＳ１０１）から配信(通知)されたトランザクションログ（図１４参照）に格納されたロック獲得命令（ＬＴ１，ＬＴ３，ＬＴ２，ＬＴ４）に対応して実行されたロック獲得番号の最大値である。 The above-mentioned node lock number (NLKN) is also stored in the writing set. As described above, this node lock number (NLKN) is stored in the lock acquisition command (LT1, LT3, LT2, LT4) stored in the transaction log (see FIG. 14) distributed (notified) from the upper master node (MS101). This is the maximum value of the lock acquisition number executed correspondingly.

前述の説明を繰り返すと、下位マスタノード（ＭＳ３０１）が図１４に示すようなトランザクションログを受信して、このトランザクションログのロック獲得命令（ＬＴ１，ＬＴ３，ＬＴ２，ＬＴ４）に基づいて３番目のロック獲得命令（ＬＴ２）まで完了している場合、このノードロック番号（ＮＬＫＮ）は「３」となり（ＮＬＫＮ＝３）、図１５に示すように下位マスタノード（ＭＳ３０１）で生成される書込セットに格納される。一方、下位マスタノード（ＭＳ３０２）は、２番目のロック獲得命令（ＬＴ３）まで完了している場合、このノードロック番号（ＮＬＫＮ）は「２」となり（ＮＬＫＮ＝２）、同図に示すように下位マスタノード（ＭＳ３０２）で生成される書込セットに格納される。 When the above description is repeated, the lower level master node (MS301) receives the transaction log as shown in FIG. 14, and the third lock based on the transaction log lock acquisition command (LT1, LT3, LT2, LT4). When the acquisition command (LT2) is completed, the node lock number (NLKN) becomes “3” (NLKN = 3), and the write set generated in the lower master node (MS301) as shown in FIG. Stored. On the other hand, when the subordinate master node (MS302) has completed up to the second lock acquisition command (LT3), this node lock number (NLKN) becomes “2” (NLKN = 2), as shown in FIG. It is stored in the writing set generated in the lower master node (MS302).

このようにして生成された書込セットは、当該下位マスタノード（ＭＳ３０１，ＭＳ３０２）から上位マスタノード（ＭＳ１０１）に送信される。当該書込セットは、その間の中位マスタノード（ＭＳ２０２）を経由するが、当該中位マスタノード（ＭＳ２０２）は当該書込セットに対して何らの処理は行わない。 The writing set generated in this way is transmitted from the lower master node (MS301, MS302) to the upper master node (MS101). The writing set passes through the intermediate master node (MS202) in the meantime, but the intermediate master node (MS202) does not perform any processing on the writing set.

上位マスタノード（ＭＳ１０１）において、データベース処理部１１ｂ（中央処理装置（ＣＰＵ））は、前記下位マスタノード（ＭＳ３０１，ＭＳ３０２）から前記書込セットを受信すると、そこからヒープタプルマップ（ＨＴＭ）を読み出して、自身のマスタデータベース（１１ａ）と比較する。図１２ではターゲットとなっているタプル（ここでは行番号４，５および７）の内容がマスタデータベース（１１ａ）上で更新されているか否かを検証する。ここでは行番号４〜６については未更新であるため、行番号４に削除ポインタを付与し、書き換えられる旧番号５にも削除ポインタを付与する。そして、新たな行番号７に新しい命令（ｓｃ１）が書き込まれる。 In the upper master node (MS101), when the database processing unit 11b (central processing unit (CPU)) receives the write set from the lower master nodes (MS301, MS302), it reads the heap tuple map (HTM) therefrom. Compared with its own master database (11a). In FIG. 12, it is verified whether or not the contents of the target tuple (here, line numbers 4, 5, and 7) are updated on the master database (11a). Here, since line numbers 4 to 6 are not updated, a deletion pointer is assigned to line number 4 and a deletion pointer is also assigned to old number 5 to be rewritten. Then, a new command (sc1) is written in the new line number 7.

このとき、本実施形態では全ての書込セットをマスタデータベース（１１ａ）と比較す
るものではない。つまり、前述の比較ステップに先立って、書込セットに格納されたノードロック番号（ＮＬＫＮ）を読み出しておき、ノードロック番号（ＮＬＫＮ）よりも大きなロック番号に該当するロック獲得情報を参照し、そのロック獲得情報と書込セットが修正しようとしているテーブルとが競合していないかどうかを検証する。競合している場合には書込セットをアボートして当該書込セットを比較対象から除外する。このような検証を行う理由は、上位マスタノード（ＭＳ１０１）でロックが獲得されているにもかかわらず、下位マスタノードで前記ロックの獲得前のテーブルに基づいてそのテーブルに対してタプルの更新を行いその情報が書込セットとして上位マスタノード（ＭＳ１０１）に届いて当該タプルを含むテーブルを更新してしまった場合、上位マスタノード（ＭＳ１０１）がロック獲得中に行ったテーブルの構造の変更や削除と競合してマスタデータベース（１１ａ）の整合性が損なわれてしまうためである。 At this time, in this embodiment, not all writing sets are compared with the master database (11a). That is, prior to the comparison step described above, the node lock number (NLKN) stored in the writing set is read, and the lock acquisition information corresponding to the lock number larger than the node lock number (NLKN) is referred to. It is verified whether the lock acquisition information and the table to be corrected by the write set are in conflict. If there is a conflict, the writing set is aborted and the writing set is excluded from the comparison target. The reason for such verification is that although the lock is acquired in the upper master node (MS101), the tuple is updated in the lower master node based on the table before the lock is acquired. When the information reaches the upper master node (MS101) as a writing set and the table including the tuple is updated, the upper master node (MS101) changes or deletes the table structure during the lock acquisition. This is because the consistency of the master database (11a) is lost due to contention.

後述のヒープタプルマップ（ＨＴＭ）を用いた競合の検出方法では、このようなデータベースの整合性の破壊を検出できないため、このようにロック獲得情報との比較を行い、事前に競合を検出しておく必要がある。 Since the conflict detection method using the heap tuple map (HTM) described later cannot detect such destruction of database consistency, it compares the lock acquisition information in this way and detects the conflict in advance. It is necessary to keep.

一方、前記のロック番号との比較でアボートされなかった書込セットであっても、書込セット中のヒープタプルマップ（ＨＴＭ）と上位マスタノードのマスタデータベース（１１ａ）とを比較した結果、マスタデータベース（１１ａ）の該当行が既に別の書込セットによって更新されているときには、マスタデータベース（１１ａ）の更新が競合することになるため当該書込セットはアボートされる。 On the other hand, as a result of comparing the heap tuple map (HTM) in the write set with the master database (11a) of the higher-order master node even if the write set was not aborted by the comparison with the lock number, When the corresponding row of the database (11a) has already been updated by another writing set, the writing set is aborted because the update of the master database (11a) conflicts.

次に、前記書込セットとは別に、下位マスタノード（ＭＳ３０１，ＭＳ３０２）において、トランザクションログでレプリケーションされたロック獲得命令に対応するロック獲得結果情報が中位マスタノード（ＭＳ２０２）を介して上位マスタノード（ＭＳ１０１）に通知される機構について図１５を用いて説明する。 Next, separately from the write set, in the lower master nodes (MS301, MS302), the lock acquisition result information corresponding to the lock acquisition command replicated in the transaction log is transmitted via the intermediate master node (MS202). A mechanism notified to the node (MS 101) will be described with reference to FIG.

下位マスタノード（ＭＳ３０１，３０２）では、前述のトランザクションログに格納されたロック獲得命令（ＬＴ１，ＬＴ３，ＬＴ２,ＬＴ４・・・）を順次レプリケーション
して、そのロック獲得結果情報を得る。ここでは、具体的にはロック獲得命令順に付与されたロック獲得命令番号で管理すればよい。すなわちテーブル１のロック獲得命令（ＬＴ１）のロック獲得命令番号は「１」、次のテーブル３のロック獲得命令（ＬＴ３）のロック獲得命令番号は「２」、次のテーブル２のロック獲得命令（ＬＴ２）のロック獲得命令番号は「３」、さらに次のテーブル4のロック獲得命令（ＬＴ４）のロック番号は「４」
となる。 In the lower master nodes (MS301, 302), the lock acquisition commands (LT1, LT3, LT2, LT4...) Stored in the transaction log are replicated sequentially, and the lock acquisition result information is obtained. Here, specifically, management may be performed using lock acquisition command numbers assigned in the order of lock acquisition commands. That is, the lock acquisition command number of the lock acquisition command (LT1) of Table 1 is “1”, the lock acquisition command number of the lock acquisition command (LT3) of the next table 3 is “2”, and the lock acquisition command of the next table 2 ( The lock acquisition command number of LT2) is “3”, and the lock number of the lock acquisition command (LT4) of the next table 4 is “4”.
It becomes.

そして、それぞれの下位マスタノード（ＭＳ３０１，ＭＳ３０２）では、ロックが獲得されたロック獲得命令番号の数値の最大値をノードロック番号（ＮＬＫＮ）で管理していることは前述の通りである。 As described above, each lower master node (MS301, MS302) manages the maximum value of the lock acquisition command number from which the lock has been acquired using the node lock number (NLKN).

各下位マスタノード（ＭＳ３０１，ＭＳ３０２）からそれぞれのノードロック番号（ＮＬＫＮ）を上層の中位マスタノード（ＭＳ２０２）に送信する。中位マスタノード（ＭＳ２０２）では、自身が保有しているノードロック番号（ＮＬＫＮ＝２）と、各下位マスタノードから通知されたノードロック番号（ＮＬＫＮ＝３，２）とを比較して、その最も小さい値（ここでは２）を自身のツリーロック番号（ＴＬＫＮ＝２）として更新する。 Each lower master node (MS301, MS302) transmits the respective node lock number (NLKN) to the upper middle master node (MS202). In the intermediate master node (MS202), the node lock number (NLKN = 2) held by itself is compared with the node lock number (NLKN = 3, 2) notified from each lower master node. The smallest value (here 2) is updated as its own tree lock number (TLKN = 2).

なお、図１５において、中位マスタノード（ＭＳ２０１）ではその配下の下位マスタノードが存在しないため、自身のノードロック番号（ＮＬＫＮ＝３）がそのまま当該中位マスタノード（ＭＳ２０１）を頂点としたツリーロック番号（ＴＬＫＮ＝３）となる。 In FIG. 15, since there is no subordinate master node under the intermediate master node (MS201), its own node lock number (NLKN = 3) is a tree having the intermediate master node (MS201) as a vertex as it is. The lock number (TLKN = 3).

各中位マスタノード（ＭＳ２０１，ＭＳ２０２）は、それぞれのツリーロック番号（ＴＬＫＮ＝３，２）を上位マスタノード（ＭＳ１０１）に送信する。これを受信した上位マスタノード（ＭＳ１０１）では、これらの中から最小値（ここではＴＬＫＮ＝２）をクラスタロック番号（ＣＬＫＮ＝２）として更新する。 Each intermediate master node (MS201, MS202) transmits its tree lock number (TLKN = 3, 2) to the upper master node (MS101). The upper master node (MS101) that has received the message updates the minimum value (here, TLKN = 2) from among these as the cluster lock number (CLKN = 2).

このクラスタロック番号（ＣＬＫＮ＝２）は、前述のように全階層から収集されたクラスタ全体の全てのノードロック番号（ＮＬＫＮ）の最小値であるため、上位マスタノード（ＭＳ１０１）ではこのクラスタロック番号（ＣＬＫＮ＝２）によってクラスタ全体のロック獲得状況を把握することができる。すなわち、クラスタロック番号（ＣＬＫＮ）が２である場合、この番号と等しいか小さいロック獲得命令は全てのノードで完了していることを意味する。 Since this cluster lock number (CLKN = 2) is the minimum value of all the node lock numbers (NLKN) of the entire cluster collected from all layers as described above, this cluster lock number is assigned to the upper master node (MS101). (CLKN = 2) makes it possible to grasp the lock acquisition status of the entire cluster. That is, when the cluster lock number (CLKN) is 2, it means that a lock acquisition command equal to or smaller than this number is completed in all nodes.

したがって、上位マスタノード（ＭＳ１０１）はクラスタロック番号と等しいか小さいロック番号を持つロック獲得命令は、獲得が完了したと認識する。 Therefore, the upper master node (MS 101) recognizes that the lock acquisition command having the lock number equal to or smaller than the cluster lock number has been acquired.

ここで、中位・下位マスタノードから順次受信したツリーロック番号（ＴＬＫＮ）との比較の他に、前記書込セットに格納された下位マスタノード（ＭＳ３０１，３０２）のノードロック番号（ＮＬＫＮ）と上位マスタノードにおいて自身が保持しているロック獲得命令番号との比較をも行う理由は以下の通りである。 Here, in addition to the comparison with the tree lock number (TLKN) sequentially received from the middle / lower master node, the node lock number (NLKN) of the lower master node (MS301, 302) stored in the writing set The reason why the higher master node also performs comparison with the lock acquisition command number held by itself is as follows.

一般に下位マスタノード（ＭＳ３０１，３０２）で実行されたロック獲得結果情報（ノードロック番号：ＮＬＫＮ）が中位ノードのツリーロック番号（ＴＬＫＮ）を更新しながら上位マスタノード（ＭＳ１０１）に到達するまでには時間を要する。特にツリー階層構造が複雑なデータベースであればその到達遅延により処理効率が大幅に低下してしまう。特に各階層で最小値の比較を行っているため、どれか１個でも小さいＮＬＫＮ（たとえばＮＬＫＮ＝２）があるとクラスタロック番号（ＣＬＫＮ）はいつまでも大きな値をとれないため、上位マスタノードでは、全ての書込セットをチェックしなければならない。 In general, the lock acquisition result information (node lock number: NLKN) executed in the lower master node (MS301, 302) reaches the upper master node (MS101) while updating the tree lock number (TLKN) of the middle node. Takes time. In particular, in the case of a database having a complicated tree hierarchical structure, the processing efficiency is greatly reduced due to the arrival delay. In particular, since the minimum value is compared in each layer, if any one of NLKN (for example, NLKN = 2) is small, the cluster lock number (CLKN) cannot take a large value indefinitely. All write sets must be checked.

一方、書込セットを発行した下位マスタノード（ＭＳ３０１）はＮＬＫＮ＝３が設定されているため、３番目のロック獲得命令までは既に完了していることになる。つまり、このノード（ＭＳ３０１）に関する限り、これよりも大きいロック番号「４」（ＬＴ４）に該当するロック獲得情報のみを前記書込セット中のテーブル情報との比較対象とすれば、上位マスタノード（ＭＳ１０１）における比較処理による負荷を低減できることになる。 On the other hand, since NLKN = 3 is set for the lower-level master node (MS301) that issued the write set, the third lock acquisition command has already been completed. In other words, as far as this node (MS301) is concerned, if only lock acquisition information corresponding to a lock number “4” (LT4) larger than this is to be compared with the table information in the write set, the upper master node ( The load caused by the comparison process in the MS 101) can be reduced.

以上本発明を実施形態に基づいて説明したが、本願発明はこれに限定されるものではない。たとえば、ノードの階層構造については、上位マスタノード、中位マスタノードおよび下位マスタノードの３層構造（図１０，図１５および図１６）を例示したが、上位マスタノードと下位マスタノードの２層構造のものであってもよい。また、中位マスタノードが２層以上のものであってもよい。 Although the present invention has been described based on the embodiment, the present invention is not limited to this. For example, as for the hierarchical structure of the nodes, a three-layer structure (FIGS. 10, 15, and 16) of an upper master node, a middle master node, and a lower master node is illustrated. It may be of a structure. Further, the intermediate master node may be two or more layers.

また、以上の説明では、ノードロック番号（ＮＬＫＮ）を中位、上位マスタノードに通知して順次ツリーロック番号（ＴＬＫＮ）、クラスタロック番号（ＣＬＫＮ）を更新する実施例を説明したが、これに限定されることはない。たとえば、上位マスタノード（ＭＳ１０１）は、中位マスタノード（ＭＳ２０１，ＭＳ２０２）または下位マスタノード（ＭＳ３０１，ＭＳ３０２）から送信される書込セットに含まれるノードロック番号（ＮＬＫＮ）をそれぞれの下位・中位マスタノードのノードロック番号（ＮＬＫＮ）とみなし、上位マスタノード（ＭＳ１０１）では、各中位・下位マスタノードから発行される書込セットを収集して、その書込セット中に格納されている全てのノードロック番号（ＮＬＫＮ）の最小値をクラスタロック番号（ＣＬＫＮ）とみなしてもよい。 In the above description, the embodiment has been described in which the node lock number (NLKN) is notified to the middle and upper master nodes, and the tree lock number (TLKN) and the cluster lock number (CLKN) are sequentially updated. There is no limit. For example, the upper master node (MS101) assigns the node lock number (NLKN) included in the write set transmitted from the intermediate master node (MS201, MS202) or the lower master node (MS301, MS302) to the lower master / middle respectively. It is regarded as the node lock number (NLKN) of the higher-order master node, and the higher-order master node (MS101) collects the write set issued from each middle-order / lower-order master node and stores it in the write set The minimum value of all the node lock numbers (NLKN) may be regarded as the cluster lock number (CLKN).

このように書込セットのノードロック番号（ＮＬＫＮ）を収集してクラスタロック番号（ＣＬＫＮ）を更新する利点としては、これらの書込セットとは別にノードロック番号（ＮＬＫＮ）を中位・上位マスタノードに通知してそれぞれのツリーロック番号（ＴＬＫＮ）を更新しながらさらに上層に送信する必要がないため、通知システムを簡略化できる。一方、この方法では、上位マスタノード（ＭＳ１０１）が全ての書込セットからノードロック番号（ＮＬＫＮ）の集計作業をしなければならないため、負荷が大きくなってしまう。 As an advantage of collecting the node lock number (NLKN) of the write set and updating the cluster lock number (CLKN) as described above, the node lock number (NLKN) is assigned to the middle / upper master separately from these write sets. Since it is not necessary to notify the node and update each tree lock number (TLKN) while transmitting to the upper layer, the notification system can be simplified. On the other hand, in this method, since the upper master node (MS101) has to perform node lock number (NLKN) aggregation work from all writing sets, the load becomes large.

本発明は、階層構造を備えたマルチマスタノード構造のデータベース管理システムに利用できる。 The present invention can be used for a database management system having a multi-master node structure having a hierarchical structure.

ＭＳ１０１上位マスタノード
ＳＬスレーブ
ＭＳ２０１，ＭＳ２０２・・・ＭＳ２０ｎ下位マスタノード（中位マスタノード）
ＭＳ３０１，ＭＳ３０２・・・ＭＳ３０ｎ下位マスタノード
ＣＬクライアント端末
１１ａマスタデータベース
１１ｂデータベース処理部
１１ｃトランザクションログ処理部
１１ｄ通信モジュール
ＣＰＵ中央処理装置
ＭＭ主記憶装置
ＢＵＳバス
ＨＤ大規模記憶装置
Ｉ／Ｏ通信インターフェース
ＨＴＭヒープタプルマップ
ＳＣシャドウコピー
MS101 Upper master node SL Slave MS201, MS202 ... MS20n Lower master node (middle master node)
MS301, MS302... MS30n Lower master node CL Client terminal 11a Master database 11b Database processing unit 11c Transaction log processing unit 11d Communication module CPU Central processing unit MM Main storage unit BUS Bus HD Large scale storage unit I / O Communication interface HTM Heap Tuple Map SC Shadow Copy

Claims

更新が可能な上位と下位のマスタノードを階層的に有する追記型データベースの管理方法であって、
いずれかの下位マスタノードのセッションにおいて、上位マスタノードに対して、当該下位のマスタノードのデータベースの更新対象となったテーブル情報を書込セットに登録して上位マスタノードに送信するステップと、
前記上位マスタノードにおいて、データベースのテーブルに対するロック獲得命令が発生したときには、当該ロック獲得情報を下位マスタノードに通知するとともに、そのロック獲得情報を上位マスタノードに保持するステップと、
前記上位マスタノードにおいて、前記下位マスタノードから受信した前記書込セット中のテーブル情報と前記で保持されたロック獲得情報とを比較して、競合するときには、前記書込セットをアボートし、競合しないときには、前記書込セットを用いて、前記上位マスタノードのデータベースを更新するステップと、
前記上位マスタノードからロック獲得情報を受信した前記下位マスタノードでは、ロック獲得情報が対象としているテーブルに対するトランザクションが存在しているときには、前記下位マスタノードにおいて当該トランザクションを廃棄するステップと、
前記下位マスタノードにおいて前記上位マスタノードからのロック獲得情報に基づいて対象となるテーブルのロックを獲得するステップと
前記書込セットを用いて前記上位マスタノードのデータベースが更新された場合に、該更新とともに、前記上位マスタノードにおいて該上位マスタノードの更新記録をトランザクションログとして生成するステップと、
前記トランザクションログを、前記送信元の下位マスタノードを含む下位マスタノードに配信するステップと、
前記下位マスタノードのトランザクションログ処理部が、受信したトランザクションログに基づいて自身のデータベースを更新するステップと、
が実行される、追記型データベースの管理方法。 A method for managing a write-once database having hierarchically upper and lower master nodes that can be updated,
In a session of any lower master node, for the upper master node, registering the table information to be updated in the database of the lower master node in the writing set and transmitting to the upper master node;
In the upper master node, when a lock acquisition command for a database table occurs, notifying the lower master node of the lock acquisition information, and holding the lock acquisition information in the upper master node;
In the upper master node, the table information in the write set received from the lower master node is compared with the lock acquisition information held in the above, and if there is a conflict, the write set is aborted and no conflict occurs. Sometimes using the writing set to update the database of the higher master node;
In the lower master node that has received the lock acquisition information from the upper master node, when there is a transaction for the table targeted by the lock acquisition information, the step of discarding the transaction in the lower master node;
Acquiring the lock of the target table based on the lock acquisition information from the upper master node in the lower master node, and updating the database of the upper master node using the writing set. And generating an update record of the upper master node as a transaction log in the upper master node;
Delivering the transaction log to a subordinate master node including a subordinate master node of the transmission source;
The transaction log processing unit of the lower master node updates its own database based on the received transaction log;
Is a write-once database management method.

前記書込セットは、下位マスタノードのデータベースのシャドウコピーと、自身のメモリ上に展開されたテーブルとタプルとの組み合わせ情報を含むヒープタプルマップとからなり、
上位マスタノードが前記書込セットを受信すると、前記ロック獲得情報との比較によっ
てアボートされなかったときに、当該書込セット中のヒープタプルマップと自身のデータベースとを比較して、ターゲットとして登録されているタプルのデータベースにおける更新の有無を検証するステップと、
前記で更新がなされているときには書込セットをアボートするステップと、
前記で更新がなされていないときには前記シャドウコピーを用いて自身のデータベースを更新するステップと、
が実行される、請求項１記載の追記型データベースの管理方法。 The write set consists of a shadow copy of the database of the lower master node and a heap tuple map including combination information of tables and tuples developed on its own memory,
When the upper master node receives the write set, it is registered as a target by comparing the heap tuple map in the write set with its own database when it is not aborted by comparison with the lock acquisition information. Verifying whether there is an update in the database of tuples
Aborting the writing set when an update has been made, and
Updating its database with the shadow copy when no update has been made, and
The method for managing a write-once database according to claim 1, wherein:

前記下位マスタノードは、前記上位マスタノードからのロック獲得情報に基づいて当該テーブルに対するロックを獲得したときには、ロック獲得結果情報として前記上位マスタノードに通知する、
請求項１または２に記載の追記型データベースの管理方法。 When the lower master node acquires a lock for the table based on the lock acquisition information from the upper master node, it notifies the upper master node as lock acquisition result information.
The method for managing a write-once database according to claim 1 or 2.

前記上位マスタノードで発生したロック獲得情報はトランザクションログに格納されて下位マスタノードに通知される、
請求項１から３のいずれか１項に記載の追記型データベースの管理方法。 The lock acquisition information generated in the upper master node is stored in a transaction log and notified to the lower master node.
The method for managing a write-once database according to any one of claims 1 to 3.

前記上位マスタノードから送信されるロック獲得情報はシーケンシャルに管理されたロック番号がそれぞれ付与されており、
前記下位マスタノードでの前記ロックの獲得結果情報は、当該下位マスタノードにおいてロックが獲得された最大値のロック番号をノードロック番号として前記上位マスタノードに通知する、
請求項４記載の追記型データベースの管理方法。 The lock acquisition information transmitted from the upper master node is given a sequentially managed lock number,
The lock acquisition result information at the lower master node notifies the upper master node as a node lock number of the maximum lock number at which the lock is acquired at the lower master node.
The method for managing a write-once database according to claim 4.

前記上位マスタノードと、前記下位マスタノードとの間に１または２以上の中位マスタノードが構築された階層構造を有しており、
前記各中位マスタノードは、１または２以上の下位マスタノードからノードロック番号の通知を受領すると、受領した各ノードロック番号を、自身が保持するノードロック番号と比較して、その中の最小値をツリーロック番号として更新するとともに、該ツリーロック番号をさらに上層の中位マスタノードまたは上位マスタノードに通知し、
前記１または２以上の中位マスタノードから前記通知を受領した前記上層の中位マスタノードは、受領した各ツリーロック番号を自身が保持するノードロック番号と比較して、その中の最小値を新たなツリーロック番号として更新するとともに、該ツリーロック番号をさらに上層の上位マスタノードに通知し、
前記下位マスタノードまたは前記上層の中位マスタノードから通知を受領した上位マスタノードは、通知された前記ツリーロック番号の中の最小値を新たなクラスタロック番号として更新し、当該クラスタロック番号よりも等しいか小さいロック番号に該当するロック獲得命令がクラスタ全体で完了したことを認識する、
請求項５記載の追記型データベースの管理方法。 Having a hierarchical structure in which one or more intermediate master nodes are constructed between the upper master node and the lower master node;
When each of the intermediate master nodes receives notification of the node lock number from one or more lower level master nodes, each of the intermediate master nodes compares the received node lock number with the node lock number held by itself, The value is updated as a tree lock number, and the tree lock number is further notified to a middle master node or a higher master node of an upper layer,
The upper middle master node that has received the notification from the one or more middle master nodes compares each received tree lock number with the node lock number held by itself, and determines the minimum value among them. While updating as a new tree lock number, notify the tree lock number to the upper master node of the upper layer,
The upper master node that has received the notification from the lower master node or the upper master node updates the minimum value among the notified tree lock numbers as a new cluster lock number, Recognize that the lock acquisition command corresponding to the equal or smaller lock number is completed in the entire cluster.
6. A method for managing a write-once database according to claim 5 .

前記中位マスタノードは１階層または２以上の階層で構成されている、
請求項６記載の追記型データベースの管理方法。 The intermediate master node is composed of one layer or two or more layers.
A method for managing a write-once database according to claim 6.

前記中位マスタノードが存在せずに、上位マスタノードと下位マスタノードとの２階層で構成されている、
請求項６または７のいずれかに記載の追記型データベースの管理方法。 The intermediate master node does not exist, and is composed of two layers of an upper master node and a lower master node.
The method for managing a write-once database according to any one of claims 6 and 7.

前記ロック獲得情報に基づくノードロック番号は前記書込セットに格納されて中位マスタノードまたは上位マスタノードに通知される、
請求項５から８のいずれか１項に記載の追記型データベースの管理方法。 The node lock number based on the lock acquisition information is stored in the writing set and notified to the intermediate master node or the upper master node.
The method for managing a write-once database according to any one of claims 5 to 8 .

前記書込セットを受領した上位マスタノードは、当該書込セットからノードロック番号を読み出して、自身が保持するロック獲得命令に対応するロック番号と比較して、当該ノードロック番号よりも大きいロック番号に該当するロック獲得情報のみを前記書込セット中のテーブル情報との比較対象とする、
請求項９記載の追記型データベースの管理方法。 The higher-order master node that has received the write set reads the node lock number from the write set and compares it with the lock number corresponding to the lock acquisition command held by itself, and the lock number larger than the node lock number Only the lock acquisition information corresponding to the table is compared with the table information in the writing set,
A method for managing a write-once database according to claim 9.

前記シーケンシャルに管理されたロック番号は、上位マスタノードで生成されるトランザクションログに付与されるログシーケンス番号を用いることを特徴とする、
請求項５から１０のいずれか１項に記載の追記型データベースの管理方法。 The sequentially managed lock number uses a log sequence number given to a transaction log generated by an upper master node,
The management method of the write-once database according to any one of claims 5 to 10.