JP6877393B2

JP6877393B2 - Systems, programs and methods

Info

Publication number: JP6877393B2
Application number: JP2018159500A
Authority: JP
Inventors: 武戸田; 耕祐春木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2017-12-18
Filing date: 2018-08-28
Publication date: 2021-05-26
Anticipated expiration: 2038-08-28
Also published as: JP2019109875A

Description

本発明の実施形態は、システム、プログラム及び方法に関する。 Embodiments of the present invention relate to systems, programs and methods.

近年、機械学習の１つであるディープラーニングによるデータの有効活用が期待されている。ディープラーニングにおいて、大規模なデータを用いた学習結果をより高速に得るためには、複数のノード（コンピュータ）による学習の並列処理を実行し、各ノードによる学習経過を共有する並列分散学習処理が求められる。このような並列分散学習処理では、ノード間での通信によって学習経過を示すデータが共有される。 In recent years, effective utilization of data by deep learning, which is one of machine learning, is expected. In deep learning, in order to obtain learning results using large-scale data at higher speed, parallel distributed learning processing that executes parallel processing of learning by multiple nodes (computers) and shares the learning progress by each node is performed. Desired. In such parallel distributed learning processing, data indicating the learning progress is shared by communication between nodes.

ここで、上記した学習結果をより高速に得るために並列処理を実行するノードの数を増加させることが考えられるが、一般的な並列分散学習処理においては、当該ノード数を増加させたとしても効率的に学習結果を得ることができない（つまり、学習速度がスケールしない）場合があり、高いスケーラビリティを実現することは困難である。 Here, it is conceivable to increase the number of nodes that execute parallel processing in order to obtain the above-mentioned learning results at higher speed, but in general parallel distributed learning processing, even if the number of nodes is increased. It may not be possible to obtain learning results efficiently (that is, the learning speed does not scale), and it is difficult to achieve high scalability.

米国特許出願公開第２０１６／０２６７３８０号明細書U.S. Patent Application Publication No. 2016/0267380

そこで、本発明が解決しようとする課題は、並列分散学習処理における高いスケーラビリティを実現することが可能なシステム、プログラム及び方法を提供することにある。 Therefore, an object to be solved by the present invention is to provide a system, a program, and a method capable of realizing high scalability in parallel distributed learning processing.

本実施形態に係るシステムは、第１グループに属する第１ノードと第２ノードと、第２グループに属する第３ノードと第４ノードと、を備える。前記システムは、前記第１ノードと前記第２ノードがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出され、前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出され、前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものである。 The system according to the present embodiment includes a first node and a second node belonging to the first group, and a third node and a fourth node belonging to the second group. In the system, when the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The first gradient for updating the first weight coefficient of the objective function to the second weighting coefficient is calculated by the second node, and the third node and the second gradient are calculated. When the 4 nodes execute the m (m is a natural number) parallel distribution process, the 3rd node calculates the 3rd gradient for updating the 3rd weighting coefficient of the objective function to the 4th weighting coefficient. Moreover, the fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient is calculated by the fourth node, and the gradient calculation by the first node and the second node is described. When the gradient calculation by the third node and the fourth node is faster than the calculation of the gradient, the n + 1th parallel distribution process executed by the first node and the second node is based on the first to second gradients. In the m + 1th parallel distribution process in which the second weighting coefficient updated from the first weighting coefficient is further updated and executed by the third node and the fourth node, it is based on the first to fourth gradients. The fourth weighting coefficient, which is updated from the third weighting coefficient, is further updated.

第１の実施形態に係るシステムの概要を説明するための図。The figure for demonstrating the outline of the system which concerns on 1st Embodiment. 本システムの構成の一例を示す図。The figure which shows an example of the structure of this system. サーバノードのシステム構成の一例を示す図。The figure which shows an example of the system configuration of a server node. ワーカノードのシステム構成の一例を示す図。The figure which shows an example of the system configuration of a worker node. サーバノードの機能構成の一例を示すブロック図。A block diagram showing an example of the functional configuration of a server node. ワーカノードの機能構成の一例を示すブロック図。A block diagram showing an example of the functional configuration of a worker node. 本システムの処理手順の一例を示すシーケンスチャート。A sequence chart showing an example of the processing procedure of this system. 代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a representative node. 非代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a non-representative node. 所定の汎化性能を得ることができるまでの学習時間を学習方式毎に示す図。The figure which shows the learning time until a predetermined generalization performance can be obtained for each learning method. 本システムの変形例の概要を説明するための図。The figure for demonstrating the outline of the modification of this system. 第２の実施形態に係るシステムの構成の一例を示す図。The figure which shows an example of the structure of the system which concerns on 2nd Embodiment. 本システムの処理手順の一例を示すシーケンスチャート。A sequence chart showing an example of the processing procedure of this system. 代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a representative node. 非代表ノードの処理手順の一例を示すフローチャート。The flowchart which shows an example of the processing procedure of a non-representative node.

以下、図面を参照して、各実施形態について説明する。
（第１の実施形態）
本実施形態に係るシステムは、例えば大規模なデータを扱うディープラーニングにおいて目的関数を基準とする並列分散学習処理を実行する。なお、目的関数を基準とする並列分散学習処理とは、目的関数を学習結果のフィードバック（評価値）として用いて、複数の処理主体で学習されるものであればどのようなものであってもよく、例えば目的関数を最適化するための並列分散学習処理である。 Hereinafter, each embodiment will be described with reference to the drawings.
(First Embodiment)
The system according to this embodiment executes parallel distributed learning processing based on an objective function, for example, in deep learning that handles a large amount of data. The parallel distributed learning process based on the objective function can be any process that is learned by a plurality of processing entities by using the objective function as feedback (evaluation value) of the learning result. Often, for example, a parallel distributed learning process for optimizing an objective function.

ところで、ディープラーニングでは、目的関数を最適化する手法として、例えば確率的勾配降下法（ＳＧＤ：stochastic gradient descent）が用いられる。このＳＧＤでは、勾配（ベクトル）と称される最適解方向へのベクトルを用いて、目的関数のパラメータが繰り返し更新される。なお、目的関数のパラメータには、例えば重み係数が含まれる。 By the way, in deep learning, for example, a stochastic gradient descent (SGD) method is used as a method for optimizing an objective function. In this SGD, the parameters of the objective function are repeatedly updated using a vector in the direction of the optimum solution called a gradient (vector). The parameters of the objective function include, for example, a weighting coefficient.

ＳＧＤにおける現在の状態を示す重み係数（重みベクトル）、勾配ベクトル及び学習係数をそれぞれW^(t)、∇W^(t)、ε^(t)とすると、更新後の重み係数W^(t+1)は、以下の式で表される。 If the weighting coefficient (weighting vector), gradient vector, and learning coefficient indicating the current state in SGD are W ^(t) , ∇W ^(t) , and ε ^(t) , respectively, the updated weighting factor W ^{(t + 1)} Is expressed by the following equation.

W^(t+1)＝W^(t)−ε^(t)∇W^(t) 式（１）
なお、更新幅を決定する学習係数ε^(t)は学習の進度に応じて適応的に決定され、例えば学習の進度に応じて減衰する。 W ^{(t + 1)} = W ^(t) −ε ^(t) ∇ ^{W (t)} Equation (1)
^{The learning coefficient ε (t)} that determines the update width is adaptively determined according to the progress of learning, and is attenuated according to the progress of learning, for example.

上記した勾配は、訓練事例である学習データを目的関数に入力することで得られるが、計算コストの観点から複数の学習データをまとめて入力して平均勾配を得る「ミニバッチ法」が一般的に使用される。この平均勾配を得るための学習データの数はバッチサイズと称される。ＳＧＤによる最適化を並列分散化する際に共有する学習経過としては例えば勾配が用いられる。 The above gradient can be obtained by inputting the training data, which is a training example, into the objective function, but from the viewpoint of calculation cost, the "mini-batch method" in which a plurality of training data are collectively input to obtain the average gradient is generally used. used. The number of training data to obtain this average gradient is called the batch size. For example, a gradient is used as the learning process to be shared when the optimization by SGD is parallel-distributed.

ここで、並列分散学習の主要なアルゴリズムとしては、例えば同期型の並列分散学習方式及び非同期型の並列分散学習方式がある。 Here, as the main algorithms for parallel distributed learning, there are, for example, a synchronous parallel distributed learning method and an asynchronous parallel distributed learning method.

上記した並列分散学習処理においては複数のノードに勾配計算を実行させるが、同期型の並列分散学習方式は、当該複数のノードにおける勾配計算が同期して実行される方式である。具体的には、同期型の並列分散学習方式の一例としてはＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤが挙げられるが、当該Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによれば、上記したミニバッチ法の勾配計算を複数ノードに分散し、全ノードが計算した勾配の平均値を重み係数の更新に用いる。なお、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの実施態様は複数存在し、例えば全ノード間で勾配を共有する集団通信型と、パラメータサーバと称されるノードに勾配を集約して重み係数の更新処理を行い、当該更新された重み係数を各ノードに配布するパラメータサーバ型等がある。 In the above-mentioned parallel distributed learning process, a plurality of nodes are made to execute the gradient calculation, and the synchronous parallel distributed learning method is a method in which the gradient calculations in the plurality of nodes are executed in synchronization. Specifically, Synchronous-SGD is an example of a synchronous parallel distributed learning method. According to the Synchronous-SGD, the gradient calculation of the mini-batch method described above is distributed to a plurality of nodes, and all the nodes calculate. The average value of the gradient is used to update the weighting factor. In addition, there are a plurality of embodiments of Synchronous-SGD. For example, a collective communication type in which a gradient is shared among all nodes and a node called a parameter server are aggregated to update the weighting coefficient, and the update is performed. There is a parameter server type that distributes the weighted coefficient to each node.

同期型の並列分散方式においては、勾配を計算するノード数（並列数）が増えるほど同期コストが増大し、処理スループットが低下すること、当該ノード数が増えるほどバッチサイズが増大し、汎化性能が低下すること、全体の処理速度が複数のノードの中の処理の遅いノードの処理速度の影響を受けること等が知られている。 In the synchronous parallel distribution method, as the number of nodes for calculating the gradient (the number of parallels) increases, the synchronization cost increases and the processing throughput decreases, and as the number of the nodes increases, the batch size increases and generalization performance. Is known to decrease, and the overall processing speed is affected by the processing speed of the slow processing node among the plurality of nodes.

一方、非同期型の並列分散学習方式は、複数のノードにおける勾配計算が非同期で実行される方式である。具体的には、非同期型の並列分散学習方式の一例としてはＡｓｙｂｃｈｒｏｎｏｕｓ−ＳＧＤが挙げられるが、当該Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤは、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤと同様に勾配を共有するアルゴリズムである。しかしながら、Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによれば、同期による勾配の平均化を行うことなく、各ノードが計算した勾配をそのまま用いて重み係数が更新される。なお、Ａｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの実施態様はパラメータサーバ型が主である。 On the other hand, the asynchronous parallel distributed learning method is a method in which gradient calculation at a plurality of nodes is executed asynchronously. Specifically, an example of an asynchronous parallel distributed learning method is Synchronous-SGD, which is an algorithm that shares a gradient like Synchronous-SGD. However, according to Synchronous-SGD, the weighting factor is updated using the gradient calculated by each node as it is, without averaging the gradients by synchronization. In addition, the embodiment of Synchronous-SGD is mainly a parameter server type.

非同期型の並列分散学習方式においては、同期型の並列分散方式と比較して高い処理スループットが得られるが、ノード毎の処理速度の差により収束速度が低下する等の要因からスケーラビリティには限界があることが知られている。 The asynchronous parallel distributed learning method provides higher processing throughput than the synchronous parallel distributed method, but there is a limit to scalability due to factors such as a decrease in convergence speed due to differences in processing speed between nodes. It is known that there is.

なお、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとは異なるアプローチのＡｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤは、当該Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとは異なり、バッチサイズに依存しない並列分散学習アルゴリズムであることから、バッチサイズ非依存並列方式（処理）等と称される。なお、バッチサイズ非依存並列方式は、その多くが非同期型の並列分散学習方式である。 Asynchronous-SGD, which has a different approach from Synchronous-SGD, is called a batch size-independent parallel method (processing) because it is a parallel distributed learning algorithm that does not depend on the batch size, unlike the Synchronous-SGD. To. Most of the batch size-independent parallel methods are asynchronous parallel distributed learning methods.

ここで、図１を参照して、本実施形態に係るシステム（以下、本システムと表記）の概要について説明する。本システムは、並列分散学習処理において、階層的に異なる学習処理（例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤと他のバッチサイズ非依存並列方式）を実行可能な構成を有するものとする。 Here, with reference to FIG. 1, an outline of the system according to the present embodiment (hereinafter referred to as the system) will be described. This system shall have a configuration capable of executing hierarchically different learning processes (for example, Synchronous-SGD and other batch size-independent parallel methods) in parallel distributed learning processes.

具体的には、図１に示すように、第一階層においては複数のノードが属する各グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散処理を行い、第二階層では第一階層における各グループの代表ノード同士でバッチサイズ非依存並列分散処理を行うものとする。以下、本システムについて詳細に説明する。 Specifically, as shown in FIG. 1, in the first layer, parallel distribution processing of the mini-batch method by Synchronous-SGD is performed in each group to which a plurality of nodes belong, and in the second layer, each group in the first layer is subjected to parallel distribution processing. Batch size independent parallel distributed processing shall be performed between the representative nodes. Hereinafter, this system will be described in detail.

図２は、本システムの構成の一例を示す。図２に示すように、本システム１０は、サーバノード２０、複数のワーカノード３０及び複数のワーカノード４０を備える。 FIG. 2 shows an example of the configuration of this system. As shown in FIG. 2, the system 10 includes a server node 20, a plurality of worker nodes 30, and a plurality of worker nodes 40.

本実施形態において、複数のワーカノード３０はグループ１に属しており、複数のワーカノード４０はグループ２に属している。 In the present embodiment, the plurality of worker nodes 30 belong to group 1, and the plurality of worker nodes 40 belong to group 2.

サーバノード２０は、グループ１に属する複数のワーカノード３０のうちの１つのワーカノード３０（以下、グループ１の代表ノード３０と表記）と通信可能に接続される。また、サーバノード２０は、グループ２に属する複数のワーカノード３０のうちの１つのワーカノード４０（以下、グループ２の代表ノード４０と表記）と通信可能に接続される。 The server node 20 is communicably connected to one worker node 30 (hereinafter, referred to as a representative node 30 of group 1) among a plurality of worker nodes 30 belonging to group 1. Further, the server node 20 is communicably connected to one worker node 40 (hereinafter, referred to as a representative node 40 of the group 2) among a plurality of worker nodes 30 belonging to the group 2.

なお、複数のワーカノード３０のうち、サーバノード２０と通信可能に接続されていないワーカノード３０（つまり、グループ１の代表ノード３０以外のワーカノード３０）は、グループ１の非代表ノード３０と称する。また、複数のワーカノード４０のうち、サーバノード２０と通信可能に接続されていないワーカノード４０（つまり、グループ２の代表ノード４０以外のワーカノード４０）は、グループ２の非代表ノード４０と称する。 Of the plurality of worker nodes 30, the worker node 30 that is not communicably connected to the server node 20 (that is, the worker node 30 other than the representative node 30 of the group 1) is referred to as the non-representative node 30 of the group 1. Further, among the plurality of worker nodes 40, the worker node 40 that is not communicably connected to the server node 20 (that is, the worker node 40 other than the representative node 40 of the group 2) is referred to as the non-representative node 40 of the group 2.

グループ１内において、複数のワーカノード３０（代表ノード３０及び非代表ノード３０）は、互いに通信可能に接続されている。同様に、グループ２内において、複数のワーカノード４０（代表ノード４０及び非代表ノード４０）は、互いに通信可能に接続されている。 Within Group 1, a plurality of worker nodes 30 (representative node 30 and non-representative node 30) are connected to each other so as to be able to communicate with each other. Similarly, within Group 2, a plurality of worker nodes 40 (representative node 40 and non-representative node 40) are communicably connected to each other.

本実施形態において、第一階層では、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）内でそれぞれＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散学習処理が実行される。また、第二階層では、サーバノード２０を介して、グループ１の代表ノード３０及びグループ２の代表ノード４０同士でバッチサイズ非依存並列分散方式による並列分散学習処理が実行される。 In the first layer, in the first layer, the parallel distributed learning process of the mini-batch method by Synchronous-SGD is executed in the group 1 (plurality of worker nodes 30) and the group 2 (plurality of worker nodes 40), respectively. Further, in the second layer, parallel distribution learning processing by a batch size-independent parallel distribution method is executed between the representative nodes 30 of group 1 and the representative nodes 40 of group 2 via the server node 20.

なお、図２においてはグループ１及びグループ２にそれぞれ３つのワーカノードが属している例が示されているが、グループ１及びグループ２には２つ以上のワーカノードが属していればよい。また、図２においては２つのグループ（グループ１及びグループ２）のみが示されているが、本システムにおいては、３つ以上のグループを備えていてもよい。 Although FIG. 2 shows an example in which three worker nodes belong to each of group 1 and group 2, it is sufficient that two or more worker nodes belong to group 1 and group 2. Further, although only two groups (group 1 and group 2) are shown in FIG. 2, the system may include three or more groups.

図３は、図２に示すサーバノード２０のシステム構成の一例を示す。サーバノード２０は、ＣＰＵ２０１、システムコントローラ２０２、主メモリ２０３、ＢＩＯＳ−ＲＯＭ２０４、不揮発性メモリ２０５、通信デバイス２０６、エンベデッドコントローラ（ＥＣ）２０７等を備える。 FIG. 3 shows an example of the system configuration of the server node 20 shown in FIG. The server node 20 includes a CPU 201, a system controller 202, a main memory 203, a BIOS-ROM 204, a non-volatile memory 205, a communication device 206, an embedded controller (EC) 207, and the like.

ＣＰＵ２０１は、サーバノード２０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ２０１は、ストレージデバイスである不揮発性メモリ２０５から主メモリ２０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）２０３ａ、及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、サーバノード用の並列分散学習プログラム２０３ｂが含まれている。 The CPU 201 is a processor that controls the operation of various components in the server node 20. The CPU 201 executes various programs loaded from the non-volatile memory 205, which is a storage device, into the main memory 203. These programs include an operating system (OS) 203a and various application programs. The application program includes a parallel distributed learning program 203b for the server node.

また、ＣＰＵ２０１は、ＢＩＯＳ−ＲＯＭ２０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 201 also executes the basic input / output system (BIOS) stored in the BIOS-ROM204. The BIOS is a program for hardware control.

システムコントローラ２０２は、ＣＰＵ２０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ２０２には、主メモリ２０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 202 is a device that connects the local bus of the CPU 201 and various components. The system controller 202 also has a built-in memory controller that controls access to the main memory 203.

通信デバイス２０６は、有線または無線による通信を実行するように構成されたデバイスである。通信デバイス２０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ２０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。 The communication device 206 is a device configured to perform wired or wireless communication. The communication device 206 includes a transmitting unit that transmits a signal and a receiving unit that receives the signal. The EC207 is a one-chip microcomputer that includes an embedded controller for power management.

図４は、ワーカノード３０のシステム構成の一例を示す。ここでは、ワーカノード３０のシステム構成について説明するが、ワーカノード４０についても同様の構成であるものとする。 FIG. 4 shows an example of the system configuration of the worker node 30. Here, the system configuration of the worker node 30 will be described, but it is assumed that the worker node 40 has the same configuration.

ワーカノード３０は、ＣＰＵ３０１、システムコントローラ３０２、主メモリ３０３、ＢＩＯＳ−ＲＯＭ３０４、不揮発性メモリ３０５、通信デバイス３０６、エンベデッドコントローラ（ＥＣ）３０７等を備える。 The worker node 30 includes a CPU 301, a system controller 302, a main memory 303, a BIOS-ROM 304, a non-volatile memory 305, a communication device 306, an embedded controller (EC) 307, and the like.

ＣＰＵ３０１は、ワーカノード３０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ３０１は、ストレージデバイスである不揮発性メモリ３０５から主メモリ３０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）３０３ａ及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、ワーカノード用の並列分散学習プログラム３０３ｂが含まれている。 The CPU 301 is a processor that controls the operation of various components in the worker node 30. The CPU 301 executes various programs loaded from the non-volatile memory 305, which is a storage device, into the main memory 303. These programs include an operating system (OS) 303a and various application programs. The application program includes a parallel distributed learning program 303b for worker nodes.

また、ＣＰＵ３０１は、ＢＩＯＳ−ＲＯＭ３０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 301 also executes the basic input / output system (BIOS) stored in the BIOS-ROM 304. The BIOS is a program for hardware control.

システムコントローラ３０２は、ＣＰＵ３０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ３０２には、主メモリ３０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 302 is a device that connects the local bus of the CPU 301 and various components. The system controller 302 also has a built-in memory controller that controls access to the main memory 303.

通信デバイス３０６は、有線または無線による通信を実行するように構成されたデバイスである。通信デバイス３０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ３０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。 The communication device 306 is a device configured to perform wired or wireless communication. The communication device 306 includes a transmitting unit that transmits a signal and a receiving unit that receives the signal. The EC307 is a one-chip microcomputer that includes an embedded controller for power management.

図５は、サーバノード２０の機能構成の一例を示すブロック図である。図５に示すように、サーバノード２０は、学習データ格納部２１、データ割当部２２、送信制御部２３、重み係数格納部２４、受信制御部２５及び算出部２６を含む。 FIG. 5 is a block diagram showing an example of the functional configuration of the server node 20. As shown in FIG. 5, the server node 20 includes a learning data storage unit 21, a data allocation unit 22, a transmission control unit 23, a weighting coefficient storage unit 24, a reception control unit 25, and a calculation unit 26.

本実施形態において、学習データ格納部２１及び重み係数格納部２４は、例えば図３に示す不揮発性メモリ２０５等に格納される。また、本実施形態において、データ割当部２２、送信制御部２３、受信制御部２５及び算出部２６は、例えば図３に示すＣＰＵ２０１（つまり、サーバノード２０のコンピュータ）が不揮発性メモリ２０５に格納されている並列分散学習プログラム２０３ｂを実行すること（つまり、ソフトウェア）により実現されるものとする。なお、この並列分散学習プログラム２０３ｂは、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、並列分散学習プログラム２０３ｂは、例えばネットワークを介してサーバノード２０にダウンロードされても構わない。 In the present embodiment, the learning data storage unit 21 and the weighting coefficient storage unit 24 are stored in, for example, the non-volatile memory 205 shown in FIG. Further, in the present embodiment, in the data allocation unit 22, the transmission control unit 23, the reception control unit 25, and the calculation unit 26, for example, the CPU 201 (that is, the computer of the server node 20) shown in FIG. 3 is stored in the non-volatile memory 205. It is assumed that it is realized by executing the parallel distributed learning program 203b (that is, software). The parallel distributed learning program 203b can be stored and distributed in advance in a computer-readable storage medium. Further, the parallel distributed learning program 203b may be downloaded to the server node 20 via a network, for example.

ここでは、各部２２、２３、２５及び２６がソフトウェアにより実現されるものとして説明したが、これらの各部２２、２３、２５及び２６は、ハードウェアにより実現されてもよいし、ソフトウェア及びハードウェアの組み合わせ構成によって実現されてもよい。 Here, each part 22, 23, 25 and 26 has been described as being realized by software, but these parts 22, 23, 25 and 26 may be realized by hardware, and of software and hardware. It may be realized by a combination configuration.

学習データ格納部２１には、上記した並列分散学習処理において各ノード（ワーカノード）が勾配を計算するために用いられる学習データが格納されている。 The learning data storage unit 21 stores learning data used by each node (worker node) to calculate the gradient in the parallel distributed learning process described above.

データ割当部２２は、学習データ格納部２１に格納されている学習データのうち、各ワーカノード３０及び４０に割り当てられる学習データを決定する。データ割当部２２は、例えば学習データ格納部２１に格納されている学習データを２つに分割し、当該分割された学習データのそれぞれをグループ１（に属する複数のワーカノード３０）及びグループ２（に属する複数のワーカノード４０）に割り当てる。 The data allocation unit 22 determines the learning data to be assigned to the worker nodes 30 and 40 among the learning data stored in the learning data storage unit 21. For example, the data allocation unit 22 divides the learning data stored in the learning data storage unit 21 into two, and divides each of the divided learning data into group 1 (a plurality of worker nodes 30 belonging to) and group 2 (to group 2). It is assigned to a plurality of worker nodes 40) to which it belongs.

送信制御部２３は、通信デバイス２０６を介して各種データを送信する機能を有する。送信制御部２３は、データ割当部２２によってグループ１（に属する複数のワーカノード３０）に割り当てられた学習データを、当該グループ１の代表ノード３０に送信する。また、送信制御部２３は、データ割当部２２によってグループ２（に属する複数のワーカノード４０）に割り当てられた学習データを、当該グループ２の代表ノード４０に送信する。 The transmission control unit 23 has a function of transmitting various data via the communication device 206. The transmission control unit 23 transmits the learning data assigned to the group 1 (a plurality of worker nodes 30 belonging to the group 1) by the data allocation unit 22 to the representative node 30 of the group 1. Further, the transmission control unit 23 transmits the learning data assigned to the group 2 (a plurality of worker nodes 40 belonging to the group 2) by the data allocation unit 22 to the representative node 40 of the group 2.

重み係数格納部２４には、目的関数の重み係数が格納されている。なお、重み係数格納部２４に格納されている重み係数（つまり、サーバノード２０において管理される重み係数）は、マスタパラメータと称される。 The weighting coefficient storage unit 24 stores the weighting coefficient of the objective function. The weighting coefficient stored in the weighting coefficient storage unit 24 (that is, the weighting coefficient managed by the server node 20) is referred to as a master parameter.

受信制御部２５は、通信デバイス２０６を介して各種データを受信する機能を有する。受信制御部２５は、各ワーカノード３０及び４０上での学習経過を示す勾配を受信する。なお、受信制御部２５によって受信される勾配は、重み係数を更新するために各ワーカノード３０及び４０において算出される。グループ１に属するワーカノード３０の各々によって算出された勾配は、当該グループ１の代表ノード３０から受信される。グループ２に属するワーカノード４０の各々によって算出された勾配は、当該グループ２の代表ノード４０から受信される。 The reception control unit 25 has a function of receiving various data via the communication device 206. The reception control unit 25 receives a gradient indicating the learning progress on each of the worker nodes 30 and 40. The gradient received by the reception control unit 25 is calculated at each worker node 30 and 40 in order to update the weighting coefficient. The gradient calculated by each of the worker nodes 30 belonging to group 1 is received from the representative node 30 of the group 1. The gradient calculated by each of the worker nodes 40 belonging to the group 2 is received from the representative node 40 of the group 2.

算出部２６は、重み係数格納部２４に格納されている重み係数（マスタパラメータ）及び受信制御部２５によって受信された勾配を用いて、当該マスタパラメータを更新する。この場合、算出部２６は、上記した式（１）に基づいて更新後の重み係数を算出する。算出部２６によって算出された重み係数（更新後の重み係数）は、マスタパラメータとして重み係数格納部２４に格納されるとともに、送信制御部２３によってグループ１の代表ノード３０またはグループ２の代表ノード４０に送信される。 The calculation unit 26 updates the master parameter using the weight coefficient (master parameter) stored in the weight coefficient storage unit 24 and the gradient received by the reception control unit 25. In this case, the calculation unit 26 calculates the updated weighting coefficient based on the above equation (1). The weighting coefficient (updated weighting coefficient) calculated by the calculation unit 26 is stored in the weighting coefficient storage unit 24 as a master parameter, and is stored by the transmission control unit 23 as the representative node 30 of the group 1 or the representative node 40 of the group 2. Will be sent to.

以下、複数のワーカノード３０及び４０の機能構成について説明する。まず、図６を参照して、グループ１の代表ノード３０の機能構成の一例について説明する。 Hereinafter, the functional configurations of the plurality of worker nodes 30 and 40 will be described. First, an example of the functional configuration of the representative node 30 of the group 1 will be described with reference to FIG.

図６に示すように、グループ１の代表ノード３０は、受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 As shown in FIG. 6, the representative node 30 of the group 1 includes a reception control unit 31, a learning data storage unit 32, a weighting coefficient storage unit 33, a calculation unit 34, and a transmission control unit 35.

本実施形態において、受信制御部３１、算出部３４及び送信制御部３５は、例えば図４に示すＣＰＵ３０１（つまり、グループ１の代表ノード３０のコンピュータ）が不揮発性メモリ３０５に格納されている並列分散学習プログラム３０３ｂを実行すること（つまり、ソフトウェア）により実現されるものとする。なお、この並列分散学習プログラム３０３ｂは、コンピュータ読み取り可能な記憶媒体に予め格納して頒布可能である。また、並列分散学習プログラム３０３ｂは、例えばネットワークを介して代表ノード３０にダウンロードされても構わない。 In the present embodiment, the reception control unit 31, the calculation unit 34, and the transmission control unit 35 are distributed in parallel in which, for example, the CPU 301 shown in FIG. 4 (that is, the computer of the representative node 30 of the group 1) is stored in the non-volatile memory 305. It shall be realized by executing the learning program 303b (that is, software). The parallel distributed learning program 303b can be stored and distributed in advance in a computer-readable storage medium. Further, the parallel distributed learning program 303b may be downloaded to the representative node 30 via a network, for example.

ここでは、各部３１、３４及び３５がソフトウェアにより実現されるものとして説明したが、これらの各部３１、３４及び３５は、ハードウェアにより実現されてもよいし、ソフトウェア及びハードウェアの組み合わせ構成によって実現されてもよい。 Here, each part 31, 34, and 35 has been described as being realized by software, but these parts 31, 34, and 35 may be realized by hardware, or may be realized by a combination configuration of software and hardware. May be done.

また、本実施形態において、学習データ格納部３２及び重み係数格納部３３は、例えば図４に示す不揮発性メモリ３０５等に格納される。 Further, in the present embodiment, the learning data storage unit 32 and the weighting coefficient storage unit 33 are stored in, for example, the non-volatile memory 305 shown in FIG.

受信制御部３１は通信デバイス３０６を介して各種データを受信する機能を有する。受信制御部３１は、サーバノード２０に含まれる送信制御部２３によって送信された学習データを受信する。受信制御部３１によって受信された学習データのうち、グループ１の代表ノード３０に割り当てられた学習データは、学習データ格納部３２に格納される。一方、受信制御部３１によって受信された学習データのうち、グループ１の非代表ノード３０に割り当てられた学習データは、グループ１の代表ノード３０から非代表ノード３０に送信される。 The reception control unit 31 has a function of receiving various data via the communication device 306. The reception control unit 31 receives the learning data transmitted by the transmission control unit 23 included in the server node 20. Of the learning data received by the reception control unit 31, the learning data assigned to the representative node 30 of the group 1 is stored in the learning data storage unit 32. On the other hand, among the learning data received by the reception control unit 31, the learning data assigned to the non-representative node 30 of the group 1 is transmitted from the representative node 30 of the group 1 to the non-representative node 30.

また、受信制御部３１は、グループ１の非代表ノード３０において算出された勾配を、当該非代表ノード３０から受信する。 Further, the reception control unit 31 receives the gradient calculated in the non-representative node 30 of the group 1 from the non-representative node 30.

重み係数格納部３３には、目的関数の重み係数が格納されている。なお、重み係数格納部３３に格納されている重み係数（つまり、代表ノード３０において管理される重み係数）は、便宜的にグループ１の代表ノード３０の重み係数と称する。 The weighting coefficient of the objective function is stored in the weighting coefficient storage unit 33. The weighting coefficient stored in the weighting coefficient storage unit 33 (that is, the weighting coefficient managed by the representative node 30) is referred to as the weighting coefficient of the representative node 30 of the group 1 for convenience.

算出部３４は、学習データ格納部３２に格納された学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。 The calculation unit 34 calculates a gradient for updating the weighting coefficient of the objective function by using the learning data stored in the learning data storage unit 32 and the weighting coefficient stored in the weighting coefficient storage unit 33.

送信制御部３５は、通信デバイス３０６を介して各種データを送信する機能を有する。送信制御部３５は、受信制御部３１によって受信された勾配（非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配をサーバノード２０に送信する。 The transmission control unit 35 has a function of transmitting various data via the communication device 306. The transmission control unit 35 transmits the gradient received by the reception control unit 31 (the gradient calculated by the non-representative node 30) and the gradient calculated by the calculation unit 34 to the server node 20.

なお、上記したようにサーバノード２０に含まれる算出部２６によって算出された重み係数（更新後の重み係数）が当該サーバノード２０（送信制御部２３）から送信された場合、当該重み係数は、受信制御部３１によって受信され、重み係数格納部３３に格納されている重み係数（更新前の重み係数）と置換される。これにより、グループ１の代表ノード３０の重み係数が更新される。また、この重み係数は、送信制御部３５を介して非代表ノード３０に送信される。 When the weighting coefficient (updated weighting coefficient) calculated by the calculation unit 26 included in the server node 20 is transmitted from the server node 20 (transmission control unit 23) as described above, the weighting coefficient is calculated. It is received by the reception control unit 31 and is replaced with the weight coefficient (weight coefficient before update) stored in the weight coefficient storage unit 33. As a result, the weighting coefficient of the representative node 30 of the group 1 is updated. Further, this weighting coefficient is transmitted to the non-representative node 30 via the transmission control unit 35.

次に、グループ１の非代表ノード３０の機能構成の一例について説明する。なお、グループ１の非代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは上記したグループ１の代表ノード３０と異なる部分について主に述べる。 Next, an example of the functional configuration of the non-representative node 30 of the group 1 will be described. The functional configuration of the non-representative node 30 of the group 1 will be described with reference to FIG. 6 for convenience, but here, the parts different from the representative node 30 of the group 1 will be mainly described.

グループ１の非代表ノード３０は、上記したグループ１の代表ノード３０と同様に、図６に示す受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 The non-representative node 30 of the group 1 is the reception control unit 31, the learning data storage unit 32, the weighting coefficient storage unit 33, the calculation unit 34, and the transmission control unit 35 shown in FIG. 6, similarly to the representative node 30 of the group 1 described above. including.

受信制御部３１は、グループ１の代表ノード３０から送信された学習データを受信する。受信制御部３１によって受信された学習データは、学習データ格納部３２に格納される。 The reception control unit 31 receives the learning data transmitted from the representative node 30 of the group 1. The learning data received by the reception control unit 31 is stored in the learning data storage unit 32.

重み係数格納部３３には、目的関数の重み係数が格納されている。なお、重み係数格納部３３に格納されている重み係数（つまり、非代表ノード３０において管理される重み係数）は、便宜的にグループ１の非代表ノード３０の重み係数と称する。 The weighting coefficient of the objective function is stored in the weighting coefficient storage unit 33. The weighting coefficient stored in the weighting coefficient storage unit 33 (that is, the weighting coefficient managed by the non-representative node 30) is referred to as the weighting coefficient of the non-representative node 30 of the group 1 for convenience.

上記したようにグループ１の代表ノード３０から重み係数（更新後の重み係数）が送信された場合、当該重み係数は、受信制御部３１によって受信され、重み係数格納部３３に格納されている重み係数（更新前の重み係数）と置換される。これにより、グループ１の非代表ノード３０の重み係数が更新される。 When the weight coefficient (updated weight coefficient) is transmitted from the representative node 30 of the group 1 as described above, the weight coefficient is received by the reception control unit 31 and is stored in the weight coefficient storage unit 33. It is replaced with the coefficient (weight coefficient before update). As a result, the weighting coefficient of the non-representative node 30 of the group 1 is updated.

算出部３４は、学習データ格納部３２に格納された学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。算出部３４によって算出された勾配は、送信制御部３５によって代表ノード３０に送信される。 The calculation unit 34 calculates a gradient for updating the weighting coefficient of the objective function by using the learning data stored in the learning data storage unit 32 and the weighting coefficient stored in the weighting coefficient storage unit 33. The gradient calculated by the calculation unit 34 is transmitted to the representative node 30 by the transmission control unit 35.

ここでは、グループ１の代表ノード３０及び非代表ノード３０について説明したが、本実施形態においては、グループ２の代表ノード４０及び非代表ノード４０についてもグループ１の代表ノード３０及び非代表ノード３０と同様の機能構成であるものとする。このため、以下においてグループ２の代表ノード４０及び非代表ノード４０の機能構成について述べる場合には図６を用いるものとする。 Here, the representative node 30 and the non-representative node 30 of the group 1 have been described, but in the present embodiment, the representative node 40 and the non-representative node 40 of the group 2 are also referred to as the representative node 30 and the non-representative node 30 of the group 1. It is assumed that the functional configuration is the same. Therefore, FIG. 6 will be used when describing the functional configurations of the representative node 40 and the non-representative node 40 of the group 2 below.

以下、図７のシーケンスチャートを参照して、本システムの処理手順の一例について説明する。ここでは、サーバノード２０、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）間の処理について主に説明し、各グループ（グループ１及びグループ２）内のワーカノードの処理については後述する。 Hereinafter, an example of the processing procedure of this system will be described with reference to the sequence chart of FIG. Here, the processing between the server node 20, the group 1 (plurality of worker nodes 30) and the group 2 (plurality of worker nodes 40) will be mainly described, and the processing of the worker nodes in each group (group 1 and group 2) will be described later. To do.

なお、グループ１に属する複数のワーカノード３０の各々に割り当てられた学習データは、当該ワーカノード３０に含まれる学習データ格納部３２に格納されているものとする。グループ２に属する複数のワーカノード４０についても同様である。 It is assumed that the learning data assigned to each of the plurality of worker nodes 30 belonging to the group 1 is stored in the learning data storage unit 32 included in the worker node 30. The same applies to the plurality of worker nodes 40 belonging to the group 2.

また、サーバノード２０に含まれる重み係数格納部２４、複数のワーカノード３０及び４０の各々に含まれる重み係数格納部３３には、例えば同一の重み係数（以下、重み係数Ｗ０と表記）が格納されているものとする。 Further, for example, the same weight coefficient (hereinafter referred to as weight coefficient W0) is stored in the weight coefficient storage unit 24 included in the server node 20 and the weight coefficient storage unit 33 included in each of the plurality of worker nodes 30 and 40. It is assumed that

この場合、グループ１（に属する複数のワーカノード３０）においては、勾配算出処理が行われる（ステップＳ１）。この勾配算出処理によれば、グループ１に属する複数のワーカノード３０の各々は、当該ワーカノード３０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ１に属する複数のワーカノード３０の各々は、互いに同期して勾配算出処理を実行する。 In this case, in group 1 (plural worker nodes 30 belonging to), the gradient calculation process is performed (step S1). According to this gradient calculation process, each of the plurality of worker nodes 30 belonging to the group 1 has the learning data stored in the learning data storage unit 32 included in the worker node 30 and the weights stored in the weight coefficient storage unit 33. Using the coefficient W0, the gradient for updating the weighting coefficient of the objective function is calculated. Each of the plurality of worker nodes 30 belonging to the group 1 executes the gradient calculation process in synchronization with each other.

このようにステップＳ１において複数のワーカノード３０によって算出された勾配は、グループ１の代表ノード３０からサーバノード２０に送信される（ステップＳ２）。 The gradient calculated by the plurality of worker nodes 30 in step S1 is transmitted from the representative node 30 of the group 1 to the server node 20 (step S2).

サーバノード２０（に含まれる受信制御部２５）は、ステップＳ２において送信された勾配を受信する。サーバノード２０（に含まれる算出部２６）は、受信された勾配と当該サーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ０を用いて新たな重み係数（以下、重み係数Ｗ１と表記）を算出する。これにより、重み係数格納部２４に格納されている重み係数Ｗ０は、算出された重み係数Ｗ１に更新される（ステップＳ３）。 The server node 20 (the reception control unit 25 included in the server node 20) receives the gradient transmitted in step S2. The server node 20 (calculation unit 26 included in the server node 20) uses a new weight coefficient (hereinafter, weight coefficient) by using the received gradient and the weight coefficient W0 stored in the weight coefficient storage unit 24 included in the server node 20. W1) is calculated. As a result, the weighting coefficient W0 stored in the weighting coefficient storage unit 24 is updated to the calculated weighting coefficient W1 (step S3).

サーバノード２０（に含まれる送信制御部２３）は、ステップＳ３において重み係数Ｗ０から更新された重み係数Ｗ１（更新後のマスタパラメータ）を、グループ１に配布する（ステップＳ４）。 The server node 20 (transmission control unit 23 included in the server node 20) distributes the weight coefficient W1 (updated master parameter) updated from the weight coefficient W0 in step S3 to the group 1 (step S4).

このようにサーバノード２０から配布された重み係数Ｗ１は、グループ１に属する複数のワーカノード３０の各々に含まれる重み係数格納部３３に格納される。この場合、グループ１においては、グループ１において算出された勾配が反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 The weighting coefficient W1 distributed from the server node 20 in this way is stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 30 belonging to the group 1. In this case, in the group 1, the gradient calculation process can be executed from the next time onward by using the weighting coefficient reflecting the gradient calculated in the group 1.

一方、グループ２（に属する複数のワーカノード４０）においては、グループ１と同様に、勾配算出処理が行われる（ステップＳ５）。この勾配算出処理によれば、グループ２に属する複数のワーカノード４０の各々は、当該ワーカノード４０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ２に属する複数のワーカノード４０の各々は、互いに同期して勾配算出処理を実行する。 On the other hand, in group 2 (plural worker nodes 40 belonging to), the gradient calculation process is performed in the same manner as in group 1 (step S5). According to this gradient calculation process, each of the plurality of worker nodes 40 belonging to the group 2 has the learning data stored in the learning data storage unit 32 included in the worker node 40 and the weights stored in the weight coefficient storage unit 33. Using the coefficient W0, the gradient for updating the weighting coefficient of the objective function is calculated. Each of the plurality of worker nodes 40 belonging to the group 2 executes the gradient calculation process in synchronization with each other.

このようにステップＳ５において算出された勾配は、グループ２の代表ノード４０からサーバノード２０に送信される（ステップＳ６）。 The gradient calculated in step S5 in this way is transmitted from the representative node 40 of the group 2 to the server node 20 (step S6).

サーバノード２０は、ステップＳ６において送信された勾配を受信する。ここで、サーバノード２０に含まれる重み係数格納部２４に格納されている重み係数（マスタパラメータ）は、ステップＳ３において更新された重み係数Ｗ１である。 The server node 20 receives the gradient transmitted in step S6. Here, the weighting coefficient (master parameter) stored in the weighting coefficient storage unit 24 included in the server node 20 is the weighting coefficient W1 updated in step S3.

このため、サーバノード２０は、受信された勾配と重み係数Ｗ１とを用いて新たな重み係数（以下、重み係数Ｗ２と表記）を算出する。これにより、重み係数格納部２４に格納されている重み係数Ｗ１は、算出された重み係数Ｗ２に更新される（ステップＳ７）。 Therefore, the server node 20 calculates a new weighting coefficient (hereinafter, referred to as a weighting coefficient W2) using the received gradient and the weighting coefficient W1. As a result, the weighting coefficient W1 stored in the weighting coefficient storage unit 24 is updated to the calculated weighting coefficient W2 (step S7).

サーバノード２０は、ステップＳ７において重み係数Ｗ１から更新された重み係数Ｗ２（マスタパラメータ）を、グループ２に配布する（ステップＳ８）。 The server node 20 distributes the weighting coefficient W2 (master parameter) updated from the weighting coefficient W1 in step S7 to the group 2 (step S8).

このようにサーバノード２０から配布された重み係数Ｗ２は、グループ２に属する複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納される。 The weighting coefficient W2 distributed from the server node 20 in this way is stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 40 belonging to the group 2.

ここで、重み係数Ｗ２は、グループ１において算出された勾配を用いて更新された重み係数Ｗ１がグループ２において算出された勾配を用いて更に更新されたものである。すなわち、重み係数Ｗ２は、グループ１において算出された勾配（ステップＳ１において算出された勾配）及びグループ２において算出された勾配（ステップＳ５において算出された勾配）に基づいて算出された重み係数である。このようにグループ１による勾配の算出がグループ２による勾配の算出よりも早い場合、グループ２においては、グループ１において算出された勾配に基づいて更新された重み係数を用いた並列分散学習処理が実行される。 Here, the weighting coefficient W2 is a weighting coefficient W1 updated using the gradient calculated in the group 1 and further updated using the gradient calculated in the group 2. That is, the weighting coefficient W2 is a weighting coefficient calculated based on the gradient calculated in group 1 (gradient calculated in step S1) and the gradient calculated in group 2 (gradient calculated in step S5). .. When the calculation of the gradient by group 1 is faster than the calculation of the gradient by group 2, in group 2, parallel distribution learning processing using the weight coefficient updated based on the gradient calculated in group 1 is executed. Will be done.

このため、グループ２においては、当該グループ２において算出された勾配だけでなく、グループ１において算出された勾配もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 Therefore, in the group 2, the gradient calculation process can be executed from the next time onward by using the weighting coefficient that reflects not only the gradient calculated in the group 2 but also the gradient calculated in the group 1. ..

また、上記したステップＳ１〜Ｓ４の処理が実行された場合、グループ１においては、当該ステップＳ１〜Ｓ４の処理に相当するステップＳ９〜Ｓ１２の処理が実行される。この処理においては、グループ１における勾配算出処理によって算出された勾配とサーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ２を用いて、当該重み係数Ｗ２が新たな重み係数（以下、重み係数Ｗ３と表記）に更新される。この重み係数Ｗ３は、グループ１に属する複数のワーカノード３０に配布される。なお、ステップＳ９においては、ステップＳ１の勾配算出処理において用いられた学習データとは異なる学習データを用いて勾配を算出するものとする。 Further, when the processes of steps S1 to S4 described above are executed, in the group 1, the processes of steps S9 to S12 corresponding to the processes of the steps S1 to S4 are executed. In this process, the gradient calculated by the gradient calculation process in the group 1 and the weight coefficient W2 stored in the weight coefficient storage unit 24 included in the server node 20 are used, and the weight coefficient W2 is changed to a new weight coefficient ( Hereinafter, it is updated to the weighting coefficient W3). The weighting coefficient W3 is distributed to a plurality of worker nodes 30 belonging to the group 1. In step S9, the gradient is calculated using learning data different from the learning data used in the gradient calculation process in step S1.

ここで、重み係数Ｗ３は、グループ２において算出された勾配を用いて更新された重み係数Ｗ２がグループ１において算出された勾配を用いて更に更新されたものである。このようにグループ２による勾配の算出がグループ１による勾配の算出よりも早い場合、グループ１においては、グループ２において算出された勾配に基づいて更新された重み係数を用いた並列分散学習処理が実行される。 Here, the weighting coefficient W3 is a weighting coefficient W2 updated using the gradient calculated in the group 2, and further updated using the gradient calculated in the group 1. When the calculation of the gradient by group 2 is faster than the calculation of the gradient by group 1, in group 1, parallel distribution learning processing using the weight coefficient updated based on the gradient calculated in group 2 is executed. Will be done.

このため、グループ１においては、当該グループ１において算出された勾配（ステップＳ１及びＳ９において算出された勾配）だけでなく、グループ２において算出された勾配（ステップＳ５において算出された勾配）もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 Therefore, in group 1, not only the gradient calculated in group 1 (gradient calculated in steps S1 and S9) but also the gradient calculated in group 2 (gradient calculated in step S5) is reflected. The gradient calculation process can be executed from the next time onward by using the weighted coefficient.

一方、上記したステップＳ５〜Ｓ８の処理が実行された場合、グループ２においては、当該ステップＳ５〜Ｓ８の処理に相当するステップＳ１３〜Ｓ１６の処理が実行される。この処理においては、グループ２における勾配算出処理によって算出された勾配とサーバノード２０に含まれる重み係数格納部２４に格納されている重み係数Ｗ３が新たな重み係数（以下、重み係数Ｗ４と表記）に更新される。この重み係数Ｗ４は、グループ２に属する複数のワーカノード４０に配布される。なお、ステップＳ１３においては、ステップＳ５の勾配算出処理において用いられた学習データとは異なる学習データを用いて勾配を算出するものとする。 On the other hand, when the above-mentioned processes of steps S5 to S8 are executed, in the group 2, the processes of steps S13 to S16 corresponding to the processes of the steps S5 to S8 are executed. In this process, the gradient calculated by the gradient calculation process in group 2 and the weighting coefficient W3 stored in the weighting coefficient storage unit 24 included in the server node 20 are new weighting coefficients (hereinafter referred to as weighting coefficients W4). Will be updated to. The weighting coefficient W4 is distributed to a plurality of worker nodes 40 belonging to the group 2. In step S13, the gradient is calculated using learning data different from the learning data used in the gradient calculation process in step S5.

ここで、重み係数Ｗ４は、グループ１において算出された勾配を用いて更新された重み係数Ｗ３がグループ２において算出された勾配を用いて更に更新されたものである。 Here, the weighting coefficient W4 is a weighting coefficient W3 updated using the gradient calculated in the group 1 and further updated using the gradient calculated in the group 2.

このため、グループ２においては、当該グループ２において算出された勾配（ステップＳ５及びＳ１３において算出された勾配）だけでなく、グループ１において算出された勾配（ステップＳ１及びＳ９において算出された勾配）もが反映された重み係数を用いて、次回以降の勾配算出処理を実行することができる。 Therefore, in group 2, not only the gradient calculated in group 2 (gradient calculated in steps S5 and S13) but also the gradient calculated in group 1 (gradient calculated in steps S1 and S9) is also included. The gradient calculation process can be executed from the next time onward by using the weighting coefficient that reflects.

図７においてはステップＳ１〜Ｓ１６の処理について説明したが、当該図７に示す処理は、複数のワーカノード３０及び４０の各々に含まれる学習データ格納部３２に格納されている学習データの全てについて勾配算出処理（つまり、並列分散学習処理）が実行されるまで継続して実行される。 Although the processing of steps S1 to S16 has been described in FIG. 7, the processing shown in FIG. 7 has a gradient for all the learning data stored in the learning data storage unit 32 included in each of the plurality of worker nodes 30 and 40. It is continuously executed until the calculation process (that is, the parallel distributed learning process) is executed.

上記したように本実施形態によれば、グループ１及びグループ２内では互いに同期して処理が実行されるが、サーバノード２０及びグループ１（の代表ノード３０）間の処理と、サーバノード２０及びグループ２（の代表ノード４０）間の処理とは、非同期に実行される。 As described above, according to the present embodiment, the processes are executed in synchronization with each other in the group 1 and the group 2, but the processes between the server node 20 and the group 1 (representative node 30) and the server nodes 20 and Processing between group 2 (representative node 40) is executed asynchronously.

以下、上記した図７に示す処理が実行される際の、各グループの代表ノード及び非代表ノードの処理について説明する。 Hereinafter, the processing of the representative node and the non-representative node of each group when the processing shown in FIG. 7 is executed will be described.

まず、図８のフローチャートを参照して、代表ノードの処理手順の一例について説明する。ここでは、グループ１の代表ノード３０の処理手順について説明する。 First, an example of the processing procedure of the representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the representative node 30 of the group 1 will be described.

代表ノード３０に含まれる算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）を用いて勾配を算出する（ステップＳ２１）。以下、代表ノード３０において算出された勾配を代表ノード３０の勾配と称する。 The calculation unit 34 included in the representative node 30 calculates the gradient using the learning data stored in the learning data storage unit 32 and the weighting coefficient (for example, the weighting coefficient W0) stored in the weighting coefficient storage unit 33. (Step S21). Hereinafter, the gradient calculated in the representative node 30 will be referred to as a gradient of the representative node 30.

ここで、グループ１の代表ノード３０がステップＳ２１の処理を実行する場合、当該グループ１の非代表ノード３０は、後述するように当該代表ノード３０と同期して勾配を算出する。以下、このように非代表ノード３０において算出された勾配を非代表ノード３０の勾配と称する。 Here, when the representative node 30 of the group 1 executes the process of step S21, the non-representative node 30 of the group 1 calculates the gradient in synchronization with the representative node 30 as described later. Hereinafter, the gradient calculated in the non-representative node 30 in this way will be referred to as a gradient of the non-representative node 30.

この場合、受信制御部３１は、非代表ノード３０の勾配を当該非代表ノード３０から受信する（ステップＳ２２）。なお、本システムにおいて、複数の非代表ノード３０がグループ１に属している場合は、受信制御部３１は、当該非代表ノード３０の各々から勾配を受信する。 In this case, the reception control unit 31 receives the gradient of the non-representative node 30 from the non-representative node 30 (step S22). In this system, when a plurality of non-representative nodes 30 belong to group 1, the reception control unit 31 receives the gradient from each of the non-representative nodes 30.

次に、算出部３４は、ステップＳ２１において算出された勾配（代表ノード３０の勾配）及びステップＳ２２において受信された勾配（非代表ノード３０の勾配）の平均値を算出する（ステップＳ２３）。以下、ステップＳ２３において算出された勾配の平均値をグループ１の平均勾配と称する。 Next, the calculation unit 34 calculates the average value of the gradient calculated in step S21 (gradient of the representative node 30) and the gradient received in step S22 (gradient of the non-representative node 30) (step S23). Hereinafter, the average value of the gradient calculated in step S23 will be referred to as the average gradient of Group 1.

送信制御部３５は、グループ１の平均勾配をサーバノード２０に送信する（ステップＳ２４）。 The transmission control unit 35 transmits the average gradient of the group 1 to the server node 20 (step S24).

なお、上記したステップＳ２１〜Ｓ２４の処理は、上記した図７に示すステップＳ１及びＳ２（またはステップＳ９及びＳ１０）においてグループ１の代表ノード３０によって実行される。 The processing of steps S21 to S24 described above is executed by the representative node 30 of group 1 in steps S1 and S2 (or steps S9 and S10) shown in FIG. 7 described above.

この場合、サーバノード２０によって図７に示すステップＳ３及びＳ４の処理が実行される。すなわち、サーバノード２０においてはステップＳ２４において送信されたグループ１の平均勾配でマスタパラメータが更新され、当該更新後のマスタパラメータ（例えば、重み係数Ｗ１）がサーバノード２０からグループ１の代表ノード３０に送信される。 In this case, the server node 20 executes the processes of steps S3 and S4 shown in FIG. That is, in the server node 20, the master parameter is updated with the average gradient of the group 1 transmitted in step S24, and the updated master parameter (for example, the weighting coefficient W1) is transferred from the server node 20 to the representative node 30 of the group 1. Will be sent.

サーバノード２０からマスタパラメータが送信された場合、受信制御部３１は、当該マスタパラメータを受信する（ステップＳ２５）。 When the master parameter is transmitted from the server node 20, the reception control unit 31 receives the master parameter (step S25).

送信制御部３５は、ステップＳ２５において受信されたマスタパラメータを非代表ノード３０に送信する（ステップＳ２６）。 The transmission control unit 35 transmits the master parameter received in step S25 to the non-representative node 30 (step S26).

重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）は、ステップＳ２５において受信されたマスタパラメータ（例えば、重み係数Ｗ１）で置換される（ステップＳ２７）。これにより、グループ１の代表ノード３０の重み係数がマスタパラメータ（と同一の重み係数）に更新される。 The weighting coefficient (for example, the weighting coefficient W0) stored in the weighting coefficient storage unit 33 is replaced with the master parameter (for example, the weighting coefficient W1) received in step S25 (step S27). As a result, the weighting factor of the representative node 30 of the group 1 is updated to the master parameter (the same weighting factor).

なお、ステップＳ２５〜Ｓ２７の処理は、上記した図７に示すステップＳ４（またはステップＳ１２）の処理の後に代表ノード３０によって実行される。 The processing of steps S25 to S27 is executed by the representative node 30 after the processing of step S4 (or step S12) shown in FIG. 7 described above.

上記した図８に示す処理が実行されることにより、グループ１の代表ノード３０の重み係数がグループ１の平均勾配を用いて算出された重み係数に更新され、次の勾配の算出においては当該更新された重み係数を用いることができる。 By executing the process shown in FIG. 8 described above, the weighting coefficient of the representative node 30 of group 1 is updated to the weighting coefficient calculated using the average gradient of group 1, and the update is performed in the calculation of the next gradient. The weighting factor given can be used.

なお、図示されていないが、図８に示す処理は、図７に示す処理が継続して実行されている間は繰り返し実行される。 Although not shown, the process shown in FIG. 8 is repeatedly executed while the process shown in FIG. 7 is continuously executed.

次に、図９のフローチャートを参照して、非代表ノードの処理手順の一例について説明する。ここでは、グループ１の非代表ノード３０の処理手順について説明する。 Next, an example of the processing procedure of the non-representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the non-representative node 30 of the group 1 will be described.

非代表ノード３０に含まれる算出部３４は、上記した代表ノード３０における勾配の算出と同期して、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）を用いて勾配を算出する（ステップＳ３１）。 The calculation unit 34 included in the non-representative node 30 synchronizes with the calculation of the gradient in the representative node 30 described above, and the learning data stored in the learning data storage unit 32 and the weights stored in the weight coefficient storage unit 33. The gradient is calculated using the coefficient (for example, the weighting coefficient W0) (step S31).

ステップＳ３１の処理が実行されると、送信制御部３５は、当該ステップＳ３１において算出された勾配（非代表ノード３０の勾配）を代表ノード３０に送信する（ステップＳ３２）。 When the process of step S31 is executed, the transmission control unit 35 transmits the gradient (gradient of the non-representative node 30) calculated in step S31 to the representative node 30 (step S32).

なお、上記したステップＳ３１及びＳ３２の処理は、上記した図７に示すステップＳ１及びＳ２（またはステップＳ９及びＳ１０）において非代表ノード３０によって実行される。 The processing of steps S31 and S32 described above is executed by the non-representative node 30 in steps S1 and S2 (or steps S9 and S10) shown in FIG. 7 described above.

ステップＳ３２の処理が実行された場合、代表ノード３０においては、図８に示すステップＳ２２〜Ｓ２６の処理が実行される。この場合、サーバノード２０から送信されたマスタパラメータ（例えば、重み係数Ｗ１）がグループ１の代表ノード３０から非代表ノード３０に送信される。 When the process of step S32 is executed, the processes of steps S22 to S26 shown in FIG. 8 are executed at the representative node 30. In this case, the master parameter (for example, the weighting coefficient W1) transmitted from the server node 20 is transmitted from the representative node 30 of the group 1 to the non-representative node 30.

代表ノード３０からマスタパラメータが送信された場合、受信制御部３１は、当該マスタパラメータを受信する（ステップＳ３３）。 When the master parameter is transmitted from the representative node 30, the reception control unit 31 receives the master parameter (step S33).

重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ０）は、ステップＳ３３において受信されたマスタパラメータで置換される（ステップＳ３４）。これにより、グループ１の非代表ノード３０の重み係数がマスタパラメータ（と同一の重み係数）に更新される。 The weighting coefficient (for example, the weighting coefficient W0) stored in the weighting coefficient storage unit 33 is replaced with the master parameter received in step S33 (step S34). As a result, the weighting coefficient of the non-representative node 30 in group 1 is updated to the master parameter (the same weighting factor).

なお、ステップＳ３３及びＳ３４の処理は、上記した図７に示すステップＳ４（またはステップＳ１２）の処理の後に非代表ノード３０によって実行される。 The processing of steps S33 and S34 is executed by the non-representative node 30 after the processing of step S4 (or step S12) shown in FIG. 7 described above.

上記した図９に示す処理が実行されることにより、グループ１の非代表ノード３０の重み係数がグループ１の平均勾配を用いて算出された重み係数に更新され、次の勾配の算出においては当該更新された重み係数を用いることができる。 By executing the process shown in FIG. 9 above, the weighting coefficient of the non-representative node 30 of the group 1 is updated to the weighting coefficient calculated using the average gradient of the group 1, and the weighting coefficient is updated in the calculation of the next gradient. The updated weighting factor can be used.

なお、図示されていないが、図９に示す処理は、図７に示す処理が継続して実行されている間は繰り返し実行される。 Although not shown, the process shown in FIG. 9 is repeatedly executed while the process shown in FIG. 7 is continuously executed.

上記したようにグループ１においては、当該グループ１に属する全てのワーカノード３０の勾配を代表ノード３０に集約し、当該代表ノード３０において平均勾配を算出する処理が実行される。この場合、例えばＭＰＩ（Message Passing Interface）で定義されるＲｅｄｕｃｅと称される集団通信アルゴリズム（ＭＰＩ＿Ｒｅｄｕｃｅ）を用いることで、非代表ノード３０からの代表ノード３０への勾配の送信と平均勾配（全ワーカノード３０の勾配和）の算出処理を効率的に実行することが可能である。ここでは、ＭＰＩ＿Ｒｅｄｕｃｅを用いる場合について説明したが、当該ＭＰＩ＿Ｒｅｄｕｃｅと同程度の他の処理が実行されてもよい。 As described above, in the group 1, the gradients of all the worker nodes 30 belonging to the group 1 are aggregated in the representative node 30, and the process of calculating the average gradient is executed in the representative node 30. In this case, for example, by using a collective communication algorithm (MPI_Reduction) called Reduce defined by MPI (Message Passing Interface), the gradient is transmitted from the non-representative node 30 to the representative node 30 and the average gradient (all worker nodes). It is possible to efficiently execute the calculation process of the sum of gradients of 30). Here, the case where MPI_Reduction is used has been described, but other processing similar to that of MPI_Reduction may be executed.

ここでは、グループ１（代表ノード３０及び非代表ノード３０）の処理について説明したが、グループ２（代表ノード４０及び非代表ノード４０）においてもグループ１と同様の処理が実行される。 Here, the processing of the group 1 (representative node 30 and the non-representative node 30) has been described, but the same processing as that of the group 1 is also executed in the group 2 (representative node 40 and the non-representative node 40).

上記したように本実施形態において、システムはグループ１に属する複数のワーカノード（代表ノード及び非代表ノード）３０とグループ２（第２グループ）に属する複数のワーカノード（代表ノード及び非代表ノード）４０とを備える。複数のワーカノード３０が目的関数を基準とする例えばｎ回目の並列分散処理を実行する場合、例えばグループ１の代表ノード（第１ノード）３０によって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、グループ１の非代表ノード（第２ノード）３０によって目的関数の第１重み係数を第２重み係数へ更新するための第２勾配が算出される。 As described above, in the present embodiment, the system includes a plurality of worker nodes (representative node and non-representative node) 30 belonging to group 1 and a plurality of worker nodes (representative node and non-representative node) 40 belonging to group 2 (second group). To be equipped. When a plurality of worker nodes 30 execute, for example, the nth parallel distribution process based on the objective function, for example, the representative node (first node) 30 of the group 1 updates the first weighting coefficient of the objective function to the second weighting coefficient. The first gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient is calculated by the non-representative node (second node) 30 of the group 1.

一方、複数のワーカノード３０による並列分散処理と非同期に実行される例えばｍ回目の並列分散処理を複数のワーカノード４０が実行する場合、例えばグループ２の代表ノード（第３ノード）４０によって目的関数の第３重み係数を第４重み係数へ更新するための第４勾配が算出され、かつ、グループ２の非代表ノード４０によって目的関数の第３重み係数を第４重み係数へ更新するための第４勾配が算出される。 On the other hand, when a plurality of worker nodes 40 execute, for example, the m-th parallel distributed processing that is executed asynchronously with the parallel distributed processing by the plurality of worker nodes 30, for example, the representative node (third node) 40 of the group 2 determines the objective function. The 4th gradient for updating the 3 weighting factors to the 4th weighting coefficient is calculated, and the 4th gradient for updating the 3rd weighting coefficient of the objective function to the 4th weighting coefficient by the non-representative node 40 of the group 2 is calculated. Is calculated.

ここで、本実施形態においては、グループ１（代表ノード３０及び非代表ノード３０）における勾配の算出が、グループ２（代表ノード４０及び非代表ノード４０）における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第２勾配に基づいて第１重み係数から更新された第２重み係数をさらに更新し、グループ２におけるｍ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて第３重み係数から更新された第４重み係数をさらに更新する。 Here, in the present embodiment, when the calculation of the gradient in the group 1 (representative node 30 and the non-representative node 30) is faster than the calculation of the gradient in the group 2 (representative node 40 and the non-representative node 40), the group 1 In the n + 1th parallel distribution process in, the second weighting factor updated from the first weighting factor is further updated based on the first to second gradients, and in the m + 1th parallel distribution process in group 2, the first to second weight coefficients are updated. The fourth weighting factor updated from the third weighting factor is further updated based on the four gradients.

一方、グループ２（代表ノード４０及び非代表ノード４０）における勾配の算出が、グループ１（代表ノード３０及び非代表ノード３０）における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１〜第４勾配に基づいて第１重み係数から更新された第２重み係数をさらに更新し、グループ２におけるｍ＋１回目の並列分散処理では、第３乃至第４勾配に基づいて第３重み係数から更新された第４重み係数をさらに更新する。 On the other hand, when the gradient calculation in group 2 (representative node 40 and non-representative node 40) is faster than the gradient calculation in group 1 (representative node 30 and non-representative node 30), the n + 1th parallel distribution processing in group 1 Then, the second weighting factor updated from the first weighting factor is further updated based on the first to fourth gradients, and in the m + 1th parallel distribution processing in group 2, the third is based on the third to fourth slopes. The fourth weighting factor updated from the weighting factor is further updated.

上記したように本実施形態においては、複数のワーカノード３０及び４０を複数のグループ（グループ１及びグループ２）に分割し、第一階層として、当該グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理を行う。この第一階層においては、グループ毎に同期を行うため、例えば複数のワーカノード３０及び４０の全てを同期させて処理を行う場合と比較して、同期コスト及びバッチサイズを抑制することが可能となる。 As described above, in the present embodiment, the plurality of worker nodes 30 and 40 are divided into a plurality of groups (group 1 and group 2), and parallel distributed learning processing by Synchronous-SGD is performed in the group as the first layer. .. In this first layer, since synchronization is performed for each group, it is possible to suppress the synchronization cost and batch size as compared with the case where all of the plurality of worker nodes 30 and 40 are synchronized for processing, for example. ..

また、第二階層としては、サーバノード２０を介して、第一階層における各グループの代表ノード同士でバッチサイズ非依存並列方式による並列分散学習処理を行う。この第二階層においては、各代表ノードは同期する必要がないため、高いスループットを得ることができる。 Further, as the second layer, parallel distributed learning processing is performed by the batch size-independent parallel method between the representative nodes of each group in the first layer via the server node 20. In this second layer, since each representative node does not need to be synchronized, high throughput can be obtained.

すなわち、本実施形態においては、例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式を階層的に組み合わせる構成により、並列分散学習処理における高いスケーラビリティを実現することができ、より大きな並列数での並列分散学習処理が可能となる。 That is, in the present embodiment, for example, a configuration in which Synchronous-SGD and a batch size-independent parallel method are hierarchically combined can realize high scalability in parallel distributed learning processing, and parallel distributed learning with a larger number of parallels can be realized. Processing becomes possible.

ここで、図１０は、所定の汎化性能を得ることができるまでの時間（学習時間）を学習方式毎に示している。 Here, FIG. 10 shows the time (learning time) until a predetermined generalization performance can be obtained for each learning method.

図１０においては、学習方式として、例えば非並列分散方式（単一のノードによる学習方式）、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤ、バッチサイズ非依存並列方式及び本実施形態における方式（Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤ＋バッチサイズ非依存並列方式）が示されている。 In FIG. 10, as the learning methods, for example, a non-parallel distributed method (learning method using a single node), a S $ Synchronous-SGD, a batch size-independent parallel method, and a method in the present embodiment (Syndronous-SGD + batch size-independent parallel method). )It is shown.

図１０に示すように、非並列分散方式による学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、非並列分散方式の学習時間と表記）を１．０とした場合、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの学習時間と表記）は、０．６である。 As shown in FIG. 10, when the learning time until a predetermined generalization performance can be obtained in the learning process by the non-parallel distributed method (hereinafter, referred to as the learning time of the non-parallel distributed method) is 1.0. The learning time (hereinafter referred to as the learning time of Synchronous-SGD) until a predetermined generalization performance can be obtained in the parallel distributed learning process by Synchronous-SGD is 0.6.

同様に、非並列分散方式の学習時間を１．０とした場合、バッチサイズ非依存並列方式による並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（以下、バッチサイズ非依存並列方式の学習時間と表記）は、０．５である。 Similarly, when the learning time of the non-parallel distributed method is 1.0, the learning time until a predetermined generalization performance can be obtained in the parallel distributed learning process by the batch size independent parallel method (hereinafter, batch size non-batch size). The learning time of the dependent parallel method) is 0.5.

これに対して、本実施形態に係る方式（階層的な並列分散方式）では、理論上はＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式それぞれのスケーラビリティの上限を掛け合わせたスケーラビリティを得られる。 On the other hand, in the method according to the present embodiment (hierarchical parallel distribution method), in theory, the scalability obtained by multiplying the upper limits of the scalabilitys of the Synchronous-SGD and the batch size-independent parallel method can be obtained.

具体的には、本実施形態に係る方式による並列分散学習処理において所定の汎化性能を得ることができるまでの学習時間（の非並列分散方式の学習時間に対する割合）は、非並列分散方式の学習時間に対するＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤの学習時間の割合（ここでは、０．６）と非並列分散方式の学習時間に対するバッチサイズ非依存並列方式の学習時間の割合（ここでは、０．５）とを乗算した値（つまり、０．３）となる。 Specifically, the learning time (ratio to the learning time of the non-parallel distributed method) until a predetermined generalization performance can be obtained in the parallel distributed learning process according to the present embodiment is determined by the non-parallel distributed method. Multiply the ratio of Synchronous-SGD learning time to learning time (0.6 in this case) and the ratio of batch size independent parallel learning time to learning time in non-parallel distributed processing (0.5 in this case). (That is, 0.3).

これによれば、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤでは非並列分散方式の６割程度の学習時間で所定の汎化性能を得ることができ、バッチサイズ非依存並列方式では非並列分散方式の５割程度の学習時間で所定の汎化性能を得ることができるところ、本実施形態に係る方式では、非並列分散方式の３割程度の学習時間で所定の汎化性能を得ることができる。 According to this, the Synchronous-SGD can obtain a predetermined generalization performance in about 60% of the learning time of the non-parallel distributed method, and the batch size-independent parallel method has a learning time of about 50% of the non-parallel distributed method. However, in the method according to the present embodiment, the predetermined generalization performance can be obtained in about 30% of the learning time of the non-parallel distributed method.

すなわち、本実施形態においては、単にＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理またはバッチサイズ非依存並列方式による並列分散学習処理を実行する場合に比べて、高いスケーラビリティを実現することが可能である。 That is, in the present embodiment, it is possible to realize high scalability as compared with the case where the parallel distributed learning process by the Synchronous-SGD or the parallel distributed learning process by the batch size independent parallel method is simply executed.

なお、本実施形態において、上記した第２重み係数は、グループ１の代表ノード３０によって算出された第１勾配及びグループ１の非代表ノード３０によって算出された第２勾配から算出された第５勾配（例えば、第１勾配と第２勾配との平均値）に基づいて算出される。同様に、第４重み係数は、グループ２の代表ノード４０によって算出された第３勾配及びグループ２の非代表ノード４０によって算出された第４勾配から算出された第６勾配（例えば、第３勾配と第４勾配との平均値）に基づいて算出される。 In the present embodiment, the above-mentioned second weighting coefficient is the fifth gradient calculated from the first gradient calculated by the representative node 30 of the group 1 and the second gradient calculated by the non-representative node 30 of the group 1. It is calculated based on (for example, the average value of the first gradient and the second gradient). Similarly, the fourth weighting factor is a sixth gradient calculated from the third gradient calculated by the representative node 40 of group 2 and the fourth gradient calculated by the non-representative node 40 of group 2 (eg, third gradient). And the fourth gradient).

また、本実施形態においては、グループ１の代表ノード３０及びグループ２の代表ノード４０と通信可能に接続されるサーバノード２０において第２重み係数及び第４重み係数が算出される。 Further, in the present embodiment, the second weighting coefficient and the fourth weighting coefficient are calculated at the server node 20 communicably connected to the representative node 30 of the group 1 and the representative node 40 of the group 2.

サーバノード２０において第２重み係数が算出された場合、サーバノード２０は当該第２重み係数をグループ１の代表ノード３０に送信し、当該代表ノード３０は当該第２重み係数を非代表ノード３０に送信する。本実施形態においては、このような構成により、グループ１の代表ノード３０及び非代表ノード３０の重み係数をサーバノード２０によって算出された重み係数に更新することができる。 When the second weighting coefficient is calculated in the server node 20, the server node 20 transmits the second weighting coefficient to the representative node 30 of the group 1, and the representative node 30 transmits the second weighting coefficient to the non-representative node 30. Send. In the present embodiment, with such a configuration, the weighting coefficients of the representative node 30 and the non-representative node 30 of the group 1 can be updated to the weighting coefficients calculated by the server node 20.

また、サーバノード２０において第４重み係数が算出された場合、サーバノード２０は当該第４重み係数をグループ２の代表ノード４０に送信し、当該代表ノード４０は当該第４重み係数を非代表ノード４０に送信する。本実施形態においては、このような構成により、グループ２の代表ノード４０及び非代表ノード４０の重み係数をサーバノード２０によって算出された重み係数に更新することができる。 When the fourth weighting coefficient is calculated in the server node 20, the server node 20 transmits the fourth weighting coefficient to the representative node 40 of the group 2, and the representative node 40 transmits the fourth weighting coefficient to the non-representative node. Send to 40. In the present embodiment, with such a configuration, the weighting coefficients of the representative node 40 and the non-representative node 40 of the group 2 can be updated to the weighting coefficients calculated by the server node 20.

なお、本実施形態においては、グループ１における勾配の算出がグループ２における勾配の算出よりも早い場合、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第２勾配に基づいて重み係数が更新され、グループ２におけるｍ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて重み係数がさらに更新されるものとして説明した。 In this embodiment, when the gradient calculation in group 1 is faster than the gradient calculation in group 2, the weighting factor is updated based on the first to second gradients in the n + 1th parallel distribution processing in group 1. In the m + 1th parallel distribution process in Group 2, the weighting factors were further updated based on the first to fourth gradients.

しかしながら、この「グループ１における勾配の算出がグループ２における勾配の算出よりも早い場合」には、サーバノード２０がグループ２における勾配算出結果（グループ２の代表ノード４０から送信された勾配）を受信するより先にグループ１における勾配算出結果（グループ１の代表ノード３０から送信された勾配）を受信する場合を含むものとする。 However, in this "when the calculation of the gradient in the group 1 is faster than the calculation of the gradient in the group 2", the server node 20 receives the gradient calculation result in the group 2 (the gradient transmitted from the representative node 40 of the group 2). It is assumed that the case where the gradient calculation result in the group 1 (the gradient transmitted from the representative node 30 of the group 1) is received before the operation is performed is included.

これによれば、例えばグループ１における勾配算出結果がグループ２における勾配算出結果よりも先にサーバノード２０によって受信された場合には、グループ１におけるｎ＋１回目の並列分散処理（時間的に後の並列分散処理）では、当該グループ１における勾配算出結果（つまり、第１乃至第２勾配）に基づいて重み係数が更新され、グループ２におけるｍ＋１回目の並列分散処理では、グループ１における勾配算出結果（に基づいて更新された重み係数）及びグループ２における勾配算出結果（つまり、第１乃至第４勾配）に基づいて重み係数がさらに更新されることになる。 According to this, for example, when the gradient calculation result in group 1 is received by the server node 20 before the gradient calculation result in group 2, the n + 1th parallel distributed processing in group 1 (parallel in time later). In the distributed processing), the weight coefficient is updated based on the gradient calculation result (that is, the first to second gradients) in the group 1, and in the m + 1th parallel distribution processing in the group 2, the gradient calculation result in the group 1 (to) The weight coefficient is further updated based on the weight coefficient updated based on the above and the gradient calculation result (that is, the first to fourth gradients) in the group 2.

また、本実施形態においては、グループ２における勾配の算出がグループ１における勾配の算出よりも早い場合、グループ２におけるｍ＋１回目の並列分散処理では、第３乃至第４勾配に基づいて重み係数が更新され、グループ１におけるｎ＋１回目の並列分散処理では、第１乃至第４勾配に基づいて重み係数がさらに更新されるものとして説明した。 Further, in the present embodiment, when the calculation of the gradient in the group 2 is faster than the calculation of the gradient in the group 1, the weighting coefficient is updated based on the third to fourth gradients in the m + 1th parallel distribution processing in the group 2. In the n + 1th parallel distribution process in Group 1, the weighting factors were further updated based on the first to fourth gradients.

しかしながら、この「グループ２における勾配の算出がグループ１における勾配の算出よりも早い場合」には、サーバノード２０がグループ１における勾配算出結果（グループ１の代表ノード３０から送信された勾配）を受信するより先にグループ２における勾配算出結果（グループ２の代表ノード４０から送信された勾配）受信する場合を含むものとする。 However, in this "when the calculation of the gradient in the group 2 is faster than the calculation of the gradient in the group 1", the server node 20 receives the gradient calculation result in the group 1 (the gradient transmitted from the representative node 30 of the group 1). It is assumed that the case where the gradient calculation result in the group 2 (gradient transmitted from the representative node 40 of the group 2) is received before the operation is performed is included.

これによれば、例えばグループ２おける勾配算出結果がグループ１における勾配算出結果よりも先にサーバノード２０によって受信された場合には、グループ２におけるｍ＋１回目の並列分散処理（時間的に後の並列分散処理）では、当該グループ２における勾配算出結果（つまり、第３乃至第４勾配）に基づいて重み係数が更新され、グループ１におけるｎ＋１回目の並列分散処理では、グループ２における勾配算出結果（に基づいて更新された重み係数）及びグループ１における勾配算出結果（つまり、第１乃至第４勾配）に基づいて重み係数がさらに更新されることになる。 According to this, for example, when the gradient calculation result in group 2 is received by the server node 20 before the gradient calculation result in group 1, the m + 1th parallel distribution processing in group 2 (parallel in time later). In the distributed processing), the weighting coefficient is updated based on the gradient calculation result (that is, the third to fourth gradients) in the group 2, and in the n + 1th parallel distribution processing in the group 1, the gradient calculation result in the group 2 (to) The weight coefficient is further updated based on the weight coefficient updated based on the above and the gradient calculation result (that is, the first to fourth gradients) in the group 1.

すなわち、本実施形態においては、上記した各グループにおける勾配算出処理の順番（いずれのグループにおける勾配算出処理が早いか）という観点ではなく、いずれのグループにおける勾配算出処理の結果がサーバノード２０において早く受信されるか（サーバノード２０に早く送信されるか）という観点に基づいて重み係数が更新されるようにしても構わない。 That is, in the present embodiment, the result of the gradient calculation process in which group is earlier in the server node 20 than the viewpoint of the order of the gradient calculation processes in each group (which group the gradient calculation process is faster). The weighting coefficient may be updated based on the viewpoint of whether it is received (whether it is transmitted to the server node 20 earlier).

なお、上記したようにＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤにおいて例えばグループ１に属する複数のワーカノード３０は互いに同期して処理を実行するが、当該複数のワーカノード３０の各々の処理性能（に基づく処理速度）の差が大きい場合、グループ１の処理速度は、処理性能の低いワーカノード３０の処理速度の影響を受ける（つまり、処理性能の低いワーカノード３０の処理速度が支配的となる）。グループ２についても同様である。 As described above, in Synchronous-SGD, for example, a plurality of worker nodes 30 belonging to group 1 execute processing in synchronization with each other, but the difference in processing performance (processing speed based on) of each of the plurality of worker nodes 30 is large. In this case, the processing speed of group 1 is affected by the processing speed of the worker node 30 having low processing performance (that is, the processing speed of the worker node 30 having low processing performance becomes dominant). The same applies to group 2.

このため、同一のグループに属する複数のワーカノードの処理速度は同程度となるように構成するものとする。具体的には、グループ１に属する複数のワーカノード３０（代表ノード３０及び非代表ノード）間の処理速度の差は第１閾値以下となるようにし、グループ２に属する複数のワーカノード４０（代表ノード４０及び非代表ノード４０）間の処理速度の差は第２閾値以下となるようにする。なお、第１閾値及び第２閾値は、同じ値であってもよいし、異なる値であってもよい。 Therefore, the processing speeds of a plurality of worker nodes belonging to the same group shall be configured to be about the same. Specifically, the difference in processing speed among the plurality of worker nodes 30 (representative node 30 and non-representative node) belonging to the group 1 is set to be equal to or less than the first threshold value, and the plurality of worker nodes 40 (representative node 40) belonging to the group 2 are set. And the difference in processing speed between the non-representative nodes 40) is set to be equal to or less than the second threshold value. The first threshold value and the second threshold value may have the same value or different values.

また、図１１に示すように、例えばグループ１に属する複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅い場合には、グループ１に属するワーカノード３０の数を、グループ２に属するワーカノード４０の数よりも少なくするようにしてもよい。なお、複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅いとは、複数のワーカノード４０の処理速度の平均値が複数のワーカノード３０の処理速度の平均値よりも遅い場合であってもよいし、または、複数のワーカノード４０の各々の処理速度のうちの最も遅い処理速度が複数のワーカノード３０の処理速度のうちの最も遅い処理速度よりも遅い場合であってもよい。また、各ワーカノードの処理速度は、例えば当該ワーカノードのハードウェア性能等から算出されてもよい。 Further, as shown in FIG. 11, for example, when the processing speed of the plurality of worker nodes 40 belonging to the group 2 is slower than the processing speed of the plurality of worker nodes 30 belonging to the group 1, the number of the worker nodes 30 belonging to the group 1 is calculated. The number may be less than the number of worker nodes 40 belonging to group 2. The processing speed of the plurality of worker nodes 40 belonging to the group 2 is slower than the processing speed of the plurality of worker nodes 30 means that the average value of the processing speeds of the plurality of worker nodes 40 is larger than the average value of the processing speeds of the plurality of worker nodes 30. It may be slow, or even if the slowest processing speed of each of the plurality of worker nodes 40 is slower than the slowest processing speed of the plurality of worker nodes 30. Good. Further, the processing speed of each worker node may be calculated from, for example, the hardware performance of the worker node.

また、グループ１に属する複数のワーカノード３０の処理速度よりもグループ２に属する複数のワーカノード４０の処理速度が遅い場合には、並列分散処理におけるグループ１の処理量（つまり、グループ１に割り当てられる学習データ量）を、グループ２の処理量（つまり、グループ２に割り当てられる学習データ量）よりも少なくする。この場合には、グループ１に属するワーカノード３０の数及びグループ２に属するワーカノード４０の数は同数であってもよい。 Further, when the processing speed of the plurality of worker nodes 40 belonging to the group 2 is slower than the processing speed of the plurality of worker nodes 30 belonging to the group 1, the processing amount of the group 1 in the parallel distributed processing (that is, the learning assigned to the group 1). The amount of data) is made smaller than the amount of processing in group 2 (that is, the amount of learning data allocated to group 2). In this case, the number of worker nodes 30 belonging to group 1 and the number of worker nodes 40 belonging to group 2 may be the same.

すなわち、上記したような構成によれば、各グループに属するワーカノードの数または当該各グループの処理量（負荷）を調整することによって、各グループにおいて必要な処理時間を同程度にする（つまり、処理速度の差の影響を打ち消す）ことが可能となる。 That is, according to the above configuration, by adjusting the number of worker nodes belonging to each group or the processing amount (load) of each group, the processing time required for each group is made the same (that is, processing). It is possible to cancel the effect of the difference in speed).

本実施形態においては、サーバノード２０、複数のワーカノード３０の各々及び複数のワーカノード４０の各々がそれぞれ１つの装置（マシン）によって実現される（つまり、各ノードと装置とが１対１の関係にある）ものとして説明したが、当該各ノードは、１つの装置内で実行される１プロセスまたはスレッドとして実現されていても構わない。すなわち、本実施形態に係るシステム（サーバノード２０、複数のワーカノード３０及び複数のワーカノード４０）は、１つの装置によって実現することも可能である。また、本実施形態に係るシステムは、ノードの数とは異なる数の複数の装置によって実現されることも可能である。 In the present embodiment, the server node 20, each of the plurality of worker nodes 30, and each of the plurality of worker nodes 40 are realized by one device (machine) (that is, each node and the device have a one-to-one relationship). Although described as being), each node may be realized as one process or thread executed in one device. That is, the system according to the present embodiment (server node 20, a plurality of worker nodes 30, and a plurality of worker nodes 40) can also be realized by one device. Further, the system according to the present embodiment can be realized by a plurality of devices having a number different from the number of nodes.

すなわち、本実施形態において、１つのノードは１つのコンピュータ（サーバ）であってもよいし、複数のノードが１つのコンピュータに実装されてもよいし、１つのノードが複数のコンピュータで実装されていてもよい。なお、本実施形態においては、上記したように１つのシステムには２以上のグループがあればよく、１つのグループには２以上のノードがあれよい。 That is, in the present embodiment, one node may be one computer (server), a plurality of nodes may be implemented in one computer, or one node may be implemented in a plurality of computers. You may. In the present embodiment, as described above, one system may have two or more groups, and one group may have two or more nodes.

また、本実施形態においては、バッチサイズ非依存並列方式として非同期型の並列分散学習方式であるＡｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤについて説明したが、例えばＥｌａｓｔｉｃＡｖｅｒａｇｉｎｇＳＧＤ等の他の方式が採用されても構わない。 Further, in the present embodiment, the asynchronous parallel distributed learning method Synchronous-SGD has been described as the batch size-independent parallel method, but other methods such as the Elastic Averaging SGD may be adopted.

また、本実施形態においては、第一階層及び第２階層において異なる方式（アルゴリズム）の並列分散学習処理が実行されるものとして説明したが、組み合わせるアルゴリズムによっては例えば３つ以上の階層で並列分散学習処理が実行される構成としてもよい。 Further, in the present embodiment, it has been described that parallel distributed learning processing of different methods (algorithms) is executed in the first layer and the second layer, but depending on the algorithm to be combined, for example, parallel distributed learning is performed in three or more layers. It may be configured in which processing is executed.

（第２の実施形態）
次に、第２の実施形態について説明する。なお、本実施形態に係るシステムは、前述した第１の実施形態と同様に、階層的に異なる学習処理を実行可能な構成を有する。 (Second Embodiment)
Next, the second embodiment will be described. The system according to the present embodiment has a configuration capable of executing hierarchically different learning processes as in the first embodiment described above.

すなわち、第一階層においては複数のワーカノードが属する各グループ内でＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによるミニバッチ法の並列分散処理が行われ、第二階層においては各グループの代表ノード同士でバッチサイズ非依存並列分散処理（非同期型の並列分散方式）が行われる。 That is, in the first layer, parallel distribution processing of the mini-batch method by Asynchronous-SGD is performed in each group to which a plurality of worker nodes belong, and in the second layer, batch size-independent parallel distribution processing is performed between the representative nodes of each group. Asynchronous parallel distribution method) is performed.

図１２は、本実施形態に係るシステム（以下、本システムと表記）の構成の一例を示す。図１２に示すように、本システム１０は、複数のワーカノード３０及び複数のワーカノード４０を備える。 FIG. 12 shows an example of the configuration of the system according to the present embodiment (hereinafter referred to as the system). As shown in FIG. 12, the system 10 includes a plurality of worker nodes 30 and a plurality of worker nodes 40.

前述した第１の実施形態においてはサーバノード２０を備える構成であるが、本実施形態は、当該サーバノード２０を備えない点で、前述した第１の実施形態とは異なる。なお、複数のワーカノード３０がグループ１に属しており、複数のワーカノード４０がグループ２に属している点については、前述した第１の実施形態と同様である。 In the first embodiment described above, the server node 20 is provided, but this embodiment is different from the first embodiment described above in that the server node 20 is not provided. The point that the plurality of worker nodes 30 belong to the group 1 and the plurality of worker nodes 40 belong to the group 2 is the same as that of the first embodiment described above.

グループ１に属する複数のワーカノード３０のうちの１つのワーカノード（以下、グループ１の代表ノードと表記）３０は、他のグループ２に属する複数のワーカノード４０のうちの１つのワーカノード（以下、グループ２の代表ノードと表記）４０と通信可能に接続される。 One worker node (hereinafter, referred to as a representative node of group 1) 30 among a plurality of worker nodes 30 belonging to group 1 is one worker node (hereinafter, referred to as a representative node of group 2) among a plurality of worker nodes 40 belonging to another group 2. It is connected to the representative node) 40 so that it can communicate with it.

なお、複数のワーカノード３０のうち、グループ１の代表ノード３０以外のワーカノード３０は、グループ１の非代表ノード３０と称する。同様に、複数のワーカノード４０のうち、グループ２の代表ノード４０以外のワーカノード４０は、グループ２の非代表ノード４０と称する。 Of the plurality of worker nodes 30, the worker nodes 30 other than the representative node 30 of the group 1 are referred to as the non-representative nodes 30 of the group 1. Similarly, among the plurality of worker nodes 40, the worker nodes 40 other than the representative node 40 of the group 2 are referred to as the non-representative nodes 40 of the group 2.

本実施形態において、第一階層では、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）内でそれぞれＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理が実行される。また、第二階層では、グループ１の代表ノード３０及びグループ２の代表ノード４０同士でバッチサイズ非依存並列分散方式による並列分散学習処理が実行される。 In the first embodiment, in the first layer, parallel distributed learning processing by S $ Synchronous-SGD is executed in group 1 (plurality of worker nodes 30) and group 2 (plurality of worker nodes 40), respectively. Further, in the second layer, the parallel distributed learning process by the batch size independent parallel distributed method is executed between the representative node 30 of the group 1 and the representative node 40 of the group 2.

なお、図１２においてはグループ１及びグループ２にそれぞれ３つのワーカノードが属している例が示されているが、グループ１及びグループ２には２つ以上のワーカノードが属していればよい。また、図１２においては２つのグループのみが示されているが、本システムにおいては、３つ以上のグループを備えていてもよい。 Although FIG. 12 shows an example in which three worker nodes belong to each of group 1 and group 2, it is sufficient that two or more worker nodes belong to group 1 and group 2. Further, although only two groups are shown in FIG. 12, the system may include three or more groups.

複数のワーカノード３０及び４０のシステム構成については、前述した第１の実施形態と同様であるため、ここでは詳しい説明を省略する。 Since the system configurations of the plurality of worker nodes 30 and 40 are the same as those in the first embodiment described above, detailed description thereof will be omitted here.

以下、複数のワーカノード３０及び４０のうち、グループ１の代表ノード３０の機能構成の一例について説明する。なお、本実施形態におけるグループ１の代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは前述した第１の実施形態におけるグループ１の代表ノード３０と異なる部分について主に述べる。 Hereinafter, an example of the functional configuration of the representative node 30 of the group 1 among the plurality of worker nodes 30 and 40 will be described. The functional configuration of the representative node 30 of the group 1 in the present embodiment will be described with reference to FIG. 6 for convenience, but here, the parts different from the representative node 30 of the group 1 in the first embodiment described above will be mainly described. Described in.

図６に示すように、代表ノード３０は、受信制御部３１、学習データ格納部３２、重み係数格納部３３、算出部３４及び送信制御部３５を含む。 As shown in FIG. 6, the representative node 30 includes a reception control unit 31, a learning data storage unit 32, a weighting coefficient storage unit 33, a calculation unit 34, and a transmission control unit 35.

受信制御部３１は、グループ１の非代表ノード３０において算出された勾配を、当該非代表ノード３０から受信する。 The reception control unit 31 receives the gradient calculated in the non-representative node 30 of the group 1 from the non-representative node 30.

学習データ格納部３２には、グループ１の代表ノード３０に割り当てられた学習データが格納される。重み係数格納部３３には、目的関数の重み係数が格納されている。 The learning data storage unit 32 stores the learning data assigned to the representative node 30 of the group 1. The weighting coefficient of the objective function is stored in the weighting coefficient storage unit 33.

算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数を用いて、目的関数の重み係数を更新するための勾配を算出する。 The calculation unit 34 calculates a gradient for updating the weighting coefficient of the objective function by using the learning data stored in the learning data storage unit 32 and the weighting coefficient stored in the weighting coefficient storage unit 33.

算出部３４は、受信制御部３１によって受信された勾配（つまり、非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配と、重み係数格納部３３に格納されている重み係数とを用いて、当該重み係数を更新する。この場合、算出部３４は、上述した式（１）に基づいて更新後の重み係数を算出する。算出部３４によって算出された重み係数は、重み係数格納部３３に格納されている重み係数と置換される。これにより、グループ１の代表ノード３０の重み係数が更新される。 The calculation unit 34 includes a gradient received by the reception control unit 31 (that is, a gradient calculated by the non-representative node 30), a gradient calculated by the calculation unit 34, and a weight coefficient stored in the weight coefficient storage unit 33. The weighting factor is updated using and. In this case, the calculation unit 34 calculates the updated weighting coefficient based on the above equation (1). The weighting coefficient calculated by the calculating unit 34 is replaced with the weighting coefficient stored in the weighting coefficient storage unit 33. As a result, the weighting coefficient of the representative node 30 of the group 1 is updated.

送信制御部３５は、算出部３４によって算出された勾配を、グループ１の非代表ノード３０に送信する。 The transmission control unit 35 transmits the gradient calculated by the calculation unit 34 to the non-representative node 30 of the group 1.

また、送信制御部３５は、グループ１の非代表ノード３０において算出された勾配及び算出部３４によって算出された勾配（つまり、代表ノード３０において算出された勾配）を、他のグループ（例えば、グループ２）の代表ノードに送信する。 Further, the transmission control unit 35 uses the gradient calculated by the non-representative node 30 of the group 1 and the gradient calculated by the calculation unit 34 (that is, the gradient calculated by the representative node 30) to another group (for example, a group). Send to the representative node of 2).

ここで、グループ１の非代表ノード３０において算出された勾配及びグループ１の代表ノード３０において算出された勾配は上記したようにグループ２の代表ノード４０に送信されるが、同様に、グループ２の非代表ノード４０において算出された勾配及びグループ２の代表ノード４０において算出された勾配はグループ１の代表ノード３０に送信される。 Here, the gradient calculated at the non-representative node 30 of the group 1 and the gradient calculated at the representative node 30 of the group 1 are transmitted to the representative node 40 of the group 2 as described above. The gradient calculated at the non-representative node 40 and the gradient calculated at the representative node 40 of the group 2 are transmitted to the representative node 30 of the group 1.

グループ２の非代表ノード４０において算出された勾配及び代表ノード４０において算出された勾配がグループ１の代表ノード３０（受信制御部３１）において受信された場合、算出部３４は当該勾配を用いて重み係数格納部３３に格納されている重み係数を更新し、送信制御部３５は当該勾配をグループ１の非代表ノード３０に送信する。 When the gradient calculated by the non-representative node 40 of the group 2 and the gradient calculated by the representative node 40 are received by the representative node 30 (reception control unit 31) of the group 1, the calculation unit 34 weights using the gradient. The weighting coefficient stored in the coefficient storage unit 33 is updated, and the transmission control unit 35 transmits the gradient to the non-representative node 30 of the group 1.

次に、グループ１の非代表ノード３０の機能構成の一例について説明する。なお、本実施形態におけるグループ１の非代表ノード３０の機能構成については便宜的に図６を用いて説明するが、ここでは前述した第１の実施形態におけるグループ１の非代表ノード３０と異なる部分について主に述べる。 Next, an example of the functional configuration of the non-representative node 30 of the group 1 will be described. The functional configuration of the non-representative node 30 of the group 1 in the present embodiment will be described with reference to FIG. 6 for convenience, but here, a portion different from the non-representative node 30 of the group 1 in the first embodiment described above. Will be mainly described.

受信制御部３１は、グループ１の代表ノード３０において算出された勾配及び他の非代表ノード３０において算出された勾配を、当該代表ノード３０及び当該非代表ノード３０の各々から受信する。 The reception control unit 31 receives the gradient calculated in the representative node 30 of the group 1 and the gradient calculated in the other non-representative node 30 from each of the representative node 30 and the non-representative node 30.

学習データ格納部３２には、当該非代表ノード３０に割り当てられた学習データが格納される。重み係数格納部３３には、目的関数の重み係数が格納されている。 The learning data storage unit 32 stores the learning data assigned to the non-representative node 30. The weighting coefficient of the objective function is stored in the weighting coefficient storage unit 33.

算出部３４は、受信制御部３１によって受信された勾配（つまり、代表ノード３０において算出された勾配及び他の非代表ノード３０において算出された勾配）及び算出部３４によって算出された勾配と、重み係数格納部３３に格納されている重み係数とを用いて、当該重み係数を更新する。この場合、算出部３４は、上述した式（１）に基づいて更新後の重み係数を算出する。算出部３４によって算出された重み係数は、重み係数格納部３３に格納されている重み係数と置換される。これにより、グループ１の非代表ノード３０の重み係数が更新される。 The calculation unit 34 includes the gradient received by the reception control unit 31 (that is, the gradient calculated in the representative node 30 and the gradient calculated in the other non-representative node 30), the gradient calculated by the calculation unit 34, and the weight. The weight coefficient is updated by using the weight coefficient stored in the coefficient storage unit 33. In this case, the calculation unit 34 calculates the updated weighting coefficient based on the above equation (1). The weighting coefficient calculated by the calculating unit 34 is replaced with the weighting coefficient stored in the weighting coefficient storage unit 33. As a result, the weighting coefficient of the non-representative node 30 of the group 1 is updated.

なお、上記したようにグループ１の代表ノード３０に含まれる送信制御部３５によってグループ２の非代表ノード４０において算出された勾配及びグループ２の代表ノード４０において算出された勾配が送信された場合、当該勾配は、受信制御部３１によって受信され、重み係数の更新に用いられる。 When the gradient calculated by the non-representative node 40 of the group 2 and the gradient calculated by the representative node 40 of the group 2 are transmitted by the transmission control unit 35 included in the representative node 30 of the group 1 as described above. The gradient is received by the reception control unit 31 and used to update the weighting coefficient.

以下、図１３のシーケンスチャートを参照して、本システムの処理手順の一例について説明する。ここでは、グループ１（複数のワーカノード３０）及びグループ２（複数のワーカノード４０）間の処理について主に説明し、各グループ（グループ１及びグループ２）内の各ワーカノードの処理について後述する。 Hereinafter, an example of the processing procedure of this system will be described with reference to the sequence chart of FIG. Here, the processing between the group 1 (plurality of worker nodes 30) and the group 2 (plurality of worker nodes 40) will be mainly described, and the processing of each worker node in each group (group 1 and group 2) will be described later.

ここでは、グループ１に属する複数のワーカノード３０の各々に含まれる重み係数格納部３３には例えば重み係数Ｗ１０が格納されており、グループ２に属する複数のワーカノード４０の各々に含まれる重み係数格納部３３には例えば重み係数Ｗ２０が格納されているものとする。なお、重み係数Ｗ１０及び重み係数Ｗ２０は、同一の値であってもよいし、異なる値であってもよい。 Here, for example, the weighting coefficient W10 is stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 30 belonging to the group 1, and the weighting coefficient storage unit included in each of the plurality of worker nodes 40 belonging to the group 2 is stored. It is assumed that the weighting coefficient W20 is stored in 33, for example. The weighting coefficient W10 and the weighting coefficient W20 may have the same value or different values.

まず、グループ１（に属する複数のワーカノード３０）においては、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる勾配算出処理が行われる（ステップＳ４１）。この勾配算出処理によれば、グループ１に属する複数のワーカノード３０の各々は、当該ワーカノード３０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ１０を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ１に属する複数のワーカノード３０の各々は、互いに同期して勾配算出処理を実行する。 First, in group 1 (a plurality of worker nodes 30 belonging to the group 1), a gradient calculation process by Synchronous-SGD is performed (step S41). According to this gradient calculation process, each of the plurality of worker nodes 30 belonging to the group 1 has the learning data stored in the learning data storage unit 32 included in the worker node 30 and the weights stored in the weight coefficient storage unit 33. Using the coefficient W10, the gradient for updating the weighting coefficient of the objective function is calculated. Each of the plurality of worker nodes 30 belonging to the group 1 executes the gradient calculation process in synchronization with each other.

複数のワーカノード３０の各々は、ステップＳ４１において算出された勾配及び当該ワーカノード３０に含まれる重み係数格納部３３に格納されている重み係数Ｗ１０を用いて新たな重み係数（以下、重み係数Ｗ１１と表記）を算出する。これにより、複数のワーカノード３０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ１０は、算出された重み係数Ｗ１１に更新される（ステップＳ４２）。 Each of the plurality of worker nodes 30 uses a new weight coefficient (hereinafter referred to as a weight coefficient W11) using the gradient calculated in step S41 and the weight coefficient W10 stored in the weight coefficient storage unit 33 included in the worker node 30. ) Is calculated. As a result, the weighting coefficient W10 stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 30 is updated to the calculated weighting coefficient W11 (step S42).

ここで、ステップＳ４２の処理が実行されると、上記したステップＳ４１において算出された勾配がグループ１の代表ノード３０からグループ２の代表ノード４０に送信される（ステップＳ４３）。 Here, when the process of step S42 is executed, the gradient calculated in step S41 described above is transmitted from the representative node 30 of group 1 to the representative node 40 of group 2 (step S43).

グループ２の代表ノード４０は、ステップＳ４３において送信された勾配を受信する。このように受信された勾配は、グループ２に属する複数のワーカノード４０において共有される。これにより、複数のワーカノード４０の各々は、グループ２の代表ノード４０において受信された勾配及び当該ワーカノード４０に含まれる重み係数格納部３３に格納されている重み係数Ｗ２０を用いて新たな重み係数（以下、重み係数Ｗ２１と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ２０は、算出された重み係数Ｗ２１に更新される（ステップＳ４４）。 The representative node 40 of group 2 receives the gradient transmitted in step S43. The gradient received in this way is shared by a plurality of worker nodes 40 belonging to the group 2. As a result, each of the plurality of worker nodes 40 uses the gradient received by the representative node 40 of the group 2 and the weighting coefficient W20 stored in the weighting coefficient storage unit 33 included in the worker node 40 to obtain a new weighting coefficient ( Hereinafter, the weighting coefficient W21) is calculated. As a result, the weighting coefficient W20 stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting coefficient W21 (step S44).

また、グループ２（に属する複数のワーカノード４０）においては、グループ１と同様に、Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる勾配算出処理が行われる（ステップＳ４５）。この勾配算出処理によれば、グループ２に属する複数のワーカノード４０の各々は、当該ワーカノード４０に含まれる学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数Ｗ２１を用いて、目的関数の重み係数を更新するための勾配を算出する。なお、グループ２に属する複数のワーカノード４０の各々は、互いに同期して勾配算出処理を実行する。 Further, in the group 2 (a plurality of worker nodes 40 belonging to the group), the gradient calculation process by the Synchronous-SGD is performed as in the group 1 (step S45). According to this gradient calculation process, each of the plurality of worker nodes 40 belonging to the group 2 has the learning data stored in the learning data storage unit 32 included in the worker node 40 and the weights stored in the weight coefficient storage unit 33. Using the coefficient W21, the gradient for updating the weighting coefficient of the objective function is calculated. Each of the plurality of worker nodes 40 belonging to the group 2 executes the gradient calculation process in synchronization with each other.

複数のワーカノード４０の各々は、ステップＳ４５において算出された勾配及び当該ワーカノード４０に含まれる重み係数格納部３３に格納されている重み係数２１を用いて新たな重み係数（以下、重み係数Ｗ２２と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数Ｗ２１は、算出された重み係数Ｗ２２に更新される（ステップＳ４６）。 Each of the plurality of worker nodes 40 uses a new weight coefficient (hereinafter referred to as a weight coefficient W22) using the gradient calculated in step S45 and the weight coefficient 21 stored in the weight coefficient storage unit 33 included in the worker node 40. ) Is calculated. As a result, the weighting coefficient W21 stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting coefficient W22 (step S46).

ここで、ステップＳ４６の処理が実行されると、上記したステップＳ４５において算出された勾配がグループ２の代表ノード４０からグループ１の代表ノード３０に送信される（ステップＳ４７）。 Here, when the process of step S46 is executed, the gradient calculated in step S45 described above is transmitted from the representative node 40 of group 2 to the representative node 30 of group 1 (step S47).

グループ１の代表ノード３０は、ステップＳ４７において送信された勾配を受信する。このように受信された勾配は、グループ１に属する複数のワーカノード３０において共有される。これにより、複数のワーカノード３０の各々は、代表ノード３０において受信された勾配及び当該ワーカノード３０に含まれる重み係数格納部３３に格納されている重み係数Ｗ１１を用いて新たな重み係数（以下、重み係数Ｗ１２と表記）を算出する。これにより、複数のワーカノード４０の各々に含まれる重み係数格納部３３に格納されている重み係数１１は、算出された重み係数Ｗ１２に更新される（ステップＳ４８）。 The representative node 30 of group 1 receives the gradient transmitted in step S47. The gradient received in this way is shared by a plurality of worker nodes 30 belonging to group 1. As a result, each of the plurality of worker nodes 30 uses a new weight coefficient (hereinafter, weight) using the gradient received at the representative node 30 and the weight coefficient W11 stored in the weight coefficient storage unit 33 included in the worker node 30. The coefficient W12) is calculated. As a result, the weighting coefficient 11 stored in the weighting coefficient storage unit 33 included in each of the plurality of worker nodes 40 is updated to the calculated weighting coefficient W12 (step S48).

図１３においてはステップＳ４１〜Ｓ４８の処理について説明したが、当該図１３の処理は、複数のワーカノード３０及び４０の各々に含まれる学習データ格納部３２に格納されている学習データの全てについて勾配算出処理（つまり、並列分散学習処理）が実行されるまで継続して実行される。 Although the processing of steps S41 to S48 has been described in FIG. 13, the processing of FIG. 13 calculates the gradient for all the learning data stored in the learning data storage unit 32 included in each of the plurality of worker nodes 30 and 40. It is continuously executed until the process (that is, the parallel distributed learning process) is executed.

上記したように本実施形態によれば、グループ１及びグループ２内では互いに同期して処理が実行されるが、グループ１の処理とグループ２の処理とは非同期に実行される。 As described above, according to the present embodiment, the processes are executed synchronously with each other in the group 1 and the group 2, but the processes of the group 1 and the processes of the group 2 are executed asynchronously.

すなわち、図１３に示すステップＳ４３においてはステップＳ４１において算出された勾配がグループ１の代表ノード３０からグループ２の代表ノード４０に送信されるが、当該勾配の送信タイミングは、例えばステップＳ４１及びＳ４２の処理後であればよく、グループ２（に属する複数のワーカノード４０）の処理によって影響されない。同様に、図１３に示すステップＳ４７における勾配の送信タイミングは、例えばステップＳ４５及びＳ４６の処理後であればよく、グループ１（に属する複数のワーカノード３０）の処理によって影響されない。 That is, in step S43 shown in FIG. 13, the gradient calculated in step S41 is transmitted from the representative node 30 of group 1 to the representative node 40 of group 2, and the transmission timing of the gradient is, for example, in steps S41 and S42. It may be after processing and is not affected by the processing of group 2 (plural worker nodes 40 belonging to). Similarly, the gradient transmission timing in step S47 shown in FIG. 13 may be, for example, after the processing of steps S45 and S46, and is not affected by the processing of group 1 (plural worker nodes 30 belonging to).

以下、上記した図１３に示す処理が実行される際の、各グループの代表ノード及び非代表ノードの処理について説明する。 Hereinafter, the processing of the representative node and the non-representative node of each group when the processing shown in FIG. 13 is executed will be described.

まず、図１４のフローチャートを参照して、代表ノードの処理手順の一例について説明する。ここでは、グループ１の代表ノード３０の処理手順について説明する。 First, an example of the processing procedure of the representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the representative node 30 of the group 1 will be described.

グループ１の代表ノード３０に含まれる算出部３４は、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ１１）を用いて勾配を算出する（ステップＳ５１）。以下、代表ノード３０において算出された勾配を代表ノード３０の勾配と称する。 The calculation unit 34 included in the representative node 30 of the group 1 uses the learning data stored in the learning data storage unit 32 and the weighting coefficient stored in the weighting coefficient storage unit 33 (for example, the weighting coefficient W11) to perform a gradient. Is calculated (step S51). Hereinafter, the gradient calculated in the representative node 30 will be referred to as a gradient of the representative node 30.

ここで、グループ１の代表ノード３０がステップＳ５１の処理を実行する場合、当該グループ１の非代表ノード３０においては、後述するように当該代表ノード３０と同期して勾配が算出される。以下、このように非代表ノード３０において算出された勾配を非代表ノード３０の勾配と称する。 Here, when the representative node 30 of the group 1 executes the process of step S51, the non-representative node 30 of the group 1 calculates the gradient in synchronization with the representative node 30 as described later. Hereinafter, the gradient calculated in the non-representative node 30 in this way will be referred to as a gradient of the non-representative node 30.

この場合、グループ１の代表ノード３０の勾配及び非代表ノード３０の勾配がグループ１内で配布される（ステップＳ５２）。すなわち、グループ１の代表ノード３０の勾配が当該代表ノード３０からグループ１の非代表ノード３０（の各々）に送信されるとともに、グループ１の非代表ノード３０（の各々）の勾配がグループ１の代表ノード３０（に含まれる受信制御部３１）において受信される。 In this case, the gradient of the representative node 30 of the group 1 and the gradient of the non-representative node 30 are distributed within the group 1 (step S52). That is, the gradient of the representative node 30 of the group 1 is transmitted from the representative node 30 to (each of) the non-representative node 30 of the group 1, and the gradient of the non-representative node 30 (each of) of the group 1 is the group 1. It is received by the representative node 30 (reception control unit 31 included in the representative node 30).

次に、算出部３４は、ステップＳ５１において算出された勾配（代表ノード３０の勾配）及び受信制御部３１によって受信された勾配（非代表ノード３０の勾配）の平均値を算出する（ステップＳ５３）。以下、ステップＳ５３において算出された勾配の平均値をグループ１の平均勾配と称する。 Next, the calculation unit 34 calculates the average value of the gradient calculated in step S51 (gradient of the representative node 30) and the gradient received by the reception control unit 31 (gradient of the non-representative node 30) (step S53). .. Hereinafter, the average value of the gradient calculated in step S53 will be referred to as the average gradient of Group 1.

ステップＳ５３の処理が実行されると、算出部３４は、グループ１の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１１）に更新する（ステップＳ５４）。これにより、グループ１の代表ノード３０の重み係数がグループ１内のワーカノード３０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S53 is executed, the calculation unit 34 calculates a new weight coefficient using the average gradient of the group 1, and the weight coefficient stored in the weight coefficient storage unit 33 is the calculated weight coefficient. (For example, the weighting coefficient W11) is updated (step S54). As a result, the weighting coefficient of the representative node 30 of the group 1 is updated to the weighting coefficient based on the gradient calculated by each of the worker nodes 30 in the group 1.

ステップＳ５４の処理が実行されると、送信制御部３５は、グループ１の平均勾配をグループ２の代表ノード４０に送信する（ステップＳ５５）。 When the process of step S54 is executed, the transmission control unit 35 transmits the average gradient of group 1 to the representative node 40 of group 2 (step S55).

上記したステップＳ５１〜Ｓ５５の処理は、図１３に示すステップＳ４１〜Ｓ４３においてグループ１の代表ノード３０によって実行される。 The processing of steps S51 to S55 described above is executed by the representative node 30 of the group 1 in steps S41 to S43 shown in FIG.

なお、後述するように図１４に示す処理は、グループ２の代表ノード４０においても同様に実行される。このため、例えばグループ２の代表ノード４０においてステップＳ５５の処理に相当する処理が実行された場合には、グループ１の代表ノード３０に含まれる受信制御部３１は、グループ２の平均勾配を受信することができる。 As will be described later, the process shown in FIG. 14 is similarly executed at the representative node 40 of the group 2. Therefore, for example, when the process corresponding to the process of step S55 is executed at the representative node 40 of the group 2, the reception control unit 31 included in the representative node 30 of the group 1 receives the average gradient of the group 2. be able to.

ここで、受信制御部３１がグループ２の平均勾配を受信したか否かが判定される（ステップＳ５６）。 Here, it is determined whether or not the reception control unit 31 has received the average gradient of the group 2 (step S56).

グループ２の平均勾配を受信したと判定された場合（ステップＳ５６のＹＥＳ）、送信制御部３５は、受信フラグ「Ｔｒｕｅ」をグループ１の非代表ノード３０に送信する（ステップＳ５７）。 When it is determined that the average gradient of the group 2 has been received (YES in step S56), the transmission control unit 35 transmits the reception flag “True” to the non-representative node 30 of the group 1 (step S57).

また、送信制御部３５は、受信制御部３１によって受信されたグループ２の平均勾配を、グループ１の非代表ノード３０に送信する（ステップＳ５８）。 Further, the transmission control unit 35 transmits the average gradient of the group 2 received by the reception control unit 31 to the non-representative node 30 of the group 1 (step S58).

ステップＳ５８の処理が実行されると、算出部３４は、グループ２の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１２）に更新する（ステップＳ５９）。これにより、グループ１の代表ノード３０の重み係数がグループ２内のワーカノード４０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S58 is executed, the calculation unit 34 calculates a new weight coefficient using the average gradient of the group 2, and the weight coefficient stored in the weight coefficient storage unit 33 is the calculated weight coefficient. (For example, the weighting coefficient W12) is updated (step S59). As a result, the weighting coefficient of the representative node 30 of the group 1 is updated to the weighting coefficient based on the gradient calculated by each of the worker nodes 40 in the group 2.

上記したステップＳ５６〜Ｓ５９の処理は、図１３に示すステップＳ４８においてグループ１の代表ノード３０によって実行される。 The processes of steps S56 to S59 described above are executed by the representative node 30 of the group 1 in step S48 shown in FIG.

なお、ステップＳ５６においてグループ２の平均勾配を受信していないと判定された場合（ステップＳ５６のＮＯ）、送信制御部３５は、受信フラグ「Ｆａｌｓｅ」を非代表ノード３０に送信する（ステップＳ６０）。 If it is determined in step S56 that the average gradient of group 2 has not been received (NO in step S56), the transmission control unit 35 transmits the reception flag "False" to the non-representative node 30 (step S60). ..

上記した図１４に示す処理が実行されることにより、グループ１の代表ノード３０の重み係数は、グループ１に属する複数のワーカノード３０の各々によって算出された勾配（グループ１の平均勾配）を用いて更新されるとともに、グループ２に属する複数のワーカノード４０の各々によって算出された勾配（グループ２の平均勾配）を用いて更に更新される。 By executing the process shown in FIG. 14 described above, the weighting coefficient of the representative node 30 of the group 1 uses the gradient calculated by each of the plurality of worker nodes 30 belonging to the group 1 (average gradient of the group 1). At the same time as being updated, it is further updated using the gradient calculated by each of the plurality of worker nodes 40 belonging to group 2 (average gradient of group 2).

なお、図示されていないが、図１４に示す処理は、上記した図１３に示す処理が継続して実行されている間は繰り返し実行される。 Although not shown, the process shown in FIG. 14 is repeatedly executed while the process shown in FIG. 13 described above is continuously executed.

次に、図１５のフローチャートを参照して、非代表ノードの処理手順の一例について説明する。ここでは、グループ１の非代表ノード３０の処理手順について説明する。 Next, an example of the processing procedure of the non-representative node will be described with reference to the flowchart of FIG. Here, the processing procedure of the non-representative node 30 of the group 1 will be described.

非代表ノード３０に含まれる算出部３４は、上記した代表ノード３０における勾配の算出処理と同期して、学習データ格納部３２に格納されている学習データ及び重み係数格納部３３に格納されている重み係数（例えば、重み係数Ｗ１０）を用いて勾配を算出する（ステップＳ７１）。 The calculation unit 34 included in the non-representative node 30 is stored in the learning data and weight coefficient storage unit 33 stored in the learning data storage unit 32 in synchronization with the gradient calculation process in the representative node 30 described above. The gradient is calculated using the weighting factor (eg, weighting factor W10) (step S71).

この場合、上記した代表ノード３０の勾配及びステップＳ７１において算出された勾配（非代表ノード３０の勾配）がグループ１内で配布される（ステップＳ７２）。すなわち、グループ１の非代表ノード３０の勾配が当該非代表ノード３０からグループ１の代表ノード３０（及び他の非代表ノード３０）に送信されるとともに、当該代表ノード３０（及び他の非代表ノード３０）の勾配が非代表ノード３０（に含まれる受信制御部３１）において受信される。 In this case, the gradient of the representative node 30 and the gradient calculated in step S71 (gradient of the non-representative node 30) are distributed within the group 1 (step S72). That is, the gradient of the non-representative node 30 of the group 1 is transmitted from the non-representative node 30 to the representative node 30 (and other non-representative nodes 30) of the group 1, and the representative node 30 (and other non-representative nodes 30) is transmitted. The gradient of 30) is received by the non-representative node 30 (included in the reception control unit 31).

次に、算出部３４は、ステップＳ７１において算出された勾配（非代表ノード３０の勾配）及び受信制御部３１によって受信された勾配（代表ノード３０及び他の非代表ノード３０の勾配）の平均値を算出する（ステップＳ７３）。なお、このステップＳ７３において算出される勾配の平均値は、上記した図１４に示すステップＳ５３において算出されたグループ１の平均勾配に相当する。 Next, the calculation unit 34 averages the gradient calculated in step S71 (gradient of the non-representative node 30) and the gradient received by the reception control unit 31 (gradient of the representative node 30 and other non-representative nodes 30). Is calculated (step S73). The average value of the gradient calculated in step S73 corresponds to the average gradient of group 1 calculated in step S53 shown in FIG. 14 described above.

ステップＳ７３の処理が実行されると、算出部３４は、グループ１の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１１）に更新する（ステップＳ７４）。これにより、グループ１の非代表ノード３０の重み係数がグループ１内のワーカノード３０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S73 is executed, the calculation unit 34 calculates a new weight coefficient using the average gradient of the group 1, and the weight coefficient stored in the weight coefficient storage unit 33 is the calculated weight coefficient. (For example, the weighting coefficient W11) is updated (step S74). As a result, the weighting coefficient of the non-representative node 30 of the group 1 is updated to the weighting coefficient based on the gradient calculated by each of the worker nodes 30 in the group 1.

上記したステップＳ７１〜Ｓ７４の処理は、図１３に示すステップＳ４１及びＳ４２においてグループ１の非代表ノード３０によって実行される処理である。 The processes of steps S71 to S74 described above are processes executed by the non-representative node 30 of the group 1 in steps S41 and S42 shown in FIG.

ここで、上記した図１４に示すステップＳ５７またはステップＳ６０において代表ノード３０から送信された受信フラグは、非代表ノード３０に含まれる受信制御部３１によって受信される。 Here, the reception flag transmitted from the representative node 30 in step S57 or step S60 shown in FIG. 14 is received by the reception control unit 31 included in the non-representative node 30.

この場合、受信制御部３１によって受信フラグ「Ｔｒｕｅ」が受信されたか否かが判定される（ステップＳ７６）。 In this case, the reception control unit 31 determines whether or not the reception flag "True" has been received (step S76).

受信フラグ「Ｔｒｕｅ」が受信されたと判定された場合（ステップＳ７６のＹＥＳ）、受信制御部３１は、図１４に示すステップＳ５８においてグループ１の代表ノード３０から送信されたグループ１の平均勾配を受信する（ステップＳ７７）。 When it is determined that the reception flag "True" has been received (YES in step S76), the reception control unit 31 receives the average gradient of group 1 transmitted from the representative node 30 of group 1 in step S58 shown in FIG. (Step S77).

ステップＳ７７の処理が実行されると、算出部３４は、グループ２の平均勾配を用いて新たな重み係数を算出し、重み係数格納部３３に格納されている重み係数を当該算出された重み係数（例えば、重み係数Ｗ１２）に更新する（ステップＳ７８）。これにより、グループ１の非代表ノード３０の重み係数がグループ２内のワーカノード４０の各々によって算出された勾配に基づく重み係数に更新される。 When the process of step S77 is executed, the calculation unit 34 calculates a new weight coefficient using the average gradient of the group 2, and the weight coefficient stored in the weight coefficient storage unit 33 is the calculated weight coefficient. (For example, the weighting coefficient W12) is updated (step S78). As a result, the weighting coefficient of the non-representative node 30 of the group 1 is updated to the weighting coefficient based on the gradient calculated by each of the worker nodes 40 in the group 2.

上記したステップＳ７５〜Ｓ７８の処理は、図１３に示すステップＳ４８においてグループ１の非代表ノード３０によって実行される。 The processes of steps S75 to S78 described above are executed by the non-representative node 30 of group 1 in step S48 shown in FIG.

なお、ステップＳ７６において受信フラグ「Ｔｒｕｅ」が受信されていない（つまり、受信フラグ「Ｆａｌｓｅ」が受信されている）と判定された場合（ステップＳ７６のＮＯ）、グループ２の平均勾配が受信されないため、ステップＳ７７及びＳ７８の処理は実行されない。 If it is determined in step S76 that the reception flag "True" has not been received (that is, the reception flag "False" has been received) (NO in step S76), the average gradient of group 2 is not received. , Steps S77 and S78 are not executed.

上記した図１５の処理が実行されることにより、グループ１の非代表ノード３０の重み係数は、グループ１に属する複数のワーカノード３０の各々によって算出された勾配（グループ１の平均勾配）を用いて更新されるとともに、グループ２に属する複数のワーカノード４０の各々によって算出された勾配（グループ２の平均勾配）を用いて更に更新される。 By executing the process of FIG. 15 described above, the weighting coefficient of the non-representative node 30 of the group 1 uses the gradient calculated by each of the plurality of worker nodes 30 belonging to the group 1 (average gradient of the group 1). At the same time as being updated, it is further updated using the gradient calculated by each of the plurality of worker nodes 40 belonging to group 2 (average gradient of group 2).

なお、図示されていないが、図１５に示す処理は、図１３に示す処理が継続して実行されている間は繰り返し実行される。 Although not shown, the process shown in FIG. 15 is repeatedly executed while the process shown in FIG. 13 is continuously executed.

上記したようにグループ１においては、当該グループ１に属する全てのワーカノード３０間で勾配を共有し、当該各ワーカノード３０においてグループ１の平均勾配を算出する処理が実行される。この場合、ＭＰＩで定義されるＡｌｌｒｅｄｕｃｅと称される集団通信アルゴリズム（ＭＰＩ＿Ａｌｌｒｅｄｕｃｅ）を用いることで、当該各ワーカノード３０間での勾配の送信と平均勾配（全ワーカノード３０の勾配和）の算出処理を効率的に実行することが可能である。ここでは、ＭＰＩ＿Ａｌｌｒｅｄｕｃｅを用いる場合について説明したが、当該ＭＰＩ＿Ａｌｌｒｅｄｕｃｅと同程度の他の処理が実行されてもよい。 As described above, in the group 1, the gradient is shared among all the worker nodes 30 belonging to the group 1, and the process of calculating the average gradient of the group 1 is executed in each worker node 30. In this case, by using a collective communication algorithm (MPI_Allreduce) called Allreduce defined by MPI, it is possible to efficiently transmit the gradient between the worker nodes 30 and calculate the average gradient (sum of gradients of all worker nodes 30). It is possible to execute it as a target. Here, the case where MPI_Allreduce is used has been described, but other processing similar to that of MPI_Allreduce may be executed.

ここでは、グループ１の代表ノード３０及び非代表ノード３０の処理について説明したが、グループ２の代表ノード４０及び非代表ノード４０においてもグループ１の代表ノード３０及び非代表ノード３０と同様の処理が実行される。 Here, the processing of the representative node 30 and the non-representative node 30 of the group 1 has been described, but the same processing as the representative node 30 and the non-representative node 30 of the group 1 is performed in the representative node 40 and the non-representative node 40 of the group 2. Will be executed.

本実施形態においては、グループ１（代表ノード３０及び非代表ノード３０）による勾配の算出がグループ２（代表ノード４０及び非代表ノード４０）による勾配の算出よりも早い場合、グループ１において算出された勾配はグループ２の代表ノード４０に送信される。この場合、グループ２の代表ノード４０（及び非代表ノード４０）は、並列分散学習処理においてグループ１の平均勾配に基づいて重み係数を算出（更新）する。 In the present embodiment, when the calculation of the gradient by the group 1 (representative node 30 and the non-representative node 30) is faster than the calculation of the gradient by the group 2 (representative node 40 and the non-representative node 40), the gradient is calculated in the group 1. The gradient is transmitted to the representative node 40 of group 2. In this case, the representative node 40 (and the non-representative node 40) of group 2 calculates (updates) the weighting coefficient based on the average gradient of group 1 in the parallel distributed learning process.

また、グループ２による勾配の算出がグループ１による勾配の算出よりも早い場合、グループ２において算出された勾配はグループ１の代表ノード３０に送信される。この場合、グループ１の代表ノード３０（及び非代表ノード３０）は、並列分散処理においてグループ２の平均勾配に基づいて重み係数を算出（更新）する。 Further, when the calculation of the gradient by the group 2 is faster than the calculation of the gradient by the group 1, the gradient calculated in the group 2 is transmitted to the representative node 30 of the group 1. In this case, the representative node 30 (and non-representative node 30) of group 1 calculates (updates) the weighting factor based on the average gradient of group 2 in parallel distribution processing.

上記したように本実施形態においては、第一階層として、複数のワーカノード３０及び４０を複数のグループ（グループ１及びグループ２）に分割し、当該グループ内で集団通信型Ｓｙｎｃｈｒｏｎｏｕｓ−ＳＧＤによる並列分散学習処理を行う。この第一階層においては、グループ内のワーカノード間で勾配を共有し、当該ワーカノードの各々において平均勾配が算出されて重み係数が更新される。このような第一階層によれば、同期コスト及びバッチサイズを抑制することが可能となる。 As described above, in the present embodiment, as the first layer, a plurality of worker nodes 30 and 40 are divided into a plurality of groups (group 1 and group 2), and the group communication type Synchronous-SGD is used for parallel distributed learning within the group. Perform processing. In this first layer, the gradient is shared among the worker nodes in the group, the average gradient is calculated for each of the worker nodes, and the weighting coefficient is updated. According to such a first layer, it is possible to suppress the synchronization cost and the batch size.

また、第二階層として、第一階層における各グループの代表ノード同士でバッチサイズ非依存並列方式による並列分散学習処理を行う。この第二階層においては、各代表ノードは同期する必要がなく、高いスループットを得ることができる。 In addition, as the second layer, parallel distributed learning processing is performed by the batch size-independent parallel method between the representative nodes of each group in the first layer. In this second layer, each representative node does not need to be synchronized, and high throughput can be obtained.

すなわち、本実施形態においては、前述した第１の実施形態と同様に、例えばＳｙｎｃｈｒｏｎｏｕｓ−ＳＧＤとバッチサイズ非依存並列方式を階層的に組み合わせる構成により、並列分散学習処理における高いスケーラビリティを実現することができ、より大きな並列数での並列分散学習処理が可能となる。 That is, in the present embodiment, as in the first embodiment described above, high scalability in parallel distributed learning processing can be realized by, for example, a configuration in which Synchronous-SGD and a batch size independent parallel method are hierarchically combined. It is possible to perform parallel distributed learning processing with a larger number of parallels.

なお、本実施形態において説明した「グループ１による勾配の算出がグループ２による勾配の算出よりも早い場合」には、グループ２における勾配の算出より先にグループ２の代表ノード４０がグループ１において算出された勾配（グループ１における勾配算出結果）を受信する場合が含まれるものとする。 In the case of "when the calculation of the gradient by the group 1 is faster than the calculation of the gradient by the group 2" described in the present embodiment, the representative node 40 of the group 2 calculates in the group 1 before the calculation of the gradient in the group 2. It is assumed that the case of receiving the obtained gradient (gradient calculation result in group 1) is included.

これによれば、例えばグループ２において勾配が算出されるより先にグループ２の代表ノード４０がグループ１において算出された勾配を受信した場合、当該第１グループにおいては、当該グループ１において算出された勾配（つまり、第１乃至第２勾配）に基づいて重み係数が更新される。一方、グループ２においては、グループ１において算出された勾配に基づいて重み係数が更新された後に、グループ２において算出された勾配（つまり、第３乃至第４勾配）に基づいて重み係数が更に更新される。換言すれば、グループ２における勾配の算出より先にグループ２の代表ノード４０がグループ１において算出された勾配を受信した場合には、グループ１においては第１乃至第２勾配に基づいて重み係数が更新され、グループ２においては第１乃至第４勾配に基づいて重み係数が更新される。 According to this, for example, when the representative node 40 of the group 2 receives the gradient calculated in the group 1 before the gradient is calculated in the group 2, the gradient is calculated in the group 1 in the first group. The weighting factor is updated based on the gradient (ie, the first and second gradients). On the other hand, in group 2, after the weighting coefficient is updated based on the gradient calculated in group 1, the weighting coefficient is further updated based on the gradient calculated in group 2 (that is, the third to fourth gradients). Will be done. In other words, if the representative node 40 of group 2 receives the gradient calculated in group 1 prior to the calculation of the gradient in group 2, the weighting factor in group 1 is based on the first to second gradients. It is updated, and in group 2, the weighting coefficient is updated based on the first to fourth gradients.

また、本実施形態において説明した「グループ２による勾配の算出がグループ１による勾配の算出よりも早い場合」には、グループ１における勾配の算出より先にグループ１の代表ノード３０がグループ２において算出された勾配（グループ２における勾配算出結果）を受信する場合が含まれるものとする。 Further, in the case of "when the calculation of the gradient by the group 2 is faster than the calculation of the gradient by the group 1" described in the present embodiment, the representative node 30 of the group 1 calculates in the group 2 before the calculation of the gradient in the group 1. It is assumed that the case of receiving the obtained gradient (gradient calculation result in group 2) is included.

これによれば、例えばグループ１において勾配が算出されるより先にグループ１の代表ノード３０がグループ１において算出された勾配を受信した場合、当該第２グループにおいては、当該グループ２において算出された勾配（つまり、第３乃至第４勾配）に基づいて重み係数が更新される。一方、グループ１においては、グループ２において算出された勾配に基づいて重み係数が更新された後に、グループ１において算出された勾配（つまり、第１乃至第２勾配）に基づいて重み係数が更に更新される。換言すれば、グループ１における勾配の算出より先にグループ１の代表ノード３０がグループ２において算出された勾配を受信した場合には、グループ２においては第３乃至第４勾配に基づいて重み係数が更新され、グループ１においては第１乃至第４勾配に基づいて重み係数が更新される。 According to this, for example, when the representative node 30 of the group 1 receives the gradient calculated in the group 1 before the gradient is calculated in the group 1, the gradient is calculated in the group 2 in the second group. The weighting factor is updated based on the gradient (ie, the third to fourth gradients). On the other hand, in group 1, after the weighting coefficient is updated based on the gradient calculated in group 2, the weighting coefficient is further updated based on the gradient calculated in group 1 (that is, the first to second gradients). Will be done. In other words, if the representative node 30 of group 1 receives the gradient calculated in group 2 prior to the calculation of the gradient in group 1, the weighting factor in group 2 is based on the third to fourth gradients. It is updated, and in group 1, the weighting coefficient is updated based on the first to fourth gradients.

すなわち、本実施形態においては、上記した各グループにおける勾配算出処理の順番（いずれのグループにおける勾配算出処理が早いか）という観点ではなく、いずれのグループにおける勾配算出処理の結果がグループ１またはグループ２の代表ノードにおいて早く受信されるかという観点に基づいて重み係数が更新されるようにしても構わない。 That is, in the present embodiment, the result of the gradient calculation process in which group is the group 1 or group 2 rather than the viewpoint of the order of the gradient calculation processes in each group (which group the gradient calculation process is faster). The weighting coefficient may be updated based on the viewpoint of whether the representative node of the above is received earlier.

なお、本実施形態においては、グループ内で勾配が共有されるものとして説明したが、例えばグループ内のワーカノードの各々において更新された重み係数が当該グループ内で共有されるような構成とすることも可能である。 In the present embodiment, the gradient is shared within the group, but for example, the weight coefficient updated in each of the worker nodes in the group may be shared within the group. It is possible.

以上述べた少なくとも１つの実施形態によれば、並列分散学習処理における高いスケーラビリティを実現することが可能なシステム、プログラム及び方法を提供することができる。 According to at least one embodiment described above, it is possible to provide a system, a program and a method capable of realizing high scalability in parallel distributed learning processing.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the gist of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, as well as in the scope of the invention described in the claims and the equivalent scope thereof.

１０…システム、２０…サーバノード、２１，３２…学習データ格納部、２２…データ割当部、２３，３５…送信制御部、２４，３３…重み係数格納部、２５，３１…受信制御部、２６，３４…算出部、３０，４０…ワーカーノード、２０１，３０１…ＣＰＵ、２０２，３０２…システムコントローラ、２０３，３０３…主メモリ、２０４，３０４…ＢＩＯＳ−ＲＯＭ、２０５，３０６…不揮発性メモリ、２０６，３０６…通信デバイス、２０７，３０７…ＥＣ。 10 ... system, 20 ... server node, 21, 32 ... learning data storage unit, 22 ... data allocation unit, 23, 35 ... transmission control unit, 24, 33 ... weight coefficient storage unit, 25, 31 ... reception control unit, 26 , 34 ... Calculation unit, 30, 40 ... Worker node, 201, 301 ... CPU, 202, 302 ... System controller, 203, 303 ... Main memory, 204, 304 ... BIOS-ROM, 205, 306 ... Non-volatile memory, 206 , 306 ... Communication device, 207,307 ... EC.

Claims

システムであって、
第１グループに属する第１ノードと第２ノードと、
第２グループに属する第３ノードと第４ノードと、を備え、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出され、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出され、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものである、
システム。 It ’s a system,
The first node and the second node belonging to the first group,
It has a third node and a fourth node that belong to the second group.
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The first gradient is calculated, and the second node calculates the second gradient for updating the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. A third gradient is calculated, and the fourth node calculates a fourth gradient for updating the third weighting factor of the objective function to the fourth weighting factor.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
system.

システムであって、
第１グループに属する第１ノードと第２ノードと、
第２グループに属する第３ノードと第４ノードと、を備え、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出され、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出され、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである、
システム。 It ’s a system,
The first node and the second node belonging to the first group,
It has a third node and a fourth node that belong to the second group.
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The first gradient is calculated, and the second node calculates the second gradient for updating the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. A third gradient is calculated, and the fourth node calculates a fourth gradient for updating the third weighting factor of the objective function to the fourth weighting factor.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.
system.

システムであって、
第１グループに属する第１ノードと第２ノードと、
第２グループに属する第３ノードと第４ノードと、を備え、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出され、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出され、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものであって、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである、
システム。 It ’s a system,
The first node and the second node belonging to the first group,
It has a third node and a fourth node that belong to the second group.
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The first gradient is calculated, and the second node calculates the second gradient for updating the first weighting factor of the objective function to the second weighting factor.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. A third gradient is calculated, and the fourth node calculates a fourth gradient for updating the third weighting factor of the objective function to the fourth weighting factor.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.
system.

前記第２重み係数は、前記第１乃至第２勾配から算出された第５勾配に基づいて更新され、
前記第４重み係数は、前記第３乃至第４勾配から算出された第６勾配に基づいて更新される
請求項１〜３のいずれか一項に記載のシステム。 The second weighting factor is updated based on the fifth gradient calculated from the first to second gradients.
The system according to any one of claims 1 to 3, wherein the fourth weighting factor is updated based on the sixth gradient calculated from the third to fourth gradients.

前記第１ノード及び前記第３ノードと通信可能に接続されるサーバノードを更に備え、
前記サーバノードは、前記第２重み係数及び前記第４重み係数を算出し、
前記第１ノードは、前記サーバノードから送信された第２重み係数を第２ノードに送信し、
前記第３ノードは、前記サーバノードから送信された第４重み係数を第４ノードに送信する
請求項１〜３のいずれか一項に記載のシステム。 A server node that is communicably connected to the first node and the third node is further provided.
The server node calculates the second weighting factor and the fourth weighting factor,
The first node transmits the second weighting coefficient transmitted from the server node to the second node, and the first node transmits the second weighting coefficient.
The system according to any one of claims 1 to 3, wherein the third node transmits a fourth weighting coefficient transmitted from the server node to the fourth node.

前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１乃至第２勾配は前記第３ノードと第４ノードとに送信され、
前記第３ノードと第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第４重み係数がさらに更新され、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第３乃至第４勾配は前記第１ノードと第２ノードとに送信され、
前記第１ノードと第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第２重み係数がさらに更新される
請求項３記載のシステム。 When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the first to second gradients are the third node and the fourth node. Sent to and from the node
In the m + 1th parallel distribution process performed by the third node and the fourth node, the fourth weighting factor is further updated based on the first to fourth gradients.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the third to fourth gradients are the first node and the second node. Sent to and from the node
The system according to claim 3, wherein in the n + 1th parallel distribution process executed by the first node and the second node, the second weighting factor is further updated based on the first to fourth gradients.

前記第１グループに属する第１ノード及び第２ノードの処理速度の差は第１閾値以下であり、
前記第２グループに属する第３ノード及び第４ノードの処理速度の差は第２閾値以下である
請求項１〜３のいずれか一項に記載のシステム。 The difference in processing speed between the first node and the second node belonging to the first group is equal to or less than the first threshold value.
The system according to any one of claims 1 to 3, wherein the difference in processing speed between the third node and the fourth node belonging to the second group is equal to or less than the second threshold value.

前記第１グループには、前記第１ノード及び第２ノードを含む複数のノードが属しており、
前記第２グループには、前記第３ノード及び第４ノードを含む複数のノードが属しており、
前記第１ノード及び前記第２ノードの処理速度よりも前記第３ノード及び前記第４ノードの処理速度が遅い場合、前記第１グループに属するノードの第１数は、前記第２グループに属するノードの第２数よりも少ない
請求項７記載のシステム。 A plurality of nodes including the first node and the second node belong to the first group.
A plurality of nodes including the third node and the fourth node belong to the second group.
When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, the first number of the nodes belonging to the first group is the node belonging to the second group. 7. The system according to claim 7, which is less than the second number of.

前記第１ノード及び前記第２ノードの処理速度よりも前記第３ノード及び前記第４ノードの処理速度が遅い場合、前記並列分散処理における前記第２グループに属する第３ノード及び第３ノードの各々の処理量は、前記第１グループに属する第１ノード及び第２ノードの処理量よりも少ない請求項７記載のシステム。 When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, each of the third node and the third node belonging to the second group in the parallel distributed processing. 7. The system according to claim 7, wherein the processing amount of is smaller than the processing amount of the first node and the second node belonging to the first group.

第１グループに属する第１ノードと第２ノードと、第２グループに属する第３ノードと第４ノードとを備えるシステムの１以上のコンピュータによって実行されるプログラムであって、
前記１以上のコンピュータに、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配を算出し、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配を算出するステップと、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配を算出し、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配を算出するステップと
を実行させ、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものである、
プログラム。 A program executed by one or more computers in a system including a first node and a second node belonging to the first group, and a third node and a fourth node belonging to the second group.
To one or more computers
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. A step of calculating the first gradient and calculating a second gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. The step of calculating the third gradient and calculating the fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is executed.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
program.

第１グループに属する第１ノードと第２ノードと、第２グループに属する第３ノードと第４ノードとを備えるシステムの１以上のコンピュータによって実行されるプログラムであって、
前記１以上のコンピュータに、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配を算出し、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配を算出するステップと、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配を算出し、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配を算出するステップと
を実行させ、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである、
プログラム。 A program executed by one or more computers in a system including a first node and a second node belonging to the first group, and a third node and a fourth node belonging to the second group.
To one or more computers
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. A step of calculating the first gradient and calculating a second gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. The step of calculating the third gradient and calculating the fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is executed.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.
program.

第１グループに属する第１ノードと第２ノードと、第２グループに属する第３ノードと第４ノードとを備えるシステムの１以上のコンピュータによって実行されるプログラムであって、
前記１以上のコンピュータに、
前記第１ノードと前記第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配を算出し、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配を算出するステップと、
前記第３ノードと前記第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配を算出し、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配を算出するステップと
を実行させ、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものであって、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、且つ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである、
プログラム。 A program executed by one or more computers in a system including a first node and a second node belonging to the first group, and a third node and a fourth node belonging to the second group.
To one or more computers
When the first node and the second node execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. A step of calculating the first gradient and calculating a second gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node.
When the third node and the fourth node execute the m (m is a natural number) parallel distribution process, the third node updates the third weighting coefficient of the objective function to the fourth weighting coefficient. The step of calculating the third gradient and calculating the fourth gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is executed.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.
program.

前記第２重み係数は、前記第１乃至第２勾配から算出された第５勾配に基づいて更新され、
前記第４重み係数は、前記第３乃至第４勾配から算出された第６勾配に基づいて更新される
請求項１０〜１２のいずれか一項に記載のプログラム。 The second weighting factor is updated based on the fifth gradient calculated from the first to second gradients.
The program according to any one of claims 10 to 12, wherein the fourth weighting factor is updated based on the sixth gradient calculated from the third to fourth gradients.

前記第２重み係数及び前記第４重み係数は、サーバノードによって算出され、
前記第１ノードは、前記サーバノードから送信された第２重み係数を第２ノードに送信し、
前記第３ノードは、前記サーバノードから送信された第４重み係数を第４ノードに送信する
請求項１０〜１２のいずれか一項に記載のプログラム。 The second weighting factor and the fourth weighting factor are calculated by the server node.
The first node transmits the second weighting coefficient transmitted from the server node to the second node, and the first node transmits the second weighting coefficient.
The program according to any one of claims 10 to 12, wherein the third node transmits a fourth weighting factor transmitted from the server node to the fourth node.

前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１乃至第２勾配は前記第３ノードと第４ノードとに送信され、
前記第３ノードと第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１勾配乃至第４勾配に基づいて前記第４重み係数がさらに更新され、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第３乃至第４勾配は前記第１ノードと第２ノードとに送信され、
前記第１ノードと第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第２重み係数がさらに更新される
請求項１０記載のプログラム。 When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the first to second gradients are the third node and the fourth node. Sent to and from the node
In the m + 1th parallel distribution process performed by the third node and the fourth node, the fourth weighting factor is further updated based on the first to fourth gradients.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the third to fourth gradients are the first node and the second node. Sent to and from the node
The program according to claim 10, wherein in the n + 1th parallel distribution process executed by the first node and the second node, the second weighting factor is further updated based on the first to fourth gradients.

前記第１グループに属する第１ノード及び第２ノードの処理速度の差は第１閾値以下であり、
前記第２グループに属する第３ノード及び第４ノードの処理速度の差は第２閾値以下である
請求項１０〜１２のいずれか一項に記載のプログラム。 The difference in processing speed between the first node and the second node belonging to the first group is equal to or less than the first threshold value.
The program according to any one of claims 10 to 12, wherein the difference in processing speed between the third node and the fourth node belonging to the second group is equal to or less than the second threshold value.

前記第１グループには、前記第１ノード及び第２ノードを含む複数のノードが属しており、
前記第２グループには、前記第３ノード及び第４ノードを含む複数のノードが属しており、
前記第１ノード及び前記第２ノードの処理速度よりも前記第３ノード及び前記第４ノードの処理速度が遅い場合、前記第１グループに属するノードの第１数は、前記第２グループに属するノードの第２数よりも少ない
請求項１６記載のプログラム。 A plurality of nodes including the first node and the second node belong to the first group.
A plurality of nodes including the third node and the fourth node belong to the second group.
When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, the first number of the nodes belonging to the first group is the node belonging to the second group. 16. The program according to claim 16, which is less than the second number of the nodes.

前記第１ノード及び前記第２ノードの処理速度よりも前記第３ノード及び前記第４ノードの処理速度が遅い場合、前記並列分散処理における前記第２グループに属する第３ノード及び第３ノードの各々の処理量は、前記第１グループに属する第１ノード及び第２ノードの処理量よりも少ない請求項１６記載のプログラム。 When the processing speed of the third node and the fourth node is slower than the processing speed of the first node and the second node, each of the third node and the third node belonging to the second group in the parallel distributed processing. 16. The program according to claim 16, wherein the processing amount of is smaller than the processing amount of the first node and the second node belonging to the first group.

第１グループに属する第１ノードと第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出されるステップと、
第２グループに属する第３ノードと第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出されるステップと
を具備し、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものである、
方法。 When the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The step of calculating the first gradient for updating the first gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node, and the step of calculating the second gradient for updating to the second weighting coefficient.
When the 3rd node and the 4th node belonging to the 2nd group execute the m (m is a natural number) parallel distribution process, the 3rd node updates the 3rd weighting coefficient of the objective function to the 4th weighting coefficient. A step of calculating a third gradient for calculating the third gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is provided.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
Method.

第１グループに属する第１ノードと第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出されるステップと、
第２グループに属する第３ノードと第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出されるステップと
を具備し、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、且つ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである
方法。 When the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The step of calculating the first gradient for updating the first gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node, and the step of calculating the second gradient for updating to the second weighting coefficient.
When the 3rd node and the 4th node belonging to the 2nd group execute the m (m is a natural number) parallel distribution process, the 3rd node updates the 3rd weighting coefficient of the objective function to the 4th weighting coefficient. A step of calculating a third gradient for calculating the third gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is provided.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.

第１グループに属する第１ノードと第２ノードとがｎ（ｎは自然数）回目の並列分散処理を実行する場合に、前記第１ノードによって目的関数の第１重み係数を第２重み係数へ更新するための第１勾配が算出され、かつ、前記第２ノードによって目的関数の前記第１重み係数を前記第２重み係数へ更新するための第２勾配が算出されるステップと、
第２グループに属する第３ノードと第４ノードとがｍ（ｍは自然数）回目の並列分散処理を実行する場合に、前記第３ノードによって目的関数の第３重み係数を第４重み係数へ更新するための第３勾配が算出され、かつ、前記第４ノードによって目的関数の前記第３重み係数を前記第４重み係数へ更新するための第４勾配が算出されるステップと
を具備し、
前記第１ノードと前記第２ノードとによる勾配の算出が、前記第３ノードと前記第４ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第２勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、かつ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第３重み係数から更新された第４重み係数をさらに更新するものであって、
前記第３ノードと前記第４ノードとによる勾配の算出が、前記第１ノードと前記第２ノードとによる勾配の算出よりも早い場合、前記第１ノードと前記第２ノードとが実行するｎ＋１回目の並列分散処理では、前記第１乃至第４勾配に基づいて前記第１重み係数から更新された前記第２重み係数をさらに更新し、且つ、前記第３ノードと前記第４ノードとが実行するｍ＋１回目の並列分散処理では、前記第３乃至第４勾配に基づいて前記第３重み係数から更新された前記第４重み係数をさらに更新するものである
方法。 When the first node and the second node belonging to the first group execute the n (n is a natural number) parallel distribution process, the first node updates the first weighting coefficient of the objective function to the second weighting coefficient. The step of calculating the first gradient for updating the first gradient for updating the first weighting coefficient of the objective function to the second weighting coefficient by the second node, and the step of calculating the second gradient for updating to the second weighting coefficient.
When the 3rd node and the 4th node belonging to the 2nd group execute the m (m is a natural number) parallel distribution process, the 3rd node updates the 3rd weighting coefficient of the objective function to the 4th weighting coefficient. A step of calculating a third gradient for calculating the third gradient for updating the third weighting coefficient of the objective function to the fourth weighting coefficient by the fourth node is provided.
When the calculation of the gradient by the first node and the second node is faster than the calculation of the gradient by the third node and the fourth node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to second gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the first to fourth gradients.
When the calculation of the gradient by the third node and the fourth node is faster than the calculation of the gradient by the first node and the second node, the n + 1th time executed by the first node and the second node. In the parallel distribution processing of, the second weighting coefficient updated from the first weighting coefficient is further updated based on the first to fourth gradients, and the third node and the fourth node execute. In the m + 1th parallel dispersion processing, the fourth weighting coefficient updated from the third weighting coefficient is further updated based on the third to fourth gradients.