JP6699891B2

JP6699891B2 - Electronic device, method and information processing system

Info

Publication number: JP6699891B2
Application number: JP2016168189A
Authority: JP
Inventors: 武戸田; 光宏木村; 耕祐春木
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2016-08-30
Filing date: 2016-08-30
Publication date: 2020-05-27
Anticipated expiration: 2036-08-30
Also published as: JP2018036779A

Description

本発明の実施形態は、並列分散学習のための技術に関する。 Embodiments of the present invention relate to techniques for parallel distributed learning.

近年、機械学習の一つであるディープラーニングによるデータの有効活用が期待されている。ディープラーニングにおいて、大規模なデータを用いた学習結果をより高速に得るためには、複数のコンピュータ（プロセッサ）による学習の並列処理を実行し、各コンピュータによる学習経過を共有する並列分散学習処理が求められる。このような並列分散学習処理では、コンピュータ間での通信によって学習経過を示すデータが共有される。 In recent years, effective use of data by deep learning, which is one of machine learning, is expected. In deep learning, in order to obtain a learning result using large-scale data at higher speed, parallel distributed learning processing that executes parallel processing of learning by multiple computers (processors) and shares learning progress by each computer is required. Desired. In such parallel distributed learning processing, data indicating learning progress is shared by communication between computers.

国際公開第２０１４／０２０９５９号International Publication No. 2014/020959

しかし、ディープラーニングにおける並列分散学習処理ではコンピュータ間で共有されるデータ量が大きく、したがって、通信コストが高くなる場合がある。そのため、通信コストを低減しながら並列分散学習処理を効率的に実行できる新たな技術の実現が要求される。 However, in parallel distributed learning processing in deep learning, the amount of data shared between computers is large, and therefore communication costs may increase. Therefore, realization of a new technique that can efficiently execute parallel distributed learning processing while reducing communication costs is required.

本発明の一形態は、通信コストを低減しながら並列分散学習処理を効率的に実行できる電子装置、方法及び情報処理システムを提供することを目的とする。 An aspect of the present invention is to provide an electronic device, a method, and an information processing system that can efficiently execute parallel distributed learning processing while reducing communication costs.

実施形態によれば、電子装置は、受信手段と処理手段とを具備する。前記受信手段は、前記電子装置と少なくとも一つの他の電子装置とによって目的関数を基準とする並列分散処理が実行される場合に、前記少なくとも一つの他の電子装置の内の第１電子装置から、前記第１電子装置によって前記目的関数の第１重み係数を更新するために算出された複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを受信する。前記処理手段は、前記複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを用いて、前記目的関数の第２重み係数を更新する。 According to an embodiment, the electronic device comprises a receiving means and a processing means. When the parallel distributed processing is performed with the objective function as a reference by the electronic device and the at least one other electronic device, the receiving unit is configured to operate from the first electronic device of the at least one other electronic device. , A sum of a plurality of gradients calculated for updating the first weighting coefficient of the objective function by the first electronic device, and information capable of specifying the number of the plurality of gradients. The processing means updates the second weighting coefficient of the objective function using the sum of the plurality of gradients and the information capable of specifying the number of the plurality of gradients.

第１実施形態に係る情報処理システムの構成の例を示すブロック図。FIG. 3 is a block diagram showing an example of the configuration of an information processing system according to the first embodiment. 目的関数を基準とする並列分散学習処理で重み係数の更新に用いられる勾配の例を説明するための図。The figure for demonstrating the example of the gradient used for the update of a weighting coefficient in the parallel distributed learning process on the basis of an objective function. 図１の情報処理システムにおいて、目的関数を基準とする並列分散学習処理で重み係数の更新に用いられる勾配の例を説明するための図。The figure for demonstrating the example of the gradient used for the update of a weighting factor in the parallel distributed learning process which made the objective function the reference in the information processing system of FIG. 図１の情報処理システムによる目的関数を基準とする並列分散学習処理で学習経過が共有される例を説明するための図。The figure for demonstrating the example in which learning progress is shared by the parallel distributed learning process based on the objective function by the information processing system of FIG. 図１の情報処理システム内のサーバのシステム構成を示すブロック図。FIG. 2 is a block diagram showing the system configuration of a server in the information processing system of FIG. 1. 図１の情報処理システム内のクライアントのシステム構成を示すブロック図。FIG. 2 is a block diagram showing the system configuration of a client in the information processing system of FIG. 1. 図１の情報処理システム内のサーバ及びクライアントの機能構成の例を示すブロック図。2 is a block diagram showing an example of functional configurations of a server and a client in the information processing system of FIG. 1. FIG. 勾配の和を送信するクライアントによって実行される並列分散学習処理の手順の例を示すフローチャート。The flowchart which shows the example of the procedure of the parallel distributed learning process performed by the client which transmits the sum of gradients. 勾配の和を受信するサーバによって実行される並列分散学習処理の手順の例を示すフローチャート。The flowchart which shows the example of the procedure of the parallel distributed learning process performed by the server which receives the sum of gradients. 図７のクライアントによって実行される並列分散学習処理の手順の例を示すフローチャート。9 is a flowchart showing an example of a procedure of parallel distributed learning processing executed by the client of FIG. 7. 図７のサーバによって実行される並列分散学習処理の手順の例を示すフローチャート。9 is a flowchart showing an example of a procedure of parallel distributed learning processing executed by the server of FIG. 7. 第２実施形態に係る情報処理システムの構成の例を示すブロック図。The block diagram which shows the example of a structure of the information processing system which concerns on 2nd Embodiment. 目的関数を基準とする並列分散学習処理で重み係数の更新に用いられる勾配の別の例を説明するための図。The figure for demonstrating another example of the gradient used for the update of a weighting coefficient in the parallel distributed learning process which makes an objective function the standard. 図１２の情報処理システムにおいて、目的関数を基準とする並列分散学習処理で重み係数の更新に用いられる勾配の例を説明するための図。FIG. 13 is a diagram for explaining an example of a gradient used for updating weighting factors in the parallel distributed learning process using the objective function as a reference in the information processing system of FIG. 図１２の情報処理システムによる目的関数を基準とする並列分散学習処理で学習経過が共有される例を説明するための図。The figure for demonstrating the example in which learning progress is shared by the parallel distributed learning process which made the objective function the reference by the information processing system of FIG. 図１２の情報処理システムによる目的関数を基準とする並列分散学習処理で学習経過が共有される別の例を説明するための図。The figure for demonstrating another example in which learning progress is shared by the parallel distributed learning process based on the objective function by the information processing system of FIG. 図１２の情報処理システム内の第１クライアント及び第２クライアントの機能構成の例を示すブロック図。FIG. 13 is a block diagram showing an example of the functional configuration of a first client and a second client in the information processing system of FIG. 12. 図１７の第１クライアントによって実行される並列分散学習処理の手順の例を示すフローチャート。18 is a flowchart showing an example of a procedure of parallel distributed learning processing executed by the first client of FIG. 図１７の第２クライアントによって実行される並列分散学習処理の手順の例を示すフローチャート。18 is a flowchart showing an example of a procedure of parallel distributed learning processing executed by the second client of FIG. 図１２の情報処理システムにおいて、複数のクライアントによる並列分散学習の効果を説明するための図。The figure for demonstrating the effect of the parallel distributed learning by several clients in the information processing system of FIG. 図１２の情報処理システムにおいて、勾配の和だけでなく勾配の数も用いることによる並列分散学習の効果を説明するための図。The figure for demonstrating the effect of parallel distributed learning by using not only the sum of gradient but the number of gradients in the information processing system of FIG.

以下、実施の形態について図面を参照して説明する。
（第１実施形態）
まず、図１を参照して、第１実施形態に係る情報処理システムの構成を説明する。この情報処理システム１は、ネットワーク５０等を介して相互に接続されたサーバコンピュータ（以下、サーバとも称する）１０と複数のクライアントコンピュータ（以下、クライアントとも称する）２０，３０，４０とによって構成されるサーバ−クライアント型のシステムである。ネットワーク５０は、例えば、イーサネット（登録商標）であるが、これに限るものではない。情報処理システム１内のサーバ１０及びクライアント２０，３０，４０は、例えば、大規模なデータを扱うディープラーニングにおいて目的関数を基準とする並列分散学習処理を実行する。この目的関数を基準とする並列分散学習処理とは、目的関数を学習結果のフィードバック（評価値）として用いて、複数の処理主体で学習されるものであればどのようなものであってもよく、例えば、目的関数を最適化するための並列分散学習処理である。なお、図１では、情報処理システム１に３台のクライアント２０，３０，４０が設けられる例を示したが、クライアントの数は２台であってもよいし、４台以上であってもよい。 Hereinafter, embodiments will be described with reference to the drawings.
(First embodiment)
First, the configuration of the information processing system according to the first embodiment will be described with reference to FIG. The information processing system 1 is composed of a server computer (hereinafter, also referred to as a server) 10 and a plurality of client computers (hereinafter, also referred to as clients) 20, 30, and 40 that are mutually connected via a network 50 or the like. A server-client type system. The network 50 is, for example, Ethernet (registered trademark), but is not limited to this. The server 10 and the clients 20, 30 and 40 in the information processing system 1 execute parallel distributed learning processing with an objective function as a reference in deep learning that handles large-scale data, for example. The parallel distributed learning process based on the objective function may be any process as long as it is learned by a plurality of processing subjects by using the objective function as feedback (evaluation value) of the learning result. , Parallel distributed learning processing for optimizing the objective function. Although FIG. 1 shows an example in which the information processing system 1 is provided with three clients 20, 30, and 40, the number of clients may be two, or four or more. .

図１に示すように、この並列分散学習処理では、クライアント２０，３０，４０は、各々に割り当てられた学習データ２１Ａ，３１Ａ，４１Ａを用いて目的関数のパラメータ（例えば、重み係数）を更新し、その際の学習経過を示すデータをサーバ１０に送信する。そして、サーバ１０は、その学習経過を示すデータを利用して、サーバ１０に格納されている目的関数のパラメータを更新し、更新されたパラメータをクライアント２０，３０，４０に送信する。 As shown in FIG. 1, in this parallel distributed learning process, the clients 20, 30, 40 update the parameters (for example, weighting factors) of the objective function using the learning data 21A, 31A, 41A assigned to each. , Data indicating the learning progress at that time is transmitted to the server 10. Then, the server 10 updates the parameters of the objective function stored in the server 10 by using the data indicating the learning progress, and sends the updated parameters to the clients 20, 30, 40.

より具体的には、サーバ１０は、例えば、クライアント２０から送信された学習経過を示すデータを用いて、サーバ１０上の目的関数のパラメータを更新し、その更新されたパラメータをクライアント２０，３０，４０に送信する。また、サーバ１０は、例えば、クライアント２０から送信された学習経過を示すデータとクライアント３０から送信された学習経過を示すデータとを用いて、サーバ１０上の目的関数のパラメータを更新し、その更新されたパラメータをクライアント２０，３０，４０に送信する。 More specifically, the server 10 updates the parameters of the objective function on the server 10 using the data indicating the learning progress sent from the client 20, and updates the updated parameters to the clients 20, 30,. Send to 40. Further, the server 10 updates the parameters of the objective function on the server 10 by using, for example, the data indicating the learning progress transmitted from the client 20 and the data indicating the learning progress transmitted from the client 30, and updating the parameter. The set parameters are transmitted to the clients 20, 30, 40.

これにより、各クライアントの学習経過が、情報処理システム１内の別のクライアントでも共有されることになるので、情報処理システム１全体での目的関数の最適化を効率的に進めることができる。 As a result, the learning progress of each client is shared by the other clients in the information processing system 1, so that the optimization of the objective function in the entire information processing system 1 can be efficiently advanced.

ところで、ディープラーニングでは、目的関数を最適化する手法として、例えば、確率的勾配降下法（stochastic gradient descent: ＳＧＤ）が用いられる。このＳＧＤでは、勾配ベクトルと称される最適解方向へのベクトルを用いて、目的関数の重み係数（以下、重みベクトルとも称する）が更新される。ＳＧＤにおける現在の状態を示す重みベクトル、勾配ベクトル及び学習係数をそれぞれW^(t)、∇W^(t)、ε^(t)とすると、更新後の重みベクトルW^(t+1)は、以下の式（１）で表される。なお、以下では、重みべクトル、勾配ベクトルのそれぞれを、単に、重み、勾配とも称する。
W^(t+1)＝W^(t)−ε^(t)∇W^(t) 式（１）
更新幅を決定する学習係数ε^(t)は学習の進度に応じて適応的に決定され、例えば、学習の進度に応じて減衰する。近年では、ＳＧＤの最適化アルゴリズムとして、Ａｄａｇｒａｄ、Ａｄａｄｅｌｔａ、Ａｄａｍ等に代表される学習係数自動減衰アルゴリズムが用いられるケースが増加している。これらの学習係数自動減衰アルゴリズムでは、学習係数ε^(t)は、勾配ベクトル∇W^(t)に依存して減衰する。 By the way, in deep learning, for example, a stochastic gradient descent (SGD) is used as a method for optimizing an objective function. In this SGD, the weighting coefficient of the objective function (hereinafter, also referred to as weighting vector) is updated using a vector in the optimum solution direction called a gradient vector. If the weight vector indicating the current state in SGD, the gradient vector, and the learning coefficient are W ^(t) , ∇W ^(t) , and ε ^(t) , respectively, the updated weight vector W ^(t+1) is It is represented by formula (1). In the following, each of the weight vector and the gradient vector will be simply referred to as a weight and a gradient.
W ^(t+1) = W ^(t) −ε ^(t) ∇W ^(t) Equation (1)
The learning coefficient ε ^(t) that determines the update width is adaptively determined according to the progress of learning, and is attenuated according to the progress of learning, for example. In recent years, as a SGD optimization algorithm, cases in which a learning coefficient automatic attenuation algorithm represented by Adgrad, Adadelta, Adam, etc. are used are increasing. In these learning coefficient automatic decay algorithms, the learning coefficient ε ^(t) decays depending on the gradient vector ∇W ^(t) .

また、ＳＧＤによる目的関数の最適化を並列分散化した場合には、情報処理システム１内で共有される学習経過として勾配ベクトルが用いられることがある。このような並列分散学習の一例として、次の参考文献が挙げられる。
参考文献：Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc’Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng, “Large Scale Distributed Deep Networks,” Advances in Neural Information Processing Systems 25, 2012. When the optimization of the objective function by SGD is distributed in parallel, a gradient vector may be used as the learning process shared in the information processing system 1. As an example of such parallel distributed learning, there are the following references.
References: Jeffrey Dean, Greg S. Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng, “Large Scale Distributed Deep Networks,” Advances in Neural Information Processing Systems 25, 2012.

一般にディープラーニングにおける勾配ベクトルは非常に大きな次元（例えば、数百万）を有するので、情報処理システム１内で勾配ベクトルを共有するための通信コストも非常に高くなる。通信コストを抑制するために、例えば、クライアント２０が目的関数の重み係数を複数回更新する間に、各々の更新で用いられた勾配をその複数回分足し合わせた勾配の和∇W_transferを算出し、この勾配の和∇W_transferをサーバ１０に送信することによって、複数回の更新に対応する学習経過を一度に共有することが考えられる。その場合、サーバ１０は、以下の式（２）に従って、勾配の和∇W_transferを用いて、別のクライアント３０，４０の重み係数Wを更新する。

サーバ１０は、この更新された重み係数Wをクライアント３０，４０に送信する。これにより、クライアント３０，４０は、クライアント２０による学習経過が反映された重み係数Wを用いて、効率的に目的関数の最適化を進めることができる。 Generally, the gradient vector in deep learning has a very large dimension (for example, several millions), so that the communication cost for sharing the gradient vector in the information processing system 1 also becomes very high. In order to reduce the communication cost, for example, while the client 20 updates the weighting coefficient of the objective function a plurality of times, the gradient ∇ W _transfer calculated by adding the gradients used in each update a plurality of times is calculated. It is conceivable to share the learning progress corresponding to a plurality of updates at once by transmitting the sum ∇W _{transfer of} this gradient to the server 10. In that case, the server 10 updates the weighting factor W of the

other clients

30 and 40 by using the gradient sum ∇W _transfer according to the following equation (2).

The server 10 transmits the updated weight coefficient W to the

clients

30 and 40. As a result, the

clients

30 and 40 can efficiently proceed with the optimization of the objective function by using the weighting coefficient W that reflects the learning progress of the client 20.

なお、サーバ１０によって処理される、複数のクライアント２０，３０，４０に対応する複数の重み係数Wは同一のものであってもよい。その場合、サーバ１０は、複数のクライアント２０，３０，４０に共通の重み係数W（マスタパラメータ）を保持し、各クライアント２０，３０，４０による学習経過（勾配の和∇W_transfer）を用いて、この共通の重み係数を更新する。サーバ１０は、更新された重み係数Wをクライアント２０，３０，４０に送信する。各クライアント２０，３０，４０は、サーバから受信した重み係数Wで、そのクライアントで用いられる重み係数Wを上書きする。これにより、クライアント２０，３０，４０は、各クライアント２０，３０，４０による学習経過が反映された重み係数Wを用いて、効率的に目的関数の最適化を進めることができる。 Note that the plurality of weighting factors W corresponding to the plurality of clients 20, 30, 40 processed by the server 10 may be the same. In that case, the server 10 holds the weighting factor W (master parameter) common to the plurality of clients 20, 30, 40, and uses the learning progress (sum of gradients ∇ W _transfer ) by each client 20, 30, 40. , Update this common weighting factor. The server 10 transmits the updated weight coefficient W to the clients 20, 30, 40. Each of the clients 20, 30, 40 overwrites the weighting factor W used by the client with the weighting factor W received from the server. As a result, the clients 20, 30, 40 can efficiently proceed with the optimization of the objective function by using the weighting coefficient W that reflects the learning progress of each client 20, 30, 40.

しかし、クライアント２０が複数の勾配∇Wの各々を用いて重み係数Wを更新するのに対して、サーバ１０は勾配の和∇W_transferを用いて重み係数Wを更新するので、クライアント２０上で更新される重み係数Wとサーバ１０上で更新される重み係数Wとでは、処理工程（処理ステップ数）の差が生じることになる。 However, while the client 20 updates the weighting coefficient W using each of the plurality of gradients ∇W, the server 10 updates the weighting coefficient W using the sum of gradients ∇W _transfer. There is a difference in the processing steps (the number of processing steps) between the updated weighting coefficient W and the weighting coefficient W updated on the server 10.

図２は、重み係数の更新に用いられる勾配の一例を示す。この例では、あるクライアント（例えば、第１クライアント２０）からサーバ１０に、当該クライアントで算出された複数の勾配５１１，５１２，５１３，５１４の和５２が送信される。そして、サーバ１０は、この勾配の和５２を用いて、別のクライアント（例えば、第２クライアント３０）の重み係数を更新する。 FIG. 2 shows an example of the gradient used for updating the weighting factor. In this example, the sum 52 of the plurality of gradients 511, 512, 513, 514 calculated by the client is transmitted from a certain client (for example, the first client 20) to the server 10. Then, the server 10 updates the weight coefficient of another client (for example, the second client 30) by using the sum 52 of the gradients.

図２に示す例では、送信側のクライアントが四つの勾配５１１，５１２，５１３，５１４を用いて重み係数を更新するのに対して、サーバ１０は一つの勾配（勾配の和）５２を用いて、別のクライアントの重み係数を更新する。換言すると、送信側のクライアントが式（１）に基づく重み係数の更新処理を４回実行しているのに対して、サーバ１０は式（２）に基づく重み係数の更新処理を１回実行している。式（１）及び式（２）に含まれる学習係数εは、上述したように学習の進度に応じて適応的に決定される（例えば、勾配に依存して減衰する）。したがって、四つの勾配５１１，５１２，５１３，５１４を用いた重み係数の更新処理と、一つの勾配（勾配の和）５２を用いた重み係数の更新処理とでは、処理工程に差が生じることになる。これによって、送信側のクライアントによる学習経過が、サーバ１０での別のクライアントの重み係数の更新に十分に反映されない、すなわち、情報処理システム１内で学習経過が十分に共有されない可能性がある。 In the example shown in FIG. 2, the client on the transmitting side uses four gradients 511, 512, 513, and 514 to update the weighting factors, while the server 10 uses one gradient (sum of gradients) 52. , Update the weighting factor of another client. In other words, the client on the transmitting side executes the updating process of the weighting factor based on the formula (1) four times, while the server 10 executes the updating process of the weighting factor based on the formula (2) once. ing. The learning coefficient ε included in the equations (1) and (2) is adaptively determined according to the progress of learning as described above (for example, attenuates depending on the gradient). Therefore, there is a difference in the processing steps between the weight coefficient update processing using the four gradients 511, 512, 513, and 514 and the weight coefficient update processing using one gradient (sum of gradients) 52. Become. As a result, the learning progress by the client on the transmission side may not be sufficiently reflected in the update of the weighting coefficient of another client in the server 10, that is, the learning progress may not be sufficiently shared in the information processing system 1.

このような処理工程の差を低減するために、本実施形態では、各クライアント２０，３０，４０からサーバ１０に、複数の勾配の和だけでなく、この複数の勾配の和の算出に用いられた複数の勾配の数を特定可能な情報も送信する。この複数の勾配の数は、複数の勾配の和が算出される間に重み係数が更新された回数に対応する。なお、複数の勾配の数を特定可能な情報は、複数の勾配の数を特定できればどのような情報であってもよく、例えば数値（例：“４”）を直接示す情報であってもよく、間接的に数値が導かれる情報であってもよい。 In order to reduce such a difference in processing steps, in the present embodiment, the client 20, 30, 40 is used by the server 10 to calculate not only the sum of a plurality of gradients but also the sum of a plurality of gradients. Also, information that can specify the number of the plurality of gradients is transmitted. The number of the plurality of gradients corresponds to the number of times the weighting factor is updated while the sum of the plurality of gradients is calculated. The information capable of specifying the number of the plurality of gradients may be any information as long as the number of the plurality of gradients can be specified, and may be information directly indicating a numerical value (eg, “4”). The information may be indirectly guided by a numerical value.

図３は、本実施形態で重み係数の更新に用いられる勾配の一例を示す。この例では、あるクライアント（例えば、第１クライアント２０）からサーバ１０に、当該クライアントで算出された複数の勾配５１１，５１２，５１３，５１４の和５２と、これら複数の勾配５１１，５１２，５１３，５１４の数N（ここでは、N=4）とが送信される。サーバ１０は、この勾配の和５２を勾配の数Nで除して得られたN個の勾配５５１，５５２，５５３，５５４を用いて、別のクライアント（例えば、第２クライアント３０）の重み係数を更新する。 FIG. 3 shows an example of the gradient used for updating the weighting coefficient in this embodiment. In this example, from a certain client (for example, the first client 20) to the server 10, the sum 52 of the plurality of gradients 511, 512, 513, 514 calculated by the client and the plurality of gradients 511, 512, 513, 513. The number N of 514 (here N=4) is transmitted. The server 10 uses the N slopes 551, 552, 553, 554 obtained by dividing the sum 52 of the slopes by the number N of slopes, and uses the weighting coefficient of another client (for example, the second client 30). To update.

図３に示す例では、送信側のクライアントが四つの勾配５１１，５１２，５１３，５１４を用いて重み係数を更新しているのと同様に、サーバ１０も四つの勾配５５１，５５２，５５３，５５４を用いて、別のクライアントの重み係数を更新している。より具体的には、サーバ１０は、次の式（３）を用いて別のクライアントの重み係数を更新する。

In the example shown in FIG. 3, the server 10 updates the weighting factors using the four

gradients

511, 512, 513, 514, and the server 10 also has four

gradients

551, 552, 553, 554. Is used to update the weighting factor of another client. More specifically, the server 10 updates the weighting coefficient of another client using the following equation (3).

サーバ１０は、送信側のクライアントによる四つの勾配５１１，５１２，５１３，５１４を用いた重み係数の更新を、四つの勾配５５１，５５２，５５３，５５４を用いた重み係数の更新で擬似的に再現することによって、処理工程の差を低減している。これにより、送信側のクライアントによる学習経過が、サーバ１０での別のクライアントの重み係数の更新に十分に反映でき、したがって、情報処理システム１内で学習経過を十分に共有することができる。 The server 10 reproduces the update of the weighting factor using the four gradients 511, 512, 513, 514 by the client on the transmitting side by updating the weighting factor using the four gradients 551, 552, 553, 554 in a pseudo manner. By doing so, the difference in processing steps is reduced. Accordingly, the learning progress by the client on the transmitting side can be sufficiently reflected in the update of the weighting coefficient of another client in the server 10, and thus the learning progress can be sufficiently shared in the information processing system 1.

また、勾配の数Nはスカラー量であるので、勾配の数Nを特定可能な情報は、非常に大きな次元（例えば、数百万）を有する勾配（勾配ベクトル）のデータに対して十分に小さなメタデータである。そのため、勾配の数Nをさらに送信することによって、通信コストにほとんど影響を与えることなく、送信側のクライアントとサーバ１０とによる重み係数更新の処理工程の差を低減することができる。 Also, since the number N of gradients is a scalar quantity, the information that can identify the number N of gradients is sufficiently small for data of a gradient (gradient vector) having a very large dimension (for example, millions). It is metadata. Therefore, by further transmitting the number N of gradients, it is possible to reduce the difference in the processing steps for updating the weighting factor between the client on the transmission side and the server 10 with little influence on the communication cost.

図４は、情報処理システム１内のサーバ１０とクライアント２０，３０，４０とによって、目的関数を基準とする並列分散学習処理で学習経過が共有される例を示す。 FIG. 4 shows an example in which the learning progress is shared by the server 10 and the clients 20, 30, 40 in the information processing system 1 in the parallel distributed learning process using the objective function as a reference.

まず、第１クライアント２０、第２クライアント３０及び第３クライアント４０は、サーバ１０によって割り当てられた学習データを受信する（Ｓ１１）。そして、各クライアント２０，３０，４０は、受信した学習データを用いて、目的関数の重み係数WをN回（ここでは、３回）更新する（Ｓ１２）。 First, the first client 20, the second client 30, and the third client 40 receive the learning data assigned by the server 10 (S11). Then, each client 20, 30, 40 updates the weighting coefficient W of the objective function N times (here, three times) using the received learning data (S12).

次いで、各クライアント２０，３０，４０は、それらN回の更新毎に算出されたN個の勾配の和∇W_transferと勾配の数Nとをサーバ１０に送信する（Ｓ１３）。サーバ１０には、各クライアント２０，３０，４０から送信された勾配の和∇W_transferと勾配の数Nとを利用して、各クライアント２０，３０，４０で用いられる目的関数の重み係数Wを更新する。そして、各クライアント２０，３０，４０は、サーバ１０から、更新された重み係数Wを受信する（Ｓ１４）。 Next, each of the clients 20, 30, and 40 transmits to the server 10 the sum ∇W _transfer of the N gradients calculated for each of the N updates and the number N of gradients (S13). The server 10 uses the sum of gradients ∇W _transfer and the number N of gradients transmitted from each client 20, 30, 40 to determine the weighting factor W of the objective function used by each client 20, 30, 40. Update. Then, each of the clients 20, 30, 40 receives the updated weighting factor W from the server 10 (S14).

同様にして、各クライアント２０，３０，４０は、学習データを用いて、目的関数の重み係数WをN回更新し（Ｓ１５）、勾配の和∇W_transferと勾配の数Nとをサーバ１０に送信する（Ｓ１６）。そして、各クライアント２０，３０，４０は、サーバ１０から、更新された重み係数Wを受信する（Ｓ１７）。 Similarly, each of the clients 20, 30 and 40 updates the weighting coefficient W of the objective function N times using the learning data (S15), and the sum ∇ W _transfer of gradients and the number N of gradients are sent to the server 10. It is transmitted (S16). Then, each of the clients 20, 30, 40 receives the updated weighting factor W from the server 10 (S17).

このように、各クライアント２０，３０，４０は、例えば、予め規定された同期タイミングで（例えば、３回の更新毎に）、勾配の和∇W_transferと勾配の数Nとをサーバ１０に送信し、サーバ１０から更新された重み係数Wを受信する。これにより、情報処理システム１内で、各クライアント２０，３０，４０による学習経過が共有され、情報処理システム１全体での目的関数の最適化を効率的に進めることができる。なお、この同期タイミングは、更新回数に限らず、例えば、前回、勾配の和∇W_transferと勾配の数Nとを送信してからの経過時間に基づいて決定されてもよい。また、クライアント２０，３０，４０は、別々のタイミングで、勾配の和∇W_transferと勾配の数Nとを送信するようにしてもよい。 In this way, each of the clients 20, 30, and 40 transmits the sum of gradients ∇W _transfer and the number N of gradients to the server 10 at a predetermined synchronization timing (for example, every three updates), for example. Then, the updated weight coefficient W is received from the server 10. As a result, the learning progress by the clients 20, 30, 40 is shared in the information processing system 1, and the optimization of the objective function in the entire information processing system 1 can be efficiently advanced. Note that this synchronization timing is not limited to the number of updates, but may be determined based on the elapsed time from the transmission of the gradient sum ∇W _transfer and the gradient number N last time, for example. Further, the clients 20, 30, 40 may transmit the gradient sum ∇W _transfer and the gradient number N at different timings.

次いで、図５は、サーバ１０のシステム構成の例を示す。サーバ１０は、ＣＰＵ１０１、システムコントローラ１０２、主メモリ１０３、ＢＩＯＳ−ＲＯＭ１０４、不揮発性メモリ１０５、通信デバイス１０６、エンベデッドコントローラ（ＥＣ）１０７、等を備える。 Next, FIG. 5 shows an example of the system configuration of the server 10. The server 10 includes a CPU 101, a system controller 102, a main memory 103, a BIOS-ROM 104, a non-volatile memory 105, a communication device 106, an embedded controller (EC) 107, and the like.

ＣＰＵ１０１は、サーバ１０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ１０１は、ストレージデバイスである不揮発性メモリ１０５から主メモリ１０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）２０１、及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、並列分散学習サーバプログラム２０２が含まれている。この並列分散学習サーバプログラム２０２は、例えば、クライアント２０，３０，４０に学習データを割り当てる機能、クライアント２０，３０，４０から学習経過を受信する機能、学習経過を用いて更新された目的関数のパラメータをクライアント２０，３０，４０に送信する機能、等を有している。 The CPU 101 is a processor that controls the operation of various components in the server 10. The CPU 101 executes various programs loaded from the nonvolatile memory 105, which is a storage device, into the main memory 103. These programs include an operating system (OS) 201 and various application programs. The application program includes the parallel distributed learning server program 202. The parallel distributed learning server program 202 has, for example, a function of allocating learning data to the clients 20, 30, 40, a function of receiving a learning progress from the clients 20, 30, 40, a parameter of an objective function updated using the learning progress. To a client 20, 30, 40, and the like.

また、ＣＰＵ１０１は、ＢＩＯＳ−ＲＯＭ１０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 101 also executes a basic input/output system (BIOS) stored in the BIOS-ROM 104. The BIOS is a program for controlling hardware.

システムコントローラ１０２は、ＣＰＵ１０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ１０２には、主メモリ１０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 102 is a device that connects the local bus of the CPU 101 and various components. The system controller 102 also has a built-in memory controller that controls access to the main memory 103.

通信デバイス１０６は、有線又は無線による通信を実行するように構成されたデバイスである。通信デバイス１０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ１０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。ＥＣ１０７は、ユーザによるパワーボタンの操作に応じてサーバ１０を電源オン又は電源オフする機能を有している。 The communication device 106 is a device configured to perform wired or wireless communication. The communication device 106 includes a transmitter that transmits a signal and a receiver that receives the signal. The EC 107 is a one-chip microcomputer that includes an embedded controller for power management. The EC 107 has a function of powering on or off the server 10 according to the operation of the power button by the user.

また、図６は、クライアント２０，３０，４０のシステム構成の例を示す。クライアント２０，３０，４０は、ＣＰＵ３０１、システムコントローラ３０２、主メモリ３０３、ＢＩＯＳ−ＲＯＭ３０４、不揮発性メモリ３０５、通信デバイス３０６、エンベデッドコントローラ（ＥＣ）３０７、等を備える。 Further, FIG. 6 shows an example of a system configuration of the clients 20, 30, 40. The clients 20, 30, 40 include a CPU 301, a system controller 302, a main memory 303, a BIOS-ROM 304, a non-volatile memory 305, a communication device 306, an embedded controller (EC) 307, and the like.

ＣＰＵ３０１は、クライアント２０，３０，４０内の様々なコンポーネントの動作を制御するプロセッサである。ＣＰＵ３０１は、ストレージデバイスである不揮発性メモリ３０５から主メモリ３０３にロードされる様々なプログラムを実行する。これらプログラムには、オペレーティングシステム（ＯＳ）４０１、及び様々なアプリケーションプログラムが含まれている。アプリケーションプログラムには、並列分散学習クライアントプログラム４０２が含まれている。この並列分散学習クライアントプログラム４０２は、例えば、目的関数のパラメータを更新する機能、学習経過をサーバ１０に送信する機能、サーバ１０によって更新された目的関数のパラメータを受信する機能、等を有している。 The CPU 301 is a processor that controls the operation of various components in the clients 20, 30, and 40. The CPU 301 executes various programs loaded from the nonvolatile memory 305, which is a storage device, into the main memory 303. These programs include an operating system (OS) 401 and various application programs. The application program includes a parallel distributed learning client program 402. The parallel distributed learning client program 402 has, for example, a function of updating the parameters of the objective function, a function of transmitting the learning progress to the server 10, a function of receiving the parameters of the objective function updated by the server 10, and the like. There is.

また、ＣＰＵ３０１は、ＢＩＯＳ−ＲＯＭ３０４に格納された基本入出力システム（ＢＩＯＳ）も実行する。ＢＩＯＳは、ハードウェア制御のためのプログラムである。 The CPU 301 also executes a basic input/output system (BIOS) stored in the BIOS-ROM 304. The BIOS is a program for controlling hardware.

システムコントローラ３０２は、ＣＰＵ３０１のローカルバスと各種コンポーネントとの間を接続するデバイスである。システムコントローラ３０２には、主メモリ３０３をアクセス制御するメモリコントローラも内蔵されている。 The system controller 302 is a device that connects the local bus of the CPU 301 and various components. The system controller 302 also has a built-in memory controller that controls access to the main memory 303.

通信デバイス３０６は、有線又は無線による通信を実行するように構成されたデバイスである。通信デバイス３０６は、信号を送信する送信部と、信号を受信する受信部とを含む。ＥＣ３０７は、電力管理のためのエンベデッドコントローラを含むワンチップマイクロコンピュータである。ＥＣ３０７は、ユーザによるパワーボタンの操作に応じてクライアント２０，３０，４０を電源オン又は電源オフする機能を有している。 The communication device 306 is a device configured to perform wired or wireless communication. The communication device 306 includes a transmitter that transmits a signal and a receiver that receives the signal. The EC307 is a one-chip microcomputer that includes an embedded controller for power management. The EC 307 has a function of powering on or powering off the clients 20, 30, 40 according to the operation of the power button by the user.

図７は、サーバ１０によって実行される並列分散学習サーバプログラム２０２と、クライアント２０，３０，４０によって実行される並列分散学習クライアントプログラム４０２との機能構成の一例を示す。サーバ１０とクライアント２０，３０，４０とは、例えば、ディープラーニングによる目的関数を基準とする並列分散学習処理を実行する。ここでは、説明を分かりやすくするために、情報処理システム１において、第１クライアント２０が学習経過を示すデータをサーバ１０に送信し、サーバ１０がこのデータを用いて第２クライアント３０の重み係数を更新し、その更新された重み係数を第２クライアント３０に送信する場合を主に例示する。 FIG. 7 shows an example of the functional configuration of the parallel distributed learning server program 202 executed by the server 10 and the parallel distributed learning client program 402 executed by the clients 20, 30, 40. The server 10 and the clients 20, 30, 40 execute, for example, parallel distributed learning processing based on an objective function by deep learning. Here, in order to make the explanation easy to understand, in the information processing system 1, the first client 20 transmits data indicating the learning progress to the server 10, and the server 10 uses this data to determine the weighting factor of the second client 30. The case of updating and transmitting the updated weighting coefficient to the second client 30 will be mainly illustrated.

サーバ１０上で実行される並列分散学習サーバプログラム２０２は、例えば、データ割当部１２、送信制御部１３、受信制御部１４、及び算出部１５を備える。また、サーバ１０は、情報処理システム１内で用いられる学習データ１１Ａが格納される記憶媒体１１（例えば、不揮発性メモリ１０５）を有している。 The parallel distributed learning server program 202 executed on the server 10 includes, for example, a data allocation unit 12, a transmission control unit 13, a reception control unit 14, and a calculation unit 15. The server 10 also has a storage medium 11 (for example, a non-volatile memory 105) in which learning data 11A used in the information processing system 1 is stored.

データ割当部１２は、学習データ１１Ａの内、各クライアント２０，３０，４０に割り当てられるデータを決定する。データ割当部１２は、例えば、学習データ１１Ａを３つに分割し、分割されたデータのそれぞれを割り当てるクライアント２０，３０，４０を決定する。 The data allocation unit 12 determines the data allocated to each of the clients 20, 30, 40 among the learning data 11A. The data allocation unit 12 divides the learning data 11A into three, for example, and determines the clients 20, 30, and 40 to which each of the divided data is allocated.

送信制御部１３及び受信制御部１４は、通信デバイス１０６を介して、各クライアント２０，３０，４０との間でデータを送受信する機能を有する。送信制御部１３は、データ割当部１２によって割り当てられたデータを、各クライアント２０，３０，４０に送信する。 The transmission control unit 13 and the reception control unit 14 have a function of transmitting/receiving data to/from each of the clients 20, 30, and 40 via the communication device 106. The transmission control unit 13 transmits the data allocated by the data allocation unit 12 to each of the clients 20, 30, 40.

受信制御部１４は、各クライアント２０，３０，４０から、各々のクライアント上での学習経過を示す複数の勾配の和とそれら複数の勾配の数を特定可能な情報とを受信する。受信制御部１４は、例えば、第１クライアント２０から、この第１クライアント２０によって目的関数の重み係数２９Ａ（第１重み係数）を更新するために算出された複数の勾配の和２９Ｂとそれら複数の勾配の数２９Ｃを特定可能な情報とを受信する。 The reception control unit 14 receives from each of the clients 20, 30 and 40 the sum of a plurality of gradients indicating the learning progress on each client and information capable of specifying the number of these plurality of gradients. The reception control unit 14, for example, from the first client 20, the sum 29B of the plurality of gradients calculated for updating the weighting coefficient 29A (first weighting coefficient) of the objective function by the first client 20 and the plurality of gradients 29B. Information that can identify the number 29 C of gradients is received.

算出部１５は、あるクライアントから受信した複数の勾配の和と複数の勾配の数を特定可能な情報とを用いて、別のクライアントに関連付けられた重み係数を更新する。送信制御部１３は、算出部１５によって更新された重み係数を、その重み係数が関連付けられたクライアントに送信する。より具体的には、例えば、算出部１５は、受信制御部１４を介して、第１クライアント２０から複数の勾配の和２９Ｂと複数の勾配の数２９Ｃを特定可能な情報とを受信した場合、この複数の勾配の和２９Ｂと複数の勾配の数２９Ｃを特定可能な情報とを用いて、第２クライアント３０に関連付けられた重み係数１９Ａ（第２重み係数）を更新する。算出部１５は、例えば、上述した式（３）に従って、複数の勾配の和２９Ｂを複数の勾配の数２９Ｃで除した値に、学習係数を掛けた値を用いて、第２クライアント３０に関連付けられた重み係数１９Ａを更新する。なお、この学習係数は、例えば、複数の勾配の和２９Ｂと複数の勾配の数２９Ｃを特定可能な情報とを用いて決定される。そして、送信制御部１３は、更新された重み係数１９Ａを第２クライアント３０に送信する。 The calculation unit 15 updates the weighting factor associated with another client using the sum of the plurality of gradients received from a certain client and the information capable of specifying the number of the plurality of gradients. The transmission control unit 13 transmits the weight coefficient updated by the calculation unit 15 to the client associated with the weight coefficient. More specifically, for example, when the calculation unit 15 receives, via the reception control unit 14, information capable of specifying the sum 29B of a plurality of slopes and the number 29C of a plurality of slopes from the first client 20, The weight coefficient 19A (second weight coefficient) associated with the second client 30 is updated using the sum 29B of the plurality of gradients and the information capable of specifying the number 29C of the plurality of gradients. The calculation unit 15 associates the sum 29B of the plurality of gradients by the number 29C of the plurality of gradients with a learning coefficient, and associates the second client 30 with the second client 30 according to the above-described equation (3). The calculated weighting coefficient 19A is updated. The learning coefficient is determined using, for example, the sum 29B of the plurality of gradients and the information 29C capable of specifying the number 29C of the plurality of gradients. Then, the transmission control unit 13 transmits the updated weighting coefficient 19A to the second client 30.

また、第１クライアント２０上で実行される並列分散学習クライアントプログラム４０２は、例えば、受信制御部２２、算出部２３及び送信制御部２４を備える。受信制御部２２及び送信制御部２４は、通信デバイス３０６を介して、サーバ１０との間でデータを送受信する機能を有する。 The parallel distributed learning client program 402 executed on the first client 20 includes, for example, the reception control unit 22, the calculation unit 23, and the transmission control unit 24. The reception control unit 22 and the transmission control unit 24 have a function of transmitting and receiving data to and from the server 10 via the communication device 306.

受信制御部２２は、サーバ１０によって割り当てられた学習データ２１Ａを受信し、この受信した学習データ２１Ａを記憶媒体２１（例えば、不揮発性メモリ３０５）に格納する。 The reception control unit 22 receives the learning data 21A assigned by the server 10, and stores the received learning data 21A in the storage medium 21 (for example, the nonvolatile memory 305).

算出部２３は、学習データ２１Ａを用いて、目的関数の重み係数２９Ａを更新する処理を繰り返し実行する。算出部２３は、第１期間において、重み係数２９Ａが更新される毎に、その更新時に算出された勾配を積算することによって、勾配の和２９Ｂを算出すると共に、積算された勾配の数２９Ｃをカウントする。なお、この第１期間は、例えば、時間で規定されてもよいし、重み係数２９Ａが更新される回数で規定されてもよい。 The calculation unit 23 repeatedly executes the process of updating the weighting coefficient 29A of the objective function using the learning data 21A. Every time the weighting factor 29A is updated, the calculating unit 23 integrates the gradients calculated at the time of updating, thereby calculating the sum 29B of the gradients and calculating the number 29C of the integrated gradients. To count. The first period may be defined by time or the number of times the weighting factor 29A is updated.

送信制御部２４は、第１期間が経過した場合、第１クライアント２０による学習経過を示すデータをサーバ１０に送信する。送信制御部２４は、例えば、算出された勾配の和２９Ｂと、カウントされた勾配の数２９Ｃを特定可能な情報とをサーバ１０に送信する。 When the first period has elapsed, the transmission control unit 24 transmits data indicating the progress of learning by the first client 20 to the server 10. The transmission control unit 24 transmits, for example, the calculated sum 29B of gradients and information capable of specifying the counted number 29C of gradients to the server 10.

また、第２クライアント３０上で実行される並列分散学習クライアントプログラム４０２は、例えば、受信制御部３２、算出部３３及び送信制御部３４を備える。受信制御部３２及び送信制御部３４は、通信デバイス３０６を介して、サーバ１０との間でデータを送受信する機能を有する。 The parallel distributed learning client program 402 executed on the second client 30 includes, for example, a reception control unit 32, a calculation unit 33, and a transmission control unit 34. The reception control unit 32 and the transmission control unit 34 have a function of transmitting and receiving data to and from the server 10 via the communication device 306.

受信制御部３２は、サーバ１０によって割り当てられた学習データ３１Ａを受信し、この受信した学習データ３１Ａを記憶媒体３１（例えば、不揮発性メモリ３０５）に格納する。 The reception control unit 32 receives the learning data 31A assigned by the server 10, and stores the received learning data 31A in the storage medium 31 (for example, the nonvolatile memory 305).

算出部３３は、学習データ３１Ａを用いて、目的関数の重み係数３９Ａを更新する処理を繰り返し実行する。 The calculation unit 33 repeatedly executes the process of updating the weighting coefficient 39A of the objective function using the learning data 31A.

また、受信制御部３２は、サーバ１０によって更新された重み係数１９Ａを受信する。この受信される重み係数１９Ａは、上述したように、第１クライアント２０から送信された勾配の和２９Ｂと勾配の数２９Ｃとを用いて更新された重み係数１９Ａである。受信制御部３２は、例えば、受信した重み係数１９Ａで、作業メモリ３９に格納された重み係数３９Ａを置き換える。これにより、第２クライアント３０は、第１クライアント２０による学習経過が反映された重み係数３９Ａを用いて、並列分散学習処理を効率的に進めることができる。 Further, the reception control unit 32 receives the weighting coefficient 19A updated by the server 10. As described above, the received weighting coefficient 19A is the weighting coefficient 19A updated using the sum 29B of gradients and the number 29C of gradients transmitted from the first client 20. The reception control unit 32 replaces the weighting factor 39A stored in the working memory 39 with the received weighting factor 19A, for example. As a result, the second client 30 can efficiently proceed the parallel distributed learning process by using the weighting coefficient 39A in which the learning progress of the first client 20 is reflected.

なお、算出部３３は、第１期間において、重み係数３９Ａが更新される毎に、その更新時に算出された勾配を積算することによって、勾配の和３９Ｂを算出すると共に、積算された勾配の数３９Ｃをカウントしてもよい。 Note that the calculation unit 33 calculates the sum 39B of gradients by adding up the gradients calculated at the time of each update of the weighting factor 39A in the first period, and at the same time, calculates the number of integrated gradients. 39C may be counted.

また、送信制御部３４は、第１期間が経過した場合、第２クライアント３０による学習経過を示すデータをサーバ１０に送信してもよい。送信制御部３４は、例えば、算出された勾配の和３９Ｂと、カウントされた勾配の数３９Ｃを特定可能な情報とをサーバ１０に送信してもよい。 Further, the transmission control unit 34 may transmit the data indicating the learning progress by the second client 30 to the server 10 when the first period has elapsed. The transmission control unit 34 may transmit, for example, the calculated sum 39B of gradients and information capable of specifying the counted number 39C of gradients to the server 10.

その場合、サーバ１０の算出部１５及び送信制御部１３は、これら勾配の和３９Ｂと勾配の数３９Ｃとを用いて、第１クライアント２０に関連付けられた重み係数を更新し、更新された重み係数を第１クライアント２０に送信することもできる。第１クライアント２０の受信制御部２２は、サーバ１０によって更新された重み係数を受信する。受信制御部２２は、例えば、受信した重み係数で、作業メモリ２９に格納された重み係数２９Ａを置き換える。これにより、第１クライアント２０は、第２クライアント３０による学習経過が反映された重み係数２９Ａを用いて、並列分散学習処理を効率的に進めることができる。 In that case, the calculation unit 15 and the transmission control unit 13 of the server 10 update the weighting factor associated with the first client 20 by using the sum 39B of the gradients and the number 39C of the gradients, and the updated weighting factor. Can also be sent to the first client 20. The reception control unit 22 of the first client 20 receives the weighting factor updated by the server 10. The reception control unit 22 replaces the weighting coefficient 29A stored in the working memory 29 with the received weighting coefficient, for example. As a result, the first client 20 can efficiently proceed the parallel distributed learning process by using the weighting coefficient 29A that reflects the learning progress of the second client 30.

なお、情報処理システム１では、サーバ１０上に保持され、全てのクライアント２０，３０，４０で共通した重み係数W（マスタパラメータ）を用いることもできる。その場合、サーバ１０の算出部１５は、クライアント２０，３０，４０の少なくともいずれか１つから受信した複数の勾配の和と複数の勾配の数を特定可能な情報とを用いて、上述した方法で、サーバ１０上の重み係数W（マスタパラメータ）を更新する。そして、送信制御部１３は、更新された重み係数Wを各クライアント２０，３０，４０に送信する。例えば、第１クライアントの受信制御部２２は、サーバ１０によって更新された重み係数Wを受信し、受信した重み係数Wで作業メモリ２９に格納された重み係数２９Ａを置き換える。同様に、第２クライアント３０の受信制御部３２は、サーバ１０によって更新された重み係数Wを受信し、受信した重み係数Wで作業メモリ３９に格納された重み係数３９Ａを置き換える。これにより、第１クライアント２０及び第２クライアント３０は、クライアント２０，３０，４０による学習経過が反映された重み係数２９Ａ，３９Ａを用いて、並列分散学習処理を効率的に進めることができる。 In the information processing system 1, the weighting coefficient W (master parameter) held on the server 10 and common to all the clients 20, 30, 40 can be used. In that case, the calculation unit 15 of the server 10 uses the above-described method by using the sum of the plurality of gradients and the information capable of specifying the number of the plurality of gradients received from at least one of the clients 20, 30, and 40. Then, the weight coefficient W (master parameter) on the server 10 is updated. Then, the transmission control unit 13 transmits the updated weight coefficient W to each of the clients 20, 30, 40. For example, the reception control unit 22 of the first client receives the weighting factor W updated by the server 10, and replaces the weighting factor 29A stored in the working memory 29 with the received weighting factor W. Similarly, the reception control unit 32 of the second client 30 receives the weight coefficient W updated by the server 10, and replaces the weight coefficient 39A stored in the working memory 39 with the received weight coefficient W. As a result, the first client 20 and the second client 30 can efficiently proceed the parallel distributed learning process by using the weighting factors 29A and 39A that reflect the learning progress of the clients 20, 30, and 40.

情報処理システム１には、第１クライアント２０及び第２クライアント３０に限らず、３台以上のクライアントを設けることができ、各クライアントは、上述した第１クライアント２０及び第２クライアント３０と同様の構成を有する。したがって、情報処理システム１では、あるクライアントによる学習経過を別の複数のクライアントの重み係数の更新に反映することができ、また複数のクライアントによる複数の学習経過を別の一つのクライアントの重み係数の更新に反映することもできる。 The information processing system 1 is not limited to the first client 20 and the second client 30, and can be provided with three or more clients, and each client has the same configuration as the first client 20 and the second client 30 described above. Have. Therefore, in the information processing system 1, the learning progress by a certain client can be reflected in the update of the weighting coefficient of another plurality of clients, and the plurality of learning progresses by a plurality of clients can be reflected in the weighting coefficient of another one client. It can also be reflected in the update.

なお、上述した構成では、複数の勾配の和だけでなく複数の勾配の数Nを特定可能な情報もクライアント２０からサーバ１０に送信される例を示したが、この勾配の数Nの代わりに、クライアント２０で算出された複数の勾配（勾配ベクトル）の大小関係を特定可能な情報がサーバ１０に送信されるようにしてもよい。この複数の勾配の大小関係を特定可能な情報は、複数の勾配の大小関係を特定できればどのような情報であってもよく、例えば、数値（例えば、複数の勾配の各々の大きさの比を表すN次元のベクトル）を直接示す情報であってもよく、間接的に数値が導かれる情報であってもよい。 In the above-described configuration, an example in which not only the sum of a plurality of gradients but also information capable of specifying the number N of a plurality of gradients is transmitted from the client 20 to the server 10 is shown. Information capable of specifying the magnitude relationship of the plurality of gradients (gradient vectors) calculated by the client 20 may be transmitted to the server 10. The information capable of specifying the magnitude relationship of the plurality of gradients may be any information as long as the magnitude relationship of the plurality of gradients can be specified. For example, a numerical value (for example, a ratio of the magnitudes of the plurality of gradients can be calculated). The information may be information that directly indicates the (N-dimensional vector that represents) or information that numerically guides the numerical value.

例えば、目的関数が凸関数である場合、ＳＧＤにより最適化を進めることにより得られる勾配ベクトルの大きさは徐々に小さくなる。そのため、クライアント２０は、勾配の数Nに代わるメタデータとして、N個の勾配（勾配ベクトル）の各々の大きさの比を表すN次元のベクトルを送信することもできる。また、N次元のベクトルも、勾配ベクトルが有する非常に大きな次元（例えば、数百万）と比較すると十分に小さいと云える。 For example, when the objective function is a convex function, the magnitude of the gradient vector obtained by advancing the optimization by SGD gradually decreases. Therefore, the client 20 can also transmit an N-dimensional vector representing the ratio of the magnitudes of the N gradients (gradient vectors) as metadata in place of the number N of gradients. Also, it can be said that the N-dimensional vector is sufficiently small as compared with the very large dimension (for example, several millions) that the gradient vector has.

サーバ１０の受信制御部１４は、例えば、第１クライアント２０から、複数の勾配の和２９Ｂと、これら複数の勾配の大小関係を特定可能な情報とを受信する。そして、算出部１５は、複数の勾配の和２９Ｂと、複数の勾配の大小関係を特定可能な情報とを用いて、例えば、勾配の和２９Ｂが、複数の勾配の大小関係（例えば、N次元ベクトルによって表される大きさの比）に基づいて分割された複数の勾配を算出する。算出部１５は、算出された複数の勾配を用いて、例えば、第２クライアント３０に関連付けられた重み係数１９Ａを更新する。そして、送信制御部１３は、更新された重み係数１９Ａを第２クライアント３０に送信する。これにより、送信側の第１クライアント２０による学習経過を第２クライアント３０の重み係数の更新に、より反映させることができる。 The reception control unit 14 of the server 10 receives, for example, the sum 29B of a plurality of gradients and information capable of specifying the magnitude relationship of the plurality of gradients from the first client 20. Then, the calculating unit 15 uses the sum 29B of the plurality of gradients and the information capable of specifying the magnitude relationship of the plurality of gradients, and for example, the sum 29B of the gradients determines the magnitude relationship of the plurality of gradients (for example, N dimensions). A plurality of divided gradients is calculated based on the ratio of the magnitudes represented by the vectors). The calculation unit 15 updates the weighting coefficient 19A associated with the second client 30, for example, using the calculated plurality of gradients. Then, the transmission control unit 13 transmits the updated weighting coefficient 19A to the second client 30. Thereby, the learning progress by the first client 20 on the transmission side can be more reflected in the update of the weighting coefficient of the second client 30.

次いで、図８から図１１のフローチャートを参照して、並列分散学習処理の手順の例を示す。図８及び図９のフローチャートが、勾配の和だけが送受信される場合の処理を示すのに対して、図１０及び図１１のフローチャートは、勾配の和と勾配の数とが送受信される本実施形態の処理を示している。以下では、目的関数の最適化に用いられる学習データが、サーバからクライアントに対して既に割り当てられている場合を例示する。 Next, an example of the procedure of parallel distributed learning processing will be described with reference to the flowcharts of FIGS. The flowcharts of FIGS. 8 and 9 show the processing when only the sum of the gradients is transmitted and received, whereas the flowcharts of FIGS. 10 and 11 show that the sum of the gradients and the number of the gradients are transmitted and received. The processing of form is shown. Below, the case where the learning data used for the optimization of the objective function is already allocated from the server to the client is illustrated.

まず、図８のフローチャートは、勾配の和を送信するクライアントによって実行される処理の手順を示す。
クライアントのＣＰＵは、勾配の和を送信するための∇W_transferを初期化する（ブロックＢ１１）、すなわち、∇W_transferに０を設定する。ＣＰＵは、送信される勾配の数Nに応じて、N回、ブロックＢ１３からブロックＢ１４までの手順を実行する（ブロックＢ１２）。より具体的には、ＣＰＵは、目的関数の重みWを更新する（ブロックＢ１３）。そして、ＣＰＵは、重みWを更新する際に算出された勾配∇Wを、勾配の和∇W_transferに加算する（ブロックＢ１４）。 First, the flowchart of FIG. 8 shows the procedure of the process executed by the client that transmits the sum of gradients.
The CPU of the client initializes ∇W _transfer for transmitting the gradient sum (block B11), that is, sets ∇W _transfer to 0. The CPU executes the procedure from block B13 to block B14 N times according to the number N of gradients transmitted (block B12). More specifically, the CPU updates the weight W of the objective function (block B13). Then, the CPU adds the gradient ∇W calculated when updating the weight W to the gradient sum ∇W _transfer (block B14).

勾配∇Wが勾配の和∇W_transferにN回加算された後、すなわち、重みWの更新がN回行われた後、ＣＰＵは勾配の和∇W_transferをサーバに送信する（ブロックＢ１５）。また、ＣＰＵは、他のクライアントによる学習経過を示す勾配の和∇W_transferを用いて更新された重みWをサーバ１０から受信する（ブロックＢ１６）。そして、ＣＰＵ３０１は、受信した重みWで、第１クライアント２０で用いられる目的関数の重みWを上書きする（ブロックＢ１７）。 After the gradient ∇W is added N times to the gradient sum ∇W _transfer , that is, after the weight W is updated N times, the CPU transmits the gradient sum ∇W _transfer to the server (block B15). Further, the CPU receives the weight W updated using the sum ∇W _transfer of the gradients indicating the learning progress by other clients from the server 10 (block B16). Then, the CPU 301 overwrites the weight W of the objective function used in the first client 20 with the received weight W (block B17).

図９のフローチャートは、勾配の和を受信するサーバによって実行される処理の手順を示す。
まず、サーバのＣＰＵは、クライアントから勾配の和∇W_transferを受信したか否かを判定する（ブロックＢ２１）。クライアントから勾配の和∇W_transferを受信していない場合（ブロックＢ２１のＮＯ）、ブロックＢ２２の手順に戻る。 The flowchart of FIG. 9 shows a procedure of processing executed by the server which receives the sum of gradients.
First, the server CPU determines whether or not the gradient sum ∇W _transfer has been received from the client (block B21). When the sum of gradients ∇W _transfer is not received from the client (NO in block B21), the procedure returns to block B22.

クライアントから勾配の和∇W_transferを受信した場合（ブロックＢ２１のＹＥＳ）、ＣＰＵは、この勾配の和∇W_transferを送信したクライアント以外のクライアントの内、重みWを更新する対象のクライアントを選択する（ブロックＢ２２）。そして、ＣＰＵは、対象のクライアントに関連付けられた重みW（すなわち、対象のクライアント上で更新された重みW）を、受信した勾配の和∇W_transferを用いて更新し（ブロックＢ２３）、その更新された重みWをその対象のクライアントに送信する（ブロックＢ２４）。 When the sum of gradients ∇W _transfer is received from the client (YES in block B21), the CPU selects the client whose weight W is to be updated among the clients other than the client that transmitted the sum of gradients ∇W _transfer. (Block B22). Then, the CPU updates the weight W associated with the target client (that is, the weight W updated on the target client) using the received gradient sum ∇W _transfer (block B23), and the update. The calculated weight W is transmitted to the target client (block B24).

次いで、ＣＰＵは、重みWを更新すべき別のクライアントがあるか否かを判定する（ブロックＢ２５）。別のクライアントがある場合（ブロックＢ２５のＹＥＳ）、ブロックＢ２２に戻り、その別のクライアントに関連付けられた重みWを更新するための手順が実行される。別のクライアントがない場合（ブロックＢ２５のＮＯ）、ブロックＢ２１の手順に戻る。 Then, the CPU determines whether there is another client whose weight W should be updated (block B25). If there is another client (YES in block B25), return to block B22 and the procedure for updating the weight W associated with that other client is performed. If there is no other client (NO in block B25), the process returns to block B21.

次いで、図１０のフローチャートを参照して、勾配の和と勾配の数とを送信する第１クライアント２０によって実行される処理の手順を説明する。
まず、第１クライアント２０のＣＰＵ３０１は、勾配の和を送信するための∇W_transferを初期化する（ブロックＢ３１）、すなわち、∇W_transferに０を設定する。ＣＰＵ３０１は、送信される勾配の数Nに応じて、N回、ブロックＢ３３からブロックＢ３４までの手順を実行する（ブロックＢ３２）。より具体的には、ＣＰＵ３０１は、学習データ２１Ａを用いて目的関数の重みWを更新する（ブロックＢ３３）。そして、ＣＰＵ３０１は、重みWを更新する際に算出された勾配∇Wを、勾配の和∇W_transferに加算する（ブロックＢ３４）。 Next, a procedure of processing executed by the first client 20 that transmits the sum of gradients and the number of gradients will be described with reference to the flowchart of FIG. 10.
First, the CPU 301 of the first client 20 initializes ∇W _transfer for transmitting the sum of gradients (block B31), that is, sets ∇W _transfer to 0. The CPU 301 executes the procedure from block B33 to block B34 N times according to the number N of gradients transmitted (block B32). More specifically, the CPU 301 updates the weight W of the objective function using the learning data 21A (block B33). Then, the CPU 301 adds the gradient ∇W calculated when updating the weight W to the gradient sum ∇W _transfer (block B34).

勾配∇Wが勾配の和∇W_transferにN回加算された後、すなわち、重みWの更新がN回行われた後、ＣＰＵ３０１は勾配の和∇W_transferと勾配の数Nとをサーバ１０に送信する（ブロックＢ３５）。また、ＣＰＵ３０１は、クライアント２０，３０，４０による学習経過を示す勾配の和∇W_transferと勾配の数Nとを用いて更新された重みWをサーバ１０から受信する（ブロックＢ３６）。そして、ＣＰＵ３０１は、受信した重みWで、第１クライアント２０で用いられる目的関数の重みWを上書きする（ブロックＢ３７）。 After the gradient ∇W is added N times to the gradient sum ∇W _transfer , that is, after the weight W is updated N times, the CPU 301 sends the gradient sum ∇W _transfer and the number N of gradients to the server 10. It is transmitted (block B35). Further, the CPU 301 receives from the server 10 the weight W updated using the gradient sum ∇W _transfer indicating the learning progress by the clients 20, 30, 40 and the number N of gradients (block B36). Then, the CPU 301 overwrites the weight W of the objective function used in the first client 20 with the received weight W (block B37).

また、図１１のフローチャートを参照して、勾配の和と勾配の数とを受信するサーバ１０によって実行される処理の手順を説明する。
まず、サーバ１０のＣＰＵ１０１は、クライアント２０，３０，４０の少なくともいずれかから勾配の和∇W_transferと勾配の数Nとを受信したか否かを判定する（ブロックＢ４０１）。いずれのクライアント２０，３０，４０からも勾配の和∇W_transferと勾配の数Nとを受信していない場合（ブロックＢ４０１のＮＯ）、ブロックＢ４０１の手順に戻る。 The procedure of the process executed by the server 10 that receives the sum of gradients and the number of gradients will be described with reference to the flowchart of FIG.
First, the CPU 101 of the server 10 determines whether or not the sum of gradients ∇W _transfer and the number N of gradients have been received from at least one of the clients 20, 30, and 40 (block B401). When the sum of gradients ∇W _transfer and the number N of gradients have not been received from any of the clients 20, 30, 40 (NO in block B401), the procedure returns to block B401.

クライアント２０，３０，４０の少なくともいずれかから勾配の和∇W_transferと勾配の数Nとを受信した場合（ブロックＢ４０１のＹＥＳ）、ＣＰＵ１０１は、これら勾配の和∇W_transferと勾配の数Nの送信元のクライアント以外のクライアントから、重みWを更新する対象のクライアントを選択する（ブロックＢ４０２）。ＣＰＵ１０１は、例えば、第１クライアント２０から勾配の和∇W_transferと勾配の数Nとを受信した場合、第２クライアント３０と第３クライアント４０のいずれか一方を、重みWを更新する対象のクライアントとして選択する。 When the gradient sum ∇W _transfer and the gradient number N are received from at least one of the clients 20, 30, and 40 (YES in block B401), the CPU 101 determines whether the gradient sum ∇W _transfer and the gradient number N are A client whose weight W is to be updated is selected from clients other than the transmission source client (block B402). For example, when the CPU 101 receives the sum of gradients ∇W _transfer and the number of gradients N from the first client 20, one of the second client 30 and the third client 40 is a target client for updating the weight W. To choose as.

次いで、ＣＰＵ１０１は、更新のための勾配∇W_updateを初期化する（ブロックＢ４０３）、すなわち、∇W_updateに０を設定する。また、ＣＰＵ１０１は、ブロックＢ４０５からブロックＢ４０８までの繰り返し処理に用いられる変数iに１を設定する（ブロックＢ４０４）。ＣＰＵ１０１は、変数iが勾配の数N以下である間、ブロックＢ４０６からブロックＢ４０８までの手順を繰り返す（ブロックＢ４０５）。より具体的には、ＣＰＵ１０１は学習係数ε_iを算出する（ブロックＢ４０６）。ＣＰＵ１０１は、例えば、学習の進度に対応する変数iと、勾配の和∇W_transferを勾配の数Nで除した値である勾配の平均値∇W_transfer/Nとを用いて、学習係数ε_iを算出する。ＣＰＵ１０１は、更新のための勾配∇W_updateに、学習係数ε_iと∇W_transfer/Nとを乗じた値を加算する（ブロックＢ４０７）。ＣＰＵ１０１は、変数iに１を加算する（ブロックＢ４０９）。そして、変数iがN以下であるならば、ブロックＢ４０６の手順に戻る。 Next, the CPU 101 initializes the gradient ∇W _update for updating (block B403), that is, sets ∇W _update to 0. Further, the CPU 101 sets 1 to the variable i used in the iterative processing of the blocks B405 to B408 (block B404). The CPU 101 repeats the procedure from block B406 to block B408 while the variable i is the number N of gradients or less (block B405). More specifically, the CPU 101 calculates the learning coefficient ε _i (block B406). The CPU 101 uses, for example, the learning coefficient ε _i by using the variable i corresponding to the learning progress and the average value ∇W _transfer /N of the gradient, which is a value obtained by dividing the sum ∇W _transfer of the gradient by the number N of gradients. To calculate. The CPU 101 adds a value obtained by multiplying the gradient ∇W _update for updating by the learning coefficient ε _i and ∇W _transfer /N (block B407). The CPU 101 adds 1 to the variable i (block B409). Then, if the variable i is equal to or less than N, the procedure returns to the block B406.

一方、変数iがNより大きいならば、ＣＰＵ１０１は、対象のクライアントに関連付けられた重みWを、更新のための勾配∇W_updateを用いて更新する（ブロックＢ４０９）。ＣＰＵ１０１は、通信デバイス１０６を介して、その更新された重みWを対象のクライアントに送信する（ブロックＢ４１０）。 On the other hand, if the variable i is larger than N, the CPU 101 updates the weight W associated with the target client using the gradient ∇W _update for updating (block B409). The CPU 101 transmits the updated weight W to the target client via the communication device 106 (block B410).

次いで、ＣＰＵ１０１は、重みWを更新すべき別のクライアントがあるか否かを判定する（ブロックＢ４１１）。別のクライアントがある場合（ブロックＢ４１１のＹＥＳ）、ブロックＢ４０２に戻り、その別のクライアントに関連付けられた重みWを更新するための手順が実行される。別のクライアントがない場合（ブロックＢ４１１のＮＯ）、ブロックＢ４０１の手順に戻る。 Next, the CPU 101 determines whether there is another client whose weight W should be updated (block B411). If there is another client (YES in block B411), return to block B402 and the procedure for updating the weight W associated with that other client is performed. If there is no other client (NO in block B411), the procedure returns to block B401.

なお、上述したように、情報処理システム１では、サーバ１０上に保持され、全てのクライアント２０，３０，４０で共通した重み係数W（マスタパラメータ）を用いることもできる。その場合、ＣＰＵ１０１は、クライアント２０，３０，４０の少なくともいずれか１つから受信した複数の勾配の和と複数の勾配の数を特定可能な情報とを用いて、ブロックＢ４０２の手順で重みWを更新する対象のクライアントを選択することなく、ブロックＢ４０３からブロックＢ４０９の手順において、サーバ１０上の重み係数W（マスタパラメータ）を更新する。そして、ＣＰＵ１０１は、ブロックＢ４１０において、更新された重み係数Wを各クライアント２０，３０，４０に送信する。 As described above, in the information processing system 1, the weighting coefficient W (master parameter) held on the server 10 and common to all the clients 20, 30, 40 can be used. In that case, the CPU 101 uses the sum of the plurality of gradients received from at least one of the clients 20, 30, and 40 and the information capable of specifying the number of the plurality of gradients to determine the weight W in the procedure of block B402. The weighting factor W (master parameter) on the server 10 is updated in the procedure from block B403 to block B409 without selecting the client to be updated. Then, the CPU 101 transmits the updated weighting factor W to each of the clients 20, 30, 40 in block B410.

また、ＣＰＵ１０１は、複数のクライアントから受信した複数の勾配の和∇W_transferと複数の勾配の数Nとを用いて、それら複数のクライアント以外のクライアントに関連付けられた重みWを更新してもよい。例えば、ＣＰＵ１０１は、第１クライアント２０から受信した勾配の和∇W_transfer及び勾配の数Nと、第２クライアント３０から受信した勾配の和∇W_transfer及び勾配の数Nとを用いて、第３クライアント４０に関連付けられた重みWを更新するようにしてもよい。その場合、ＣＰＵ１０１は、第１クライアント２０から受信した勾配の和∇W_transferと、第２クライアント３０から受信した勾配の和∇W_transferとの総和を、勾配の和∇W_transferとし、第１クライアント２０から受信した勾配の数Nと第２クライアント３０から受信した勾配の数Nとの和を、勾配の数Nとして、ブロックＢ４０３以降の手順を実行する。 Further, the CPU 101 may update the weight W associated with a client other than the plurality of clients by using the sum ∇W _transfer of the plurality of gradients received from the plurality of clients and the number N of the plurality of gradients. . For example, CPU 101 may use the number N of the sum ∇W _transfer and gradient slope received from the first client 20, and the number N of the sum ∇W _transfer and gradient slope received from the second client 30, the third The weight W associated with the client 40 may be updated. In that case, CPU 101 is a sum ∇W _transfer gradient received from the first client 20, the sum of the sum ∇W _transfer gradient received from the second client 30, and the sum ∇W _transfer gradient, the first client The number of gradients N received from 20 and the number of gradients N received from the second client 30 are set as the number of gradients N, and the procedure from block B403 is executed.

以上説明したように、本実施形態によれば、通信コストを低減しながら並列分散学習処理を効率的に実行することができる。サーバ１０の受信制御部１４は、サーバ１０及びクライアント２０，３０，４０によって、ディープラーニングによる目的関数を基準とする並列分散学習処理が実行される場合に、第１クライアント２０から、この第１クライアント２０によって目的関数の重み係数２９Ａ（第１重み係数）を更新するために算出された複数の勾配の和２９Ｂと、複数の勾配の数２９Ｃを特定可能な情報とを受信する。サーバ１０の算出部１５は、複数の勾配の和２９Ｂと、複数の勾配の数２９Ｃを特定可能な情報とを用いて、目的関数の重み係数１９Ａ（第２重み係数）を更新する。 As described above, according to this embodiment, it is possible to efficiently execute parallel distributed learning processing while reducing communication costs. When the server 10 and the clients 20, 30, 40 execute parallel distributed learning processing with the objective function based on deep learning as a reference, the reception control unit 14 of the server 10 changes the first client 20 to the first client 20. 20 receives the sum 29B of the plurality of gradients calculated for updating the weighting coefficient 29A (first weighting coefficient) of the objective function and the information capable of specifying the number 29C of the plurality of gradients. The calculating unit 15 of the server 10 updates the weighting coefficient 19A (second weighting coefficient) of the objective function using the sum 29B of the plurality of gradients and the information capable of specifying the number 29C of the plurality of gradients.

これにより、第１クライアント２０から受信した複数の勾配の和２９Ｂだけでなく、通信コストが小さい複数の勾配の数２９Ｃも用いて、目的関数の重み係数１９Ａが更新されるので、第１クライアント２０による学習経過を十分に反映して重み係数１９Ａを更新することができる。したがって、通信コストを低減しながら並列分散学習処理を効率的に実行することができる。 As a result, not only the sum 29B of the plurality of gradients received from the first client 20 but also the number 29C of the plurality of gradients having a small communication cost is used to update the weighting coefficient 19A of the objective function. It is possible to update the weighting coefficient 19A by sufficiently reflecting the learning progress due to. Therefore, the parallel distributed learning process can be efficiently executed while reducing the communication cost.

（第２実施形態）
図１２を参照して、第２実施形態に係る情報処理システムの構成を説明する。この情報処理システム５は、ネットワーク５０等を介して相互に接続された複数のクライアントコンピュータ（以下、クライアントとも称する）２０，３０，４０によって構成される。情報処理システム５内のクライアント２０，３０，４０は、例えば、大規模なデータを扱うディープラーニングにおいて目的関数を基準とする並列分散学習処理を実行する。この目的関数を基準とする並列分散学習処理とは、目的関数を学習結果のフィードバック（評価値）として用いて、複数の処理主体で学習されるものであればどのようなものであってもよく、例えば、目的関数を最適化するための並列分散学習処理である。クライアント２０，３０，４０は、第１実施形態において図６を参照して上述したシステム構成を有する。図１２では、情報処理システム５に３台のクライアント２０，３０，４０が設けられる例を示したが、クライアントの数は２台であってもよいし、４台以上であってもよい。 (Second embodiment)
The configuration of the information processing system according to the second embodiment will be described with reference to FIG. The information processing system 5 is composed of a plurality of client computers (hereinafter, also referred to as clients) 20, 30, 40 that are mutually connected via a network 50 or the like. The clients 20, 30, and 40 in the information processing system 5 execute parallel distributed learning processing with an objective function as a reference in deep learning that handles large-scale data, for example. The parallel distributed learning process based on the objective function may be any process as long as it is learned by a plurality of processing subjects by using the objective function as feedback (evaluation value) of the learning result. , Parallel distributed learning processing for optimizing the objective function. The clients 20, 30, 40 have the system configuration described above with reference to FIG. 6 in the first embodiment. Although FIG. 12 shows an example in which the information processing system 5 is provided with three clients 20, 30, and 40, the number of clients may be two, or four or more.

図１２に示すように、この並列分散学習処理では、クライアント２０，３０，４０は、各々に割り当てられた学習データ２１Ａ，３１Ａ，４１Ａを用いて目的関数のパラメータ（例えば、重み係数）を更新し、その際の学習経過を示すデータを相互に送受信し得る。各クライアント２０，３０，４０は、受信した学習経過を示すデータを利用して、各クライアント２０，３０，４０の目的関数のパラメータをさらに更新する。 As shown in FIG. 12, in this parallel distributed learning process, the clients 20, 30, 40 update the parameters (for example, weighting coefficient) of the objective function using the learning data 21A, 31A, 41A assigned to each. , And data showing the learning progress at that time can be mutually transmitted and received. Each client 20, 30, 40 further updates the parameters of the objective function of each client 20, 30, 40 by using the received data showing the learning progress.

より具体的には、例えば、クライアント２０は、クライアント３０から送信された学習経過を示すデータを用いて、クライアント２０の目的関数のパラメータをさらに更新する。また、例えば、クライアント３０は、クライアント２０から送信された学習経過を示すデータと、クライアント４０から送信された学習経過を示すデータとを用いて、クライアント３０の目的関数のパラメータをさらに更新する。 More specifically, for example, the client 20 further updates the parameters of the objective function of the client 20 using the data indicating the learning progress transmitted from the client 30. Further, for example, the client 30 further updates the parameters of the objective function of the client 30 using the data indicating the learning progress transmitted from the client 20 and the data indicating the learning progress transmitted from the client 40.

これにより、各クライアントの学習経過が、情報処理システム５内の別のクライアントでも共有されることになるので、情報処理システム５全体での目的関数の最適化を効率的に進めることができる。 As a result, the learning progress of each client is shared by the other clients in the information processing system 5, so that the optimization of the objective function in the entire information processing system 5 can be efficiently advanced.

第１実施形態でも述べたように、ディープラーニングでは、目的関数を最適化する手法として、例えば、確率的勾配降下法（ＳＧＤ）が用いられる。このＳＧＤでは、勾配ベクトルと称される最適解方向へのベクトルを用いて、目的関数の重み係数（重みベクトル）が更新される。ＳＧＤにおける現在の状態を示す重みベクトル、勾配ベクトル及び学習係数をそれぞれW^(t)、∇W^(t)、ε^(t)とすると、更新後の重みベクトルW^(t+1)は、以下の式（４）で表される。
W^(t+1)＝W^(t)−ε^(t)∇W^(t) 式（４）
更新幅を決定する学習係数ε^(t)は学習の進度に応じて適応的に決定され、例えば、学習の進度に応じて減衰する。近年では、ＳＧＤの最適化アルゴリズムとして、Ａｄａｇｒａｄ、Ａｄａｄｅｌｔａ、Ａｄａｍ等に代表される学習係数自動減衰アルゴリズムが用いられるケースが増加している。これらの学習係数自動減衰アルゴリズムでは、学習係数ε^(t)は、勾配∇W^(t)に依存して減衰する。 As described in the first embodiment, in deep learning, for example, the stochastic gradient descent method (SGD) is used as a method of optimizing the objective function. In this SGD, the weight coefficient (weight vector) of the objective function is updated using a vector in the optimal solution direction called a gradient vector. If the weight vector indicating the current state in SGD, the gradient vector, and the learning coefficient are W ^(t) , ∇W ^(t) , and ε ^(t) , respectively, the updated weight vector W ^(t+1) is It is expressed by equation (4).
W ^(t+1) = W ^(t) −ε ^(t) ∇W ^(t) Equation (4)
The learning coefficient ε ^(t) that determines the update width is adaptively determined according to the progress of learning, and is attenuated according to the progress of learning, for example. In recent years, as a SGD optimization algorithm, cases in which a learning coefficient automatic attenuation algorithm represented by Adgrad, Adadelta, Adam, etc. are used are increasing. In these learning coefficient automatic decay algorithms, the learning coefficient ε ^(t) decays depending on the gradient ∇W ^(t) .

また、ＳＧＤによる目的関数の最適化を並列分散化した場合には、情報処理システム５内で共有される学習経過として勾配ベクトルが用いられることがある。 When the optimization of the objective function by SGD is distributed in parallel, a gradient vector may be used as a learning process shared in the information processing system 5.

しかし、一般にディープラーニングにおける勾配ベクトルは非常に大きな次元（例えば、数百万）を有するので、情報処理システム５内で勾配ベクトルを共有するための通信コストも非常に高くなる。通信コストを減少させるために、例えば、第１クライアント２０が目的関数の重み係数を複数回更新する間に、各々の更新で用いられた勾配を複数回分足し合わせた勾配の和∇W_transferを算出し、この勾配の和∇W_transferを第２クライアント３０に送信することが考えられる。その場合、第２クライアント３０は、以下の式（５）に従って、受信した勾配の和∇W_transferと、第２クライアント３０上で算出された勾配∇W_localとを用いて、重み係数Wを更新する。

However, since the gradient vector in deep learning generally has a very large dimension (for example, several millions), the communication cost for sharing the gradient vector in the information processing system 5 also becomes very high. In order to reduce the communication cost, for example, while the first client 20 updates the weighting coefficient of the objective function a plurality of times, the gradient sum ∇ W _{transfer obtained} by adding the gradients used in each update a plurality of times is calculated. However, it is conceivable to transmit the sum ∇W _{transfer of} this gradient to the second client 30. In that case, the second client 30 updates the weighting factor W using the sum of gradients ∇W _transfer received and the gradient ∇W _local calculated on the second client 30 according to the following equation (5). To do.

これにより、第２クライアント３０は、第１クライアント２０による学習経過が反映された重み係数Wを用いて、効率的に目的関数の最適化を進めることができる。 As a result, the second client 30 can efficiently proceed with the optimization of the objective function by using the weighting coefficient W that reflects the learning progress of the first client 20.

しかし、第１クライアント２０が複数の勾配∇Wの各々を用いて重み係数Wを更新するのに対して、第２クライアント３０は、受信した勾配の和∇W_transferを用いて重み係数Wを更新するので、第１クライアント２０上で更新される重み係数Wと第２クライアント３０上で更新される重み係数Wとでは、処理工程（処理ステップ数）の差が生じることになる。 However, while the first client 20 updates the weighting coefficient W using each of the plurality of gradients ∇W, the second client 30 updates the weighting coefficient W using the sum of the received gradients ∇W _transfer. Therefore, the weighting coefficient W updated on the first client 20 and the weighting coefficient W updated on the second client 30 have a difference in processing steps (the number of processing steps).

図１３は、重み係数の更新に用いられる勾配の一例を示す。この例では、第１クライアント２０から第２クライアント３０に、第１クライアント２０で算出された複数の勾配５１１，５１２，５１３，５１４の和５２が送信される。そして、第２クライアント３０は、この勾配の和５２と第２クライアント３０で算出された勾配５３との和（すなわち、勾配の和５２と勾配５３との合成ベクトル）５４を用いて、重み係数を更新する。 FIG. 13 shows an example of the gradient used for updating the weighting factor. In this example, the sum 52 of the plurality of gradients 511, 512, 513, 514 calculated by the first client 20 is transmitted from the first client 20 to the second client 30. Then, the second client 30 uses the sum 52 of the gradients 52 and the gradient 53 calculated by the second client 30 (that is, the combined vector of the gradient sum 52 and the gradient 53) 54 to determine the weighting factor. Update.

図１３に示す例では、第１クライアント２０が四つの勾配５１１，５１２，５１３，５１４を用いて重み係数を更新しているのに対して、第２クライアント３０は一つの勾配（勾配の和）５４を用いて重み係数を更新している。換言すると、第１クライアント２０が式（４）に基づく重み係数の更新処理を４回実行しているのに対して、サーバ１０は式（５）に基づく重み係数の更新処理を１回実行している。式（４）及び式（５）に含まれる学習係数εは、学習の進度に応じて適応的に決定される（例えば、勾配に依存して減衰する）。したがって、四つの勾配５１１，５１２，５１３，５１４を用いた重み係数の更新処理と、一つの勾配（勾配の和）５４を用いた重み係数の更新処理とでは、処理工程の差が生じることになる。これによって、送信側の第１クライアント２０による学習経過が、第２クライアント３０での重み係数の更新に十分に反映されない、すなわち、情報処理システム５内で学習経過が十分に共有されない可能性がある。 In the example shown in FIG. 13, the first client 20 updates the weighting factor using the four gradients 511, 512, 513, 514, while the second client 30 updates one weighting gradient (sum of gradients). 54 is used to update the weighting factor. In other words, the first client 20 executes the weight coefficient updating process based on the equation (4) four times, while the server 10 executes the weight coefficient updating process based on the equation (5) once. ing. The learning coefficient ε included in the equations (4) and (5) is adaptively determined according to the progress of learning (for example, attenuates depending on the gradient). Therefore, there is a difference in processing steps between the weight coefficient update process using the four gradients 511, 512, 513, and 514 and the weight coefficient update process using one gradient (sum of gradients) 54. Become. As a result, the learning progress by the first client 20 on the transmission side may not be sufficiently reflected in the update of the weighting factor by the second client 30, that is, the learning progress may not be sufficiently shared in the information processing system 5. .

このような処理工程の差を低減するために、本実施形態では、各クライアント２０，３０，４０間で、複数の勾配の和だけでなく、この複数の勾配の和の算出に用いられた複数の勾配の数を特定可能な情報も送信するこの複数の勾配の数を特定可能な情報は、複数の勾配の数を特定できればどのような情報であってもよく、例えば数値（例：“４”）を直接示す情報であってもよく、間接的に数値が導かれる情報であってもよい。 In order to reduce such a difference in processing steps, in the present embodiment, not only the sum of a plurality of gradients among the clients 20, 30, 40, but also a plurality of sums used for calculating the sum of the plurality of gradients are used. The information capable of specifying the number of the plurality of gradients may be any information as long as the number of the plurality of gradients can be specified. For example, a numerical value (for example, “4 )) may be directly indicated information, or the numerical value may be indirectly derived.

図１４は、本実施形態で重み係数の更新に用いられる勾配の一例を示す。この例では、第１クライアント２０から第２クライアント３０に、第１クライアント２０で算出された複数の勾配５１１，５１２，５１３，５１４の和５２と、これら複数の勾配５１１，５１２，５１３，５１４の数N（ここでは、N=4）とが送信される。第２クライアント３０は、この第２クライアント３０自体で算出された勾配５３だけでなく、受信した勾配の和５２を勾配の数Nで除して得られたN個の勾配５５１，５５２，５５３，５５４も用いて、第２クライアント３０の重み係数を更新する。 FIG. 14 shows an example of the gradient used for updating the weighting coefficient in this embodiment. In this example, the sum 52 of the plurality of gradients 511, 512, 513, 514 calculated by the first client 20 and the plurality of gradients 511, 512, 513, 514 are calculated from the first client 20 to the second client 30. The number N (here N=4) is transmitted. The second client 30 includes not only the gradient 53 calculated by the second client 30 itself but also N gradients 551, 552, 553 obtained by dividing the received gradient sum 52 by the number N of gradients. 554 is also used to update the weighting factor of the second client 30.

図１４に示す例では、第１クライアント２０が四つの勾配５１１，５１２，５１３，５１４を用いて重み係数を更新しているのに対して、第２クライアント３０は、四つの勾配５５１，５５２，５５３，５５４と、この第２クライアント３０自体で算出された勾配５３とを用いて重み係数を更新している。より具体的には、第２クライアント３０は、次の式（６）を用いて第２クライアント３０の重み係数を更新する。

In the example shown in FIG. 14, the first client 20 updates the weighting factor using the four

gradients

511, 512, 513, 514, while the second client 30 updates the four

gradients

551, 552. The weighting coefficient is updated using 553 and 554 and the gradient 53 calculated by the second client 30 itself. More specifically, the second client 30 updates the weighting factor of the second client 30 using the following equation (6).

第２クライアント３０は、送信側の第１クライアント２０による四つの勾配５１１，５１２，５１３，５１４を用いた重み係数の更新を、四つの勾配５５１，５５２，５５３，５５４を用いた重み係数の更新で擬似的に再現することによって、処理工程の差を低減している。これにより、送信側の第１クライアント２０による学習経過が、第２クライアント３０での重み係数の更新に十分に反映でき、したがって、情報処理システム５内で学習経過を十分に共有することができる。 The second client 30 updates the weight coefficient using the four gradients 511, 512, 513, 514 by the first client 20 on the transmission side, and updates the weight coefficient using the four gradients 551, 552, 553, 554. The difference in the processing steps is reduced by pseudo-reproducing. Thereby, the learning progress by the first client 20 on the transmission side can be sufficiently reflected in the update of the weighting coefficient in the second client 30, and thus the learning progress can be sufficiently shared in the information processing system 5.

また、勾配の数Nはスカラー量であるので、勾配の数Nを特定可能な情報は、非常に大きな次元（例えば、数百万）を有する勾配（勾配ベクトル）のデータに対して十分に小さなメタデータである。そのため、勾配の数Nをさらに送信することによって、通信コストにほとんど影響を与えることなく、クライアント２０，３０，４０間での重み係数更新の処理工程の差を低減することができる。 Also, since the number N of gradients is a scalar quantity, information that can identify the number N of gradients is sufficiently small for data of a gradient (gradient vector) having a very large dimension (for example, millions). It is metadata. Therefore, by further transmitting the number N of gradients, it is possible to reduce the difference in the processing steps of updating the weighting factors among the clients 20, 30, and 40, with little influence on the communication cost.

図１５は、情報処理システム５内のクライアント２０，３０，４０によって、目的関数を基準とする並列分散学習処理で学習経過が共有される例を示す。以下では、各クライアント２０，３０，４０に学習データ２１Ａ，３１Ａ，４１Ａが既に割り当てられている場合を想定する。各クライアント２０，３０，４０は、割り当てられた学習データを用いて、目的関数の重み係数Wを繰り返し更新する。 FIG. 15 shows an example in which the learning progress is shared by the parallel distributed learning processing using the objective function as a reference by the clients 20, 30, 40 in the information processing system 5. In the following, it is assumed that the learning data 21A, 31A, 41A are already assigned to the clients 20, 30, 40. Each client 20, 30, 40 repeatedly updates the weighting coefficient W of the objective function using the assigned learning data.

図１５に示す例では、第１クライアント２０は、割り当てられた学習データ２１Ａを用いて、重み係数Wを４回更新し、その４回の更新毎に算出された４個の勾配の和∇W_transferを算出する（Ｓ２１）。そして、第１クライアント２０は、算出された勾配の和∇W_transferと勾配の数とを示すデータ(∇W_transfer, 4)を第２クライアント３０に送信する（Ｓ２２）。 In the example illustrated in FIG. 15, the first client 20 updates the weighting factor W four times using the assigned learning data 21A, and the sum ∇W of the four gradients calculated for each of the four updates. _Transfer is calculated (S21). Then, the first client 20 transmits data (∇W _transfer , 4) indicating the calculated sum of gradients ∇W _transfer and the number of gradients to the second client 30 (S22).

次いで、第２クライアント３０は、このデータ(∇W_transfer, 4)を受信し、勾配の和∇W_transfer及び勾配の数と、第２クライアント３０上で算出された勾配∇W_localとを用いて、第２クライアント３０の重み係数Wを更新する（Ｓ２３）。 Then, the second client 30 receives this data (∇W _transfer , 4) and uses the sum of gradients ∇W _transfer and the number of gradients and the gradient ∇W _local calculated on the second client 30. , And updates the weighting factor W of the second client 30 (S23).

同様にして、各クライアント２０，３０，４０は、学習データを用いて、目的関数の重み係数WをN回更新し、勾配の和∇W_transferと勾配の数Nとを示すデータを別のクライアント２０，３０，４０に送信することができる。そして、各クライアント２０，３０，４０は、別のクライアント２０，３０，４０から受信した勾配の和∇W_transfer及び勾配の数Nと、そのクライアント自体で算出した勾配∇W_localとを用いて、重み係数Wを更新することができる。 Similarly, each of the clients 20, 30 and 40 updates the weighting coefficient W of the objective function N times by using the learning data and obtains data indicating the sum ∇ W _transfer of gradients and the number N of gradients from another client. It can be sent to 20, 30, 40. Then, each client 20, 30, 40 uses the sum ∇W _transfer of gradients and the number N of gradients received from another client 20, 30, 40, and the gradient ∇W _local calculated by the client itself, The weighting factor W can be updated.

このように、各クライアント２０，３０，４０は、例えば、予め規定されたタイミングで（例えば、４回の更新毎に）、勾配の和∇W_transferと勾配の数Nとを、予め規定された別のクライアント２０，３０，４０に送信し得る。これにより、情報処理システム５内で、各クライアント２０，３０，４０による学習経過が共有され、情報処理システム５全体での目的関数の最適化を効率的に進めることができる。なお、この送信タイミングは、更新回数に限らず、例えば、前回、勾配の和∇W_transferと勾配の数Nとが送信されてからの経過時間に基づいて決定されてもよい。 In this way, each client 20, 30, 40 defines the sum of gradients ∇W _transfer and the number N of gradients in advance at predetermined timings (for example, every four updates). It may be sent to another client 20, 30, 40. As a result, the learning progress by the clients 20, 30, 40 is shared in the information processing system 5, and the optimization of the objective function in the entire information processing system 5 can be efficiently advanced. Note that this transmission timing is not limited to the number of updates, but may be determined based on the elapsed time from the transmission of the gradient sum ∇W _transfer and the gradient number N last time, for example.

また、図１６は、情報処理システム５内のクライアント２０，３０，４０によって、目的関数を基準とする並列分散学習処理で学習経過が共有される別の例を示す。各クライアント２０，３０，４０は、割り当てられた学習データ２１Ａ，３１Ａ，４１Ａを用いて、目的関数の重み係数Wを繰り返し更新する。 16 shows another example in which the learning progress is shared by the parallel distributed learning processing using the objective function as a reference by the clients 20, 30, 40 in the information processing system 5. Each of the clients 20, 30, 40 repeatedly updates the weighting coefficient W of the objective function using the assigned learning data 21A, 31A, 41A.

図１６に示す例では、第３クライアント４０は、割り当てられた学習データ４１Ａを用いて、重み係数Wを４回更新し、その４回の更新毎に算出された４個の勾配の和∇W_t1を算出する（Ｓ３１）。そして、第３クライアント４０は、算出された勾配の和∇W_t1と勾配の数とを示すデータ(∇W_t1, 4)を第１クライアント３０に送信する（Ｓ３２）。第１クライアント２０は、第３クライアント４０によって送信されたデータ(∇W_t1, 4)を受信する。 In the example shown in FIG. 16, the third client 40 updates the weighting factor W four times using the assigned learning data 41A, and the sum ∇W of the four gradients calculated for each of the four updates. Calculate _t1 (S31). Then, the third client 40 transmits data (∇W _t1 , 4) indicating the calculated sum of gradients ∇W _t1 and the number of gradients to the first client 30 (S32). The first client 20 receives the data (∇W _t1 , 4) transmitted by the third client 40.

また、第１クライアント２０は、割り当てられた学習データ２１Ａを用いて、重み係数Wを４回更新し、その４回の更新毎に算出された４個の勾配の和を算出している（Ｓ３３）。第１クライアント２０は、この算出した勾配の和と第３クライアント４０から受信した勾配の和∇W_t1との総和∇W_t2を算出し、さらに、受信した勾配の数（＝４）と、第１クライアント２０上で算出した勾配の数（＝４）との和（＝８）を算出し、算出された勾配の総和と勾配の数の和とを示すデータ(∇W_t2, 8)を第２クライアント３０に送信する（Ｓ３４）。第２クライアント３０は、第１クライアント２０によって送信されたデータ(∇W_t2, 8)を受信する。なお、第１クライアント２０は、第３クライアント４０から受信したデータ(∇W_t1, 4)と、第１クライアント２０上で、例えば直前に算出された勾配とを用いて、上述した式（６）に従って、第１クライアント２０の重み係数Wを更新するようにしてもよい。 Further, the first client 20 updates the weighting factor W four times using the assigned learning data 21A, and calculates the sum of the four gradients calculated for each of the four updates (S33). ). The first client 20 calculates the sum ∇W _t2 of the calculated sum of the gradients and the sum ∇W _{t1 of} the gradients received from the third client 40, and further calculates the number of the received gradients (=4) and the Calculate the sum (=8) with the number of gradients (=4) calculated on one client 20, and calculate the data (∇W _t2 , 8) indicating the total of the calculated gradients and the sum of the number of gradients. 2 Send to the client 30 (S34). The second client 30 receives the data (∇W _t2 , 8) transmitted by the first client 20. The first client 20 uses the data (∇W _t1 , 4) received from the third client 40 and the gradient calculated on the first client 20, for example, immediately before, and uses the above equation (6). Accordingly, the weighting factor W of the first client 20 may be updated.

また、第２クライアント３０は、割り当てられた学習データ３１Ａを用いて、重み係数Wを６回更新し、その６回の更新毎に算出された６個の勾配の和を算出している（Ｓ３５）。第２クライアント３０は、この算出した勾配の和と第１クライアント２０から受信した勾配の和∇W_t2との総和∇W_t3を算出し、さらに、受信した勾配の数（＝８）と、第２クライアント３０上で算出した勾配の数（＝６）との和（＝１４）を算出し、算出された勾配の総和と勾配の数の和とを示すデータ(∇W_t3, 14)を第３クライアント４０に送信する（Ｓ３６）。第３クライアント４０は、第２クライアント３０によって送信されたデータ(∇W_t3, 14)を受信する。なお、第２クライアント３０は、第１クライアント２０から受信したデータ(∇W_t2, 8)と、第２クライアント３０上で、例えば直前に算出された勾配とを用いて、上述した式（６）に従って、第２クライアント３０の重み係数Wを更新するようにしてもよい。 Further, the second client 30 updates the weighting factor W six times using the assigned learning data 31A, and calculates the sum of the six gradients calculated for each of the six updates (S35). ). The second client 30 calculates the sum ∇W _t3 of the calculated sum of the gradients and the sum ∇W _{t2 of} the gradients received from the first client 20, and further calculates the number of the received gradients (=8) and the 2 Calculate the sum (=14) with the number of gradients (=6) calculated on the client 30, and calculate the data (∇W _t3 , 14) indicating the sum of the calculated gradients and the number of gradients. 3 to the client 40 (S36). The third client 40 receives the data (∇W _t3 , 14) transmitted by the second client 30. The second client 30 uses the data (∇W _t2 , 8) received from the first client 20 and the gradient calculated immediately before on the second client 30, for example, to obtain the above equation (6). Accordingly, the weighting factor W of the second client 30 may be updated.

次いで、第３クライアント４０は、第２クライアント３０から受信したデータ(∇W_t3, 14)と、第３クライアント４０上で、例えば直前に算出された勾配∇W_localとを用いて、第３クライアント４０の重み係数Wを更新する（Ｓ３７）。より具体的には、第３クライアント４０は、受信したデータ(∇W_t3, 14)と、Ｓ３１においてこの第３クライアント４０上で算出されたデータ(∇W_t1, 4)とを用いて、第３クライアント４０で共有すべき勾配の和と勾配の数とを算出する。すなわち、第３クライアント４０は、受信した勾配の総和∇W_t3から、第３クライアント４０自体で算出された勾配の和∇W_t1を引いた値（∇W_t3−∇W_t1）を第３クライアント４０で共有すべき勾配の和として算出する。また、第３クライアント４０は、受信した勾配の数（＝１４）から、第３クライアント４０自体で算出された勾配の数（＝４）を引いた値（＝１０）を第３クライアント４０で共有すべき勾配の数として算出する。第３クライアント４０は、算出された勾配の和（∇W_t3−∇W_t1）と勾配の数（＝１０）と、第３クライアント４０上で直前に算出された勾配∇W_localとを用いて、上述した式（６）に従って第３クライアント４０の重み係数Wを更新する。 Then, the third client 40 uses the data (∇W _t3 , 14) received from the second client 30 and the gradient ∇W _local calculated on the third client 40, for example, immediately before the third client 40. The weighting factor W of 40 is updated (S37). More specifically, the third client 40 uses the received data (∇W _t3 , 14) and the data (∇W _t1 , 4) calculated on the third client 40 in S31, The sum of gradients and the number of gradients to be shared by the three clients 40 are calculated. That is, the third client 40 subtracts a value (∇W _t3 −∇W _t1 ) obtained by subtracting the sum ∇W _{t1 of} the gradients calculated by the third client 40 itself from the total sum ∇W _t3 of the received gradients. Calculated as the sum of the gradients to be shared by 40. The third client 40 shares a value (=10) obtained by subtracting the number of gradients (=4) calculated by the third client 40 itself from the number of received gradients (=14) with the third client 40. Calculate as the number of gradients to be made. The third client 40 uses the calculated sum of gradients (∇W _t3 −∇W _t1 ), the number of gradients (=10), and the gradient ∇W _local calculated immediately before on the third client 40. The weighting factor W of the third client 40 is updated according to the above equation (6).

このように、各クライアント２０，３０，４０は、別のクライアントから受信した勾配の和と勾配の数とに、そのクライアント自体で算出した勾配の和と勾配の数とをそれぞれ加算し、加算結果である勾配の総和と勾配の数の和とをさらに別のクライアントに送信することができる。各クライアント２０，３０，４０は、受信した勾配の総和と勾配の数の和と、そのクライアント自体で算出した勾配∇W_localとを用いて、重み係数Wを更新することができる。なお、各クライアント２０，３０，４０は、受信した勾配の総和と勾配の数の和とに、そのクライアント自体で算出した勾配の和と勾配の数とがそれぞれ含まれている場合には、それらを引いた値を用いて、重み係数Wを更新することができる。 In this way, each client 20, 30, 40 adds the sum of gradients and the number of gradients calculated by the client itself to the sum of gradients and the number of gradients received from another client, and the addition result The sum of the gradients and the sum of the number of gradients can be sent to yet another client. Each of the clients 20, 30, 40 can update the weighting factor W using the sum of the received gradients and the sum of the number of gradients and the gradient ∇W _local calculated by the client itself. Note that each of the clients 20, 30 and 40 receives the sum of gradients and the number of gradients, if the sum of the gradients and the number of gradients calculated by the client itself are included in the sum. The weighting factor W can be updated using a value obtained by subtracting.

図１７は、第１クライアント２０と第２クライアント３０とによってそれぞれ実行される並列分散学習クライアントプログラム４０２の機能構成の一例を示す。第１クライアント２０と第２クライアント３０とは、例えば、ディープラーニングによる目的関数を基準とする並列分散学習処理を実行する。ここでは、説明を分かりやすくするために、情報処理システム５において、第１クライアント２０が学習経過を示すデータを第２クライアント３０に送信し、第２クライアント３０がこの学習経過を用いて重み係数を更新する場合を主に例示する。 FIG. 17 shows an example of the functional configuration of the parallel distributed learning client program 402 executed by the first client 20 and the second client 30, respectively. The first client 20 and the second client 30 execute, for example, parallel distributed learning processing based on an objective function by deep learning. Here, in order to make the explanation easier to understand, in the information processing system 5, the first client 20 transmits data indicating the learning progress to the second client 30, and the second client 30 uses the learning progress to determine the weighting factor. The case of updating is mainly illustrated.

第１クライアント２０上で実行される並列分散学習クライアントプログラム４０２は、例えば、受信制御部２２、算出部２３及び送信制御部２４を備える。第１クライアント２０に設けられた記憶媒体２１（例えば、不揮発性メモリ３０５）には、第１クライアント２０に割り当てられた学習データ２１Ａが格納されている。 The parallel distributed learning client program 402 executed on the first client 20 includes, for example, a reception control unit 22, a calculation unit 23, and a transmission control unit 24. The learning data 21A assigned to the first client 20 is stored in the storage medium 21 (for example, the non-volatile memory 305) provided in the first client 20.

算出部２３は、学習データ２１Ａを用いて、目的関数の重み係数２９Ａ（第１重み係数）を更新する処理を繰り返し実行する。算出部２３は、第１期間において、重み係数２９Ａが更新される毎に、その更新時に算出された勾配を積算することによって、複数の勾配の和２９Ｂを算出すると共に、積算された複数の勾配の数２９Ｃをカウントする。なお、第１期間は、例えば、時間で規定されてもよいし、重み係数２９Ａが更新される回数で規定されてもよい。 The calculation unit 23 repeatedly executes the process of updating the weighting coefficient 29A (first weighting coefficient) of the objective function using the learning data 21A. Every time the weighting factor 29A is updated, the calculating unit 23 integrates the gradients calculated at the time of updating to calculate the sum 29B of the plurality of gradients, and the plurality of integrated gradients. The number 29C is counted. The first period may be defined by time or the number of times the weighting coefficient 29A is updated, for example.

受信制御部２２及び送信制御部２４は、通信デバイス３０６を介して、第２クライアント３０との間でデータを送受信する機能を有する。 The reception control unit 22 and the transmission control unit 24 have a function of transmitting and receiving data to and from the second client 30 via the communication device 306.

送信制御部２４は、第１期間が経過した場合、第１クライアント２０による学習経過を示すデータを第２クライアント３０に送信する。送信制御部２４は、例えば、算出された複数の勾配の和２９Ｂと、カウントされた複数の勾配の数２９Ｃを特定可能な情報とを第２クライアント３０に送信する。 When the first period has elapsed, the transmission control unit 24 transmits the data indicating the learning progress of the first client 20 to the second client 30. The transmission control unit 24 transmits, for example, the sum 29B of the calculated plurality of gradients and the information capable of specifying the counted number 29C of the plurality of gradients to the second client 30.

また、第２クライアント３０上で実行される並列分散学習クライアントプログラム４０２は、例えば、受信制御部３２、算出部３３及び送信制御部３４を備える。第２クライアント３０に設けられた記憶媒体３１（例えば、不揮発性メモリ３０５）には、第２クライアント３０に割り当てられた学習データ３１Ａが格納されている。 The parallel distributed learning client program 402 executed on the second client 30 includes, for example, a reception control unit 32, a calculation unit 33, and a transmission control unit 34. The learning data 31A assigned to the second client 30 is stored in the storage medium 31 (for example, the non-volatile memory 305) provided in the second client 30.

受信制御部３２及び送信制御部３４は、通信デバイス３０６を介して、第１クライアント２０との間でデータを送受信する機能を有する。 The reception control unit 32 and the transmission control unit 34 have a function of transmitting and receiving data to and from the first client 20 via the communication device 306.

受信制御部３２は、第１クライアント２０から、学習経過を示す複数の勾配の和２９Ｂと、それら複数の勾配の数２９Ｃを特定可能な情報とを受信する。 The reception control unit 32 receives from the first client 20 the sum 29B of a plurality of gradients indicating the learning progress and the information capable of specifying the number 29C of the plurality of gradients.

算出部３３は、第１クライアント２０から受信した複数の勾配の和２９Ｂと複数の勾配の数２９Ｃを特定可能な情報とを用いて、第２クライアント３０の重み係数３９Ａを更新する。算出部３３は、例えば、上述した式（６）に従って、複数の勾配の和２９Ｂを複数の勾配の数２９Ｃで除した値に、学習係数を掛けた値を用いて、第２クライアント３０の重み係数３９Ａを更新する。なお、この学習係数は、例えば、複数の勾配の和２９Ｂと、複数の勾配の数２９Ｃを特定可能な情報とを用いて決定される。これにより、第２クライアント３０は、第１クライアント２０による学習経過が反映された重み係数３９Ａを用いて、並列分散学習処理を効率的に進めることができる。 The calculation unit 33 updates the weighting factor 39A of the second client 30 using the sum 29B of the plurality of gradients received from the first client 20 and the information capable of specifying the number 29C of the plurality of gradients. The calculation unit 33 uses the value obtained by multiplying the learning coefficient by a value obtained by dividing the sum 29B of the plurality of gradients by the number 29C of the plurality of gradients, for example, according to the above-described formula (6), and uses the value of the second client 30. The coefficient 39A is updated. The learning coefficient is determined using, for example, the sum 29B of the plurality of gradients and the information capable of specifying the number 29C of the plurality of gradients. As a result, the second client 30 can efficiently proceed the parallel distributed learning process by using the weighting coefficient 39A in which the learning progress of the first client 20 is reflected.

また、送信制御部３４は、第１期間が経過した場合、第２クライアント３０による学習経過を示すデータを第１クライアント２０に送信してもよい。送信制御部３４は、例えば、算出された勾配の和３９Ｂと、カウントされた勾配の数３９Ｃを特定可能な情報とを第１クライアント２０に送信してもよい。 Further, the transmission control unit 34 may transmit data indicating the learning progress of the second client 30 to the first client 20 when the first period has elapsed. The transmission control unit 34 may transmit, for example, the calculated sum 39B of gradients and information capable of specifying the counted number 39C of gradients to the first client 20.

その場合、第１クライアント２０の算出部２３は、これら勾配の和３９Ｂと勾配の数３９Ｃとを用いて、第１クライアント２０の重み係数２９Ａを更新することができる。これにより、第１クライアント２０は、第２クライアント３０による学習経過が反映された重み係数２９Ａを用いて、並列分散学習処理を効率的に進めることができる。 In that case, the calculation unit 23 of the first client 20 can update the weighting factor 29A of the first client 20 using the sum 39B of these gradients and the number 39C of gradients. As a result, the first client 20 can efficiently proceed the parallel distributed learning process by using the weighting coefficient 29A that reflects the learning progress of the second client 30.

なお、上述した構成では、勾配の和だけでなく勾配の数Nも含む学習経過が第１クライアント２０から第２クライアント３０に送信される例を示したが、この勾配の数Nの代わりに、第１クライアント２０で算出されたN個の勾配（勾配ベクトル）の大きさの比を表すN次元のベクトルが第２クライアント３０に送信されるようにしてもよい。 In the above-described configuration, the example in which the learning progress including not only the sum of the gradients but also the number N of gradients is transmitted from the first client 20 to the second client 30 is shown. However, instead of the number N of gradients, An N-dimensional vector representing the size ratio of the N gradients (gradient vectors) calculated by the first client 20 may be transmitted to the second client 30.

第２クライアント３０の受信制御部３２は、例えば、第１クライアント２０から、複数の勾配の和２９Ｂと、これら複数の勾配の各々の大きさの比を表すベクトルとを受信する。そして、算出部３３は、複数の勾配の和２９Ｂと、複数の勾配の各々の大きさの比を表すベクトルとを用いて、例えば、勾配の和２９Ｂが、ベクトルによって表される大きさの比に基づいて分割された複数の勾配を算出する。算出部３３は、算出された複数の勾配を用いて重み係数３９Ａを更新する。これにより、送信側の第１クライアント２０による学習経過を第２クライアント３０の重み係数３９Ａの更新に、より反映させることができる。 The reception control unit 32 of the second client 30 receives, for example, the sum 29B of the plurality of gradients and the vector indicating the ratio of the magnitudes of the plurality of gradients from the first client 20. Then, the calculation unit 33 uses the sum 29B of the plurality of gradients and the vector representing the ratio of the magnitudes of the plurality of gradients, for example, the sum 29B of the gradients is the ratio of the magnitudes represented by the vectors. A plurality of gradients divided based on is calculated. The calculation unit 33 updates the weighting factor 39A using the calculated plurality of gradients. As a result, the learning progress by the first client 20 on the transmission side can be more reflected in the update of the weighting factor 39A of the second client 30.

また、情報処理システム５には、第１クライアント２０及び第２クライアント３０に限らず、３台以上のクライアントを設けることができ、各クライアントは、上述した第１クライアント２０及び第２クライアント３０と同様の構成を有する。したがって、情報処理システム５では、あるクライアントによる学習経過を別の複数のクライアントの重み係数の更新に反映することができる。 The information processing system 5 is not limited to the first client 20 and the second client 30 and can be provided with three or more clients, and each client is the same as the first client 20 and the second client 30 described above. It has the configuration of. Therefore, in the information processing system 5, the learning progress by a certain client can be reflected in the update of the weighting factor of another plurality of clients.

さらに、情報処理システム５では、複数のクライアントによる複数の学習経過を別の一つのクライアントの重み係数の更新に反映することもできる。 Further, in the information processing system 5, a plurality of learning progresses by a plurality of clients can be reflected in the update of the weighting coefficient of another one client.

例えば、第１クライアント２０が、第３クライアント４０から、この第３クライアント４０によって目的関数の重み係数（第３重み係数）を更新するために算出された複数の第１の勾配の和と、複数の第１の勾配の数を特定可能な情報とを受信した場合を想定する。この場合、第１クライアント２０の送信制御部２４は、複数の第１の勾配の和と、算出部２３によって重み係数２９Ａ（第１重み係数）を更新するために算出された複数の第２の勾配の和２９Ｂとの総和と、複数の第１の勾配の数と複数の第２の勾配の数２９Ｃとの和を特定可能な情報とを、第２クライアント３０に送信する。 For example, the first client 20 may add a plurality of first gradients calculated from the third client 40 to update the weighting coefficient (third weighting coefficient) of the objective function by the third client 40, and Suppose that the information capable of specifying the number of the first gradients of is received. In this case, the transmission control unit 24 of the first client 20 calculates the sum of the plurality of first gradients and the plurality of second calculation values calculated by the calculation unit 23 to update the weight coefficient 29A (first weight coefficient). The total sum of the gradient sum 29B and the information capable of specifying the sum of the plurality of first gradient numbers and the plurality of second gradient numbers 29C are transmitted to the second client 30.

第２クライアント３０の受信制御部３２は、複数の第１の勾配の和と複数の第２の勾配の和２９Ｂとの総和と、複数の第１の勾配の数と複数の第２の勾配の数２９Ｃとの和を特定可能な情報とを受信する。そして、算出部３３は、これら総和と、前記複数の第１の勾配の数と前記複数の第２の勾配の数との和を特定可能な情報とを用いて、重み係数３９Ａ（第２重み係数）を更新する。 The reception control unit 32 of the second client 30 determines the sum of the sum of the plurality of first gradients and the sum of the plurality of second gradients 29B, the number of the plurality of first gradients, and the plurality of the second gradients. Information that can specify the sum of the number 29C and the number 29C is received. Then, the calculation unit 33 uses the total sum and the information capable of specifying the sum of the number of the plurality of first gradients and the number of the plurality of second gradients, and the weighting factor 39A (second weight). Coefficient) is updated.

このように、第３クライアント４０と第１クライアント２０とを経由して得られた複数の学習経過を、別の第２クライアント３０の重み係数３９Ａの更新に反映することもできる。 In this way, a plurality of learning progresses obtained via the third client 40 and the first client 20 can be reflected in the update of the weighting factor 39A of another second client 30.

図１８のフローチャートを参照して、勾配の和と勾配の数とを送信する第１クライアント２０によって実行される処理の手順を説明する。以下では、目的関数の最適化に用いられる学習データが、クライアント２０，３０，４０に対して既に割り当てられている場合を想定する。 The procedure of the process executed by the first client 20 that transmits the sum of gradients and the number of gradients will be described with reference to the flowchart of FIG. In the following, it is assumed that the learning data used for the optimization of the objective function is already assigned to the clients 20, 30, 40.

まず、第１クライアント２０のＣＰＵ３０１は、勾配の和を送信するための∇W_transferを初期化する（ブロックＢ３１）、すなわち、∇W_transferに０を設定する。ＣＰＵ３０１は、送信される勾配の数Nに応じて、N回、ブロックＢ５３及びブロックＢ５４の手順を実行する（ブロックＢ５２）。より具体的には、ＣＰＵ３０１は、学習データ２１Ａを用いて目的関数の重みWを更新する（ブロックＢ５３）。そして、ＣＰＵ３０１は、重みWを更新する際に算出された勾配∇Wを、勾配の和∇W_transferに加算する（ブロックＢ５４）。 First, the CPU 301 of the first client 20 initializes ∇W _transfer for transmitting the sum of gradients (block B31), that is, sets ∇W _transfer to 0. The CPU 301 executes the procedure of block B53 and block B54 N times according to the number N of gradients transmitted (block B52). More specifically, the CPU 301 updates the weight W of the objective function using the learning data 21A (block B53). Then, the CPU 301 adds the gradient ∇W calculated when updating the weight W to the gradient sum ∇W _transfer (block B54).

勾配∇Wが勾配の和∇W_transferにN回加算された後、すなわち、重みWの更新がN回行われた後、ＣＰＵ３０１は勾配の和∇W_transferと勾配の数Nとを別のクライアント（例えば、第２クライアント３０）に送信する（ブロックＢ５５）。 After the gradient ∇W has been added N times to the gradient sum ∇W _transfer , that is, after the weight W has been updated N times, the CPU 301 determines the gradient sum ∇W _transfer and the number N of gradients by another client. (For example, the second client 30) (block B55).

次いで、図１９のフローチャートを参照して、勾配の和と勾配の数とを受信する第２クライアント３０によって実行される処理の手順を説明する。 Next, the procedure of the process executed by the second client 30 that receives the sum of gradients and the number of gradients will be described with reference to the flowchart in FIG.

まず、第２クライアント３０のＣＰＵ３０１は、学習データ３１Ａを用いて重みWを更新する（ブロックＢ６１）。そして、ＣＰＵ３０１は、第１クライアント２０から勾配の和∇W_transferと勾配の数Nとを受信したか否かを判定する（ブロックＢ６２）。第１クライアント２０から勾配の和∇W_transferと勾配の数Nとを受信していない場合（ブロックＢ６２のＮＯ）、ブロックＢ６１の手順に戻る。 First, the CPU 301 of the second client 30 updates the weight W using the learning data 31A (block B61). Then, the CPU 301 determines whether or not the sum of gradients ∇W _transfer and the number N of gradients have been received from the first client 20 (block B62). If the sum of gradients ∇W _transfer and the number N of gradients have not been received from the first client 20 (NO in block B62), the process returns to block B61.

第１クライアント２０から勾配の和∇W_transferと勾配の数Nとを受信した場合（ブロックＢ６２のＹＥＳ）、ＣＰＵ３０１は、更新のための勾配∇W_updateを初期化する（ブロックＢ６３）、すなわち、∇W_updateに０を設定する。また、ＣＰＵ３０１は、ブロックＢ６５からブロックＢ６８までの繰り返し処理に用いられる変数iに１を設定する（ブロックＢ６４）。ＣＰＵ３０１は、iが勾配の数N以下である間、ブロックＢ６６からブロックＢ６８までの手順を繰り返す（ブロックＢ６５）。より具体的には、ＣＰＵ３０１は学習係数ε_iを算出する（ブロックＢ６６）。ＣＰＵ３０１は、例えば、学習の進度に対応するiと、勾配の和∇W_transferを勾配の数Nで除した勾配の平均∇W_transfer/Nとを用いて、学習係数ε_iを算出する。ＣＰＵ３０１は、更新のための勾配∇W_updateに、学習係数ε_iと勾配の平均∇W_transfer/Nとを乗じた値を加算する（ブロックＢ６７）。そして、ＣＰＵ３０１は、変数iに１を加算する（ブロックＢ６８）。 When the sum of gradients ∇W _transfer and the number of gradients N are received from the first client 20 (YES in block B62), the CPU 301 initializes the gradient ∇W _update for updating (block B63), that is, ∇ Set W _update to 0. Further, the CPU 301 sets 1 to the variable i used in the iterative processing of the blocks B65 to B68 (block B64). The CPU 301 repeats the procedure from block B66 to block B68 while i is the number N of gradients or less (block B65). More specifically, the CPU 301 calculates the learning coefficient ε _i (block B66). For example, the CPU 301 calculates the learning coefficient ε _i using i corresponding to the progress of learning and the average ∇W _transfer /N of the gradients obtained by dividing the sum ∇W _transfer of the gradients by the number N of gradients. The CPU 301 adds a value obtained by multiplying the gradient ∇W _update for updating by the learning coefficient ε _i and the average ∇W _transfer /N of the gradient (block B67). Then, the CPU 301 adds 1 to the variable i (block B68).

変数iがNより大きいならば、ＣＰＵ３０１は、ブロックＢ６１で更新された重みWを、更新のための勾配の和∇W_updateを用いてさらに更新する（ブロックＢ６９）。 If the variable i is larger than N, the CPU 301 further updates the weight W updated in block B61 by using the sum of gradients for _update ∇W _update (block B69).

なお、第２クライアント３０のＣＰＵ３０１は、第１クライアント２０が第３クライアント４０から、第３クライアント４０によって目的関数の重みWを更新するために算出された複数の第１の勾配の和∇W_transferと第１の勾配の数Nとを受信した場合に、第１クライアント２０から、
（１）複数の第１の勾配の和∇W_transferと、第１クライアント２０によって目的関数の重みWを更新するために算出された複数の第２の勾配の和∇W_transferとの総和、及び
（２）複数の第１の勾配の数Nと複数の第２の勾配の数Nとの和
を受信してもよい。その場合、ＣＰＵ３０１は、複数の第１の勾配の和∇W_transferと複数の第２の勾配の和∇W_transferとの総和と、複数の第１の勾配の数Nと複数の第２の勾配の数Nとの和とを用いて、ブロックＢ６３以降の手順を実行することによって、第２クライアント３０の重みWを更新する。 Incidentally, CPU 301 of the second client 30, the first client 20 and the third client 40, the sum ∇W of the plurality of first slope calculated for the third client 40 updates the weight W of the objective function _transfer And the first gradient number N is received from the first client 20,
(1) Sum of a plurality of first gradient sums ∇W _transfer and a plurality of second gradient sums ∇W _transfer calculated for updating the weight W of the objective function by the first client 20, and
(2) Sum of the number N of the plurality of first gradients and the number N of the plurality of second gradients
May be received. In that case, CPU 301 is the sum of the sum ∇W _transfer sum ∇W _transfer a plurality of second gradient of the plurality of first slope, the number N and a plurality of second gradient of the plurality of first slope The weight W of the second client 30 is updated by executing the procedure from block B63 onward using the sum of the number N and the number N.

また、ＣＰＵ３０１は、第１クライアント２０と第３クライアント４０の各々から、勾配の和∇W_transferと勾配の数Nとを受信してもよい。その場合、ＣＰＵ３０１は、第１クライアント２０から受信した勾配の和∇W_transferと、第３クライアント４０から受信した勾配の和∇W_transferとの総和を、勾配の和∇W_transferとし、第１クライアント２０から受信した勾配の数Nと第３クライアント４０から受信した勾配の数Nとの和を、勾配の数Nとして、ブロックＢ６３以降の手順を実行することによって、第２クライアント３０の重みWを更新する。 Further, the CPU 301 may receive the sum of gradients ∇W _transfer and the number of gradients N from each of the first client 20 and the third client 40. In that case, CPU 301 is a sum ∇W _transfer gradient received from the first client 20, the sum of the sum ∇W _transfer gradient received from the third client 40, and the sum ∇W _transfer gradient, the first client The number W of gradients received from 20 and the number N of gradients received from the third client 40 are set as the number N of gradients, and the weight W of the second client 30 is set by executing the procedure from block B63. Update.

次いで、図２０は、本実施形態の複数のクライアント２０，３０，４０による並列分散学習の効果を示す。図２０に示す例では、目的関数を基準とする並列分散学習処理における学習時間と認識精度との関係が、処理が１台のクライアントで実行された場合の折れ線グラフ６１と、処理が３台のクライアントで実行された場合の折れ線グラフ６２とで示されている。これら折れ線グラフ６１，６２から、処理が３台のクライアントで実行された場合の方が、より早く最適解に収束し、より早く一定水準の認識精度（例えば、０．８）に到達していることが分かる。 Next, FIG. 20 shows the effect of parallel distributed learning by the plurality of clients 20, 30, 40 of this embodiment. In the example illustrated in FIG. 20, the relationship between the learning time and the recognition accuracy in the parallel distributed learning process with the objective function as a reference is the line graph 61 when the process is executed by one client, and the relationship between the process of three processes. It is shown as a line graph 62 when executed on the client. From these line graphs 61 and 62, when the processing is executed by three clients, the optimal solution converges faster and the recognition accuracy of a certain level (for example, 0.8) is reached earlier. I understand.

また、図２１は、勾配の和だけでなく勾配の数も用いることによる並列分散学習の効果を示す。図２１に示す例では、目的関数を基準とする並列分散学習処理における学習時間と認識精度との関係が、勾配の和だけを用いて処理が実行された場合の折れ線グラフ７１と、勾配の和と勾配の数とを用いて処理が実行された場合の折れ線グラフ７２とで示されている。勾配の和だけを用いて処理が実行された場合の折れ線グラフ７１では、勾配の和だけが用いられることによる処理工程の差に起因して、認識精度の変動がクライアントによって大きく異なる箇所がある。つまり、この折れ線グラフ７１は、最適解への収束に関して、クライアント間で異なる挙動が生じていることを示している。 21 shows the effect of parallel distributed learning by using not only the sum of gradients but also the number of gradients. In the example shown in FIG. 21, the relationship between the learning time and the recognition accuracy in the parallel distributed learning processing with the objective function as the reference is the line graph 71 when the processing is executed using only the sum of the gradients, and the sum of the gradients. And a line graph 72 when the process is executed using the number of gradients. In the line graph 71 in the case where the process is executed using only the sum of gradients, there is a portion where the variation in the recognition accuracy greatly differs depending on the client due to the difference in the processing steps due to the use of only the sum of the gradients. In other words, this line graph 71 indicates that different behaviors are occurring between clients regarding the convergence to the optimum solution.

これに対して、勾配の和と勾配の数とを用いて処理が実行された場合の折れ線グラフ７２では、処理工程の差が低減されたことによって、各クライアントの認識精度の変動がほぼ同様になっている。つまり、この折れ線グラフ７２は、最適解への収束に関して、クライアント間で同様の挙動が生じていることを示している。したがって、勾配の和だけでなく勾配の数も用いることによって、クライアント間で学習経過が十分に共有され、並列分散学習が効率的に進められていることが分かる。 On the other hand, in the line graph 72 when the process is executed using the sum of the gradients and the number of the gradients, the variation in the recognition accuracy of each client is almost the same because the difference in the processing steps is reduced. Is becoming That is, this line graph 72 indicates that similar behavior occurs between the clients regarding the convergence to the optimum solution. Therefore, by using not only the sum of gradients but also the number of gradients, it can be seen that the learning progress is sufficiently shared between the clients and the parallel distributed learning is efficiently advanced.

以上説明したように、本実施形態によれば、通信コストを低減しながら並列分散学習処理を効率的に実行することができる。第２クライアント３０と少なくとも一つの他のクライアント２０，４０とによって目的関数を基準とする並列分散処理が実行される場合に、第２クライアント３０の受信制御部３２は、少なくとも一つの他のクライアント２０，４０の内の第１クライアント２０から、この第１クライアント２０によって目的関数の重み係数２９Ａ（第１重み係数）を更新するために算出された複数の勾配の和２９Ｂと、複数の勾配の数２９Ｃを特定可能な情報とを受信する。第２クライアント２０の算出部３３は、複数の勾配の和２９Ｂと、複数の勾配の数２９Ｃを特定可能な情報とを用いて、目的関数の重み係数３９Ａ（第２重み係数）を更新する。 As described above, according to the present embodiment, it is possible to efficiently execute the parallel distributed learning process while reducing the communication cost. When the second client 30 and at least one of the other clients 20 and 40 perform parallel distributed processing with the objective function as a reference, the reception control unit 32 of the second client 30 may include at least one of the other clients 20 and 40. , 40 from the first client 20, the sum 29B of the plurality of gradients calculated by the first client 20 to update the weighting coefficient 29A (first weighting coefficient) of the objective function, and the number of the plurality of gradients. 29C and information capable of specifying 29C. The calculation unit 33 of the second client 20 updates the weighting coefficient 39A (second weighting coefficient) of the objective function using the sum 29B of the plurality of gradients and the information capable of specifying the number 29C of the plurality of gradients.

これにより、第１クライアント２０から受信した複数の勾配の和２９Ｂだけでなく、通信コストが小さい複数の勾配の数２９Ｃも用いて、目的関数の重み係数３９Ａが更新されるので、第１クライアント２０による学習経過を十分に反映して重み係数３９Ａを更新することができる。したがって、通信コストを低減しながら並列分散学習処理を効率的に実行することができる。 As a result, the weighting factor 39A of the objective function is updated using not only the sum 29B of the plurality of gradients received from the first client 20 but also the number 29C of the plurality of gradients having a low communication cost. It is possible to update the weighting factor 39A by sufficiently reflecting the learning progress due to. Therefore, the parallel distributed learning process can be efficiently executed while reducing the communication cost.

また、本発明のいくつかの実施形態に記載された様々な機能の各々は、回路（処理回路）によって実現されてもよい。処理回路の例には、中央処理装置（ＣＰＵ）のような、プログラムされたプロセッサが含まれる。このプロセッサは、メモリに格納されたコンピュータプログラム（命令群）を実行することによって、記載された機能それぞれを実行する。このプロセッサは、電気回路を含むマイクロプロセッサであってもよい。処理回路の例には、デジタル信号プロセッサ（ＤＳＰ）、特定用途向け集積回路（ＡＳＩＣ）、マイクロコントローラ、コントローラ、他の電気回路部品も含まれる。これら実施形態に記載されたＣＰＵ以外の他のコンポーネントの各々もまた処理回路によって実現されてもよい。 Further, each of the various functions described in some embodiments of the present invention may be realized by a circuit (processing circuit). Examples of processing circuits include programmed processors, such as central processing units (CPUs). The processor executes each of the functions described by executing a computer program (instruction group) stored in the memory. The processor may be a microprocessor containing electrical circuits. Examples of processing circuits also include digital signal processors (DSPs), application specific integrated circuits (ASICs), microcontrollers, controllers, and other electrical circuit components. Each of the components other than the CPU described in these embodiments may also be realized by the processing circuit.

また、本発明のいくつかの実施形態の各種処理はコンピュータプログラムによって実現することができるので、このコンピュータプログラムを格納したコンピュータ読み取り可能な記憶媒体を通じてこのコンピュータプログラムをコンピュータにインストールして実行するだけで、これら実施形態と同様の効果を容易に実現することができる。 Further, since various processes of some embodiments of the present invention can be realized by a computer program, it suffices to install and execute the computer program on a computer through a computer-readable storage medium storing the computer program. The same effects as those of these embodiments can be easily realized.

本発明のいくつかの実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. These embodiments and their modifications are included in the scope and gist of the invention, and are also included in the invention described in the claims and the scope equivalent thereto.

１…情報処理システム、１０…サーバ、２０，３０，４０…クライアント、２１Ａ，３１Ａ，４１Ａ…学習データ、１０１…ＣＰＵ、１０２…システムコントローラ、１０３…主メモリ、１０４…ＢＩＯＳ−ＲＯＭ、１０５…不揮発性メモリ、１０６…通信デバイス、１０７…ＥＣ、２０１…ＯＳ、２０２…並列分散学習サーバプログラム、３０１…ＣＰＵ、３０２…システムコントローラ、３０３…主メモリ、３０４…ＢＩＯＳ−ＲＯＭ、３０５…不揮発性メモリ、３０６…通信デバイス、３０７…ＥＣ、４０１…ＯＳ、４０２…並列分散学習クライアントプログラム。 1... Information processing system, 10... Server, 20, 30, 40... Client, 21A, 31A, 41A... Learning data, 101... CPU, 102... System controller, 103... Main memory, 104... BIOS-ROM, 105... Nonvolatile Memory, 106... Communication device, 107... EC, 201... OS, 202... Parallel distributed learning server program, 301... CPU, 302... System controller, 303... Main memory, 304... BIOS-ROM, 305... Nonvolatile memory, 306... Communication device, 307... EC, 401... OS, 402... Parallel distributed learning client program.

Claims

電子装置であって、
前記電子装置と少なくとも一つの他の電子装置とによって目的関数を基準とする並列分散処理が実行される場合に、前記少なくとも一つの他の電子装置の内の第１電子装置から、前記第１電子装置によって前記目的関数の第１重み係数を更新するために算出された複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを受信する受信手段と、
前記複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを用いて、前記目的関数の第２重み係数を更新する処理手段とを具備する電子装置。 An electronic device,
When parallel distributed processing based on an objective function is executed by the electronic device and at least one other electronic device, the first electronic device from the first electronic device among the at least one other electronic device Receiving means for receiving a sum of a plurality of gradients calculated by the device for updating the first weighting coefficient of the objective function, and information capable of specifying the number of the plurality of gradients;
An electronic device comprising: a processing unit that updates the second weighting coefficient of the objective function using the sum of the plurality of gradients and information that can specify the number of the plurality of gradients.

前記受信手段は、前記第１電子装置から、前記複数の勾配の大小関係を特定可能な情報を受信し、
前記処理手段は、さらに、前記複数の勾配の和と前記複数の勾配の大小関係を特定可能な情報とを用いて前記第２重み係数を更新する請求項１記載の電子装置。 The receiving means receives, from the first electronic device, information capable of specifying the magnitude relationship of the plurality of gradients,
The electronic device according to claim 1, wherein the processing unit further updates the second weighting factor using a sum of the plurality of gradients and information capable of specifying a magnitude relationship between the plurality of gradients.

前記処理手段は、前記複数の勾配の和を前記複数の勾配の数で除した値に、学習係数を掛けた値を用いて、前記第２重み係数を更新する請求項１記載の電子装置。 The electronic device according to claim 1, wherein the processing unit updates the second weighting factor using a value obtained by multiplying a learning coefficient by a value obtained by dividing the sum of the plurality of gradients by the number of the plurality of gradients.

前記学習係数は、前記複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを用いて決定される請求項３記載の電子装置。 The electronic device according to claim 3, wherein the learning coefficient is determined using a sum of the plurality of gradients and information capable of specifying the number of the plurality of gradients.

前記受信手段は、さらに、前記第１電子装置が、前記少なくとも一つの他の電子装置の内の第２電子装置から、前記第２電子装置によって前記目的関数の第３重み係数を更新するために算出された複数の第１の勾配の和と、前記複数の第１の勾配の数を特定可能な情報とを受信した場合に、前記第１電子装置から、前記複数の第１の勾配の和と、前記第１電子装置によって前記第１重み係数を更新するために算出された複数の第２の勾配の和との総和と、前記複数の第１の勾配の数と前記複数の第２の勾配の数との和を特定可能な情報とを受信し、
前記処理手段は、さらに、前記総和と、前記複数の第１の勾配の数と前記複数の第２の勾配の数との和を特定可能な情報とを用いて前記第２重み係数を更新する請求項１記載の電子装置。 The receiving means may further comprise the first electronic device updating a third weighting factor of the objective function by the second electronic device from a second electronic device among the at least one other electronic device. The sum of the plurality of first slopes is received from the first electronic device when the calculated sum of the first slopes and the information capable of specifying the number of the plurality of first slopes are received. And a sum of a plurality of second gradients calculated for updating the first weighting coefficient by the first electronic device, the number of the plurality of first gradients, and the plurality of second gradients. Receives information that can identify the sum of the number of gradients and
The processing means further updates the second weighting factor using the total sum and information capable of specifying the sum of the number of the plurality of first gradients and the number of the plurality of second gradients. The electronic device according to claim 1.

複数の電子装置によって目的関数を基準とする並列分散処理が実行される場合に、前記複数の電子装置の内の第１電子装置から、前記第１電子装置によって前記目的関数の第１重み係数を更新するために算出された複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを受信し、
前記複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを用いて、前記目的関数の第２重み係数を更新する方法。 When parallel distribution processing based on an objective function is executed by a plurality of electronic devices, a first weighting coefficient of the objective function is calculated by the first electronic device from a first electronic device among the plurality of electronic devices. Receiving a sum of a plurality of gradients calculated for updating and information capable of specifying the number of the plurality of gradients,
A method of updating the second weighting coefficient of the objective function using the sum of the plurality of gradients and information capable of specifying the number of the plurality of gradients.

前記第１電子装置から、前記複数の勾配の大小関係を特定可能な情報を受信することを含み、
前記複数の勾配の和と前記複数の勾配の大小関係を特定可能な情報とを用いて前記第２重み係数を更新することを含む請求項６記載の方法。 Receiving information capable of specifying a magnitude relationship of the plurality of gradients from the first electronic device,
7. The method according to claim 6, further comprising updating the second weighting factor using the sum of the plurality of gradients and the information capable of identifying a magnitude relationship between the plurality of gradients.

前記複数の勾配の和を前記複数の勾配の数で除した値に、学習係数を掛けた値を用いて、前記第２重み係数を更新することを含む請求項６記載の方法。 7. The method of claim 6, comprising updating the second weighting factor with a value obtained by multiplying a value obtained by dividing the sum of the plurality of gradients by the number of the plurality of gradients by a learning coefficient.

前記学習係数は、前記複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを用いて決定される請求項８記載の方法。 9. The method according to claim 8, wherein the learning coefficient is determined using a sum of the plurality of gradients and information capable of specifying the number of the plurality of gradients.

前記受信することは、前記第１電子装置が、前記複数の電子装置の内の第２電子装置から、前記第２電子装置によって前記目的関数の第３重み係数を更新するために算出された複数の第１の勾配の和と、前記複数の第１の勾配の数を特定可能な情報とを受信した場合に、前記第１電子装置から、前記第１の勾配の和と、前記第１電子装置によって前記第１重み係数を更新するために算出された複数の第２の勾配の和との総和と、前記複数の第１の勾配の数と前記複数の第２の勾配の数との和を特定可能な情報とを受信することをさらに含み、
前記更新することは、前記総和と、前記複数の第１の勾配の数と前記複数の第２の勾配の数との和を特定可能な情報とを用いて前記第２重み係数を更新することをさらに含む請求項６記載の方法。 The receiving is performed by the first electronic device from a second electronic device of the plurality of electronic devices, and the second electronic device calculates the third weighting factor of the objective function by the plurality of electronic devices. Of the first gradient and the information capable of specifying the number of the plurality of first gradients are received from the first electronic device by the first gradient sum and the first electron. A sum of a plurality of second gradients calculated by the device for updating the first weighting factor, and a sum of the plurality of first gradients and the plurality of second gradients. Further comprising receiving information identifiable to
The updating includes updating the second weighting factor using the total sum and information capable of specifying the sum of the number of the plurality of first gradients and the number of the plurality of second gradients. The method of claim 6, further comprising:

サーバ装置と第１クライアント装置と第２クライアント装置とによって構成される情報処理システムであって、
前記第１クライアント装置は、
前記情報処理システムにおいて目的関数を基準とする並列分散処理が実行される場合に、前記目的関数の第１重み係数を更新し、
前記第１重み係数の更新のために算出された複数の勾配の和と、前記複数の勾配の数を特定可能な情報とを前記サーバ装置に送信し、
前記サーバ装置は、
前記送信された複数の勾配の和と複数の勾配の数を特定可能な情報とを用いて、前記目的関数の第２重み係数を更新し、
前記更新された第２重み係数を前記第２クライアント装置に送信する情報処理システム。 An information processing system including a server device, a first client device, and a second client device,
The first client device is
Updating a first weighting coefficient of the objective function when parallel distributed processing based on the objective function is executed in the information processing system;
Transmitting a sum of a plurality of gradients calculated for updating the first weighting factor and information capable of specifying the number of the plurality of gradients to the server device,
The server device is
Updating the second weighting factor of the objective function using the transmitted sum of the plurality of gradients and information capable of specifying the number of the plurality of gradients,
An information processing system for transmitting the updated second weighting factor to the second client device.