WO2017084016A1 - 模型参数融合方法及装置 - Google Patents
模型参数融合方法及装置 Download PDFInfo
- Publication number
- WO2017084016A1 WO2017084016A1 PCT/CN2015/094722 CN2015094722W WO2017084016A1 WO 2017084016 A1 WO2017084016 A1 WO 2017084016A1 CN 2015094722 W CN2015094722 W CN 2015094722W WO 2017084016 A1 WO2017084016 A1 WO 2017084016A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- parameter
- group
- node
- nodes
- parameter collection
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/10—Requirements analysis; Specification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- the present invention relates to the field of machine learning, and in particular, to a method and device for blending model parameters.
- the model parameter refers to the parameter of the description model composed of multiple constraint parameters.
- the model parameters can be used to filter the data with common features. For example, when the model parameters are image class model parameters, different model parameters can be used. The image data of the person, the animal, or the face is selected from the image data. With the rapid growth of data volume and data types, there are more and more model parameters for data screening, and these model parameters are obtained by multiple calculations and fusions of a large number of data with common characteristics.
- the model parameter fusion divides the data into multiple data subsets, and assigns them to different nodes to train the assigned data subsets using the data iterative calculation method. Each time one or more iterations are calculated, each node pair is The model parameters obtained by training different data subsets are merged once, and the merged model parameters are used as initial model parameters for the next iteration calculation. After multiple fusions, the final total model parameters are obtained.
- model parameter fusion there are mainly two methods for model parameter fusion: the first one is a model in which the parameter server trains each node to multiple data subsets after multiple nodes perform multiple iteration calculations on multiple data subsets. The parameters are summarized and merged to obtain new model parameters. Then, each node performs the next iteration calculation on the plurality of data subsets according to the new model parameters; the second is when a node assigns a subset of the data to it. After multiple iterations, the node sends the model parameters obtained by training the assigned subset of data to the designated other nodes for model parameter fusion with the data subsets of other nodes, and then the node receives the data according to itself.
- the other nodes begin to iteratively calculate the model parameters transmitted after training other data subsets.
- the first type of parameter server for performing model parameter fusion has higher performance requirements, and is prone to downtime.
- the second type requires more data to be stored, and the data transmission amount is large.
- Embodiments of the present invention provide a method and a device for merging model parameters, which are used to solve the problem of high performance requirements and large data transmission capacity of a parameter server in model parameter fusion.
- a model parameter fusion method is provided, the method being applied to a machine learning system, the machine learning system comprising at least one parameter collection group and at least one parameter distribution group, each parameter collection group corresponding to at least one parameter distribution group
- Each parameter collection group includes at least one node
- each parameter distribution group includes at least one node
- at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
- the method includes :
- the parameter collection group minimum fusion node number s ⁇ the M ⁇ the parameter collection group that satisfies the condition includes the total number of nodes;
- the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
- the M nodes that have completed the calculation of the current model parameters are selected from the parameter collection group, and the M nodes are selected.
- the model parameters calculated by the nodes are fused to obtain the first model parameters.
- the parameter collection group corresponds to the parameter distribution group, that is, one parameter collection group can correspond to one or more parameter distribution groups. Therefore, when the parameter collection group is merged to obtain the first model parameter, if the group is satisfied,
- the internal distribution condition may be based on the correspondence between the parameter collection group and the parameter distribution group, and the first model parameter is sent to all nodes in the corresponding parameter distribution group, or part of the nodes.
- the intra-group distribution condition may be that the number of times of intra-group merging reaches a preset number of times, or a preset duration, and the like, which is not limited by the embodiment of the present invention.
- the parameter collection group performs a new round based on the first model parameters obtained by the fusion. Generation calculation, and each time the M nodes are merged, the first model parameters are updated once, and when the intra-group distribution conditions are met, the first model parameters are distributed.
- the address information participating in the first model parameter fusion may also be sent to the node in the parameter distribution group, and the address information may be the IP address of the node or The node number and the like are not limited in the present invention.
- the minimum number of fusion nodes s, M, and N can be set in advance, and the parameter collection group satisfying the condition of s ⁇ M ⁇ includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution corresponding to the parameter collection group that satisfies the condition The group contains the total number of nodes.
- the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, and may be
- the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, or all the nodes included in the parameter collection group and all the nodes included in the parameter distribution group corresponding to the parameter collection group are different. .
- the merging the model parameters of the M nodes in the parameter collection group that meet the condition, and obtaining the parameter collection group that satisfies the condition including:
- the method may be completed by a device independent of the parameter collection group, for example, a parameter server, which may be operated by a fixed node.
- a parameter server which may be operated by a fixed node.
- the M nodes that complete the iteration in the parameter collection group respectively send the model parameters calculated by the current iteration to the parameter server, and when the parameter server receives the model parameters sent by the M nodes, the parameter server may pass multiple Different fusion modes fuse the model parameters corresponding to the M nodes to obtain the first model parameters.
- the plurality of different fusion modes may be: the parameter server merges the model parameters corresponding to the M nodes at a time to obtain the first model parameter; or each node sends the parameter to the parameter server after completing the iteration, The parameter server receives the parameters from the node and fuses the parameters, after multiple receiving and merging processes, until the M nodes complete the fusion,
- the embodiment of the present invention does not limit this.
- the correspondence between the parameter server and the parameter collection group and the parameter distribution group corresponding to the parameter collection group may be set in advance.
- the merging the model parameters of the M nodes in the parameter collection group that meet the condition, and obtaining the parameter collection group that satisfies the condition including:
- the node state information may include a node identifier and a node order of completing the iteration.
- the method may be completed by a node in the parameter collection group, and the node may be referred to as a control node, and the control node may be specified in advance, or may be temporarily recommended by a node in the parameter collection group.
- the control node can count the state information of the nodes in the parameter collection group and indicate the delivery and fusion instructions of the model parameters.
- control node when the control node collects the state information of the nodes in the group according to the parameter, and indicates that the M nodes that complete the iteration perform the fusion, the control node may indicate that the M nodes that complete the iteration are fused by different combinations, for example, control.
- the node may instruct the M nodes to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first model parameter, or the control node performs the third possible implementation manner of the first aspect described below.
- the merging is performed to improve the merging of the M nodes to obtain the efficiency of the first model parameter.
- the control node may also be fused by other combinations, which is not limited by the embodiment of the present invention.
- the parameter information of the node in the group is collected according to the parameter that meets the condition, and the parameter collection that meets the condition is indicated.
- the M nodes that complete the iteration in the group perform model parameter fusion, including:
- One of the s nodes indicating completion of the iteration merging the model parameters of the s nodes number
- the control node After determining the s nodes that complete the iteration in the parameter collection group, the control node indicates that one of the s nodes is used as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
- the fusion node associates the model parameters corresponding to the s nodes.
- the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
- the relationship between the number of newly added nodes and the s size can be divided into two cases:
- the x nodes are added.
- the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
- y nodes are added.
- the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
- the remaining nodes of the M nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve the M nodes.
- the efficiency of the fusion of the model parameters can also be fused by other means, which is not limited by the embodiment of the present invention.
- one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
- the method further includes:
- the W parameter collection group is determined by the upper layer parameter collection group of the W parameter collection groups, and the W ⁇ the upper layer parameter collection group includes the total number of groups.
- the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times, or a certain period of time.
- the intra-group fusion condition is that the number of intra-group fusions of the parameter collection group reaches a preset number of times, when the number of intra-group fusions of the W parameter collection groups reaches a preset number, each parameter in the W parameter collection group is obtained.
- the parameter collection group can integrally integrate the current model parameters of all nodes in the group to obtain the second model parameters, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
- all the nodes of each parameter collection group can send the current model parameters to a node in the group, and the node integrates the current model parameters of all the nodes to obtain the second model parameters, and of course, other
- the embodiment is not limited by the embodiment of the present invention.
- the third model parameter is sent to the nodes of the W parameter collection group, not only by broadcast, but also by iteratively, that is, the node that finally completes the fusion, and the third model parameters are respectively sent to
- the W parameter collection group includes a node in the parameter collection, and the node sequentially sends the third model parameter to the other nodes participating in the inter-group fusion.
- the third model parameter is sent to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, and the transmission mode may also be a broadcast mode or an iterative manner.
- the transmission may be performed not only through the broadcast mode but also through the iterative method, that is, the node that finally completes the fusion,
- the three model parameters are respectively sent to the first nodes of the parameter distribution group of the W parameter collection group, and the third model parameters are iteratively sent to other nodes in the parameter distribution group of the previous layer.
- the first node refers to a node responsible for receiving W parameter collection group model parameters.
- the third model parameter is sent to the node in each lower layer parameter distribution group in the parameter distribution group of the upper layer, wherein the sending mode may also be a broadcast mode or an iterative manner.
- the merging the second model parameters of each parameter collection group in the W parameter collection groups get the third model parameters, including:
- the second model parameter of the corresponding parameter collection group is sent to the group Converging the nodes, so that the inter-group fusion node fuses the second model parameters of the W parameter collection groups to obtain the third model parameters;
- inter-group fusion node may be a node recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number. Not limited.
- the node responsible for the overall fusion in the parameter collection group may be selected.
- the second model parameters of the W parameter collection groups that satisfy the intra-group fusion condition are merged to obtain a third model parameter.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
- the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
- the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
- the And sending by the N nodes in the parameter distribution group corresponding to the parameter collection group of the sufficient condition, the first model parameter of the parameter collection group that meets the condition, including:
- the method further includes:
- the parameter collection group and the nodes included in the parameter distribution group are regrouped when the preset condition is satisfied.
- the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
- the method of re-grouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped according to the node grouping method provided by the second aspect of the present invention, and the present invention will not be repeated herein.
- a node grouping method for use in a machine learning system, the machine learning system comprising at least two nodes, the method comprising:
- the included node is different from the node included in the parameter distribution group corresponding to the parameter collection group.
- Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
- the parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group.
- the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes are not with.
- the number of nodes of the different parameter collection group is the same or different; and/or,
- the number of nodes in different parameter distribution groups is the same or different; and/or,
- the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
- the machine learning system may further include a parameter server, a parameter collection group, and the parameter collection group.
- the parameter distribution group corresponds to the same parameter server, and the parameter collection group corresponding to the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to different parameter servers.
- the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the The parameter distribution group corresponding to the parameter collection group corresponds to the layer 1 parameter server, where 1 ⁇ j ⁇ j+1 ⁇ Y.
- the grouping the nodes in the machine learning system includes:
- the node identifier is used to uniquely identify the node.
- the node identifier may be an IP address of the node, a sequence code of the node, and the like.
- the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
- the node number of each node can be changed, and the number of parameter collection groups and parameter distribution groups can also be changed, and the correspondence between the parameter collection group and the parameter distribution group Relationships can also change accordingly.
- the parameter The number of collection groups and the number of the parameter distribution groups are determined, and the parameter collection group and the parameter distribution group are determined, including:
- the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
- a model parameter fusion device being applied to a machine learning system, the machine learning system comprising at least one parameter collection group and at least one parameter distribution group, each parameter collection group corresponding to at least one parameter distribution group
- Each parameter collection group includes at least one node
- each parameter distribution group includes at least one node
- at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
- the device includes :
- a first merging unit configured to: when any parameter collection group satisfies the intra-group fusion condition, fuse the model parameters of the M nodes in the parameter collection group that meet the condition, and obtain the first model parameter of the parameter collection group that satisfies the condition
- the parameter collection group that satisfies the condition has a minimum number of fusion nodes s ⁇ ⁇ M ⁇ the parameter collection group that satisfies the condition includes the total number of nodes;
- a first sending unit configured to send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that satisfies the condition, where, 1 ⁇ N ⁇
- the parameter distribution group corresponding to the parameter collection group that satisfies the condition includes the total number of nodes.
- the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
- the minimum number of fusion nodes s, M, and N can be set in advance, and the parameter collection group satisfying the condition of s ⁇ M ⁇ includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution corresponding to the parameter collection group that satisfies the condition The group contains the total number of nodes.
- the first merging unit includes:
- a receiving module configured to receive model parameters of the M nodes sent by the M nodes that complete the iteration in the parameter collection group that meets the condition
- a fusion module configured to perform fusion according to the received model parameters of the M nodes, to obtain a first model parameter of the parameter collection group that satisfies the condition.
- the fusion module can fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters. For example, the fusion module fuses the model parameters corresponding to the M nodes at one time, and obtains The first model parameter; or each node sends the model parameters to the fusion module after completing the iteration, the fusion module receives the parameters from the node and fuses, after multiple receiving and merging processes, until the M nodes complete the fusion,
- the first model parameter and the like are not limited in the embodiment of the present invention.
- the first merging unit includes:
- an obtaining module configured to obtain state information of the node in the parameter collection group that satisfies the condition; wherein the node state information may include a node identifier and a node order of completing the iteration.
- the indication module is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
- the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
- the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
- the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter
- the indication module may indicate that the M nodes that complete the iteration are fused by different combinations.
- the M nodes may be instructed to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first
- the model parameter, or the indication module is fused by the third possible implementation manner of the following third aspect to improve the efficiency of the M-node to obtain the first model parameter.
- the indication module may also adopt other combinations.
- the embodiment is not limited by the embodiment of the present invention.
- the indication module is specifically configured to:
- One of the s nodes indicating completion of the iteration fuses the model parameters of the s nodes; at this time, the node may be referred to as a fusion node.
- the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
- the fusion node merges the model parameters corresponding to the s nodes, If a new node completes the iteration, it can be divided into two cases according to the relationship between the number of newly added nodes and the size of the s:
- the x nodes are added.
- the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
- y nodes are added.
- the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
- the indication module may indicate that the remaining nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve
- the efficiency of the fusion of the model parameters of the M node may be performed by other means, which is not limited by the embodiment of the present invention.
- one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
- the device further includes:
- a second merging unit configured to perform overall merging of model parameters of nodes in each parameter collection group of the W parameter collection groups when the inter-group fusion condition is met between the W parameter collection groups, to obtain the W The second model parameter of each parameter collection group in the parameter collection group;
- the W parameter collection group is determined by the upper layer parameter collection group of the W parameter collection groups, where the W ⁇ the upper layer parameter collection group includes the total number of groups;
- the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times.
- the second merging unit is configured to collect all the nodes in the group for each parameter collection group in the W parameter collection group when the number of times of fusion in the group of the W parameter collection groups reaches a preset number of times.
- the current model parameters are integrally fused, and the second model parameters are obtained, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
- a third fusion unit configured to fuse the second model parameters of each parameter collection group in the W parameter collection group to obtain a third model parameter
- a second sending unit configured to send the third model parameter to a node of the W parameter collection group or to a node of the parameter distribution group of the W parameter collection group.
- the second sending unit may not only transmit in a broadcast manner but also in an iterative manner, that is, the second sending unit separately sends the third model parameter to one of each parameter collection included in the W parameter collecting groups. A node by which the third model parameters are iteratively sent to other nodes in the group.
- the third model parameter is sent to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, wherein the sending mode may also adopt a dedicated broadcast mode or an iterative manner.
- the third merging unit is specifically configured to:
- the second model parameter of the corresponding parameter collection group is sent to the group
- the fusion node is configured to fuse the second model parameters of the W parameter collection groups to obtain a third model parameter.
- the node responsible for the overall fusion in the parameter collection group may be selected.
- the second model parameters of the W parameter collection groups that satisfy the intra-group fusion condition are merged to obtain a third model parameter.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
- the node responsible for the overall fusion in each parameter collection group may be selected, or the section with the smallest number may be selected.
- the present invention does not limit this.
- the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
- the first sending unit is specifically configured to:
- the device further includes:
- a first grouping unit configured to re-group the parameter collection group and the nodes included in the parameter distribution group when the preset condition is met.
- the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
- the step of regrouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided by the fourth aspect of the present invention, and the present invention is not described herein.
- a node grouping apparatus for use in a machine learning system, the machine learning system comprising at least two nodes, the apparatus comprising:
- a second grouping unit configured to group nodes in the machine learning system, such that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter distribution group. At least one of the parameter collection groups includes nodes that are different from the nodes included in the parameter distribution group corresponding to the parameter collection group.
- Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
- the parameter collection group includes nodes that are different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes and corresponding parameter points.
- the node included in the parameter collection group is not the same, and may include: at least one node in the parameter collection group is different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all nodes included in the parameter collection group and the node
- the parameter distribution group corresponding to the parameter collection group is different for all nodes included.
- the number of nodes of the different parameter collection group is the same or different;
- the number of nodes in different parameter distribution groups is the same or different; and/or,
- the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
- the machine learning system further includes a parameter server, a parameter collection group, and a corresponding parameter distribution group corresponding to the same
- the parameter server, different parameter collection groups, and corresponding parameter distribution groups correspond to different parameter servers.
- the parameter server includes a Y layer, and a parameter server of the j+1 layer corresponds to the jth layer
- the at least one parameter server, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server, where 1 ⁇ j ⁇ j+1 ⁇ Y.
- the second grouping unit specifically includes:
- a first determining module configured to determine a correspondence between a node identifier and a node number
- a second determining module configured to determine the number of the parameter collection groups, and the number of the parameter distribution groups
- a third determining module configured to determine a parameter collection group and a parameter distribution group based on a correspondence between the node identifier and the node number, the number of the parameter collection group, and the number of the parameter distribution group;
- a fourth determining module configured to determine a correspondence between the parameter collection group and the parameter distribution group.
- the node identifier is used to uniquely identify the node.
- the node identifier may be an IP address of the node, a sequence code of the node, and the like.
- the node number can be The serial number assigned to the node by the machine may also be any value randomly assigned to the node, etc., and the present invention is also not limited thereto.
- the node number of each node can be changed, and the number of parameter collection groups and parameter distribution groups can also be changed, and the correspondence between the parameter collection group and the parameter distribution group Relationships can also change accordingly.
- the third determining module is specifically configured to:
- the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
- a model parameter fusion device comprising a processor and a memory, wherein the memory stores code and data, the processor can execute code in the memory, and the processor is configured to execute The model parameter fusion method of any of the above-mentioned first aspect to the seventh possible implementation of the first aspect.
- the model parameter fusion device is a parameter server, and the parameter server is set independently of the node or configured on the node.
- a controller comprising a processor and a memory, the memory storing code and data, the processor being operable to execute code in a memory, the processor for performing the second aspect to The node grouping method of any of the possible implementations of the fifth aspect of the second aspect.
- a machine learning system comprising the model parameter fusion device according to any one of the first to fifth aspects of the fifth aspect, and the sixth aspect A controller as described.
- a model parameter fusion method and device provided by an embodiment of the present invention obtains a first model parameter by using a parameter collection group to perform intra-group fusion, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, thereby solving the model parameter fusion.
- Parameter server performance requirements High and large data transmission.
- FIG. 1 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention.
- FIG. 2 is a schematic flowchart of a model parameter fusion method according to an embodiment of the present disclosure
- FIG. 3 is a schematic structural diagram of a parameter server according to an embodiment of the present disclosure.
- FIG. 4 is a schematic flowchart of a method for grouping nodes according to an embodiment of the present invention.
- FIG. 5 is a schematic structural diagram of a model parameter fusion apparatus according to an embodiment of the present disclosure.
- FIG. 6 is a schematic structural diagram of another model parameter fusion apparatus according to an embodiment of the present invention.
- FIG. 7 is a schematic structural diagram of still another model parameter fusion apparatus according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a model parameter fusion apparatus according to an embodiment of the present invention.
- FIG. 9 is a schematic structural diagram of a controller according to an embodiment of the present invention.
- the machine learning system architecture applied by the embodiment of the present invention is shown in FIG. 1.
- the system architecture diagram includes a data storage device 101, a model parameter training platform 102, and a model parameter storage device 103.
- the data storage device 101 can be a data storage server 101, and the data is stored.
- the storage server 101 can be used to store raw data for model parameter training, and the storage capacity of the data storage server 101 is much larger than the storage capacity of the computing server 1021 in the model training platform 102.
- the original data may be language data, image data, video data, etc., and the original data is composed of a plurality of data sets, and each data set further comprises a plurality of type subsets, each type subset having a representation type
- the data label has the same label of the subset of the types included in the same data set.
- the data set may be an image containing multiple characters with a person's label, or may contain multiple animal images with animal labels. , or other categories of images, and so on.
- the model parameter training platform 102 includes a computing server 1021 for iterative computing, which may also be referred to as a node, and may be a general computer, a mobile terminal, a workstation or a general-purpose server, a dedicated server, etc., and is used to perform data communication between computing servers. Switch 1022.
- the computing server 1021 has local storage and its capacity is smaller than the data storage server 101.
- each computing server reads certain data from the data storage server 101 into the local storage device for sampling model parameters by sampling.
- the model parameter training platform 102 can obtain a total model parameter of the final fusion output by performing model parameter training fusion on the data set with the data label, and the data type of the new data can be identified by the total model parameter.
- the image of the person in the new image data can be identified by the final output model parameter, and the model parameter fusion can be performed by using the image dataset with the animal tag.
- An animal image or the like in the new image data is identified by the finally output model parameters.
- the model parameter storage server 103 is configured to store the model parameters obtained by the training.
- the model parameters obtained by the final fusion may be sent to the model parameter storage server 103 to be stored by the model parameter storage server 103.
- the model parameters originally used by the calculation server 1021 in the model parameter platform 102 for performing model parameter training fusion may also be acquired from the model parameter storage server 103.
- each parameter collection group corresponds to at least one parameter.
- Distribution group, included in each parameter collection group At least one node, each parameter distribution group includes at least one node, and at least one parameter collection group includes a node that is different from a node included in the corresponding parameter distribution group, and the method includes the following steps.
- Step 201 A node for performing model parameter fusion acquires a data subset of the data set.
- the data set refers to a data set used for iterative calculation of model parameters, and the data set may be language data, image data, video data, etc., and the data set is composed of multiple types of subsets, each type of sub- The set has data labels for representing categories, and the labels of the subset of types included in the same data set are the same.
- the data set may be stored in a storage device such as a hard disk or a disk in advance, or may be stored in a data storage server in advance.
- the storage device may directly connect to the device where the node is located. To get a subset of data, or to get data from a data storage server.
- the node when the node acquires the data subset in the data set, the node can extract a certain amount of data from the data set, if Knowing the computing power of each node in advance, the data amount of the data subset acquired by the node can be allocated according to the computing power of the node.
- the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, that is, At least one parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes are different.
- Step 202 Each node performs iterative calculation based on the data subset and the current model parameters.
- each node can perform iterative calculation based on the acquired data subset and the initial model parameters.
- each node can be based on the data subset and the currently obtained model parameters. Perform the next iteration calculation.
- the initial model parameters refer to the initial model parameters of each node, and the initial model parameters of each node may be the same.
- the currently obtained model parameters mean that each node is completed.
- Step 203 When any parameter collection group satisfies the intra-group fusion condition, the model parameters of the M nodes in the parameter collection group satisfying the condition are merged, and the first model parameter of the parameter collection group satisfying the condition is obtained, wherein the condition is met.
- the parameter collection group has the lowest number of fusion nodes s ⁇ M ⁇
- the parameter collection group that satisfies the condition includes the total number of nodes.
- the intra-group fusion condition refers to the number of nodes in the parameter collection group that complete the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
- each parameter collection group may include one or more nodes, and therefore, when the nodes in any parameter collection group satisfy the current model parameter, the number of nodes calculated by the iteration reaches a preset value.
- the M nodes that have completed the calculation of the current model parameters may be selected from the parameter collection group, and the model parameters calculated by the M nodes are merged to obtain the first model parameters.
- the minimum number of fusion nodes s and M can be set in advance, and s ⁇ M ⁇ the parameter collection group contains the total number of nodes.
- the number of parameter collection groups included in the machine learning system may be made in advance, and may be determined after each node obtains the data subset, that is, after the step 201, which is not limited by the embodiment of the present invention.
- model parameters of the M nodes in the parameter collection group are merged, and the first model parameters obtained by the parameter collection group fusion can be divided into two different methods according to different execution subjects, as described below.
- the first method receives the M node model parameters sent by the M nodes that complete the iteration in the parameter collection group that satisfies the condition; performs fusion according to the model parameters of the received M nodes, and obtains the parameter collection group that satisfies the condition A model parameter.
- the method may be completed by a device independent of the parameter collection group, for example, a parameter server, which may be operated by a fixed node.
- a parameter server which may be operated by a fixed node.
- the M nodes that complete the iteration in the parameter collection group respectively send the model parameters calculated by the current iteration to the parameter server, and when the parameter server receives the model parameters sent by the M nodes, the parameter The server may fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters.
- the plurality of different fusion modes may be: the parameter server merges the model parameters corresponding to the M nodes at a time to obtain the first model parameter; or each node sends the parameter to the parameter server after completing the iteration, The parameter server receives the model parameters from the node and performs the merging. After the process of multiple receiving and merging, the merging of the M nodes is completed, the first model parameters and the like are obtained, which is not limited by the embodiment of the present invention.
- the state information of the node in the parameter collection group that satisfies the condition is obtained, and the node state information may include a node identifier and a node sequence for completing the iteration; and indicating the content is satisfied according to the state information of the node in the parameter collection group that satisfies the condition
- the M nodes that complete the iteration in the conditional parameter collection group perform model parameter fusion, and obtain the first model parameters of the parameter collection group.
- the method may be completed by a node in the parameter collection group, and the node may be referred to as a control node, and the control node may be specified in advance, or may be temporarily recommended by a node in the parameter collection group.
- the control node may collect state information of nodes in the parameter collection group, and instruct other nodes to perform model parameter transmission and fusion.
- the control node may indicate that the M nodes that complete the iteration are fused by different combinations, for example, the control node may indicate
- the M nodes send the corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first model parameters, or the control node performs the fusion by the following implementation manner to improve the M nodes for fusion.
- the efficiency of the first model parameter is obtained.
- the control node can also be fused by other combinations, which is not limited by the embodiment of the present invention.
- control node when the control node collects the state information of the node in the group according to the parameter, and indicates that the M nodes that complete the iteration are fused, the control node may determine the parameter collection group according to the state information of the node in the parameter collection group. The s nodes that complete the iteration, and then indicate one of the s nodes that complete the iteration to fuse the model parameters of the s nodes.
- the control node After determining the s nodes that complete the iteration in the parameter collection group, the control node indicates that one of the s nodes is used as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
- the fusion node associates the model parameters corresponding to the s nodes.
- the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
- the relationship between the number of newly added nodes and the s size can be divided into two cases:
- x nodes are added.
- x ⁇ s if the x nodes are completed in the process of model parameter fusion in the iterative s nodes, then one of the x nodes is added.
- the node fusion adds the model parameters of the x nodes and the model parameters after the s nodes are merged.
- y nodes are added.
- y ⁇ s if y nodes are added to complete the iteration in the process of model parameter fusion in the iterative s nodes, then one of the y nodes is added.
- the node fuses the model parameters of the y nodes, and fuses the model parameters after the fusion of the y nodes with the model parameters after the s nodes are merged.
- the remaining nodes of the M nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve the M nodes.
- the efficiency of the fusion of the model parameters can also be fused by other means, which is not limited by the embodiment of the present invention.
- one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
- step 204 is performed.
- the intra-group distribution condition may be that the number of times of intra-group merging reaches a preset number of times, or a preset duration, and the like, which is not limited by the embodiment of the present invention.
- Step 204 Send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that satisfies the condition, where 1 ⁇ N ⁇ parameter corresponding to the parameter collection group that satisfies the condition
- the distribution group contains the total number of nodes.
- the parameter collection group corresponds to the parameter distribution group, that is, one parameter collection group can correspond to one or more parameter distribution groups, and therefore, when the intra-group distribution conditions are met.
- the first model parameter is sent to the node in the corresponding parameter distribution group, which may be all nodes in the parameter distribution group, or may be partial nodes.
- the first model of the parameter collection group that satisfies the condition may be sent to the node in the parameter distribution group corresponding to the parameter collection group that satisfies the condition by broadcast.
- the first model parameter of the parameter collection group that satisfies the condition is sent to the node in the parameter distribution group corresponding to the parameter collection group that satisfies the condition, that is, the parameter collection group corresponding to the condition is correspondingly
- the first node in the parameter distribution group sends the first model parameter of the parameter collection group that satisfies the condition, so that the first node sequentially sends the parameter satisfying the condition to the remaining nodes of the N nodes except the first node in an iterative manner.
- the first node sends the first model parameter to the second node
- the second node sends the first node to the third node, and sequentially sends the iteratively until the first model parameter is sent to the N nodes. All other nodes except the first node.
- the first node may be any node of the node that completes the iteration in the parameter collection group, or may be a node recommended by the node in the parameter distribution group, which is not limited in this embodiment of the present invention. .
- the step 204 may be performed by the device in a manner other than the parameter collection group, for example, the parameter server, or may be completed by a node in the parameter collection group, for example, a control node. Not limited.
- the machine learning system includes a parameter server
- a parameter collection group and a parameter distribution group corresponding to the parameter collection group correspond to the same parameter server
- different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
- the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, and the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
- a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer
- the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
- Step 205 When the inter-group fusion condition is met between the W parameter collection groups, the model parameters of the nodes in each parameter collection group of the W parameter collection groups are integrally integrated, and each parameter in the W parameter collection group is obtained. Collect the second model parameters of the group.
- the W parameter collection group is determined by the upper parameter collection group of the W parameter collection groups, and the W ⁇ upper layer parameter collection group includes the total number of groups.
- the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times, or a certain period of time, etc., which is not limited by the embodiment of the present invention.
- the intra-group fusion condition is that the number of intra-group fusions of the parameter collection group reaches a preset number of times
- the number of intra-group fusions of the W parameter collection groups reaches a preset number
- each parameter in the W parameter collection group is obtained.
- the parameter collection group can integrally integrate the current model parameters of all nodes in the group to obtain the second model parameters, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
- the above step 203 may be performed by a device other than the parameter collection group, or may be completed by a node in the parameter collection group.
- the step 205 may also be different. The details are as follows.
- the parameter server determines whether the inter-group fusion condition is satisfied between the W parameter collection groups, and after satisfying the inter-group fusion condition, the W parameters are The model parameters of the nodes in each parameter collection group of the collection group are integrated.
- the control node determines whether the inter-group fusion condition is satisfied between the W parameter collection groups, and when the inter-group fusion condition is met, the parameter collection group A node receives model parameters sent by other nodes, and integrates the received model parameters of other nodes. At this time, the node may be referred to as a fusion node.
- control node determines that the inter-group fusion condition is met between the W parameter collection groups
- all the nodes of each parameter collection group may send the current model parameters to one node in the group, and the node will The model parameters are integrally fused to obtain the second model parameters.
- the overall fusion may be performed in other manners, which is not limited by the embodiment of the present invention.
- step 206 is performed.
- Step 206 fused the second model parameters of each parameter collection group in the W parameter collection group to obtain the third model parameter, and send the third model parameter to the node of the W parameter collection group or send the parameter to the W parameter. Collect the nodes of the group's parameter distribution group.
- the second model parameters of the W parameter collection groups are merged to obtain a third model
- the parameters may be interpreted in accordance with step 203 to perform different specific aspects of the subject.
- the parameter server When the execution subject is a device other than the parameter collection group, such as a parameter server, the parameter server directly fuses the second model parameters of the W parameter collection groups to obtain a third model parameter.
- the parameter server can directly send the parameter to the parameter fusion W.
- the parameter server may further include multiple layers, and one parameter server of the upper layer corresponds to at least one parameter server of the lower layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group correspond to the parameter server of the lowest layer, and the lower layer server
- the number of fusions of the parameter collection groups, the node identifier, and the current model parameters are sent to the upper parameter server, and the upper layer parameter server determines whether the inter-group convergence is satisfied, and the fusion is performed by the upper parameter server after the inter-group fusion is satisfied, and then the fusion is performed.
- the obtained model parameters are sent to the lower parameter server, and finally, the node of the bottom parameter server is sent to the nodes of the W parameter collection group.
- the nodes in the W parameter collection groups participating in the fusion determine one node from the W parameter collection groups as the inter-group fusion node; W parameter collection groups In the parameter collection group other than the parameter collection group where the inter-group fusion node is located, one node is selected to send the second model parameter of the corresponding parameter collection group to the inter-group fusion node, so that the inter-group fusion node will have W parameters.
- the second model parameters of the collection group are fused to obtain a third model parameter.
- inter-group fusion node may be a node recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number. Not limited.
- the node responsible for the overall fusion in the parameter collection group may be selected.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged to obtain the third model parameter.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
- the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
- the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
- each node in the parameter collection group of the W parameter collection groups selects the node responsible for the overall integration within the group, obtains W nodes, and determines the W nodes as a new parameter collection group, when the new parameter collection group
- the W second model parameters corresponding to the W nodes are merged according to the intra-group fusion mode, for example, when the intra-group fusion condition is that the nodes that complete the overall fusion reach the preset number.
- the part of the nodes that complete the integration may be merged, and then merged with other nodes that complete the integration within the group, of course,
- the W second model parameters corresponding to the W nodes may be fused at one time, which is not limited in this embodiment of the present invention.
- the inter-group fusion node when the third model parameter is sent to the nodes of the W parameter collection group, the inter-group fusion node can be transmitted not only by broadcast but also by an iterative manner, that is, the inter-group fusion node will be the third model parameter.
- Each node is sent to one of the parameter collections included in the W parameter collection group, and the third model parameter is iteratively transmitted by the node to other nodes participating in the inter-group fusion.
- the parameter server or each parameter collection group may send the third model parameter to the parameter distribution group corresponding to each parameter collection group in the W parameter collection groups.
- the node, wherein the sending mode can also be broadcast, or iterative.
- the third model parameter is sent to the nodes of the parameter distribution group of the W parameter collection group, which can be sent not only through broadcast but also through iterative manner. Sending, that is, the node that finally completes the fusion, sends the third model parameter to the first node of the parameter distribution group of the previous layer, and the node sequentially sends the third model parameter to the other parameter in the parameter distribution group.
- the first node refers to the node responsible for receiving the parameters of the previous layer of the model.
- the third model parameter is sent to the node in each lower layer parameter distribution group in the parameter distribution group of the upper layer, wherein the sending mode may also be a broadcast mode or an iterative manner.
- Step 207 Re-group the nodes included in the parameter collection group and the parameter distribution group when the preset condition is met.
- the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or an iterative calculation of a certain number of times, and the like, which is not limited by the embodiment of the present invention.
- the parameter server when the execution subject is a device other than the parameter collection group, such as a parameter server, when the preset condition is met, the parameter server directly reassembles the parameter collection group and the nodes included in the parameter distribution group;
- the main body is a node in the parameter collection group, such as a control node, the control node regroups the parameter collection group and the nodes included in the parameter distribution group.
- the nodes included in the parameter collection group and the parameter distribution group are regrouped, including: a correspondence between a preset node identifier and a node number, and a number of parameter collection groups and a parameter distribution group. Dividing the number of the node corresponding to the node identifier by the number of the parameter collection group, and obtaining the remainder of the collection group of the node;
- the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
- the method for re-grouping the parameters included in the parameter collection group and the parameter distribution group may be re-grouped according to the node grouping method provided in the following embodiments, and details are not described herein again.
- step 202 After regrouping, return to step 202 to continue the iterative calculation based on the subset of data and the current model parameters until the final model parameters are output.
- the parameter server allocates the lowest parameter device for the newly added node.
- the IP address of the server is sent by the lowest-level parameter server to the newly added node.
- the newly added node obtains the data subset from the storage server, and the newly added node performs iterative calculation based on the received model parameters and the data subset.
- the control node allocates an IP address of another node that previously participated in the iterative calculation for the newly added node, and the node sends the model parameter for the newly added node, and the newly added node acquires the data subset from the storage server.
- the newly added nodes are iteratively calculated based on the received model parameters and data subsets.
- a model parameter fusion method obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
- the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.
- An embodiment of the present invention provides a node grouping method, which is applied to a machine learning system, where the machine learning system includes at least two nodes, and the method includes:
- the included node is different from the node included in the parameter distribution group corresponding to the parameter collection group.
- Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
- the at least one parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes that are not identical to the node included in the corresponding parameter distribution group.
- at least one node is different from the node in the parameter distribution group corresponding to the parameter collection group, or all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes included are different.
- the number of nodes of different parameter collection groups is the same or different; and/or,
- the number of nodes in different parameter distribution groups is the same or different; and/or,
- the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
- the machine learning system may further include a parameter server, a parameter collection group, and a parameter distribution group corresponding to the parameter collection group corresponding to the same parameter server, different parameter collection groups, and parameter distribution corresponding to the parameter collection group.
- the group corresponds to a different parameter server.
- the parameter server includes a Y layer, and one parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group and the first
- the layer parameter server corresponds, where 1 ⁇ j ⁇ j+1 ⁇ Y.
- a schematic diagram of a parameter server when Y is equal to 2 as shown in FIG. 3, the parameter server 1 corresponds to the parameter server 2 and the parameter server 3, and is composed of node 1, node 2, node 3, and node.
- the parameter collection group composed of 4 and the node 5, and the parameter distribution group corresponding to the parameter collection group correspond to the parameter servers 1 and 2 of the first layer.
- grouping nodes in the machine learning system includes the following steps.
- Step 301 Establish a correspondence between the node identifier and the node number.
- the node identifier is used to uniquely identify the node.
- the node identifier may be an IP address of the node, a sequence code of the node, and the like.
- the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
- the node identifier is the IP address of the node.
- the IP address of each node is as shown in Table 1 below, and the correspondence between the node identifier and the node number as shown in Table 1 below is established. relationship.
- Step 302 Determine the number of the parameter collection groups and the number of the parameter distribution groups.
- the number of parameter collection groups is 2, and the number of parameter distribution groups is 3.
- Step 303 Determine a parameter collection group and a parameter distribution group based on the correspondence between the node identifier and the node number, the number of the parameter collection groups, and the number of the parameter distribution groups.
- determining the parameter collection group and the parameter distribution group based on the correspondence between the node identifier and the node number, the number of the parameter collection group, and the parameter distribution group may include: dividing the node number corresponding to the node identifier by the The number of the parameter collection group is obtained, and the remaining number of the collection group of the node is obtained; the number of the node corresponding to the node identifier is divided by the number of the parameter distribution group, and the remainder of the distribution group of the node is obtained; The nodes are determined to be the same parameter collection group, and the nodes with the same distribution group remainder are determined as the same parameter distribution group.
- the remainder of the collection group of the node is: the remainder of the collection group of node numbers 2, 0, and 4 is 0, and the node number is 3, 1.
- the remainder of the collection group of 5 is 1; the number of each node shown in Table 1 is divided by the number of the parameter distribution group 3, and the remainder of the collection group of the node is: the remainder of the distribution group of node numbers 0 and 3 is 0.
- the remainder of the distribution group with node numbers 1 and 4 is 1, the remainder of the distribution group with node numbers 2 and 5 is 2; the node with the remaining number of collection group 0 is determined as parameter collection group 0, and the node with the remaining number of collection group 1 is determined as Parameter collection group 1, similarly, obtains parameter distribution group 0, parameter distribution group 1, and parameter distribution group 2.
- Step 304 Determine a correspondence between the parameter collection group and the parameter distribution group.
- the correspondence between the two may be determined based on the determined parameter collection group and the parameter distribution group. For example, it is determined that the parameter collection group 0 corresponds to the parameter distribution group 1 and the parameter distribution group 2, and the parameter collection group 1 corresponds to the parameter distribution group 0.
- the node number of each node may be changed every time the node grouping is performed, and the number of the parameter collection group and the parameter distribution group may also be changed, and the parameters may be changed.
- the correspondence between the collection group and the parameter distribution group can also change accordingly.
- An embodiment of the present invention provides a node grouping method, by grouping nodes in a machine learning system, so that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
- a distribution group at least one of the parameter collection groups includes a node that is different from a node included in the parameter distribution group corresponding to the parameter collection group.
- FIG. 5 is a schematic diagram of a model parameter fusion device, which is applied to a machine learning system, where the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
- a distribution group each parameter collection group includes at least one node
- each parameter distribution group includes at least one node
- at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
- the device include:
- the first merging unit 401 is configured to: when any parameter collection group satisfies the intra-group fusion condition, fuse the model parameters of the M nodes in the parameter collection group that meet the condition, and obtain the first model of the parameter collection group that satisfies the condition a parameter, wherein the parameter collection group that satisfies the condition has a minimum number of fusion nodes s ⁇ ⁇ M ⁇ the parameter collection group that satisfies the condition includes a total number of nodes;
- the first sending unit 402 is configured to send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that meets the condition, where 1 ⁇ N ⁇
- the parameter distribution group corresponding to the parameter collection group that satisfies the condition includes the total number of nodes.
- the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
- the first fusion unit selects M nodes that have completed the calculation of the current model parameters from the parameter collection group, and The model parameters calculated by the M nodes are fused to obtain a first model parameter. Then, when the intra-group distribution condition is met, the first sending unit sends the first model parameter to all nodes in the corresponding parameter distribution group, or part of the node, based on the correspondence between the parameter collection group and the parameter distribution group.
- the minimum number of fusion nodes s, M, and N can be set in advance, and s ⁇ M ⁇ the parameter collection group includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution group corresponding to the parameter collection group includes the total number of nodes number.
- the number of parameter collection groups included in the machine learning system, the number of nodes included in each parameter collection group, and the number of parameter distribution groups corresponding to each parameter collection group, and each parameter division can be determined in advance.
- the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, and
- the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may be all nodes included in the parameter collection group and all nodes included in the parameter distribution group corresponding to the parameter collection group. different.
- the address information participating in the first model parameter fusion may also be sent to the node in the parameter distribution group, and the address information may be the IP address of the node or The node number and the like are not limited in the present invention.
- the first converging unit 401 includes:
- a receiving module configured to receive model parameters of M nodes sent by the M nodes that complete the iteration in the parameter collection group that meets the condition
- the fusion module is configured to perform fusion according to the received model parameters of the M nodes, and obtain a first model parameter of the M node parameter collection group.
- the fusion module can fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters. For example, the fusion module fuses the model parameters corresponding to the M nodes at one time, and obtains The first model parameter; or each node sends the model parameters to the fusion module after completing the iteration, the fusion module receives the parameters from the node and fuses, after multiple receiving and merging processes, until the M nodes complete the fusion,
- the first model parameter and the like are not limited in the embodiment of the present invention.
- the first converging unit 401 includes:
- an obtaining module configured to obtain state information of the node in the parameter collection group that satisfies the condition; wherein the node state information may include a node identifier and a node order of completing the iteration.
- the indication module is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the parameter collection group that satisfies the condition The first model parameter.
- the indication module may indicate that the M nodes that complete the iteration are fused by different combinations.
- the M nodes may be instructed to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first
- the model parameter, or the indication module performs the fusion by using the following specific specific manner to improve the fusion of the M nodes.
- the efficiency of the first model parameter is, of course, the indication module can also indicate the fusion by other combinations, which is not limited by the embodiment of the present invention.
- the indicator module is specifically configured to:
- One of the s nodes indicating completion of the iteration fuses the model parameters of the s nodes; at this time, the node may be referred to as a fusion node.
- the instruction module uses one of the s nodes as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
- the fusion node associates the model parameters corresponding to the s nodes.
- the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
- the relationship between the number of newly added nodes and the s size can be divided into two cases:
- the x nodes are added.
- the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
- y nodes are added.
- the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
- the indication module may indicate that the remaining nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve
- the efficiency of the fusion of the model parameters of the M node may be performed by other means, which is not limited by the embodiment of the present invention.
- one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
- the device further includes:
- a second merging unit 403 configured to perform overall merging of model parameters of nodes in each parameter collection group of the W parameter collection groups when the inter-group fusion condition is met between the W parameter collection groups The second model parameters of each parameter collection group in the W parameter collection group;
- the W parameter collection group is determined by the upper parameter collection group of the W parameter collection groups, and the W ⁇ upper layer parameter collection group includes the total number of groups.
- the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times.
- each parameter collection group in the W parameter collection groups may send the current model parameters to the second fusion unit, and the second fusion unit
- the current model parameters of all the nodes in the parameter collection group are integrally fused, and the second model parameters are obtained, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
- a third merging unit 404 configured to fuse the second model parameters of each parameter collection group in the W parameter collection group to obtain a third model parameter
- the second sending unit 405 is configured to send the third model parameter to a node of the W parameter collection group or to a node of the parameter distribution group of the W parameter collection group.
- the second sending unit 405 can not only transmit in a broadcast manner, but also in an iterative manner, that is, the second sending unit separately sends the third model parameter to each parameter included in the W parameter collecting group. A node by which the third model parameters are iteratively sent to other nodes in the group.
- the second sending unit may also send the third model parameter to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, wherein the sending manner may be a broadcast mode or an iterative manner.
- the third converging unit 404 is specifically configured to:
- the inter-group fusion node may be a node that is recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number, which is not limited in this embodiment of the present invention.
- the node responsible for the overall fusion in the parameter collection group may be selected, of course, In the actual application, the third merging unit may also select other nodes in the other parameter collection group, which is not limited by the embodiment of the present invention.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged to obtain the third model parameter.
- the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
- the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
- the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
- each node in the parameter collection group of the W parameter collection groups selects the node responsible for the overall integration within the group, obtains W nodes, and determines the W nodes as a new parameter collection group, when the new parameter collection group
- the W second model parameters corresponding to the W nodes are merged according to the intra-group fusion mode, for example, when the intra-group fusion condition is that the nodes that complete the overall fusion reach the preset number.
- the part of the nodes that complete the integration may be merged, and then merged with other nodes that complete the integration within the group, of course,
- the W second model parameters corresponding to the W nodes may be merged at one time, and the embodiment of the present invention does not Limited.
- the first sending unit 402 is specifically configured to:
- the third model parameter may be sent to the nodes in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group or may be finally merged.
- the nodes are sent to the W parameters to collect the nodes of the parameter distribution group of the previous layer.
- the device further includes:
- the first grouping unit 406 is configured to re-group the parameter collection group and the nodes included in the parameter distribution group when the preset condition is met.
- the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
- the step of regrouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided by the fourth aspect of the present invention, and the present invention is not described herein.
- the machine learning system further includes a parameter server
- a parameter collection group and a parameter distribution group corresponding to the parameter collection group correspond to the same parameter server
- different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
- the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, and the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
- a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer
- the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
- the first grouping unit is configured to: according to the correspondence between the preset node identifier and the node number, and the number of parameter collection groups and the number of parameter distribution groups, the node number corresponding to the node identifier is divided by the The number of the parameter collection group is obtained, and the remainder of the collection group of the node is obtained;
- the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
- the re-grouping of the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided in the following embodiment 5.
- the embodiments of the present invention are not described herein again.
- a model parameter fusion device obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
- the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.
- An embodiment of the present invention provides a node grouping apparatus, which is applied to a machine learning system, where the machine learning system includes at least two nodes, and the apparatus includes:
- a second grouping unit configured to group nodes in the machine learning system, such that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter distribution group. At least one of the parameter collection groups includes nodes that are different from the nodes included in the parameter distribution group corresponding to the parameter collection group.
- Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
- the at least one parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes that are not identical to the node included in the corresponding parameter distribution group.
- the parameter collection group may include at least one node and the node in the parameter distribution group corresponding to the parameter collection group, or may refer to all nodes included in the parameter collection group and parameters corresponding to the parameter collection group. All nodes included in the distribution group are different.
- the number of nodes of different parameter collection groups is the same or different; and/or,
- the number of nodes in different parameter distribution groups is the same or different; and/or,
- the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
- the machine learning system further includes a parameter server, a parameter collection group, and a parameter distribution group corresponding to the parameter collection group corresponding to the same parameter server, and different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
- the parameter server includes a Y layer, and one parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group and the first
- the layer parameter server corresponds, where 1 ⁇ j ⁇ j+1 ⁇ Y.
- the parameter server includes the Y layer
- the number of parameter servers of each layer and the correspondence between the lower parameter server and the upper parameter server may be determined.
- the correspondence between the lower parameter server and the upper parameter server may be set in advance, or may be determined in the node grouping process.
- the parameter setting group or the parameter distribution group may be determined by the following method to determine the lower parameter server and
- the specific method may refer to the following method for determining the parameter collection group or the parameter distribution group, and details are not described herein again.
- the second grouping unit specifically includes:
- a first determining module configured to determine a correspondence between a node identifier and a node number
- a second determining module configured to determine the number of the parameter collection groups, and the number of the parameter distribution groups
- a third determining module configured to determine a parameter collection group and a parameter distribution group based on a correspondence between the node identifier and the node number, the number of the parameter collection group, and the number of the parameter distribution group;
- a fourth determining module configured to determine a correspondence between the parameter collection group and the parameter distribution group.
- the node identifier is used to uniquely identify the node.
- the node identifier may be an IP address of the node, a sequence code of the node, and the like.
- the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
- the node number of each node can be changed, and the parameter collection group and the parameter distribution group are changed.
- the number can also vary, and the correspondence between the parameter collection group and the parameter distribution group can also change accordingly.
- the third determining module is specifically configured to:
- the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
- An embodiment of the present invention provides a node grouping apparatus, by grouping nodes in a machine learning system, so that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
- the distribution group includes: at least one of the parameter collection groups includes a node that is different from the parameter distribution group corresponding to the parameter collection group, thereby solving the problem that the parameter server has high performance requirements and dynamically adjusts the calculation resource in the parameter parameter fusion. .
- FIG. 8 is a schematic diagram of a model parameter fusion device, where the model parameter fusion device includes a memory 801, a processor 802, a power component 803, an input/output interface 804, a communication component 805, and the like.
- the model parameter fusion method described in the second embodiment above is executed.
- model parameter fusion device may also include more or fewer components than those shown in FIG. 8, or have a different configuration than that shown in FIG.
- the memory 801 can be used to store data, software programs, and modules; and mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; and the storage data area can be stored according to model parameters. Data created by the use of the fusion device, etc.
- the memory may comprise a high speed random access memory, and may also comprise a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory. Storage device.
- the processor 802 is a control center of the model parameter fusion device that connects various portions of the entire model parameter fusion device using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 801, and recalling stored in the memory.
- the data in 801 performs various functions and processing data of the model parameter fusion device, thereby integrally monitoring the model parameter fusion device.
- the processor 802 may include one or more processing units; preferably, the processor 502 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 802.
- the power component 803 is used to provide power to various components of the model parameter fusion device, which may include a power management system, one or more power sources, and other components associated with the model parameter fusion device to generate, manage, and distribute power.
- the input/output interface 804 provides an interface between the processor 802 and the peripheral interface module.
- the peripheral interface module can be a keyboard, a mouse, or the like.
- Communication component 805 is configured to facilitate wired or wireless communication between the model parameter fusion device and other devices.
- the model parameter fusion device can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
- model parameter fusion device may further include an audio component, a multimedia component, and the like, which are not described herein again.
- the model parameter fusion device is a parameter server, and the parameter server is set independently of the node or configured on the node.
- a model parameter fusion device obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
- the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements and large data transmission capacity in the parameter parameter fusion.
- FIG. 9 is a schematic diagram of a model parameter fusion device according to an embodiment of the present invention.
- the device includes a memory 901, a processor 902, a power component 903, an input/output interface 904, a communication component 905, and the like.
- the processor 902 is configured to perform the node grouping method described in the third embodiment.
- FIG. 9 is merely illustrative and does not limit the structure of the controller.
- the controller may also include more or fewer components than shown in FIG. 9, or have a different configuration than that shown in FIG.
- the memory 901 can be used to store data, software programs, and modules; and mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; and the storage data area can be stored according to model parameters. Data created by the use of the fusion device, etc. Further, the memory may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
- the processor 902 is a control center of the controller that connects various portions of the entire controller using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 901, and recalling data stored in the memory 901.
- the controller performs various functions and processing data to monitor the controller as a whole.
- the processor 902 may include one or more processing units; preferably, the processor 502 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
- the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 902.
- the power supply assembly 903 is used to provide power to various components of the controller, and the power supply assembly 503 can include a power management system, one or more power supplies, and other components associated with controller generation, management, and distribution of power.
- the input/output interface 904 provides an interface between the processor 902 and the peripheral interface module.
- the peripheral interface module can be a keyboard, a mouse, or the like.
- Communication component 905 is configured to facilitate wired or wireless communication between the controller and other devices.
- the controller can access a wireless network based on communication standards such as WiFi, 2G or 3G, or a combination thereof.
- controller may further include an audio component, a multimedia component, etc., the present invention
- audio component e.g., a speaker, a microphone, etc.
- multimedia component e.g., a graphics processing unit, etc.
- the controller provided by the embodiment of the present invention, by grouping nodes in the machine learning system, the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
- the distribution group includes: at least one of the parameter collection groups includes a node that is different from the parameter distribution group corresponding to the parameter collection group, thereby solving the problem that the parameter server has high performance requirements and dynamically adjusts the calculation resource in the parameter parameter fusion. .
- the embodiment of the present invention provides a machine learning system, which includes the model parameter fusion device described in Embodiment 6, and the controller described in Embodiment 7.
- a model parameter fusion device obtains a first model parameter by performing a group aggregation by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then, The first model parameters of each parameter collection group in the parameter collection group are integrally fused, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and when the preset conditions are met,
- the controllers regroup the nodes in the parameter collection group and the parameter distribution group, which solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Geometry (AREA)
- Human Computer Interaction (AREA)
- Mobile Radio Communication Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Architecture (AREA)
Abstract
Description
节点标识 | 节点编号 |
192.168.1.1 | 2 |
192.168.1.2 | 0 |
192.168.1.3 | 3 |
192.168.1.4 | 1 |
192.168.1.4 | 5 |
192.168.1.4 | 4 |
Claims (24)
- 一种模型参数融合方法,其特征在于,所述方法应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少有一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述方法包括:在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
- 根据权利要求1所述方法,其特征在于,所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;根据接收的所述M个节点的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
- 根据权利要求1所述的方法,其特征在于,所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:获取所述满足条件的参数收集组中节点的状态信息;根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述满足条件的参数收集组的所述第一模型参数。
- 根据权利要求3所述的方法,其特征在于,所述根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,包括:根据所述满足条件的参数收集组中节点的状态信息,确定所述满足条件的参数收集组中s个完成迭代的节点;指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数,其中,所述x<所述s;若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合,其中,所述y≥所述s。
- 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
- 根据权利要求5所述的方法,其特征在于,所述将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数,包括:从所述W个参数收集组中确定一个节点作为组间融合节点;在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数;或者,分别从W个参数收集组中的每个参数收集组确定一个节点,将所述确定的节点确定为新参数收集组;当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
- 根据权利要求1所述的方法,其特征在于,所述向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,包括:通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述满足条件的参数收集组的第一模型参数;或者,向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的所述第一模型参数。
- 根据权利要求1或2所述的方法,其特征在于,所述机器学习***还包括参数服务器,一个参数收集组以及对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
- 根据权利要求8所述的方法,其特征在于,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
- 根据权利要求1-9任一项所述的方法,其特征在于,所述方法还包括:在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
- 根据权利要求10所述的方法,其特征在于,所述将所述参数收集组和所述参数分发组中包括的节点进行重新分组,包括:基于预设的节点标识与节点编号之间的对应关系,以及参数收集组个数和参数分发组个数,用节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分 发组余数相同的节点确定为同一参数分发组。
- 一种模型参数融合装置,其特征在于,所述装置应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述装置包括:第一融合单元,用于在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;第一发送单元,用于向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
- 根据权利要求12所述装置,其特征在于,所述第一融合单元包括:接收模块,用于接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;融合模块,用于根据接收的所述M个节点发送的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
- 根据权利要求12所述的装置,其特征在于,所述第一融合单元包括:获取模块,用于获取所述满足条件的参数收集组中节点的状态信息;指示模块,用于根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述参数收集组的第一模型参数。
- 根据权利要求14所述的装置,其特征在于,所述指示模块具体用于:根据所述参数收集组中节点的状态信息,确定所述参数收集组中s个完成迭代的节点;指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数,其中,所述x<所述s;若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合,其中,所述y≥所述s。
- 根据权利要求12-15任一项所述的装置,其特征在于,所述装置还包括:第二融合单元,用于在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;第三融合单元,用于将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;第二发送单元,用于将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
- 根据权利要求16所述的装置,其特征在于,所述第三融合单元具体用于:从所述W个参数收集组中确定一个节点作为组间融合节点;在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数;或者,分别从所述W个参数收集组包括的每个参数收集组确定一个节点,将所述确定的节点确定为新参数收集组;当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
- 根据权利要求12所述的装置,其特征在于,所述第一发送单元具体用于:通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述参数收集组的第一模型参数;或者,向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的第一模型参数。
- 根据权利要求12或13所述的装置,其特征在于,所述机器学习***还包括参数服务器,一个参数收集组以及对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
- 根据权利要求19所述的装置,其特征在于,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
- 根据权利要求12-20任一项所述的装置,其特征在于,所述装置还包括:第一分组单元,用于在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
- 根据权利要求21所述的装置,其特征在于,所述第一分组单元具体用于:基于预设的节点标识与节点编号之间的对应关系、以及参数收集组个数和参数分发组个数,用所述节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
- 一种模型参数融合装置,其特征在于,所述模型参数融合装置包括处理器和存储器,所述存储器中存储代码和数据,所述处理器可运行存储器中的代码,所述处理器用于执行上述权利要求1-11任一项 所述的模型参数融合方法。
- 根据权利要求23所述的装置,其特征在于,所述模型参数融合装置为参数服务器,所述参数服务器独立于所述节点设置,或者配置在所述节点上。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020187017016A KR102118073B1 (ko) | 2015-11-16 | 2015-11-16 | 모델 파라미터 조합 방법 및 장치 |
EP15908513.3A EP3370159A4 (en) | 2015-11-16 | 2015-11-16 | Model parameter fusion method and apparatus |
PCT/CN2015/094722 WO2017084016A1 (zh) | 2015-11-16 | 2015-11-16 | 模型参数融合方法及装置 |
CN201580001411.5A CN107209746B (zh) | 2015-11-16 | 2015-11-16 | 模型参数融合方法及装置 |
US15/980,866 US11386350B2 (en) | 2015-11-16 | 2018-05-16 | Model parameter combination method and apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2015/094722 WO2017084016A1 (zh) | 2015-11-16 | 2015-11-16 | 模型参数融合方法及装置 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/980,866 Continuation US11386350B2 (en) | 2015-11-16 | 2018-05-16 | Model parameter combination method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017084016A1 true WO2017084016A1 (zh) | 2017-05-26 |
Family
ID=58717192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/094722 WO2017084016A1 (zh) | 2015-11-16 | 2015-11-16 | 模型参数融合方法及装置 |
Country Status (5)
Country | Link |
---|---|
US (1) | US11386350B2 (zh) |
EP (1) | EP3370159A4 (zh) |
KR (1) | KR102118073B1 (zh) |
CN (1) | CN107209746B (zh) |
WO (1) | WO2017084016A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109447274A (zh) * | 2017-08-30 | 2019-03-08 | 第四范式(北京)技术有限公司 | 用于执行机器学习的分布式***及其方法 |
US20210271975A1 (en) * | 2019-04-10 | 2021-09-02 | Tencent Technology (Shenzhen) Company Limited | User tag generation method and apparatus, storage medium, and computer device |
US11373116B2 (en) * | 2015-11-16 | 2022-06-28 | Huawei Technologies Co., Ltd. | Model parameter fusion method and apparatus |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10521332B1 (en) * | 2018-09-28 | 2019-12-31 | Dspace Digital Signal Processing And Control Engineering Gmbh | Parametrization of a simulation model |
US11829888B2 (en) | 2019-03-27 | 2023-11-28 | International Business Machines Corporation | Modifying artificial intelligence models using model fragments |
CN110705177B (zh) * | 2019-09-29 | 2023-05-16 | 支付宝(杭州)信息技术有限公司 | 基于机器学习的终端风险评估模型的生成方法及其*** |
KR102295948B1 (ko) * | 2019-11-26 | 2021-08-30 | 한전케이디엔주식회사 | 연합 학습을 통한 인공지능 기반 보안관제 시스템 및 방법 |
CN111191792B (zh) * | 2019-12-11 | 2022-07-15 | 深圳平安医疗健康科技服务有限公司 | 数据分发方法、装置和计算机设备 |
CN111178443B (zh) * | 2019-12-31 | 2023-10-31 | 东软集团股份有限公司 | 模型参数选择、图像分类、信息识别方法及装置、设备 |
WO2021097494A2 (en) * | 2020-05-30 | 2021-05-20 | Futurewei Technologies, Inc. | Distributed training of multi-modal machine learning models |
CN114221871A (zh) * | 2021-04-09 | 2022-03-22 | 无锡江南计算技术研究所 | 一种网格化流水的全收集方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011129819A1 (en) * | 2010-04-13 | 2011-10-20 | Empire Technology Development Llc | Combined-model data compression |
CN104463324A (zh) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | 一种基于大规模高性能集群的卷积神经网络并行处理方法 |
CN104699894A (zh) * | 2015-01-26 | 2015-06-10 | 江南大学 | 基于实时学习的高斯过程回归多模型融合建模方法 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012093899A (ja) * | 2010-10-26 | 2012-05-17 | Hitachi Ltd | 計算機システム、シミュレーション方法、及びプログラム |
US9633315B2 (en) | 2012-04-27 | 2017-04-25 | Excalibur Ip, Llc | Method and system for distributed machine learning |
CN104463424A (zh) | 2014-11-11 | 2015-03-25 | 上海交通大学 | 众包中任务最优分配方法及其*** |
CN104834709B (zh) * | 2015-04-29 | 2018-07-31 | 南京理工大学 | 一种基于负载均衡的并行余弦模式挖掘方法 |
EP3745284A1 (en) * | 2015-11-16 | 2020-12-02 | Huawei Technologies Co., Ltd. | Model parameter fusion method and apparatus |
-
2015
- 2015-11-16 CN CN201580001411.5A patent/CN107209746B/zh active Active
- 2015-11-16 WO PCT/CN2015/094722 patent/WO2017084016A1/zh active Application Filing
- 2015-11-16 EP EP15908513.3A patent/EP3370159A4/en active Pending
- 2015-11-16 KR KR1020187017016A patent/KR102118073B1/ko active IP Right Grant
-
2018
- 2018-05-16 US US15/980,866 patent/US11386350B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011129819A1 (en) * | 2010-04-13 | 2011-10-20 | Empire Technology Development Llc | Combined-model data compression |
CN104463324A (zh) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | 一种基于大规模高性能集群的卷积神经网络并行处理方法 |
CN104699894A (zh) * | 2015-01-26 | 2015-06-10 | 江南大学 | 基于实时学习的高斯过程回归多模型融合建模方法 |
Non-Patent Citations (3)
Title |
---|
SEBASTIAN RIEDEL ET AL., MODEL COMBINATION FOR EVENT EXTRACTION IN BIONLP 2011, 31 December 2011 (2011-12-31), pages 51 - 55, XP055382437 * |
See also references of EP3370159A4 * |
WANG, YANG. ET AL.: "Multiple Rank Aggregation Based on Directly Optimizing Performace Measure", CHINESE JOURNAL OF COMPUTER, vol. 37, no. 8, 31 August 2014 (2014-08-31), pages 1658 - 1668, XP009506019 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11373116B2 (en) * | 2015-11-16 | 2022-06-28 | Huawei Technologies Co., Ltd. | Model parameter fusion method and apparatus |
CN109447274A (zh) * | 2017-08-30 | 2019-03-08 | 第四范式(北京)技术有限公司 | 用于执行机器学习的分布式***及其方法 |
US20210271975A1 (en) * | 2019-04-10 | 2021-09-02 | Tencent Technology (Shenzhen) Company Limited | User tag generation method and apparatus, storage medium, and computer device |
Also Published As
Publication number | Publication date |
---|---|
CN107209746B (zh) | 2019-10-22 |
KR102118073B1 (ko) | 2020-06-02 |
EP3370159A1 (en) | 2018-09-05 |
CN107209746A (zh) | 2017-09-26 |
EP3370159A4 (en) | 2018-12-26 |
KR20180082577A (ko) | 2018-07-18 |
US11386350B2 (en) | 2022-07-12 |
US20180260739A1 (en) | 2018-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017084016A1 (zh) | 模型参数融合方法及装置 | |
CN108028803B (zh) | 用于确定网络中的服务方案的拓扑的方法、控制器和*** | |
US11373116B2 (en) | Model parameter fusion method and apparatus | |
CN110069341B (zh) | 边缘计算中结合功能按需配置的有依赖关系任务的调度方法 | |
CN104052803A (zh) | 一种去中心化的分布式渲染方法及渲染*** | |
TW201717066A (zh) | 叢集運算架構的資源規劃方法、系統及裝置 | |
CN110209549B (zh) | 数据处理方法、相关装置、相关设备和*** | |
CN103327121A (zh) | 一种p2p网络资源传输方法和装置 | |
US20230281513A1 (en) | Data model training method and apparatus | |
CN105703927A (zh) | 一种资源分配方法、网络设备和网络*** | |
CN107656807A (zh) | 一种虚拟资源的自动弹性伸缩方法及装置 | |
CN106201715A (zh) | 一种任务调度方法和装置 | |
CN104883585A (zh) | 显示媒体数据的方法、设备及*** | |
CN110891087B (zh) | 一种日志传输方法、装置及电子设备和存储介质 | |
CN111711702B (zh) | 一种基于通信拓扑的分布式协同交互方法及*** | |
CN110557679A (zh) | 一种视频内容识别方法、设备、介质和*** | |
CN113254215B (zh) | 数据处理方法和装置、存储介质及电子设备 | |
CN112543354B (zh) | 业务感知的分布式视频集群高效伸缩方法和*** | |
CN109474696A (zh) | 一种网络服务方法、装置、电子设备及可读存储介质 | |
CN110362575B (zh) | 一种生成数据的全局索引的方法及装置 | |
CN106294721A (zh) | 一种集群数据统计及导出方法及装置 | |
CN113015179A (zh) | 基于深度q网络的网络资源选择方法、装置以及存储介质 | |
CN104780562B (zh) | 一种处理数据的方法、装置及*** | |
WO2024139573A1 (zh) | 阈值确定方法、装置、存储介质及电子装置 | |
EP4383665A1 (en) | System and method for finding configuration mappings in monitoring networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15908513 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2015908513 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 20187017016 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020187017016 Country of ref document: KR |