WO2017084016A1 - 模型参数融合方法及装置 - Google Patents

模型参数融合方法及装置 Download PDF

Info

Publication number
WO2017084016A1
WO2017084016A1 PCT/CN2015/094722 CN2015094722W WO2017084016A1 WO 2017084016 A1 WO2017084016 A1 WO 2017084016A1 CN 2015094722 W CN2015094722 W CN 2015094722W WO 2017084016 A1 WO2017084016 A1 WO 2017084016A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
group
node
nodes
parameter collection
Prior art date
Application number
PCT/CN2015/094722
Other languages
English (en)
French (fr)
Inventor
邵云峰
徐君
莫塔扎维马苏德
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020187017016A priority Critical patent/KR102118073B1/ko
Priority to EP15908513.3A priority patent/EP3370159A4/en
Priority to PCT/CN2015/094722 priority patent/WO2017084016A1/zh
Priority to CN201580001411.5A priority patent/CN107209746B/zh
Publication of WO2017084016A1 publication Critical patent/WO2017084016A1/zh
Priority to US15/980,866 priority patent/US11386350B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/10Requirements analysis; Specification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Definitions

  • the present invention relates to the field of machine learning, and in particular, to a method and device for blending model parameters.
  • the model parameter refers to the parameter of the description model composed of multiple constraint parameters.
  • the model parameters can be used to filter the data with common features. For example, when the model parameters are image class model parameters, different model parameters can be used. The image data of the person, the animal, or the face is selected from the image data. With the rapid growth of data volume and data types, there are more and more model parameters for data screening, and these model parameters are obtained by multiple calculations and fusions of a large number of data with common characteristics.
  • the model parameter fusion divides the data into multiple data subsets, and assigns them to different nodes to train the assigned data subsets using the data iterative calculation method. Each time one or more iterations are calculated, each node pair is The model parameters obtained by training different data subsets are merged once, and the merged model parameters are used as initial model parameters for the next iteration calculation. After multiple fusions, the final total model parameters are obtained.
  • model parameter fusion there are mainly two methods for model parameter fusion: the first one is a model in which the parameter server trains each node to multiple data subsets after multiple nodes perform multiple iteration calculations on multiple data subsets. The parameters are summarized and merged to obtain new model parameters. Then, each node performs the next iteration calculation on the plurality of data subsets according to the new model parameters; the second is when a node assigns a subset of the data to it. After multiple iterations, the node sends the model parameters obtained by training the assigned subset of data to the designated other nodes for model parameter fusion with the data subsets of other nodes, and then the node receives the data according to itself.
  • the other nodes begin to iteratively calculate the model parameters transmitted after training other data subsets.
  • the first type of parameter server for performing model parameter fusion has higher performance requirements, and is prone to downtime.
  • the second type requires more data to be stored, and the data transmission amount is large.
  • Embodiments of the present invention provide a method and a device for merging model parameters, which are used to solve the problem of high performance requirements and large data transmission capacity of a parameter server in model parameter fusion.
  • a model parameter fusion method is provided, the method being applied to a machine learning system, the machine learning system comprising at least one parameter collection group and at least one parameter distribution group, each parameter collection group corresponding to at least one parameter distribution group
  • Each parameter collection group includes at least one node
  • each parameter distribution group includes at least one node
  • at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
  • the method includes :
  • the parameter collection group minimum fusion node number s ⁇ the M ⁇ the parameter collection group that satisfies the condition includes the total number of nodes;
  • the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
  • the M nodes that have completed the calculation of the current model parameters are selected from the parameter collection group, and the M nodes are selected.
  • the model parameters calculated by the nodes are fused to obtain the first model parameters.
  • the parameter collection group corresponds to the parameter distribution group, that is, one parameter collection group can correspond to one or more parameter distribution groups. Therefore, when the parameter collection group is merged to obtain the first model parameter, if the group is satisfied,
  • the internal distribution condition may be based on the correspondence between the parameter collection group and the parameter distribution group, and the first model parameter is sent to all nodes in the corresponding parameter distribution group, or part of the nodes.
  • the intra-group distribution condition may be that the number of times of intra-group merging reaches a preset number of times, or a preset duration, and the like, which is not limited by the embodiment of the present invention.
  • the parameter collection group performs a new round based on the first model parameters obtained by the fusion. Generation calculation, and each time the M nodes are merged, the first model parameters are updated once, and when the intra-group distribution conditions are met, the first model parameters are distributed.
  • the address information participating in the first model parameter fusion may also be sent to the node in the parameter distribution group, and the address information may be the IP address of the node or The node number and the like are not limited in the present invention.
  • the minimum number of fusion nodes s, M, and N can be set in advance, and the parameter collection group satisfying the condition of s ⁇ M ⁇ includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution corresponding to the parameter collection group that satisfies the condition The group contains the total number of nodes.
  • the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, and may be
  • the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, or all the nodes included in the parameter collection group and all the nodes included in the parameter distribution group corresponding to the parameter collection group are different. .
  • the merging the model parameters of the M nodes in the parameter collection group that meet the condition, and obtaining the parameter collection group that satisfies the condition including:
  • the method may be completed by a device independent of the parameter collection group, for example, a parameter server, which may be operated by a fixed node.
  • a parameter server which may be operated by a fixed node.
  • the M nodes that complete the iteration in the parameter collection group respectively send the model parameters calculated by the current iteration to the parameter server, and when the parameter server receives the model parameters sent by the M nodes, the parameter server may pass multiple Different fusion modes fuse the model parameters corresponding to the M nodes to obtain the first model parameters.
  • the plurality of different fusion modes may be: the parameter server merges the model parameters corresponding to the M nodes at a time to obtain the first model parameter; or each node sends the parameter to the parameter server after completing the iteration, The parameter server receives the parameters from the node and fuses the parameters, after multiple receiving and merging processes, until the M nodes complete the fusion,
  • the embodiment of the present invention does not limit this.
  • the correspondence between the parameter server and the parameter collection group and the parameter distribution group corresponding to the parameter collection group may be set in advance.
  • the merging the model parameters of the M nodes in the parameter collection group that meet the condition, and obtaining the parameter collection group that satisfies the condition including:
  • the node state information may include a node identifier and a node order of completing the iteration.
  • the method may be completed by a node in the parameter collection group, and the node may be referred to as a control node, and the control node may be specified in advance, or may be temporarily recommended by a node in the parameter collection group.
  • the control node can count the state information of the nodes in the parameter collection group and indicate the delivery and fusion instructions of the model parameters.
  • control node when the control node collects the state information of the nodes in the group according to the parameter, and indicates that the M nodes that complete the iteration perform the fusion, the control node may indicate that the M nodes that complete the iteration are fused by different combinations, for example, control.
  • the node may instruct the M nodes to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first model parameter, or the control node performs the third possible implementation manner of the first aspect described below.
  • the merging is performed to improve the merging of the M nodes to obtain the efficiency of the first model parameter.
  • the control node may also be fused by other combinations, which is not limited by the embodiment of the present invention.
  • the parameter information of the node in the group is collected according to the parameter that meets the condition, and the parameter collection that meets the condition is indicated.
  • the M nodes that complete the iteration in the group perform model parameter fusion, including:
  • One of the s nodes indicating completion of the iteration merging the model parameters of the s nodes number
  • the control node After determining the s nodes that complete the iteration in the parameter collection group, the control node indicates that one of the s nodes is used as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
  • the fusion node associates the model parameters corresponding to the s nodes.
  • the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
  • the relationship between the number of newly added nodes and the s size can be divided into two cases:
  • the x nodes are added.
  • the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
  • y nodes are added.
  • the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
  • the remaining nodes of the M nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve the M nodes.
  • the efficiency of the fusion of the model parameters can also be fused by other means, which is not limited by the embodiment of the present invention.
  • one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
  • the method further includes:
  • the W parameter collection group is determined by the upper layer parameter collection group of the W parameter collection groups, and the W ⁇ the upper layer parameter collection group includes the total number of groups.
  • the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times, or a certain period of time.
  • the intra-group fusion condition is that the number of intra-group fusions of the parameter collection group reaches a preset number of times, when the number of intra-group fusions of the W parameter collection groups reaches a preset number, each parameter in the W parameter collection group is obtained.
  • the parameter collection group can integrally integrate the current model parameters of all nodes in the group to obtain the second model parameters, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
  • all the nodes of each parameter collection group can send the current model parameters to a node in the group, and the node integrates the current model parameters of all the nodes to obtain the second model parameters, and of course, other
  • the embodiment is not limited by the embodiment of the present invention.
  • the third model parameter is sent to the nodes of the W parameter collection group, not only by broadcast, but also by iteratively, that is, the node that finally completes the fusion, and the third model parameters are respectively sent to
  • the W parameter collection group includes a node in the parameter collection, and the node sequentially sends the third model parameter to the other nodes participating in the inter-group fusion.
  • the third model parameter is sent to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, and the transmission mode may also be a broadcast mode or an iterative manner.
  • the transmission may be performed not only through the broadcast mode but also through the iterative method, that is, the node that finally completes the fusion,
  • the three model parameters are respectively sent to the first nodes of the parameter distribution group of the W parameter collection group, and the third model parameters are iteratively sent to other nodes in the parameter distribution group of the previous layer.
  • the first node refers to a node responsible for receiving W parameter collection group model parameters.
  • the third model parameter is sent to the node in each lower layer parameter distribution group in the parameter distribution group of the upper layer, wherein the sending mode may also be a broadcast mode or an iterative manner.
  • the merging the second model parameters of each parameter collection group in the W parameter collection groups get the third model parameters, including:
  • the second model parameter of the corresponding parameter collection group is sent to the group Converging the nodes, so that the inter-group fusion node fuses the second model parameters of the W parameter collection groups to obtain the third model parameters;
  • inter-group fusion node may be a node recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number. Not limited.
  • the node responsible for the overall fusion in the parameter collection group may be selected.
  • the second model parameters of the W parameter collection groups that satisfy the intra-group fusion condition are merged to obtain a third model parameter.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
  • the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
  • the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
  • the And sending by the N nodes in the parameter distribution group corresponding to the parameter collection group of the sufficient condition, the first model parameter of the parameter collection group that meets the condition, including:
  • the method further includes:
  • the parameter collection group and the nodes included in the parameter distribution group are regrouped when the preset condition is satisfied.
  • the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
  • the method of re-grouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped according to the node grouping method provided by the second aspect of the present invention, and the present invention will not be repeated herein.
  • a node grouping method for use in a machine learning system, the machine learning system comprising at least two nodes, the method comprising:
  • the included node is different from the node included in the parameter distribution group corresponding to the parameter collection group.
  • Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
  • the parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group.
  • the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes are not with.
  • the number of nodes of the different parameter collection group is the same or different; and/or,
  • the number of nodes in different parameter distribution groups is the same or different; and/or,
  • the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
  • the machine learning system may further include a parameter server, a parameter collection group, and the parameter collection group.
  • the parameter distribution group corresponds to the same parameter server, and the parameter collection group corresponding to the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to different parameter servers.
  • the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the The parameter distribution group corresponding to the parameter collection group corresponds to the layer 1 parameter server, where 1 ⁇ j ⁇ j+1 ⁇ Y.
  • the grouping the nodes in the machine learning system includes:
  • the node identifier is used to uniquely identify the node.
  • the node identifier may be an IP address of the node, a sequence code of the node, and the like.
  • the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
  • the node number of each node can be changed, and the number of parameter collection groups and parameter distribution groups can also be changed, and the correspondence between the parameter collection group and the parameter distribution group Relationships can also change accordingly.
  • the parameter The number of collection groups and the number of the parameter distribution groups are determined, and the parameter collection group and the parameter distribution group are determined, including:
  • the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
  • a model parameter fusion device being applied to a machine learning system, the machine learning system comprising at least one parameter collection group and at least one parameter distribution group, each parameter collection group corresponding to at least one parameter distribution group
  • Each parameter collection group includes at least one node
  • each parameter distribution group includes at least one node
  • at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
  • the device includes :
  • a first merging unit configured to: when any parameter collection group satisfies the intra-group fusion condition, fuse the model parameters of the M nodes in the parameter collection group that meet the condition, and obtain the first model parameter of the parameter collection group that satisfies the condition
  • the parameter collection group that satisfies the condition has a minimum number of fusion nodes s ⁇ ⁇ M ⁇ the parameter collection group that satisfies the condition includes the total number of nodes;
  • a first sending unit configured to send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that satisfies the condition, where, 1 ⁇ N ⁇
  • the parameter distribution group corresponding to the parameter collection group that satisfies the condition includes the total number of nodes.
  • the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
  • the minimum number of fusion nodes s, M, and N can be set in advance, and the parameter collection group satisfying the condition of s ⁇ M ⁇ includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution corresponding to the parameter collection group that satisfies the condition The group contains the total number of nodes.
  • the first merging unit includes:
  • a receiving module configured to receive model parameters of the M nodes sent by the M nodes that complete the iteration in the parameter collection group that meets the condition
  • a fusion module configured to perform fusion according to the received model parameters of the M nodes, to obtain a first model parameter of the parameter collection group that satisfies the condition.
  • the fusion module can fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters. For example, the fusion module fuses the model parameters corresponding to the M nodes at one time, and obtains The first model parameter; or each node sends the model parameters to the fusion module after completing the iteration, the fusion module receives the parameters from the node and fuses, after multiple receiving and merging processes, until the M nodes complete the fusion,
  • the first model parameter and the like are not limited in the embodiment of the present invention.
  • the first merging unit includes:
  • an obtaining module configured to obtain state information of the node in the parameter collection group that satisfies the condition; wherein the node state information may include a node identifier and a node order of completing the iteration.
  • the indication module is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
  • the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
  • the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the method of the parameter collection group.
  • the first model parameter is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter
  • the indication module may indicate that the M nodes that complete the iteration are fused by different combinations.
  • the M nodes may be instructed to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first
  • the model parameter, or the indication module is fused by the third possible implementation manner of the following third aspect to improve the efficiency of the M-node to obtain the first model parameter.
  • the indication module may also adopt other combinations.
  • the embodiment is not limited by the embodiment of the present invention.
  • the indication module is specifically configured to:
  • One of the s nodes indicating completion of the iteration fuses the model parameters of the s nodes; at this time, the node may be referred to as a fusion node.
  • the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
  • the fusion node merges the model parameters corresponding to the s nodes, If a new node completes the iteration, it can be divided into two cases according to the relationship between the number of newly added nodes and the size of the s:
  • the x nodes are added.
  • the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
  • y nodes are added.
  • the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
  • the indication module may indicate that the remaining nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve
  • the efficiency of the fusion of the model parameters of the M node may be performed by other means, which is not limited by the embodiment of the present invention.
  • one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
  • the device further includes:
  • a second merging unit configured to perform overall merging of model parameters of nodes in each parameter collection group of the W parameter collection groups when the inter-group fusion condition is met between the W parameter collection groups, to obtain the W The second model parameter of each parameter collection group in the parameter collection group;
  • the W parameter collection group is determined by the upper layer parameter collection group of the W parameter collection groups, where the W ⁇ the upper layer parameter collection group includes the total number of groups;
  • the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times.
  • the second merging unit is configured to collect all the nodes in the group for each parameter collection group in the W parameter collection group when the number of times of fusion in the group of the W parameter collection groups reaches a preset number of times.
  • the current model parameters are integrally fused, and the second model parameters are obtained, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
  • a third fusion unit configured to fuse the second model parameters of each parameter collection group in the W parameter collection group to obtain a third model parameter
  • a second sending unit configured to send the third model parameter to a node of the W parameter collection group or to a node of the parameter distribution group of the W parameter collection group.
  • the second sending unit may not only transmit in a broadcast manner but also in an iterative manner, that is, the second sending unit separately sends the third model parameter to one of each parameter collection included in the W parameter collecting groups. A node by which the third model parameters are iteratively sent to other nodes in the group.
  • the third model parameter is sent to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, wherein the sending mode may also adopt a dedicated broadcast mode or an iterative manner.
  • the third merging unit is specifically configured to:
  • the second model parameter of the corresponding parameter collection group is sent to the group
  • the fusion node is configured to fuse the second model parameters of the W parameter collection groups to obtain a third model parameter.
  • the node responsible for the overall fusion in the parameter collection group may be selected.
  • the second model parameters of the W parameter collection groups that satisfy the intra-group fusion condition are merged to obtain a third model parameter.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
  • the node responsible for the overall fusion in each parameter collection group may be selected, or the section with the smallest number may be selected.
  • the present invention does not limit this.
  • the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
  • the first sending unit is specifically configured to:
  • the device further includes:
  • a first grouping unit configured to re-group the parameter collection group and the nodes included in the parameter distribution group when the preset condition is met.
  • the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
  • the step of regrouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided by the fourth aspect of the present invention, and the present invention is not described herein.
  • a node grouping apparatus for use in a machine learning system, the machine learning system comprising at least two nodes, the apparatus comprising:
  • a second grouping unit configured to group nodes in the machine learning system, such that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter distribution group. At least one of the parameter collection groups includes nodes that are different from the nodes included in the parameter distribution group corresponding to the parameter collection group.
  • Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
  • the parameter collection group includes nodes that are different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes and corresponding parameter points.
  • the node included in the parameter collection group is not the same, and may include: at least one node in the parameter collection group is different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all nodes included in the parameter collection group and the node
  • the parameter distribution group corresponding to the parameter collection group is different for all nodes included.
  • the number of nodes of the different parameter collection group is the same or different;
  • the number of nodes in different parameter distribution groups is the same or different; and/or,
  • the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
  • the machine learning system further includes a parameter server, a parameter collection group, and a corresponding parameter distribution group corresponding to the same
  • the parameter server, different parameter collection groups, and corresponding parameter distribution groups correspond to different parameter servers.
  • the parameter server includes a Y layer, and a parameter server of the j+1 layer corresponds to the jth layer
  • the at least one parameter server, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server, where 1 ⁇ j ⁇ j+1 ⁇ Y.
  • the second grouping unit specifically includes:
  • a first determining module configured to determine a correspondence between a node identifier and a node number
  • a second determining module configured to determine the number of the parameter collection groups, and the number of the parameter distribution groups
  • a third determining module configured to determine a parameter collection group and a parameter distribution group based on a correspondence between the node identifier and the node number, the number of the parameter collection group, and the number of the parameter distribution group;
  • a fourth determining module configured to determine a correspondence between the parameter collection group and the parameter distribution group.
  • the node identifier is used to uniquely identify the node.
  • the node identifier may be an IP address of the node, a sequence code of the node, and the like.
  • the node number can be The serial number assigned to the node by the machine may also be any value randomly assigned to the node, etc., and the present invention is also not limited thereto.
  • the node number of each node can be changed, and the number of parameter collection groups and parameter distribution groups can also be changed, and the correspondence between the parameter collection group and the parameter distribution group Relationships can also change accordingly.
  • the third determining module is specifically configured to:
  • the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
  • a model parameter fusion device comprising a processor and a memory, wherein the memory stores code and data, the processor can execute code in the memory, and the processor is configured to execute The model parameter fusion method of any of the above-mentioned first aspect to the seventh possible implementation of the first aspect.
  • the model parameter fusion device is a parameter server, and the parameter server is set independently of the node or configured on the node.
  • a controller comprising a processor and a memory, the memory storing code and data, the processor being operable to execute code in a memory, the processor for performing the second aspect to The node grouping method of any of the possible implementations of the fifth aspect of the second aspect.
  • a machine learning system comprising the model parameter fusion device according to any one of the first to fifth aspects of the fifth aspect, and the sixth aspect A controller as described.
  • a model parameter fusion method and device provided by an embodiment of the present invention obtains a first model parameter by using a parameter collection group to perform intra-group fusion, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, thereby solving the model parameter fusion.
  • Parameter server performance requirements High and large data transmission.
  • FIG. 1 is a schematic structural diagram of a machine learning system according to an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of a model parameter fusion method according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a parameter server according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of a method for grouping nodes according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a model parameter fusion apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another model parameter fusion apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of still another model parameter fusion apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a model parameter fusion apparatus according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a controller according to an embodiment of the present invention.
  • the machine learning system architecture applied by the embodiment of the present invention is shown in FIG. 1.
  • the system architecture diagram includes a data storage device 101, a model parameter training platform 102, and a model parameter storage device 103.
  • the data storage device 101 can be a data storage server 101, and the data is stored.
  • the storage server 101 can be used to store raw data for model parameter training, and the storage capacity of the data storage server 101 is much larger than the storage capacity of the computing server 1021 in the model training platform 102.
  • the original data may be language data, image data, video data, etc., and the original data is composed of a plurality of data sets, and each data set further comprises a plurality of type subsets, each type subset having a representation type
  • the data label has the same label of the subset of the types included in the same data set.
  • the data set may be an image containing multiple characters with a person's label, or may contain multiple animal images with animal labels. , or other categories of images, and so on.
  • the model parameter training platform 102 includes a computing server 1021 for iterative computing, which may also be referred to as a node, and may be a general computer, a mobile terminal, a workstation or a general-purpose server, a dedicated server, etc., and is used to perform data communication between computing servers. Switch 1022.
  • the computing server 1021 has local storage and its capacity is smaller than the data storage server 101.
  • each computing server reads certain data from the data storage server 101 into the local storage device for sampling model parameters by sampling.
  • the model parameter training platform 102 can obtain a total model parameter of the final fusion output by performing model parameter training fusion on the data set with the data label, and the data type of the new data can be identified by the total model parameter.
  • the image of the person in the new image data can be identified by the final output model parameter, and the model parameter fusion can be performed by using the image dataset with the animal tag.
  • An animal image or the like in the new image data is identified by the finally output model parameters.
  • the model parameter storage server 103 is configured to store the model parameters obtained by the training.
  • the model parameters obtained by the final fusion may be sent to the model parameter storage server 103 to be stored by the model parameter storage server 103.
  • the model parameters originally used by the calculation server 1021 in the model parameter platform 102 for performing model parameter training fusion may also be acquired from the model parameter storage server 103.
  • each parameter collection group corresponds to at least one parameter.
  • Distribution group, included in each parameter collection group At least one node, each parameter distribution group includes at least one node, and at least one parameter collection group includes a node that is different from a node included in the corresponding parameter distribution group, and the method includes the following steps.
  • Step 201 A node for performing model parameter fusion acquires a data subset of the data set.
  • the data set refers to a data set used for iterative calculation of model parameters, and the data set may be language data, image data, video data, etc., and the data set is composed of multiple types of subsets, each type of sub- The set has data labels for representing categories, and the labels of the subset of types included in the same data set are the same.
  • the data set may be stored in a storage device such as a hard disk or a disk in advance, or may be stored in a data storage server in advance.
  • the storage device may directly connect to the device where the node is located. To get a subset of data, or to get data from a data storage server.
  • the node when the node acquires the data subset in the data set, the node can extract a certain amount of data from the data set, if Knowing the computing power of each node in advance, the data amount of the data subset acquired by the node can be allocated according to the computing power of the node.
  • the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, that is, At least one parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may also refer to all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes are different.
  • Step 202 Each node performs iterative calculation based on the data subset and the current model parameters.
  • each node can perform iterative calculation based on the acquired data subset and the initial model parameters.
  • each node can be based on the data subset and the currently obtained model parameters. Perform the next iteration calculation.
  • the initial model parameters refer to the initial model parameters of each node, and the initial model parameters of each node may be the same.
  • the currently obtained model parameters mean that each node is completed.
  • Step 203 When any parameter collection group satisfies the intra-group fusion condition, the model parameters of the M nodes in the parameter collection group satisfying the condition are merged, and the first model parameter of the parameter collection group satisfying the condition is obtained, wherein the condition is met.
  • the parameter collection group has the lowest number of fusion nodes s ⁇ M ⁇
  • the parameter collection group that satisfies the condition includes the total number of nodes.
  • the intra-group fusion condition refers to the number of nodes in the parameter collection group that complete the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
  • each parameter collection group may include one or more nodes, and therefore, when the nodes in any parameter collection group satisfy the current model parameter, the number of nodes calculated by the iteration reaches a preset value.
  • the M nodes that have completed the calculation of the current model parameters may be selected from the parameter collection group, and the model parameters calculated by the M nodes are merged to obtain the first model parameters.
  • the minimum number of fusion nodes s and M can be set in advance, and s ⁇ M ⁇ the parameter collection group contains the total number of nodes.
  • the number of parameter collection groups included in the machine learning system may be made in advance, and may be determined after each node obtains the data subset, that is, after the step 201, which is not limited by the embodiment of the present invention.
  • model parameters of the M nodes in the parameter collection group are merged, and the first model parameters obtained by the parameter collection group fusion can be divided into two different methods according to different execution subjects, as described below.
  • the first method receives the M node model parameters sent by the M nodes that complete the iteration in the parameter collection group that satisfies the condition; performs fusion according to the model parameters of the received M nodes, and obtains the parameter collection group that satisfies the condition A model parameter.
  • the method may be completed by a device independent of the parameter collection group, for example, a parameter server, which may be operated by a fixed node.
  • a parameter server which may be operated by a fixed node.
  • the M nodes that complete the iteration in the parameter collection group respectively send the model parameters calculated by the current iteration to the parameter server, and when the parameter server receives the model parameters sent by the M nodes, the parameter The server may fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters.
  • the plurality of different fusion modes may be: the parameter server merges the model parameters corresponding to the M nodes at a time to obtain the first model parameter; or each node sends the parameter to the parameter server after completing the iteration, The parameter server receives the model parameters from the node and performs the merging. After the process of multiple receiving and merging, the merging of the M nodes is completed, the first model parameters and the like are obtained, which is not limited by the embodiment of the present invention.
  • the state information of the node in the parameter collection group that satisfies the condition is obtained, and the node state information may include a node identifier and a node sequence for completing the iteration; and indicating the content is satisfied according to the state information of the node in the parameter collection group that satisfies the condition
  • the M nodes that complete the iteration in the conditional parameter collection group perform model parameter fusion, and obtain the first model parameters of the parameter collection group.
  • the method may be completed by a node in the parameter collection group, and the node may be referred to as a control node, and the control node may be specified in advance, or may be temporarily recommended by a node in the parameter collection group.
  • the control node may collect state information of nodes in the parameter collection group, and instruct other nodes to perform model parameter transmission and fusion.
  • the control node may indicate that the M nodes that complete the iteration are fused by different combinations, for example, the control node may indicate
  • the M nodes send the corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first model parameters, or the control node performs the fusion by the following implementation manner to improve the M nodes for fusion.
  • the efficiency of the first model parameter is obtained.
  • the control node can also be fused by other combinations, which is not limited by the embodiment of the present invention.
  • control node when the control node collects the state information of the node in the group according to the parameter, and indicates that the M nodes that complete the iteration are fused, the control node may determine the parameter collection group according to the state information of the node in the parameter collection group. The s nodes that complete the iteration, and then indicate one of the s nodes that complete the iteration to fuse the model parameters of the s nodes.
  • the control node After determining the s nodes that complete the iteration in the parameter collection group, the control node indicates that one of the s nodes is used as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
  • the fusion node associates the model parameters corresponding to the s nodes.
  • the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
  • the relationship between the number of newly added nodes and the s size can be divided into two cases:
  • x nodes are added.
  • x ⁇ s if the x nodes are completed in the process of model parameter fusion in the iterative s nodes, then one of the x nodes is added.
  • the node fusion adds the model parameters of the x nodes and the model parameters after the s nodes are merged.
  • y nodes are added.
  • y ⁇ s if y nodes are added to complete the iteration in the process of model parameter fusion in the iterative s nodes, then one of the y nodes is added.
  • the node fuses the model parameters of the y nodes, and fuses the model parameters after the fusion of the y nodes with the model parameters after the s nodes are merged.
  • the remaining nodes of the M nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve the M nodes.
  • the efficiency of the fusion of the model parameters can also be fused by other means, which is not limited by the embodiment of the present invention.
  • one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
  • step 204 is performed.
  • the intra-group distribution condition may be that the number of times of intra-group merging reaches a preset number of times, or a preset duration, and the like, which is not limited by the embodiment of the present invention.
  • Step 204 Send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that satisfies the condition, where 1 ⁇ N ⁇ parameter corresponding to the parameter collection group that satisfies the condition
  • the distribution group contains the total number of nodes.
  • the parameter collection group corresponds to the parameter distribution group, that is, one parameter collection group can correspond to one or more parameter distribution groups, and therefore, when the intra-group distribution conditions are met.
  • the first model parameter is sent to the node in the corresponding parameter distribution group, which may be all nodes in the parameter distribution group, or may be partial nodes.
  • the first model of the parameter collection group that satisfies the condition may be sent to the node in the parameter distribution group corresponding to the parameter collection group that satisfies the condition by broadcast.
  • the first model parameter of the parameter collection group that satisfies the condition is sent to the node in the parameter distribution group corresponding to the parameter collection group that satisfies the condition, that is, the parameter collection group corresponding to the condition is correspondingly
  • the first node in the parameter distribution group sends the first model parameter of the parameter collection group that satisfies the condition, so that the first node sequentially sends the parameter satisfying the condition to the remaining nodes of the N nodes except the first node in an iterative manner.
  • the first node sends the first model parameter to the second node
  • the second node sends the first node to the third node, and sequentially sends the iteratively until the first model parameter is sent to the N nodes. All other nodes except the first node.
  • the first node may be any node of the node that completes the iteration in the parameter collection group, or may be a node recommended by the node in the parameter distribution group, which is not limited in this embodiment of the present invention. .
  • the step 204 may be performed by the device in a manner other than the parameter collection group, for example, the parameter server, or may be completed by a node in the parameter collection group, for example, a control node. Not limited.
  • the machine learning system includes a parameter server
  • a parameter collection group and a parameter distribution group corresponding to the parameter collection group correspond to the same parameter server
  • different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
  • the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, and the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
  • a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer
  • the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
  • Step 205 When the inter-group fusion condition is met between the W parameter collection groups, the model parameters of the nodes in each parameter collection group of the W parameter collection groups are integrally integrated, and each parameter in the W parameter collection group is obtained. Collect the second model parameters of the group.
  • the W parameter collection group is determined by the upper parameter collection group of the W parameter collection groups, and the W ⁇ upper layer parameter collection group includes the total number of groups.
  • the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times, or a certain period of time, etc., which is not limited by the embodiment of the present invention.
  • the intra-group fusion condition is that the number of intra-group fusions of the parameter collection group reaches a preset number of times
  • the number of intra-group fusions of the W parameter collection groups reaches a preset number
  • each parameter in the W parameter collection group is obtained.
  • the parameter collection group can integrally integrate the current model parameters of all nodes in the group to obtain the second model parameters, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
  • the above step 203 may be performed by a device other than the parameter collection group, or may be completed by a node in the parameter collection group.
  • the step 205 may also be different. The details are as follows.
  • the parameter server determines whether the inter-group fusion condition is satisfied between the W parameter collection groups, and after satisfying the inter-group fusion condition, the W parameters are The model parameters of the nodes in each parameter collection group of the collection group are integrated.
  • the control node determines whether the inter-group fusion condition is satisfied between the W parameter collection groups, and when the inter-group fusion condition is met, the parameter collection group A node receives model parameters sent by other nodes, and integrates the received model parameters of other nodes. At this time, the node may be referred to as a fusion node.
  • control node determines that the inter-group fusion condition is met between the W parameter collection groups
  • all the nodes of each parameter collection group may send the current model parameters to one node in the group, and the node will The model parameters are integrally fused to obtain the second model parameters.
  • the overall fusion may be performed in other manners, which is not limited by the embodiment of the present invention.
  • step 206 is performed.
  • Step 206 fused the second model parameters of each parameter collection group in the W parameter collection group to obtain the third model parameter, and send the third model parameter to the node of the W parameter collection group or send the parameter to the W parameter. Collect the nodes of the group's parameter distribution group.
  • the second model parameters of the W parameter collection groups are merged to obtain a third model
  • the parameters may be interpreted in accordance with step 203 to perform different specific aspects of the subject.
  • the parameter server When the execution subject is a device other than the parameter collection group, such as a parameter server, the parameter server directly fuses the second model parameters of the W parameter collection groups to obtain a third model parameter.
  • the parameter server can directly send the parameter to the parameter fusion W.
  • the parameter server may further include multiple layers, and one parameter server of the upper layer corresponds to at least one parameter server of the lower layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group correspond to the parameter server of the lowest layer, and the lower layer server
  • the number of fusions of the parameter collection groups, the node identifier, and the current model parameters are sent to the upper parameter server, and the upper layer parameter server determines whether the inter-group convergence is satisfied, and the fusion is performed by the upper parameter server after the inter-group fusion is satisfied, and then the fusion is performed.
  • the obtained model parameters are sent to the lower parameter server, and finally, the node of the bottom parameter server is sent to the nodes of the W parameter collection group.
  • the nodes in the W parameter collection groups participating in the fusion determine one node from the W parameter collection groups as the inter-group fusion node; W parameter collection groups In the parameter collection group other than the parameter collection group where the inter-group fusion node is located, one node is selected to send the second model parameter of the corresponding parameter collection group to the inter-group fusion node, so that the inter-group fusion node will have W parameters.
  • the second model parameters of the collection group are fused to obtain a third model parameter.
  • inter-group fusion node may be a node recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number. Not limited.
  • the node responsible for the overall fusion in the parameter collection group may be selected.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged to obtain the third model parameter.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
  • the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
  • the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
  • each node in the parameter collection group of the W parameter collection groups selects the node responsible for the overall integration within the group, obtains W nodes, and determines the W nodes as a new parameter collection group, when the new parameter collection group
  • the W second model parameters corresponding to the W nodes are merged according to the intra-group fusion mode, for example, when the intra-group fusion condition is that the nodes that complete the overall fusion reach the preset number.
  • the part of the nodes that complete the integration may be merged, and then merged with other nodes that complete the integration within the group, of course,
  • the W second model parameters corresponding to the W nodes may be fused at one time, which is not limited in this embodiment of the present invention.
  • the inter-group fusion node when the third model parameter is sent to the nodes of the W parameter collection group, the inter-group fusion node can be transmitted not only by broadcast but also by an iterative manner, that is, the inter-group fusion node will be the third model parameter.
  • Each node is sent to one of the parameter collections included in the W parameter collection group, and the third model parameter is iteratively transmitted by the node to other nodes participating in the inter-group fusion.
  • the parameter server or each parameter collection group may send the third model parameter to the parameter distribution group corresponding to each parameter collection group in the W parameter collection groups.
  • the node, wherein the sending mode can also be broadcast, or iterative.
  • the third model parameter is sent to the nodes of the parameter distribution group of the W parameter collection group, which can be sent not only through broadcast but also through iterative manner. Sending, that is, the node that finally completes the fusion, sends the third model parameter to the first node of the parameter distribution group of the previous layer, and the node sequentially sends the third model parameter to the other parameter in the parameter distribution group.
  • the first node refers to the node responsible for receiving the parameters of the previous layer of the model.
  • the third model parameter is sent to the node in each lower layer parameter distribution group in the parameter distribution group of the upper layer, wherein the sending mode may also be a broadcast mode or an iterative manner.
  • Step 207 Re-group the nodes included in the parameter collection group and the parameter distribution group when the preset condition is met.
  • the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or an iterative calculation of a certain number of times, and the like, which is not limited by the embodiment of the present invention.
  • the parameter server when the execution subject is a device other than the parameter collection group, such as a parameter server, when the preset condition is met, the parameter server directly reassembles the parameter collection group and the nodes included in the parameter distribution group;
  • the main body is a node in the parameter collection group, such as a control node, the control node regroups the parameter collection group and the nodes included in the parameter distribution group.
  • the nodes included in the parameter collection group and the parameter distribution group are regrouped, including: a correspondence between a preset node identifier and a node number, and a number of parameter collection groups and a parameter distribution group. Dividing the number of the node corresponding to the node identifier by the number of the parameter collection group, and obtaining the remainder of the collection group of the node;
  • the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
  • the method for re-grouping the parameters included in the parameter collection group and the parameter distribution group may be re-grouped according to the node grouping method provided in the following embodiments, and details are not described herein again.
  • step 202 After regrouping, return to step 202 to continue the iterative calculation based on the subset of data and the current model parameters until the final model parameters are output.
  • the parameter server allocates the lowest parameter device for the newly added node.
  • the IP address of the server is sent by the lowest-level parameter server to the newly added node.
  • the newly added node obtains the data subset from the storage server, and the newly added node performs iterative calculation based on the received model parameters and the data subset.
  • the control node allocates an IP address of another node that previously participated in the iterative calculation for the newly added node, and the node sends the model parameter for the newly added node, and the newly added node acquires the data subset from the storage server.
  • the newly added nodes are iteratively calculated based on the received model parameters and data subsets.
  • a model parameter fusion method obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
  • the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.
  • An embodiment of the present invention provides a node grouping method, which is applied to a machine learning system, where the machine learning system includes at least two nodes, and the method includes:
  • the included node is different from the node included in the parameter distribution group corresponding to the parameter collection group.
  • Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
  • the at least one parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes that are not identical to the node included in the corresponding parameter distribution group.
  • at least one node is different from the node in the parameter distribution group corresponding to the parameter collection group, or all the nodes included in the parameter collection group and the parameter distribution group corresponding to the parameter collection group. All nodes included are different.
  • the number of nodes of different parameter collection groups is the same or different; and/or,
  • the number of nodes in different parameter distribution groups is the same or different; and/or,
  • the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
  • the machine learning system may further include a parameter server, a parameter collection group, and a parameter distribution group corresponding to the parameter collection group corresponding to the same parameter server, different parameter collection groups, and parameter distribution corresponding to the parameter collection group.
  • the group corresponds to a different parameter server.
  • the parameter server includes a Y layer, and one parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group and the first
  • the layer parameter server corresponds, where 1 ⁇ j ⁇ j+1 ⁇ Y.
  • a schematic diagram of a parameter server when Y is equal to 2 as shown in FIG. 3, the parameter server 1 corresponds to the parameter server 2 and the parameter server 3, and is composed of node 1, node 2, node 3, and node.
  • the parameter collection group composed of 4 and the node 5, and the parameter distribution group corresponding to the parameter collection group correspond to the parameter servers 1 and 2 of the first layer.
  • grouping nodes in the machine learning system includes the following steps.
  • Step 301 Establish a correspondence between the node identifier and the node number.
  • the node identifier is used to uniquely identify the node.
  • the node identifier may be an IP address of the node, a sequence code of the node, and the like.
  • the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
  • the node identifier is the IP address of the node.
  • the IP address of each node is as shown in Table 1 below, and the correspondence between the node identifier and the node number as shown in Table 1 below is established. relationship.
  • Step 302 Determine the number of the parameter collection groups and the number of the parameter distribution groups.
  • the number of parameter collection groups is 2, and the number of parameter distribution groups is 3.
  • Step 303 Determine a parameter collection group and a parameter distribution group based on the correspondence between the node identifier and the node number, the number of the parameter collection groups, and the number of the parameter distribution groups.
  • determining the parameter collection group and the parameter distribution group based on the correspondence between the node identifier and the node number, the number of the parameter collection group, and the parameter distribution group may include: dividing the node number corresponding to the node identifier by the The number of the parameter collection group is obtained, and the remaining number of the collection group of the node is obtained; the number of the node corresponding to the node identifier is divided by the number of the parameter distribution group, and the remainder of the distribution group of the node is obtained; The nodes are determined to be the same parameter collection group, and the nodes with the same distribution group remainder are determined as the same parameter distribution group.
  • the remainder of the collection group of the node is: the remainder of the collection group of node numbers 2, 0, and 4 is 0, and the node number is 3, 1.
  • the remainder of the collection group of 5 is 1; the number of each node shown in Table 1 is divided by the number of the parameter distribution group 3, and the remainder of the collection group of the node is: the remainder of the distribution group of node numbers 0 and 3 is 0.
  • the remainder of the distribution group with node numbers 1 and 4 is 1, the remainder of the distribution group with node numbers 2 and 5 is 2; the node with the remaining number of collection group 0 is determined as parameter collection group 0, and the node with the remaining number of collection group 1 is determined as Parameter collection group 1, similarly, obtains parameter distribution group 0, parameter distribution group 1, and parameter distribution group 2.
  • Step 304 Determine a correspondence between the parameter collection group and the parameter distribution group.
  • the correspondence between the two may be determined based on the determined parameter collection group and the parameter distribution group. For example, it is determined that the parameter collection group 0 corresponds to the parameter distribution group 1 and the parameter distribution group 2, and the parameter collection group 1 corresponds to the parameter distribution group 0.
  • the node number of each node may be changed every time the node grouping is performed, and the number of the parameter collection group and the parameter distribution group may also be changed, and the parameters may be changed.
  • the correspondence between the collection group and the parameter distribution group can also change accordingly.
  • An embodiment of the present invention provides a node grouping method, by grouping nodes in a machine learning system, so that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
  • a distribution group at least one of the parameter collection groups includes a node that is different from a node included in the parameter distribution group corresponding to the parameter collection group.
  • FIG. 5 is a schematic diagram of a model parameter fusion device, which is applied to a machine learning system, where the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
  • a distribution group each parameter collection group includes at least one node
  • each parameter distribution group includes at least one node
  • at least one of the parameter collection groups includes a node that is different from a node included in the corresponding parameter distribution group
  • the device include:
  • the first merging unit 401 is configured to: when any parameter collection group satisfies the intra-group fusion condition, fuse the model parameters of the M nodes in the parameter collection group that meet the condition, and obtain the first model of the parameter collection group that satisfies the condition a parameter, wherein the parameter collection group that satisfies the condition has a minimum number of fusion nodes s ⁇ ⁇ M ⁇ the parameter collection group that satisfies the condition includes a total number of nodes;
  • the first sending unit 402 is configured to send, to the N nodes in the parameter distribution group corresponding to the parameter collection group that meets the condition, the first model parameter of the parameter collection group that meets the condition, where 1 ⁇ N ⁇
  • the parameter distribution group corresponding to the parameter collection group that satisfies the condition includes the total number of nodes.
  • the intra-group fusion condition may be that the number of nodes in the parameter collection group that completes the iterative calculation of the current model parameter reaches a preset value, that is, the minimum number of fusion nodes s.
  • the first fusion unit selects M nodes that have completed the calculation of the current model parameters from the parameter collection group, and The model parameters calculated by the M nodes are fused to obtain a first model parameter. Then, when the intra-group distribution condition is met, the first sending unit sends the first model parameter to all nodes in the corresponding parameter distribution group, or part of the node, based on the correspondence between the parameter collection group and the parameter distribution group.
  • the minimum number of fusion nodes s, M, and N can be set in advance, and s ⁇ M ⁇ the parameter collection group includes the total number of nodes, and 1 ⁇ N ⁇ the parameter distribution group corresponding to the parameter collection group includes the total number of nodes number.
  • the number of parameter collection groups included in the machine learning system, the number of nodes included in each parameter collection group, and the number of parameter distribution groups corresponding to each parameter collection group, and each parameter division can be determined in advance.
  • the node included in the at least one parameter collection group is different from the node included in the corresponding parameter distribution group, that is, the node included in at least one parameter collection group is not completely the same as the node included in the corresponding parameter distribution group, and
  • the parameter collection group includes at least one node different from the node in the parameter distribution group corresponding to the parameter collection group, and may be all nodes included in the parameter collection group and all nodes included in the parameter distribution group corresponding to the parameter collection group. different.
  • the address information participating in the first model parameter fusion may also be sent to the node in the parameter distribution group, and the address information may be the IP address of the node or The node number and the like are not limited in the present invention.
  • the first converging unit 401 includes:
  • a receiving module configured to receive model parameters of M nodes sent by the M nodes that complete the iteration in the parameter collection group that meets the condition
  • the fusion module is configured to perform fusion according to the received model parameters of the M nodes, and obtain a first model parameter of the M node parameter collection group.
  • the fusion module can fuse the model parameters corresponding to the M nodes by using different fusion modes to obtain the first model parameters. For example, the fusion module fuses the model parameters corresponding to the M nodes at one time, and obtains The first model parameter; or each node sends the model parameters to the fusion module after completing the iteration, the fusion module receives the parameters from the node and fuses, after multiple receiving and merging processes, until the M nodes complete the fusion,
  • the first model parameter and the like are not limited in the embodiment of the present invention.
  • the first converging unit 401 includes:
  • an obtaining module configured to obtain state information of the node in the parameter collection group that satisfies the condition; wherein the node state information may include a node identifier and a node order of completing the iteration.
  • the indication module is configured to: according to the state information of the node in the parameter collection group that satisfies the condition, instruct the M nodes that complete the iteration in the parameter collection group that meet the condition to perform model parameter fusion, and obtain the parameter collection group that satisfies the condition The first model parameter.
  • the indication module may indicate that the M nodes that complete the iteration are fused by different combinations.
  • the M nodes may be instructed to send corresponding model parameters to one of the nodes, and the node performs a fusion to obtain the first
  • the model parameter, or the indication module performs the fusion by using the following specific specific manner to improve the fusion of the M nodes.
  • the efficiency of the first model parameter is, of course, the indication module can also indicate the fusion by other combinations, which is not limited by the embodiment of the present invention.
  • the indicator module is specifically configured to:
  • One of the s nodes indicating completion of the iteration fuses the model parameters of the s nodes; at this time, the node may be referred to as a fusion node.
  • the instruction module uses one of the s nodes as the fusion node, and the remaining nodes respectively send the model parameters obtained by the current iteration to the fusion node.
  • the fusion node associates the model parameters corresponding to the s nodes.
  • the fused node may be the last node that completes the iteration, or may be the node with the smallest node number, which is not limited in this embodiment of the present invention.
  • the relationship between the number of newly added nodes and the s size can be divided into two cases:
  • the x nodes are added.
  • the indication is Adding one of the x nodes to merge the model parameters of the newly added x nodes and the model parameters after the s nodes are merged;
  • y nodes are added.
  • the indication is One of the y nodes is added to fuse the model parameters of the y nodes, and the model parameters of the y nodes are merged with the model parameters after the s nodes are merged.
  • the indication module may indicate that the remaining nodes may continue to perform the fusion of the model parameters by using the methods provided by the foregoing two situations to improve
  • the efficiency of the fusion of the model parameters of the M node may be performed by other means, which is not limited by the embodiment of the present invention.
  • one of the newly added nodes may be the node with the smallest node number in the newly added node, or may be the node that completes the iteration at the latest, which is not limited by the embodiment of the present invention.
  • the device further includes:
  • a second merging unit 403 configured to perform overall merging of model parameters of nodes in each parameter collection group of the W parameter collection groups when the inter-group fusion condition is met between the W parameter collection groups The second model parameters of each parameter collection group in the W parameter collection group;
  • the W parameter collection group is determined by the upper parameter collection group of the W parameter collection groups, and the W ⁇ upper layer parameter collection group includes the total number of groups.
  • the inter-group fusion condition may be that the number of intra-group fusions of the parameter collection group reaches a preset number of times.
  • each parameter collection group in the W parameter collection groups may send the current model parameters to the second fusion unit, and the second fusion unit
  • the current model parameters of all the nodes in the parameter collection group are integrally fused, and the second model parameters are obtained, thereby obtaining the second model parameters of each parameter collection group in the W parameter collection groups.
  • a third merging unit 404 configured to fuse the second model parameters of each parameter collection group in the W parameter collection group to obtain a third model parameter
  • the second sending unit 405 is configured to send the third model parameter to a node of the W parameter collection group or to a node of the parameter distribution group of the W parameter collection group.
  • the second sending unit 405 can not only transmit in a broadcast manner, but also in an iterative manner, that is, the second sending unit separately sends the third model parameter to each parameter included in the W parameter collecting group. A node by which the third model parameters are iteratively sent to other nodes in the group.
  • the second sending unit may also send the third model parameter to the node in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group, wherein the sending manner may be a broadcast mode or an iterative manner.
  • the third converging unit 404 is specifically configured to:
  • the inter-group fusion node may be a node that is recommended by the nodes in the W parameter collection group, or may be the node that completes the iteration first, or the node with the smallest node number, which is not limited in this embodiment of the present invention.
  • the node responsible for the overall fusion in the parameter collection group may be selected, of course, In the actual application, the third merging unit may also select other nodes in the other parameter collection group, which is not limited by the embodiment of the present invention.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged to obtain the third model parameter.
  • the second model parameters of the W parameter collection groups satisfying the intra-group fusion condition are merged, and the second model parameters of each parameter collection group in the W parameter collection groups are merged.
  • the node responsible for the overall fusion in each parameter collection group may be selected, and the node with the lowest number may be selected, which is not limited by the present invention.
  • the method for the model parameter fusion by the new parameter collection group is similar to the method for the intra-group fusion of the parameter collection group satisfying the above conditions, and the present invention will not be repeated herein.
  • each node in the parameter collection group of the W parameter collection groups selects the node responsible for the overall integration within the group, obtains W nodes, and determines the W nodes as a new parameter collection group, when the new parameter collection group
  • the W second model parameters corresponding to the W nodes are merged according to the intra-group fusion mode, for example, when the intra-group fusion condition is that the nodes that complete the overall fusion reach the preset number.
  • the part of the nodes that complete the integration may be merged, and then merged with other nodes that complete the integration within the group, of course,
  • the W second model parameters corresponding to the W nodes may be merged at one time, and the embodiment of the present invention does not Limited.
  • the first sending unit 402 is specifically configured to:
  • the third model parameter may be sent to the nodes in the parameter distribution group corresponding to each parameter collection group in the W parameter collection group or may be finally merged.
  • the nodes are sent to the W parameters to collect the nodes of the parameter distribution group of the previous layer.
  • the device further includes:
  • the first grouping unit 406 is configured to re-group the parameter collection group and the nodes included in the parameter distribution group when the preset condition is met.
  • the preset condition may be a certain period of time, or a certain number of times of the integration of the model parameters, or a certain number of iterations, etc., which is not limited by the embodiment of the present invention.
  • the step of regrouping the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided by the fourth aspect of the present invention, and the present invention is not described herein.
  • the machine learning system further includes a parameter server
  • a parameter collection group and a parameter distribution group corresponding to the parameter collection group correspond to the same parameter server
  • different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
  • the parameter server includes a Y layer, and a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, and the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
  • a parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer
  • the parameter collection group and the parameter distribution group corresponding to the parameter collection group correspond to the layer 1 parameter server.
  • the first grouping unit is configured to: according to the correspondence between the preset node identifier and the node number, and the number of parameter collection groups and the number of parameter distribution groups, the node number corresponding to the node identifier is divided by the The number of the parameter collection group is obtained, and the remainder of the collection group of the node is obtained;
  • the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
  • the re-grouping of the nodes included in the parameter collection group and the parameter distribution group may be re-grouped by the node grouping device provided in the following embodiment 5.
  • the embodiments of the present invention are not described herein again.
  • a model parameter fusion device obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
  • the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.
  • An embodiment of the present invention provides a node grouping apparatus, which is applied to a machine learning system, where the machine learning system includes at least two nodes, and the apparatus includes:
  • a second grouping unit configured to group nodes in the machine learning system, such that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter distribution group. At least one of the parameter collection groups includes nodes that are different from the nodes included in the parameter distribution group corresponding to the parameter collection group.
  • Each parameter collection group corresponds to at least one parameter distribution group, that is, one parameter collection group may correspond to one parameter distribution group, or corresponding to multiple parameter distribution groups.
  • the at least one parameter collection group includes a node that is different from the parameter distribution group corresponding to the parameter collection group, that is, at least one parameter collection group includes nodes that are not identical to the node included in the corresponding parameter distribution group.
  • the parameter collection group may include at least one node and the node in the parameter distribution group corresponding to the parameter collection group, or may refer to all nodes included in the parameter collection group and parameters corresponding to the parameter collection group. All nodes included in the distribution group are different.
  • the number of nodes of different parameter collection groups is the same or different; and/or,
  • the number of nodes in different parameter distribution groups is the same or different; and/or,
  • the number of nodes of a parameter collection group is the same as or different from the number of nodes of the parameter distribution group corresponding to the parameter collection group.
  • the machine learning system further includes a parameter server, a parameter collection group, and a parameter distribution group corresponding to the parameter collection group corresponding to the same parameter server, and different parameter collection groups and corresponding parameter distribution groups correspond to different parameter servers.
  • the parameter server includes a Y layer, and one parameter server of the j+1th layer corresponds to at least one parameter server of the jth layer, the parameter collection group, and the parameter distribution group corresponding to the parameter collection group and the first
  • the layer parameter server corresponds, where 1 ⁇ j ⁇ j+1 ⁇ Y.
  • the parameter server includes the Y layer
  • the number of parameter servers of each layer and the correspondence between the lower parameter server and the upper parameter server may be determined.
  • the correspondence between the lower parameter server and the upper parameter server may be set in advance, or may be determined in the node grouping process.
  • the parameter setting group or the parameter distribution group may be determined by the following method to determine the lower parameter server and
  • the specific method may refer to the following method for determining the parameter collection group or the parameter distribution group, and details are not described herein again.
  • the second grouping unit specifically includes:
  • a first determining module configured to determine a correspondence between a node identifier and a node number
  • a second determining module configured to determine the number of the parameter collection groups, and the number of the parameter distribution groups
  • a third determining module configured to determine a parameter collection group and a parameter distribution group based on a correspondence between the node identifier and the node number, the number of the parameter collection group, and the number of the parameter distribution group;
  • a fourth determining module configured to determine a correspondence between the parameter collection group and the parameter distribution group.
  • the node identifier is used to uniquely identify the node.
  • the node identifier may be an IP address of the node, a sequence code of the node, and the like.
  • the node number may be a sequence number that is randomly assigned to the node, or may be any value that is randomly assigned to the node, etc., and the present invention is not limited thereto.
  • the node number of each node can be changed, and the parameter collection group and the parameter distribution group are changed.
  • the number can also vary, and the correspondence between the parameter collection group and the parameter distribution group can also change accordingly.
  • the third determining module is specifically configured to:
  • the nodes having the same remaining number of the collection group are determined as the same parameter collection group, and the nodes having the same remainder of the distribution group are determined as the same parameter distribution group.
  • An embodiment of the present invention provides a node grouping apparatus, by grouping nodes in a machine learning system, so that the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
  • the distribution group includes: at least one of the parameter collection groups includes a node that is different from the parameter distribution group corresponding to the parameter collection group, thereby solving the problem that the parameter server has high performance requirements and dynamically adjusts the calculation resource in the parameter parameter fusion. .
  • FIG. 8 is a schematic diagram of a model parameter fusion device, where the model parameter fusion device includes a memory 801, a processor 802, a power component 803, an input/output interface 804, a communication component 805, and the like.
  • the model parameter fusion method described in the second embodiment above is executed.
  • model parameter fusion device may also include more or fewer components than those shown in FIG. 8, or have a different configuration than that shown in FIG.
  • the memory 801 can be used to store data, software programs, and modules; and mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; and the storage data area can be stored according to model parameters. Data created by the use of the fusion device, etc.
  • the memory may comprise a high speed random access memory, and may also comprise a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state memory. Storage device.
  • the processor 802 is a control center of the model parameter fusion device that connects various portions of the entire model parameter fusion device using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 801, and recalling stored in the memory.
  • the data in 801 performs various functions and processing data of the model parameter fusion device, thereby integrally monitoring the model parameter fusion device.
  • the processor 802 may include one or more processing units; preferably, the processor 502 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It will be appreciated that the above described modem processor may also not be integrated into the processor 802.
  • the power component 803 is used to provide power to various components of the model parameter fusion device, which may include a power management system, one or more power sources, and other components associated with the model parameter fusion device to generate, manage, and distribute power.
  • the input/output interface 804 provides an interface between the processor 802 and the peripheral interface module.
  • the peripheral interface module can be a keyboard, a mouse, or the like.
  • Communication component 805 is configured to facilitate wired or wireless communication between the model parameter fusion device and other devices.
  • the model parameter fusion device can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • model parameter fusion device may further include an audio component, a multimedia component, and the like, which are not described herein again.
  • the model parameter fusion device is a parameter server, and the parameter server is set independently of the node or configured on the node.
  • a model parameter fusion device obtained by an embodiment of the present invention obtains a first model parameter by performing intra-group fusion by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then collects W parameters.
  • the first model parameters of each parameter collection group in the group are integrated, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and the nodes are performed when the preset conditions are met. Regrouping solves the problem of high performance requirements and large data transmission capacity in the parameter parameter fusion.
  • FIG. 9 is a schematic diagram of a model parameter fusion device according to an embodiment of the present invention.
  • the device includes a memory 901, a processor 902, a power component 903, an input/output interface 904, a communication component 905, and the like.
  • the processor 902 is configured to perform the node grouping method described in the third embodiment.
  • FIG. 9 is merely illustrative and does not limit the structure of the controller.
  • the controller may also include more or fewer components than shown in FIG. 9, or have a different configuration than that shown in FIG.
  • the memory 901 can be used to store data, software programs, and modules; and mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function, and the like; and the storage data area can be stored according to model parameters. Data created by the use of the fusion device, etc. Further, the memory may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the processor 902 is a control center of the controller that connects various portions of the entire controller using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 901, and recalling data stored in the memory 901.
  • the controller performs various functions and processing data to monitor the controller as a whole.
  • the processor 902 may include one or more processing units; preferably, the processor 502 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 902.
  • the power supply assembly 903 is used to provide power to various components of the controller, and the power supply assembly 503 can include a power management system, one or more power supplies, and other components associated with controller generation, management, and distribution of power.
  • the input/output interface 904 provides an interface between the processor 902 and the peripheral interface module.
  • the peripheral interface module can be a keyboard, a mouse, or the like.
  • Communication component 905 is configured to facilitate wired or wireless communication between the controller and other devices.
  • the controller can access a wireless network based on communication standards such as WiFi, 2G or 3G, or a combination thereof.
  • controller may further include an audio component, a multimedia component, etc., the present invention
  • audio component e.g., a speaker, a microphone, etc.
  • multimedia component e.g., a graphics processing unit, etc.
  • the controller provided by the embodiment of the present invention, by grouping nodes in the machine learning system, the machine learning system includes at least one parameter collection group and at least one parameter distribution group, and each parameter collection group corresponds to at least one parameter.
  • the distribution group includes: at least one of the parameter collection groups includes a node that is different from the parameter distribution group corresponding to the parameter collection group, thereby solving the problem that the parameter server has high performance requirements and dynamically adjusts the calculation resource in the parameter parameter fusion. .
  • the embodiment of the present invention provides a machine learning system, which includes the model parameter fusion device described in Embodiment 6, and the controller described in Embodiment 7.
  • a model parameter fusion device obtains a first model parameter by performing a group aggregation by a parameter collection group, and sends the first model parameter to a parameter distribution group corresponding to the parameter collection group, and then, The first model parameters of each parameter collection group in the parameter collection group are integrally fused, and the second model parameters are obtained, and then the W parameter collection groups are merged between the groups to obtain the third model parameters, and when the preset conditions are met,
  • the controllers regroup the nodes in the parameter collection group and the parameter distribution group, which solves the problem of high performance requirements, large data transmission volume and dynamic adjustment of computing resources in the parameter parameter fusion.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Human Computer Interaction (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Architecture (AREA)

Abstract

本发明实施例提供一种模型参数融合方法及装置,应用于机器学习***,机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,涉及机器学习领域,用于解决模型参数融合中对参数服务器性能要求高、数据传输量大和动态调整计算资源的问题。该方法包括:在任一参数收集组满足组内融合条件时,融合参数收集组中M个节点的模型参数,得到该参数收集组的第一模型参数,其中,参数收集组最低融合节点数s≤M≤参数收集组包含节点的总个数;向参数收集组对应的参数分发组中的N个节点发送参数收集组的第一模型参数,其中,1≤N≤参数收集组对应的参数分发组包含节点的总个数。

Description

模型参数融合方法及装置 技术领域
本发明涉及机器学习领域,尤其涉及一种模型参数融合方法及装置。
背景技术
模型参数是指由多个约束参数组成的描述模型的参数,通过模型参数可以将具有共同特征的数据筛选出来,比如,当模型参数是图像类模型参数时,通过不同的模型参数,可以从众多的图像数据中筛选出具有人物、动物、或人脸的图像数据。随着数据量和数据种类的快速增长,用于数据筛选的模型参数也越来越多,而这些模型参数是经过对大量具有共同特征的数据进行多次计算融合得到的。
目前,模型参数融合都是将数据划分成多个数据子集,分配到不同的节点对分配的数据子集采用数据迭代的计算方法进行训练,每经过一次或者多次迭代计算,将各节点对不同数据子集训练得到的模型参数进行一次融合,并将融合后的模型参数作为下次迭代计算的初始模型参数,经过多次融合之后,得到最终的总模型参数。
现有技术中,模型参数融合的方法主要有两种:第一种是当各节点对多个数据子集完成多次迭代计算之后,参数服务器将各节点对多个数据子集训练得到的模型参数进行汇总、融合,得到新的模型参数,然后,各节点对多个数据子集再根据新的模型参数进行下次迭代计算;第二种是当某个节点对其分配的数据子集完成多次迭代计算之后,将该节点对分配的数据子集训练得到的模型参数发送给指定的其他节点,以用于与其他节点的数据子集进行模型参数融合,然后该节点再根据自己收到的其它节点对其他数据子集训练后传输过来的模型参数开始迭代计算。但是,第一种对用于进行模型参数融合的参数服务器的性能要求较高,容易发生宕机,第二种需要存储的数据较多,且数据传输量大。
发明内容
本发明的实施例提供一种模型参数融合方法及装置,用于解决模型参数融合中对参数服务器性能要求高、数据传输量大的问题。
为达到上述目的,本发明的实施例采用如下技术方案:
第一方面,提供一种模型参数融合方法,所述方法应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述方法包括:
在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;
向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
其中,组内融合条件可以是该参数收集组中完成当前模型参数迭代计算的节点个数达到预设数值,即最低融合节点数s。
具体地,当任一参数收集组完成当前模型参数迭代计算的节点个数达到最低融合节点数s时,从该参数收集组中选取已完成当前模型参数计算的M个节点,并将该M个节点计算得到的模型参数进行融合,得到第一模型参数。之后,由于参数收集组与参数分发组是对应的,也即是,一个参数收集组可以对应一个或者多个参数分发组,因此,当该参数收集组融合得到第一模型参数时,若满足组内分发条件,可以基于参数收集组与参数分发组的对应关系,将第一模型参数发送给对应的参数分发组中的全部节点,或者部分节点。
其中,组内分发条件可以是组内融合次数达到预设次数,或者经过预设时长等,本发明实施例对此不作限定。
进一步,当满足组内融合条件完成M个节点的融合时,若不满足组内分发条件,则参数收集组基于融合得到的第一模型参数进行新一轮的迭 代计算,并且每完成一次M个节点的融合,对第一模型参数进一次更新,当满足组内分发条件时,则将第一模型参数进行分发。
进一步,将第一模型参数发送给参数收集组对应的参数分发组时,也可以将参与第一模型参数融合的地址信息发送给参数分发组中的节点,该地址信息可以是节点的IP地址或者节点编号等等,本发明对此不作限定。
需要说明的是,最低融合节点数s、M和N可以事先设置,且s≤M≤满足条件的参数收集组包含节点的总个数,1≤N≤满足条件的参数收集组对应的参数分发组包含节点的总个数。
另外,至少一个参数收集组包含的节点与所对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,可以是参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
结合第一方面,在第一方面的第一种可能的实现方式中,所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:
接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;
根据接收的所述M个节点的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
其中,该方法可以由独立于参数收集组之外的设备完成,例如,参数服务器,该参数服务器可以是由固定节点来担当。具体地,该参数收集组中完成迭代的M个节点分别将当前迭代计算得到的模型参数发送给参数服务器,当该参数服务器接收到该M个节点发送的模型参数时,该参数服务器可以通过多种不同的融合方式将该M个节点对应的模型参数进行融合,得到第一模型参数。
比如,多种不同的融合方式可以为:参数服务器一次性地将该M个节点对应的模型参数进行融合,得到第一模型参数;或者,每个节点在完成迭代后将参数发送给参数服务器,参数服务器接收来自节点的参数并融合参数,经过多次接收、融合的过程,直到该M个节点都完成融合,得 到第一模型参数等等,本发明实施例对此不作限定。
需要说明的是,该参数服务器与参数收集组、以及该参数收集组对应的参数分发组之间的对应关系可以事先设置。
结合第一方面,在第一方面的第二种可能的实现方式中,所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:
获取所述满足条件的参数收集组中节点的状态信息;其中,该节点状态信息可以包括节点标识和完成迭代的节点顺序。
根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述满足条件的参数收集组的所述第一模型参数。
其中,该方法可以由参数收集组内的一个节点完成,该节点可以称为控制节点,该控制节点可以事先进行指定,也可以由参数收集组内的节点临时推荐确定。该控制节点可以统计参数收集组中节点的状态信息,并指示模型参数的传递和融合指令。
具体地,当该控制节点根据该参数收集组中节点的状态信息,指示完成迭代的M个节点进行融合时,控制节点可以指示完成迭代的M个节点通过不同的组合方式进行融合,比如,控制节点可以指示该M个节点将对应的模型参数发送给其中的一个节点,由该节点进行一次融合,得到第一模型参数,或者控制节点通过下述第一方面的第三种可能的实现方式进行融合,以提高该M个节点进行融合,得到第一模型参数的效率,当然,控制节点也可以通过其他的组合方式进行融合,本发明实施例对此不作限定。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,包括:
根据所述满足条件的参数收集组中节点的状态信息,确定所述参数收集组中s个完成迭代的节点;
指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参 数;
也即是,控制节点在确定参数收集组中完成迭代的s个节点之后,指示将该s个节点中的一个节点作为融合节点,其余的节点分别将当前迭代得到的模型参数发送给该融合节点,由该融合节点将该s个节点对应的模型参数进行融合。
需要说明的是,该融合节点可以是最后一个完成迭代的节点,也可以是节点编号最小的节点,本发明实施例对此不作限定。
其中,当融合节点将该s个节点对应的模型参数进行融合的过程中,若有新增节点完成迭代,可以根据新增节点的个数与s大小关系可以分为两种情况:
第一种情况、新增x个节点,在所述x<所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数;
第二种情况、新增y个节点,在所述y≥所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合。
需要说明的是,在上述两种情况之后,若该M个节点中存在剩余节点没有参与融合,该剩余节点可以继续通过上述两种情况提供的方法进行模型参数的融合,以提高该M节点的模型参数进行融合的效率,当然,也可以通过其他方式进行融合,本发明实施例对此不作限定。
另外,所述新增节点中的一个节点可以是新增节点中节点编号最小的节点,也可以是最晚完成迭代的节点,本发明实施例对此不作限定。
结合第一方面至第一方面的第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述方法还包括:
在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;
其中,所述W个参数收集组由所述W个参数收集组的上一层参数收集组确定,所述W≤所述上一层参数收集组包含组数的总个数。
另外,组间融合条件可以为参数收集组的组内融合次数达到预设次数,或者经过一定的时间。相应的,若组间融合条件为参数收集组的组内融合次数达到预设次数,当W个参数收集组的组内融合次数达到预设次数时,对于W个参数收集组中的每个参数收集组,该参数收集组可以将组内全部节点当前的模型参数进行整体融合,得到第二模型参数,从而获得W个参数收集组中每个参数收集组的第二模型参数。
比如,每个参数收集组的全部节点可以将当前的模型参数发送给组内的一个节点,由该节点将全部节点的当前模型参数进行整体融合,得到第二模型参数,当然,也可以通过其它的方式进行整体融合,本发明实施例对此不作限定。
将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;
将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
其中,将第三模型参数发送给W个参数收集组的节点时,不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发送,即最后完成融合的节点,将第三模型参数分别发送给W个参数收集组包括的参数收集中的一个节点,由该节点将第三模型参数依次迭代的发送给参与组间融合的其它节点。
之后,将第三模型参数发送给W个参数收集组中每个参数收集组对应的参数分发组中的节点,其中,发送方式也可以采用广播方式,或者迭代的方式。
其中,将第三模型参数发送给W个参数收集组上一层参数分发组的节点时,不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发送,即最后完成融合的节点,将第三模型参数分别发送给W个参数收集组上一层参数分发组的第一节点,由该这些节点将第三模型参数依次迭代的发送给上一层参数分发组内的其它节点。所述的第一节点指负责接收W个参数收集组模型参数的节点。
之后,将第三模型参数发送给上一层参数分发组中每个低层参数分发组中的节点,其中,发送方式也可以采用广播方式,或者迭代的方式。
结合第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,所述将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数,包括:
从所述W个参数收集组中确定一个节点作为组间融合节点;
在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数;
需要说明的是,该组间融合节点可以是由W个参数收集组中节点共同推荐的一个节点,也可以是最先完成迭代的节点,或者是节点编号最小的节点,本发明实施例对此不作限定。
其中,当从W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点时,可以选择参数收集组内负责整体融合的节点。
或者,
分别从所述W个参数收集组的每个参数收集组中确定一个节点,将所述确定的节点确定为新参数收集组;
当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
需要说明的是,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,是指将W个参数收集组中每个参数收集组的第二模型参数都进行融合。
另外,从W个参数收集组的每个参数收集组中确定一个节点时,可以选择每个参数收集组内负责整体融合的节点,也可以选择编号最小的节点等,本发明对此不作限定。
再者,新参数收集组进行模型参数融合的方法与上述满足条件的参数收集组进行组内融合的方法类似,本发明在此不再赘述。
结合第一方面,在第一方面的第六种可能的实现方式中,向所述满 足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,包括:
通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述参数收集组的第一模型参数;或者,
向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的所述第一模型参数。
结合第一方面至第一方面的第六种可能的实现方式中的任一种可能的实现方式,在第一方面的第七种可能的实现方式中,所述方法还包括:
在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
其中,预设条件可以是经过一定的时间,或者完成一定次数的模型参数的融合,或者是完成一定次数的迭代等等,本发明实施例对此不作限定。
另外,对参数收集组和参数分发组中包括的节点进行重新分组的方法可以根据本发明第二方面提供的节点分组方法进行重新分组,本发明在此不再赘述。
第二方面,提供一种节点分组方法,应用于机器学习***,所述机器学习***包含至少两个节点,所述方法包括:
对所述机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同。
其中,每个参数收集组对应至少一个参数分发组是指一个参数收集组可以对应一个参数分发组,或者对应多个参数分发组。
另外,参数收集组包含的节点与该参数收集组对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,可以是指参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是指该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不 同。
结合第二方面,在第二方面的第一种可能的实现方式中,不同参数收集组的节点个数相同或者不同;和/或,
不同参数分发组的节点个数相同或者不同;和/或,
一个参数收集组的节点个数与所述参数收集组对应的参数分发组的节点的节点个数相同或者不同。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述机器学习***还可以包括参数服务器,一个参数收集组以及所述参数收集组对应的参数分发组对应同一个参数服务器,不同参数收集组以及所述参数收集组对应的参数分发组对应不同参数服务器。
结合第二方面的第二种可能的实现方式中,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
结合第二方面,在第二方面的第三种可能的实现方式中,所述对所述机器学习***内的节点进行分组,具体包括:
建立节点标识与节点编号之间的对应关系;
确定所述参数收集组的个数、以及所述参数分发组的个数;
基于所述节点标识与节点编号之间的对应关系、所述参数收集组个数和所述参数分发组个数,确定参数收集组和参数分发组;
确定所述参数收集组与所述参数分发组的对应关系。
其中,节点标识用于唯一标识该节点,比如,节点标识可以是节点的IP地址,节点的序列码等等,本发明对此不作限定。节点编号可以是随机分配给节点的序号,也可以是随机分配给节点的任一数值等,本发明同样对此不作限定。
当满足预设条件,通过节点分组方法进行重新分组时,每个节点的节点编号是可以变化的,参数收集组和参数分发组的个数也可以变化,同时参数收集组与参数分发组的对应关系也可以相应的发生变化。
结合第二方面的第三种可能的实现方式,在第二方面的第四种可能的实现方式中,所述基于所述节点标识与节点编号之间的对应关系、所述参 数收集组个数和所述参数分发组个数,确定参数收集组和参数分发组,包括:
用所述节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
第三方面,提供一种模型参数融合装置,所述装置应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述装置包括:
第一融合单元,用于在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;
第一发送单元,用于向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
其中,组内融合条件可以是该参数收集组中完成当前模型参数迭代计算的节点个数达到预设数值,即最低融合节点数s。
需要说明的是,最低融合节点数s、M和N可以事先设置,且s≤M≤满足条件的参数收集组包含节点的总个数,1≤N≤满足条件的参数收集组对应的参数分发组包含节点的总个数。
结合第三方面,在第三方面的第一种可能的实现方式中,所述第一融合单元包括:
接收模块,用于接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;
融合模块,用于根据接收的所述M个节点的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
其中,融合模块可以通过多种不同的融合方式将该M个节点对应的模型参数进行融合,得到第一模型参数,比如,融合模块一次性地将该M个节点对应的模型参数进行融合,得到第一模型参数;或者每个节点在完成迭代后将模型参数发送给融合模块,融合模块接收来自节点的参数并融合,经过多次接收、融合的过程,直到该M个节点都完成融合,得到第一模型参数等等,本发明实施例对此不作限定。
结合第三方面,在第三方面的第二种可能的实现方式中,所述第一融合单元包括:
获取模块,用于获取所述满足条件的参数收集组中节点的状态信息;其中,该节点状态信息可以包括节点标识和完成迭代的节点顺序。
指示模块,用于根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述参数收集组的所述第一模型参数。
其中,指示模块可以指示完成迭代的M个节点通过不同的组合方式进行融合,比如,可以指示该M个节点将对应的模型参数发送给其中的一个节点,由该节点进行一次融合,得到第一模型参数,或者指示模块通过下述第三方面的第三种可能的实现方式进行融合,以提高该M个节点进行融合,得到第一模型参数的效率,当然,指示模块也可以通过其他的组合方式进行融合,本发明实施例对此不作限定。
结合第三方面的第二种可能的实现方式,在第三方面的第三种可能的实现方式中,所述指示模块具体用于:
根据所述参数收集组中节点的状态信息,确定所述参数收集组中s个完成迭代的节点;
指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;此时,该节点可以称为融合节点。
需要说明的是,该融合节点可以是最后一个完成迭代的节点,也可以是节点编号最小的节点,本发明实施例对此不作限定。
其中,当融合节点将该s个节点对应的模型参数进行融合的过程中, 若有新增节点完成迭代,可以根据新增节点的个数与s大小关系可以分为两种情况:
第一种情况、新增x个节点,在所述x<所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数;
第二种情况、新增y个节点,在所述y≥所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合。
需要说明的是,在上述两种情况之后,若该M个节点中存在剩余节点没有参与融合,指示模块可以指示该剩余节点可以继续通过上述两种情况提供的方法进行模型参数的融合,以提高该M节点的模型参数进行融合的效率,当然,也可以通过其他方式进行融合,本发明实施例对此不作限定。
另外,所述新增节点中的一个节点可以是新增节点中节点编号最小的节点,也可以是最晚完成迭代的节点,本发明实施例对此不作限定。
结合第三方面至第三方面的第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述装置还包括:
第二融合单元,用于在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;
其中,所述W个参数收集组由所述W个参数收集组的上一层参数收集组确定,所述W≤所述上一层参数收集组包含组数的总个数;
另外,组间融合条件可以为参数收集组的组内融合次数达到预设次数。相应的,第二融合单元,用于当W个参数收集组的组内融合次数达到预设次数时,对于W个参数收集组中的每个参数收集组,可以将该参数收集组内全部节点当前的模型参数进行整体融合,得到第二模型参数,从而获得W个参数收集组中每个参数收集组的第二模型参数。
第三融合单元,用于将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;
第二发送单元,用于将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
其中,第二发送单元不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发送,即第二发送单元将第三模型参数分别发送给W个参数收集组包括的每个参数收集中的一个节点,由该节点将第三模型参数依次迭代的发送给组内的其它节点。
之后,将第三模型参数发送给W个参数收集组中每个参数收集组对应的参数分发组中的节点,其中,发送方式也可以采用专用的广播方式,或者迭代的方式。
结合第三方面的第四种可能的实现方式,在第三方面的第五种可能的实现方式中,所述第三融合单元具体用于:
从所述W个参数收集组中确定一个节点作为组间融合节点;
在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
其中,当从W个参数收集组中选择一个节点时,可以选择参数收集组内负责整体融合的节点。
或者,
分别从所述W个参数收集组的每个参数收集组中确定一个节点,将所述确定的节点确定为新参数收集组;
当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
需要说明的是,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,是指将W个参数收集组中每个参数收集组的第二模型参数都进行融合。
另外,从W个参数收集组的每个参数收集组中确定一个节点时,可以选择每个参数收集组内负责整体融合的节点,也可以选择编号最小的节 点等,本发明对此不作限定。
再者,新参数收集组进行模型参数融合的方法与上述满足条件的参数收集组进行组内融合的方法类似,本发明在此不再赘述。
结合第三方面,在第三方面的第六种可能的实现方式中,所述第一发送单元具体用于:
通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述参数收集组的第一模型参数;或者,
向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的所述第一模型参数。
结合第三方面至第三方面的第六种可能的实现方式中的任一种可能的实现方式,在第三方面的第七种可能的实现方式中,所述装置还包括:
第一分组单元,用于在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
其中,预设条件可以是经过一定的时间,或者完成一定次数的模型参数的融合,或者是完成一定次数的迭代等等,本发明实施例对此不作限定。
另外,对参数收集组和参数分发组中包括的节点进行重新分组的步骤可以由本发明第四方面提供的节点分组装置进行重新分组,本发明在此不做赘述。
第四方面,提供一种节点分组装置,应用于机器学习***,所述机器学习***包含至少两个节点,所述装置包括:
第二分组单元,用于对所述机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同。
其中,每个参数收集组对应至少一个参数分发组是指一个参数收集组可以对应一个参数分发组,或者对应多个参数分发组。
另外,参数收集组包含的节点与该参数收集组对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分 发组包含的节点不完全相同,可以是指参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是指该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
结合第四方面,在第四方面的第一种可能的实现方式中,不同参数收集组的节点个数相同或者不同;和/或,
不同参数分发组的节点个数相同或者不同;和/或,
一个参数收集组的节点个数与所述参数收集组对应的参数分发组的节点的节点个数相同或者不同。
结合第四方面的第一种可能的实现方式,在第四方面的第二种可能的实现方式中,所述机器学习***还包括参数服务器,一个参数收集组以及对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
结合第四方面的第二种可能的实现方式,在第四方面的第三种可能的实现方式中,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
结合第四方面至第四方面的第三种可能的实现方式中的任一种可能的实现方式,在第四方面的第四种可能的实现方式中,所述第二分组单元具体包括:
第一确定模块,用于确定节点标识与节点编号之间的对应关系;
第二确定模块,用于确定所述参数收集组的个数、以及所述参数分发组的个数;
第三确定模块,用于基于所述节点标识与节点编号之间的对应关系、所述参数收集组个数和所述参数分发组个数,确定参数收集组和参数分发组;
第四确定模块,用于确定所述参数收集组与所述参数分发组的对应关系。
其中,节点标识用于唯一标识该节点,比如,节点标识可以是节点的IP地址,节点的序列码等等,本发明对此不作限定。节点编号可以是随 机分配给节点的序号,也可以是随机分配给节点的任一数值等,本发明同样对此不作限定。
当满足预设条件,通过节点分组方法进行重新分组时,每个节点的节点编号是可以变化的,参数收集组和参数分发组的个数也可以变化,同时参数收集组与参数分发组的对应关系也可以相应的发生变化。
结合第四方面的第四种可能的实现方式,在第四方面的第五种可能的实现方式中,所述第三确定模块具体用于:
用所述节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
第五方面,提供一种模型参数融合装置,所述模型参数融合装置包括处理器和存储器,所述存储器中存储代码和数据,所述处理器可运行存储器中的代码,所述处理器用于执行上述第一方面至第一方面的第七种可能的实现方式中任一项所述的模型参数融合方法。
结合第五方面,在第五方面的第一种可能的实现方式中,所述模型参数融合装置为参数服务器,所述参数服务器独立于所述节点设置,或者配置在所述节点上。
第六方面,提供一种控制器,所述控制器包括处理器和存储器,所述存储器中存储代码和数据,所述处理器可运行存储器中的代码,所述处理器用于执行第二方面至第二方面的第五方面可能的实现方式中任一项所述的节点分组方法。
第七方面,提供一种机器学习***,所述机器学习***包括上述第五方面至第五方面的第一种可能的实现方式中任一项所述的模型参数融合装置、以及第六方面所述的一种控制器。
本发明实施例提供的一种模型参数融合方法及装置,通过参数收集组进行组内融合得到第一模型参数,将第一模型参数发送给参数收集组对应的参数分发组,解决了模型参数融合中对参数服务器性能要求 高、数据传输量大的问题。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的一种机器学习***的结构示意图;
图2为本发明实施例提供的一种模型参数融合方法的流程示意图;
图3为本发明实施例提供的一种参数服务器的的结构示意图;
图4为本发明实施例提供的一种节点分组方法的流程示意图;
图5为本发明实施例提供的一种模型参数融合装置的结构示意图;
图6为本发明实施例提供的另一种模型参数融合装置的结构示意图;
图7为本发明实施例提供的又一种模型参数融合装置的结构示意图;
图8为本发明实施例提供的一种模型参数融合装置的结构示意图;
图9为本发明实施例提供的一种控制器的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例一
本发明的实施例所应用的机器学***台102和模型参数存储设备103。
其中,数据存储设备101可以为数据存储服务器101,该数据存 储服务器101可以用来存储用于模型参数训练的原始数据,数据存储服务器101的存储容量远大于模型训练平台102中计算服务器1021的存储容量。该原始数据可以是语言数据、图像数据、以及视频数据等,且原始数据由多个数据集组成,且每个数据集又包括多个类型子集组成,每个类型子集带有用于表示类别的数据标签,同一个数据集中包括的类型子集的标签是相同的,比如,该数据集可以是包含带有人物标签的多张人物图像,也可以是包含带有动物标签的多张动物图像,或者其它类别的图像等等。
模型参数训练平台102包括用于迭代计算的计算服务器1021,也可以称为节点,具体可以为普通的计算机、移动终端、工作站或通用服务器、专用服务器等,以及用于负责计算服务器间进行数据通信的交换机1022。计算服务器1021有本地的存储,其容量小于数据存储服务器101。在模型训练时,每个计算服务器通过采样的方式从数据存储服务器101中读取一定的数据到本地的存储设备中用于模型参数训练。模型参数训练平台102通过将带有数据标签的数据集进行模型参数训练融合,可以得到最终融合输出的一个总的模型参数,通过这个总的模型参数就可以识别出新数据的数据类型。比如,用带有人物标签的图像数据集进行模型参数融合,就可以通过最终输出的模型参数识别出新图像数据中的人物图像,用带有动物标签的图像数据集进行模型参数融合,就可以通过最终输出的模型参数识别出新图像数据中的动物图像等。
模型参数存储服务器103用于存储训练得到的模型参数,当模型参数训练平台102训练融合完成时,可以将最终融合得到的模型参数发送给模型参数存储服务器103,使模型参数存储服务器103进行存储,以方便后续的使用。另外,模型参数平台102中计算服务器1021最初用于进行模型参数训练融合的模型参数也可以是从模型参数存储服务器103中获取的。
实施例二
图2为本发明实施例提供的一种模型参数融合方法,该方法应用于机器学习***,该机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含 至少一个节点,每个参数分发组中包含至少一个节点,至少一个参数收集组包含的节点与所对应的参数分发组包含的节点不相同,该方法包括以下几个步骤。
步骤201:用于进行模型参数融合的节点获取数据集中的数据子集。
其中,该数据集是指用于进行模型参数迭代计算的数据集,该数据集可以是语言数据、图像数据、以及视频数据等,且该数据集由多个类型子集组成,每个类型子集带有用于表示类别的数据标签,同一个数据集中包括的类型子集的标签是相同的。
另外,该数据集可以事先存储在硬盘、磁盘等存储设备上,也可以事先存储在数据存储服务器上,当节点从数据集中获取数据子集时,可以将存储设备直接与节点所在的设备进行连接来获取数据子集,或者从数据存储服务器上来获取数据等。
需要说明的是,由于进行模型参数融合的数据集远远大于实际模型参数用到的数据量,因此,当节点获取数据集中的数据子集时,节点可以从数据集中抽取一定量的数据,如果事先知道每个节点的计算能力,可以按照该节点的计算能力分配该节点获取的数据子集的数据量。
另外,至少一个参数收集组包含的节点与所对应的参数分发组包含的节点不相同,是指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,即指至少有一个参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是指该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
步骤202:各节点基于数据子集和当前的模型参数进行迭代计算。
当第一次进行模型参数迭代计算时,每个节点可以基于获取的数据子集和初始的模型参数进行迭代计算,当完成迭代计算时,每个节点可以基于数据子集和当前得到的模型参数进行下次的迭代计算。
其中,初始的模型参数是指每个节点最开始的模型参数,且每个节点初始的模型参数可以是相同的。当前得到的模型参数是指每个节点完成当 前迭代计算得到的模型参数,或者当前接收到的模型参数,也即是,当前最新的模型参数。
步骤203:在任一参数收集组满足组内融合条件时,融合该满足条件的参数收集组中M个节点的模型参数,得到该满足条件的参数收集组的第一模型参数,其中,该满足条件的参数收集组最低融合节点数s≤M≤满足条件的参数收集组包含节点的总个数。
其中,组内融合条件是指该参数收集组中完成当前模型参数迭代计算的节点个数达到预设数值,即最低融合节点数s。
由于机器学习***包括至少一个参数收集组,每个参数收集组可以包括一个或者多个节点,因此,当任一参数收集组中的节点满足当前模型参数迭代计算的节点个数达到预设数值时,可以从该参数收集组中选取已完成当前模型参数计算的M个节点,并将该M个节点计算得到的模型参数进行融合,得到第一模型参数。
需要说明的是,最低融合节点数s和M可以事先设置,且s≤M≤参数收集组包含节点的总个数。
另外,机器学习***包括的参数收集组的个数,每个参数收集组包括的节点个数、以及与每参数收集组对应的参数分发组的个数、每个参数分发组包括的节点个数可以事先确定,也可以在每个节点获取数据子集之后进行确定,也即是在步骤201之后确定,本发明实施例对此不作限定。
进一步,融合该参数收集组中M个节点的模型参数,得到该参数收集组融合后的第一模型参数可以根据执行主体的不同分为两种不同的方法,如下所述。
第一种方法、接收该满足条件的参数收集组中完成迭代的M个节点发送的M个节点模型参数;根据接收的M个节点的模型参数进行融合,得到该满足条件的参数收集组的第一模型参数。
其中,该方法可以由独立于参数收集组之外的设备完成,例如,参数服务器,该参数服务器可以是由固定节点来担当。具体地,该参数收集组中完成迭代的M个节点分别将当前迭代计算得到的模型参数发送给参数服务器,当该参数服务器接收到该M个节点发送的模型参数时,该参数 服务器可以通过多种不同的融合方式将该M个节点对应的模型参数进行融合,得到第一模型参数。
比如,多种不同的融合方式可以为:参数服务器一次性地将该M个节点对应的模型参数进行融合,得到第一模型参数;或者,每个节点在完成迭代后将参数发送给参数服务器,参数服务器接收来自节点的模型参数并进行融合,经过多次接收、融合的过程,直到该M个节点都完成融合,得到第一模型参数等等,本发明实施例对此不作限定。
第二种方法、获取该满足条件的参数收集组中节点的状态信息,该节点状态信息可以包括节点标识和完成迭代的节点顺序;根据满足条件的参数收集组中节点的状态信息,指示该满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到该参数收集组的第一模型参数。
其中,该方法可以由参数收集组内的一个节点完成,该节点可以称为控制节点,该控制节点可以事先进行指定,也可以由参数收集组内的节点临时推荐确定。该控制节点可以统计参数收集组中节点的状态信息,并指示其他节点进行模型参数的传递和融合。
当该控制节点根据该参数收集组中节点的状态信息,指示完成迭代的M个节点进行融合时,控制节点可以指示完成迭代的M个节点通过不同的组合方式进行融合,比如,控制节点可以指示该M个节点将对应的模型参数发送给其中的一个节点,由该节点进行一次融合,得到第一模型参数,或者控制节点通过下述的实现方式进行融合,以提高该M个节点进行融合,得到第一模型参数的效率,当然,控制节点也可以通过其他的组合方式进行融合,本发明实施例对此不作限定。
可选地,当该控制节点根据该参数收集组中节点的状态信息,指示完成迭代的M个节点进行融合时,该控制节点可以根据该参数收集组中节点的状态信息,确定该参数收集组中s个完成迭代的节点,再指示完成迭代的s个节点中的一个节点融合s个节点的模型参数。
具体地,控制节点在确定参数收集组中完成迭代的s个节点之后,指示将该s个节点中的一个节点作为融合节点,其余的节点分别将当前迭代得到的模型参数发送给该融合节点,由该融合节点将该s个节点对应的模型参数进行融合。
需要说明的是,该融合节点可以是最后一个完成迭代的节点,也可以是节点编号最小的节点,本发明实施例对此不作限定。
其中,当融合节点将该s个节点对应的模型参数进行融合的过程中,若有新增节点完成迭代,可以根据新增节点的个数与s大小关系可以分为两种情况:
第一种情况、新增x个节点,在x<s时,若在完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示新增x个节点中的一个节点融合新增x个节点的模型参数以及s个节点融合后的模型参数。
第二种情况、新增y个节点,在y≥s时,若在完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示新增y个节点中的一个节点融合y个节点的模型参数,并将y个节点融合后的模型参数与s个节点融合后的模型参数再次进行融合。
需要说明的是,在上述两种情况之后,若该M个节点中存在剩余节点没有参与融合,该剩余节点可以继续通过上述两种情况提供的方法进行模型参数的融合,以提高该M节点的模型参数进行融合的效率,当然,也可以通过其他方式进行融合,本发明实施例对此不作限定。
另外,所述新增节点中的一个节点可以是新增节点中节点编号最小的节点,也可以是最晚完成迭代的节点,本发明实施例对此不作限定。
进一步,当满足组内融合条件完成M个节点的融合时,若不满足组内分发条件,则参数收集组基于融合得到的第一模型参数进行新一轮的迭代计算,即返回执行步骤202,并且每完成一次M个节点的融合,对第一模型参数进一次更新,当满足组内分发条件时,则执行步骤204。
其中,组内分发条件可以是组内融合次数达到预设次数,或者经过预设时长等,本发明实施例对此不作限定。
步骤204:向该满足条件的参数收集组对应的参数分发组中的N个节点发送该满足条件的参数收集组的第一模型参数,其中,1≤N≤满足条件的参数收集组对应的参数分发组包含节点的总个数。
由于参数收集组与参数分发组是对应的,也即是,一个参数收集组可以对应一个或者多个参数分发组,因此,当满足组内分发条件时,可以基 于参数收集组与参数分发组的对应关系,将第一模型参数发送给对应的参数分发组中的节点,可以是该参数分发组中的全部节点,也可以是部分节点。
其中,将第一模型参数发送给对应的参数分发组中的节点时,可以通过广播方式向该满足条件的参数收集组对应的参数分发组中的节点发送满足条件的参数收集组的第一模型参数;或者,通过迭代的方式向该满足条件的参数收集组对应的参数分发组中的节点发送满足条件的参数收集组的第一模型参数,也即是,向该满足条件的参数收集组对应的参数分发组中第一节点发送该满足条件的参数收集组的第一模型参数,使得第一节点通过迭代方式依次向N个节点中除第一节点之外的其余节点发送该满足条件的参数收集组的第一模型参数,如第一节点将第一模型参数发送给第二节点,第二节点再发给第三节点,依次迭代发送,直到将第一模型参数发送给N个节点中除第一节点之外的其余所有节点。
需要说明的是,第一节点可以是该参数收集组中最晚完成迭代的节点的任一个节点,也可以是该参数分发组中的节点共同推荐的一个节点,本发明实施例对此不作限定。
另外,步骤204可以由独立于参数收集组之外的设备通过上述方式完成,比如参数服务器,也可以由参数收集组内的一个节点通过上述方式完成,比如,控制节点,本发明实施例对此不作限定。
当该机器学习***包括参数服务器时,一个参数收集组以及该参数收集组对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
进一步,该参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,参数收集组、以及参数收集组所对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
步骤205:在W个参数收集组之间满足组间融合条件时,分别将W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得W个参数收集组中每个参数收集组的第二模型参数。
其中,W个参数收集组由W个参数收集组的上一层参数收集组确定,W≤上一层参数收集组包含组数的总个数。
另外,组间融合条件可以为参数收集组的组内融合次数达到预设次数,或者经过一定的时间等,本发明实施例对此不作限定。
相应的,若组间融合条件为参数收集组的组内融合次数达到预设次数,当W个参数收集组的组内融合次数达到预设次数时,对于W个参数收集组中的每个参数收集组,该参数收集组可以将组内全部节点当前的模型参数进行整体融合,得到第二模型参数,从而获得W个参数收集组中每个参数收集组的第二模型参数。
由于上述步骤203可以由独立于参数收集组之外的设备完成,也可以由参数收集组内的一个节点完成,相应的,当步骤203的执行主体不同时,步骤205也会存在一定的不同,具体如下所述。
当执行主体为独立于参数收集组之外的设备时,比如参数服务器,由参数服务器确定W个参数收集组之间是否满足组间融合条件,同时在满足组间融合条件之后,将W个参数收集组的每个参数收集组中节点的模型参数进行整体融合。
当执行主体为参数收集组内的一个节点时,比如,控制节点,由控制节点确定W个参数收集组之间是否满足组间融合条件,在满足组间融合条件时,由参数收集组中的一个节点接收其它节点发送的模型参数,并将接收到的其他节点的模型参数进行整体融合,此时,该节点可以称为融合节点。
比如,在控制节点确定W个参数收集组之间满足组间融合条件时,每个参数收集组的全部节点可以将当前的模型参数发送给组内的一个节点,由该节点将全部节点的当前模型参数进行整体融合,得到第二模型参数,当然,也可以通过其它的方式进行整体融合,本发明实施例对此不作限定。
进一步,当W个参数收集组之间不满足组间融合条件时,则返回步骤202继续执行,否则执行步骤206。
步骤206:将W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数,以及将第三模型参数发送给W个参数收集组的节点或发送给W个参数收集组上一层参数分发组的节点。
其中,将W个参数收集组的第二模型参数进行融合,得到第三模型 参数可以根据步骤203执行主体的不同具体阐述。
当执行主体为独立于参数收集组之外的设备时,比如参数服务器,由参数服务器直接将W个参数收集组的第二模型参数进行融合,得到第三模型参数。
相应的,将第三模型参数发送给W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点时,参数服务器可以通过广播方式直接发送给参数融合的W个参数收集组的节点或由最后进行融合的节点或发送给W个参数收集组上一层参数分发组的节点。
进一步,该参数服务器还可以包括多层,且上层的一个参数服务器对应下层的至少一个参数服务器,参数收集组、以及该参数收集组对应的参数分发组与最低层的参数服务器对应,由下层服务器将各参数收集组的融合次数、节点标识和当前的模型参数发送给上层参数服务器,由上层参数服务器确定是否满足组间融合,在满足组间融合之后由上层参数服务器进行融合,之后,将融合得到的模型参数发送给下层参数服务器,最后,由最底层的参数服务器发送给W个参数收集组的节点。
当执行主体为参数收集组内的一个节点时,比如控制节点,由参与融合的W个参数收集组中的节点从W个参数收集组中确定一个节点作为组间融合节点;W个参数收集组中除组间融合节点所在的参数收集组之外的其它参数收集组,分别选择一个节点将对应的参数收集组的第二模型参数发送给组间融合节点,使得组间融合节点将W个参数收集组的第二模型参数进行融合,得到第三模型参数。
需要说明的是,该组间融合节点可以是由W个参数收集组中节点共同推荐的一个节点,也可以是最先完成迭代的节点,或者是节点编号最小的节点,本发明实施例对此不作限定。
另外,当从W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点时,可以选择参数收集组内负责整体融合的节点。
或者,
分别从W个参数收集组的每个参数收集组中确定一个节点,将所述确定的节点确定为新参数收集组;
当新参数收集组满足组内融合条件时,将满足组内融合条件的W个参数收集组的第二模型参数进行融合,得到第三模型参数。
需要说明的是,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,是指将W个参数收集组中每个参数收集组的第二模型参数都进行融合。
另外,从W个参数收集组的每个参数收集组中确定一个节点时,可以选择每个参数收集组内负责整体融合的节点,也可以选择编号最小的节点等,本发明对此不作限定。
再者,新参数收集组进行模型参数融合的方法与上述满足条件的参数收集组进行组内融合的方法类似,本发明在此不再赘述。
比如,从W个参数收集组的每个参数收集组中分别选择出负责组内整体融合的节点,得到W个节点,将该W个节点确定为新参数收集组,当该新参数收集组中的节点满足组内融合条件时,将W个节点对应的W个第二模型参数按照组内融合方式进行融合,比如,当该组内融合条件为完成整体融合的节点个达到预设个数时,若该新参数收集组中完成整体融合的节点个数达到预设个数时,可以将完成整体融合的这部分节点进行融合,之后再与完成组内融合的其它节点进行融合等,当然,也可以将W个节点对应的W个第二模型参数一次性进行融合,本发明实施例对此不作限定。
相应的,将第三模型参数发送给W个参数收集组的节点时,组间融合节点不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发送,即组间融合节点将第三模型参数分别发送给W个参数收集组包括的参数收集中的一个节点,由该节点将第三模型参数依次迭代的发送给参与组间融合的其它节点。
将第三模型参数发送给W个参数收集组的节点之后,参数服务器或者每个参数收集组可以再将第三模型参数发送给W个参数收集组中每个参数收集组对应的参数分发组中的节点,其中,发送方式也可以采用广播方式,或者迭代的方式。
相应的,将第三模型参数发送给W个参数收集组上一层参数分发组的节点,不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发 送,即最后完成融合的节点,将第三模型参数分别发送给上一层参数分发组的第一节点,由该节点将第三模型参数依次迭代的发送给上一层参数分发组内的其它节点,第一节点指负责接收上一层模型参数的节点。
之后,将第三模型参数发送给上一层参数分发组中每个低层参数分发组中的节点,其中,发送方式也可以采用广播方式,或者迭代的方式。
步骤207:在满足预设条件时,将参数收集组和参数分发组中包括的节点进行重新分组。
其中,预设条件可以是经过一定的时间,或者完成一定次数的模型参数的融合,或者是完成一定次数的迭代计算等等,本发明实施例对此不作限定。
另外,当执行主体为独立于参数收集组之外的设备时,比如参数服务器,在满足预设条件时,由参数服务器直接将参数收集组和参数分发组中包括的节点进行重新分组;当执行主体为参数收集组内的一个节点时,比如控制节点,由控制节点将参数收集组和参数分发组中包括的节点进行重新分组。
可选的,将参数收集组和参数分发组中包括的节点进行重新分组,包括:基于预设的节点标识与节点编号之间的对应关系,以及参数收集组个数和参数分发组个数,用节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
具体的,对参数收集组和参数分发组中包括的节点进行重新分组的方法可以根据下述实施例提供的节点分组方法进行重新分组,本发明实施例在此不再赘述。
当重新分组之后,可以返回步骤202基于数据子集和当前的模型参数继续进行迭代计算,直到输出最终的模型参数。
进一步,在步骤202-207的执行过程中,当有新增加的节点时,若执行主体为参数服务器,由参数服务器为新增加的节点分配最底层的参数服 务器的IP地址,由最底层参数服务器为新增加的节点发送模型参数,新增节点从存储服务器获取数据子集,新增加的节点基于接收的模型参数和数据子集进行迭代计算。
若执行主体为控制节点,由控制节点为新增加的节点分配一个之前参与迭代计算的其他节点的IP地址,由该节点为新增节点发送模型参数,新增节点从存储服务器获取数据子集,新增加的节点基于接收的模型参数和数据子集进行迭代计算。
本发明实施例提供的一种模型参数融合方法,通过参数收集组进行组内融合得到第一模型参数,将第一模型参数发送给参数收集组对应的参数分发组,之后,将W个参数收集组中每个参数收集组的第一模型参数进行整体融合,得到第二模型参数,再将W个参数收集组进行组间融合,得到第三模型参数,并且在满足预设条件时进行节点的重新分组,解决了模型参数融合中对参数服务器性能要求高、数据传输量大和动态调整计算资源的问题。
实施例三
本发明实施例提供一种节点分组方法,应用于机器学习***,该机器学习***包含至少两个节点,该方法包括:
对所述机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同。
其中,每个参数收集组对应至少一个参数分发组是指一个参数收集组可以对应一个参数分发组,或者对应多个参数分发组。
另外,至少一个参数收集组包含的节点与该参数收集组对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,可以是指有一个参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
可选的,不同参数收集组的节点个数相同或者不同;和/或,
不同参数分发组的节点个数相同或者不同;和/或,
一个参数收集组的节点个数与所述参数收集组对应的参数分发组的节点的节点个数相同或者不同。
可选的,所述机器学习***还可以包括参数服务器,一个参数收集组以及所述参数收集组对应的参数分发组对应同一个参数服务器,不同参数收集组以及所述参数收集组对应的参数分发组对应不同参数服务器。
可选的,参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
比如,如图3所示为Y等于2时的一种参数服务器的结构示意图,如图3所示,参数服务器1对应参数服务器2和参数服务器3,由节点1、节点2、节点3、节点4和节点5组成的参数收集组,以及与参数收集组对应的参数分发组与第1层的参数服务器1和2对应。
可选的,如图4所示,对所述机器学习***内的节点进行分组,具体包括以下步骤。
步骤301:建立节点标识与节点编号之间的对应关系。
其中,节点标识用于唯一标识该节点,比如,节点标识可以是节点的IP地址,节点的序列码等等,本发明对此不作限定。节点编号可以是随机分配给节点的序号,也可以是随机分配给节点的任一数值等,本发明同样对此不作限定。
比如,有6个节点参与模型参数融合的计算,且节点标识为节点的IP地址,每个节点的IP地址如下表1所示,建立如下表1所示的节点标识与节点编号之间的对应关系。
节点标识 节点编号
192.168.1.1 2
192.168.1.2 0
192.168.1.3 3
192.168.1.4 1
192.168.1.4 5
192.168.1.4 4
步骤302:确定所述参数收集组的个数、以及所述参数分发组的个数。
比如,参数收集组的个数为2,参数分发组的个数为3。
步骤303:基于所述节点标识与节点编号之间的对应关系、所述参数收集组个数和所述参数分发组个数,确定参数收集组和参数分发组。
具体的,基于节点标识与节点编号之间的对应关系、参数收集组个数和参数分发组个数,确定参数收集组和参数分发组,可以包括:用节点标识对应的节点编号除以所述参数收集组的个数,得到节点的收集组余数;所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
比如,将上述表1所示的每个节点除以参数收集组的个数2,得到节点的收集组余数分别为:节点编号2、0、4的收集组余数为0,节点编号3、1、5的收集组余数为1;将上述表1所示的每个节点除以参数分发组的个数3,得到节点的收集组余数分别为:节点编号0、3的分发组余数为0,节点编号为1、4的分发组余数为1,节点编号为2、5的分发组余数为2;将收集组余数为0的节点确定为参数收集组0,收集组余数为1的节点确定为参数收集组1,同理,得到参数分发组0、参数分发组1和参数分发组2。
步骤304:确定所述参数收集组与所述参数分发组的对应关系。
在确定参数收集组和参数分发组之后,可以基于确定的参数收集组和参数分发组确定二者的对应关系。比如,确定参数收集组0与参数分发组1、参数分发组2对应,参数收集组1与参数分发组0对应。
需要说明的是,通过本发明实施例提供的节点分组方法在每一次进行节点分组时,每个节点的节点编号是可以变化的,参数收集组和参数分发组的个数也可以变化,同时参数收集组与参数分发组的对应关系也可以相应的发生变化。
本发明实施例提供一种节点分组方法,通过对机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相 同,从而解决了模型参数融合中对参数服务器性能要求高和动态调整计算资源的问题。
实施例四
图5为本发明实施例提供一种模型参数融合装置,该装置应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,该装置包括:
第一融合单元401,用于在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;
第一发送单元402,用于向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
其中,组内融合条件可以是该参数收集组中完成当前模型参数迭代计算的节点个数达到预设数值,即最低融合节点数s。
具体地,当任一参数收集组完成当前模型参数迭代计算的节点个数达到最低融合节点数s时,第一融合单元从该参数收集组中选取已完成当前模型参数计算的M个节点,并将该M个节点计算得到的模型参数进行融合,得到第一模型参数。之后,当满足组内分发条件时,第一发送单元基于参数收集组与参数分发组的对应关系,将第一模型参数发送给对应的参数分发组中的全部节点,或者部分节点。
需要说明的是,最低融合节点数s、M和N可以事先设置,且s≤M≤参数收集组包含节点的总个数,1≤N≤参数收集组对应的参数分发组包含节点的总个数。
另外,机器学习***包括的参数收集组的个数,每个参数收集组包括的节点个数、以及与每参数收集组对应的参数分发组的个数、每个参数分 发组包括的节点个数可以事先确定。
再者,至少一个参数收集组包含的节点与所对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,可以是参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
进一步,将第一模型参数发送给参数收集组对应的参数分发组时,也可以将参与第一模型参数融合的地址信息发送给参数分发组中的节点,该地址信息可以是节点的IP地址或者节点编号等等,本发明对此不作限定。
可选的,第一融合单元401包括:
接收模块,用于接收所述满足条件的参数收集组中完成迭代的M个节点发送的M个节点的模型参数;
融合模块,用于根据接收的M个节点的模型参数进行融合,得到该M个节点参数收集组的第一模型参数。
其中,融合模块可以通过多种不同的融合方式将该M个节点对应的模型参数进行融合,得到第一模型参数,比如,融合模块一次性地将该M个节点对应的模型参数进行融合,得到第一模型参数;或者每个节点在完成迭代后将模型参数发送给融合模块,融合模块接收来自节点的参数并融合,经过多次接收、融合的过程,直到该M个节点都完成融合,得到第一模型参数等等,本发明实施例对此不作限定。
可选的,第一融合单元401包括:
获取模块,用于获取所述满足条件的参数收集组中节点的状态信息;其中,该节点状态信息可以包括节点标识和完成迭代的节点顺序。
指示模块,用于根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述满足条件的参数收集组的所述第一模型参数。
其中,指示模块可以指示完成迭代的M个节点通过不同的组合方式进行融合,比如,可以指示该M个节点将对应的模型参数发送给其中的一个节点,由该节点进行一次融合,得到第一模型参数,或者指示模块通过下述可选的具体方式指示进行融合,以提高该M个节点进行融合,得 到第一模型参数的效率,当然,指示模块也可以通过其他的组合方式指示进行融合,本发明实施例对此不作限定。
可选的,指示模块具体用于:
根据所述满足条件的参数收集组中节点的状态信息,确定所述满足条件的参数收集组中s个完成迭代的节点;
指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;此时,该节点可以称为融合节点。
也即是,在确定参数收集组中完成迭代的s个节点之后,指示模块将该s个节点中的一个节点作为融合节点,其余的节点分别将当前迭代得到的模型参数发送给该融合节点,由该融合节点将该s个节点对应的模型参数进行融合。
需要说明的是,该融合节点可以是最后一个完成迭代的节点,也可以是节点编号最小的节点,本发明实施例对此不作限定。
其中,当融合节点将该s个节点对应的模型参数进行融合的过程中,若有新增节点完成迭代,可以根据新增节点的个数与所述s大小关系可以分为两种情况:
第一种情况、新增x个节点,在所述x<所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数;
第二种情况、新增y个节点,在所述y≥所述s时,若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合。
需要说明的是,在上述两种情况之后,若该M个节点中存在剩余节点没有参与融合,指示模块可以指示该剩余节点可以继续通过上述两种情况提供的方法进行模型参数的融合,以提高该M节点的模型参数进行融合的效率,当然,也可以通过其他方式进行融合,本发明实施例对此不作限定。
另外,所述新增节点中的一个节点可以是新增节点中节点编号最小的节点,也可以是最晚完成迭代的节点,本发明实施例对此不作限定。
可选的,如图6所示,该装置还包括:
第二融合单元403,用于在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;
其中,W个参数收集组由W个参数收集组的上一层参数收集组确定,W≤上一层参数收集组包含组数的总个数。
其中,组间融合条件可以为参数收集组的组内融合次数达到预设次数。相应的,当W个参数收集组的组内融合次数达到预设次数时,W个参数收集组中的每个参数收集组可以将当前的模型参数发送给第二融合单元,由第二融合单元将该参数收集组内全部节点当前的模型参数进行整体融合,得到第二模型参数,从而获得W个参数收集组中每个参数收集组的第二模型参数。
第三融合单元404,用于将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;
第二发送单元405,用于将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
其中,第二发送单元405不仅可以通过广播方式进行发送,还可以通过迭代的方式进行发送,即第二发送单元将第三模型参数分别发送给W个参数收集组包括的每个参数收集中的一个节点,由该节点将第三模型参数依次迭代的发送给组内的其它节点。
之后,第二发送单元还可以将第三模型参数发送给W个参数收集组中每个参数收集组对应的参数分发组中的节点,其中,发送方式可以采用广播方式,或者迭代的方式。
可选的,第三融合单元404具体用于:
从所述W个参数收集组中确定一个节点作为组间融合节点;
在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模 型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数。
其中,该组间融合节点可以是由W个参数收集组中节点共同推荐的一个节点,也可以是最先完成迭代的节点,或者是节点编号最小的节点,本发明实施例对此不作限定。
另外,当从W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点时,可以选择参数收集组内负责整体融合的节点,当然,在实际应用中,第三融合单元也可以选择其他参数收集组中的其他节点,本发明实施例对此不作限定。
或者,
分别从W个参数收集组的每个参数收集组中确定一个节点,将所述确定的节点确定为新参数收集组;
当新参数收集组满足组内融合条件时,将满足组内融合条件的W个参数收集组的第二模型参数进行融合,得到第三模型参数。
需要说明的是,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,是指将W个参数收集组中每个参数收集组的第二模型参数都进行融合。
另外,从W个参数收集组的每个参数收集组中确定一个节点时,可以选择每个参数收集组内负责整体融合的节点,也可以选择编号最小的节点等,本发明对此不作限定。
再者,新参数收集组进行模型参数融合的方法与上述满足条件的参数收集组进行组内融合的方法类似,本发明在此不再赘述。
比如,从W个参数收集组的每个参数收集组中分别选择出负责组内整体融合的节点,得到W个节点,将该W个节点确定为新参数收集组,当该新参数收集组中的节点满足组内融合条件时,将W个节点对应的W个第二模型参数按照组内融合方式进行融合,比如,当该组内融合条件为完成整体融合的节点个达到预设个数时,若该新参数收集组中完成整体融合的节点个数达到预设个数时,可以将完成整体融合的这部分节点进行融合,之后再与完成组内融合的其它节点进行融合等,当然,也可以将W个节点对应的W个第二模型参数一次性进行融合,本发明实施例对此不 作限定。
可选的,第一发送单元402具体用于:
通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述满足条件的参数收集组的第一模型参数;或者,
向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述除所述第一节点之外的参数收集组的所述第一模型参数。
在将第三模型参数发送给W个参数收集组的节点之后,可以再将第三模型参数发送给W个参数收集组中每个参数收集组对应的参数分发组中的节点或由最后进行融合的节点发送给W个参数收集上一层的参数分发组的节点。
可选的,如图7所示,该装置还包括:
第一分组单元406,用于在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
其中,预设条件可以是经过一定的时间,或者完成一定次数的模型参数的融合,或者是完成一定次数的迭代等等,本发明实施例对此不作限定。
另外,对参数收集组和参数分发组中包括的节点进行重新分组的步骤可以由本发明第四方面提供的节点分组装置进行重新分组,本发明在此不做赘述。
可选的,当该机器学习***还包括参数服务器,一个参数收集组以及该参数收集组对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
进一步,该参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,参数收集组、以及参数收集组所对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
可选的,第一分组单元,用于基于预设的节点标识与节点编号之间的对应关系,以及参数收集组个数和参数分发组个数,用节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所 述节点的分发组余数;
将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
具体的,对参数收集组和参数分发组中包括的节点进行重新分组还可以通过下述实施例五提供的节点分组装置进行重新分组,本发明实施例在此不再赘述。
本发明实施例提供的一种模型参数融合装置,通过参数收集组进行组内融合得到第一模型参数,将第一模型参数发送给参数收集组对应的参数分发组,之后,将W个参数收集组中每个参数收集组的第一模型参数进行整体融合,得到第二模型参数,再将W个参数收集组进行组间融合,得到第三模型参数,并且在满足预设条件时进行节点的重新分组,解决了模型参数融合中对参数服务器性能要求高、数据传输量大和动态调整计算资源的问题。
实施例五
本发明实施例提供一种节点分组装置,应用于机器学习***,机器学习***包含至少两个节点,该装置包括:
第二分组单元,用于对所述机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同。
其中,每个参数收集组对应至少一个参数分发组是指一个参数收集组可以对应一个参数分发组,或者对应多个参数分发组。
另外,至少一个参数收集组包含的节点与该参数收集组对应的参数分发组包含的节点不相同,即指至少有一个参数收集组包含的节点与所对应的参数分发组包含的节点不完全相同,可以是指有一个参数收集组中包括至少一个节点和该参数收集组对应的参数分发组中的节点不同,也可以是指该参数收集组中包括的所有节点和该参数收集组对应的参数分发组包括的所有节点不同。
可选的,不同参数收集组的节点个数相同或者不同;和/或,
不同参数分发组的节点个数相同或者不同;和/或,
一个参数收集组的节点个数与所述参数收集组对应的参数分发组的节点的节点个数相同或者不同。
可选的,机器学习***还包括参数服务器,一个参数收集组以及该参数收集组对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
可选的,参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
进一步,当参数服务器包括Y层时,可以确定每层参数服务器的个数,以及下层参数服务器与上层参数服务器之间的对应关系。
其中,下层参数服务器与上层参数服务器之间的对应关系可以事先设置,也可以在节点分组过程中进行确定,比如,可以通过下述确定参数收集组或者参数分发组的方法来确定下层参数服务器与上层参数服务器之间的对应关系,具体方法可以参考下述确定参数收集组或者参数分发组的方法,本发明实施例在此不再赘述。
可选的,第二分组单元具体包括:
第一确定模块,用于确定节点标识与节点编号之间的对应关系;
第二确定模块,用于确定所述参数收集组的个数、以及所述参数分发组的个数;
第三确定模块,用于基于所述节点标识与节点编号之间的对应关系、所述参数收集组个数和所述参数分发组个数,确定参数收集组和参数分发组;
第四确定模块,用于确定所述参数收集组与所述参数分发组的对应关系。
其中,节点标识用于唯一标识该节点,比如,节点标识可以是节点的IP地址,节点的序列码等等,本发明对此不作限定。节点编号可以是随机分配给节点的序号,也可以是随机分配给节点的任一数值等,本发明同样对此不作限定。
当满足预设条件,通过本发明实施例提供的节点分组方法进行重新分组时,每个节点的节点编号是可以变化的,参数收集组和参数分发组的个 数也可以变化,同时参数收集组与参数分发组的对应关系也可以相应的发生变化。
可选的,第三确定模块具体用于:
用所述节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
本发明实施例提供一种节点分组装置,通过对机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同,从而解决了模型参数融合中对参数服务器性能要求高和动态调整计算资源的问题。
实施例六
图8为本发明实施例提供一种模型参数融合装置,所述模型参数融合装置包括存储器801、处理器802、电源组件803、输入\输出接口804和通信组件805等,所述处理器802用于执行上述实施例二所述的模型参数融合方法。
本领域普通技术人员可以理解,图8所示的结构仅为示意,其并不对模型参数融合装置的结构造成限定。例如,该模型参数融合装置还可包括比图8中所示更多或者更少的组件,或者具有与图8所示不同的配置。
下面对模型参数融合装置的各个构成部件进行具体的介绍:
存储器801可用于存储数据、软件程序以及模块;主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序等;存储数据区可存储根据模型参数融合装置的使用所创建的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存 储器件。
处理器802是模型参数融合装置的控制中心,利用各种接口和线路连接整个模型参数融合装置的各个部分,通过运行或执行存储在存储器801内的软件程序和/或模块,以及调用存储在存储器801内的数据,执行模型参数融合装置的各种功能和处理数据,从而对模型参数融合装置进行整体监控。可选的,处理器802可包括一个或多个处理单元;优选的,处理器502可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器802中。
电源组件803用于为模型参数融合装置的各个组件提供电源,电源组件503可以包括电源管理***,一个或多个电源,及其他与模型参数融合装置生成、管理和分配电力相关联的组件。
输入\输出接口804为处理器802和***接口模块之间提供接口,比如,***接口模块可以键盘、鼠标等。
通信组件805被配置为便于模型参数融合装置和其他设备之间有线或无线方式的通信。该模型参数融合装置可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合等。
尽管未示出,该模型参数融合装置还可以包括音频组件和多媒体组件等,本发明实施例在此不再赘述。
可选的,所述模型参数融合装置为参数服务器,所述参数服务器独立于所述节点设置,或者配置在所述节点上。
本发明实施例提供的一种模型参数融合装置,通过参数收集组进行组内融合得到第一模型参数,将第一模型参数发送给参数收集组对应的参数分发组,之后,将W个参数收集组中每个参数收集组的第一模型参数进行整体融合,得到第二模型参数,再将W个参数收集组进行组间融合,得到第三模型参数,并且在满足预设条件时进行节点的重新分组,解决了模型参数融合中对参数服务器性能要求高、数据传输量大的问题。
实施例七
图9为本发明实施例提供一种模型参数融合装置,所述模型参数融合 装置包括存储器901、处理器902、电源组件903、输入\输出接口904和通信组件905等,所述处理器902用于执行上述实施例三所述的节点分组方法。
本领域普通技术人员可以理解,图9所示的结构仅为示意,其并不对控制器的结构造成限定。例如,该控制器还可包括比图9中所示更多或者更少的组件,或者具有与图9所示不同的配置。
下面对控制器的各个构成部件进行具体的介绍:
存储器901可用于存储数据、软件程序以及模块;主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序等;存储数据区可存储根据模型参数融合装置的使用所创建的数据等。此外,存储器可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器902是控制器的控制中心,利用各种接口和线路连接整个控制器的各个部分,通过运行或执行存储在存储器901内的软件程序和/或模块,以及调用存储在存储器901内的数据,执行控制器的各种功能和处理数据,从而对控制器进行整体监控。可选的,处理器902可包括一个或多个处理单元;优选的,处理器502可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器902中。
电源组件903用于为控制器的各个组件提供电源,电源组件503可以包括电源管理***,一个或多个电源,及其他与控制器生成、管理和分配电力相关联的组件。
输入\输出接口904为处理器902和***接口模块之间提供接口,比如,***接口模块可以键盘、鼠标等。
通信组件905被配置为便于控制器和其他设备之间有线或无线方式的通信。该控制器可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合等。
尽管未示出,该控制器还可以包括音频组件和多媒体组件等,本发明 实施例在此不再赘述。
本发明实施例提供的一种控制器,通过对机器学习***内的节点进行分组,使得所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,至少一个所述参数收集组包含的节点与所述参数收集组对应的参数分发组包含的节点不相同,从而解决了模型参数融合中对参数服务器性能要求高和动态调整计算资源的问题。
实施例八
本发明实施例提供一种机器学习***,所述机器学习***包括实施例六所述的模型参数融合装置,以及实施例七所述的控制器。
本发明实施例提供的一种机器学习***,模型参数融合装置通过参数收集组进行组内融合得到第一模型参数,将第一模型参数发送给参数收集组对应的参数分发组,之后,将W个参数收集组中每个参数收集组的第一模型参数进行整体融合,得到第二模型参数,再将W个参数收集组进行组间融合,得到第三模型参数,并且在满足预设条件时,通过控制器对参数收集组和参数分发组中的节点进行重新分组,解决了模型参数融合中对参数服务器性能要求高、数据传输量大和动态调整计算资源的问题。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (24)

  1. 一种模型参数融合方法,其特征在于,所述方法应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少有一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述方法包括:
    在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;
    向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
  2. 根据权利要求1所述方法,其特征在于,
    所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:
    接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;
    根据接收的所述M个节点的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
  3. 根据权利要求1所述的方法,其特征在于,
    所述融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组融合后的第一模型参数,包括:
    获取所述满足条件的参数收集组中节点的状态信息;
    根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述满足条件的参数收集组的所述第一模型参数。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,包括:
    根据所述满足条件的参数收集组中节点的状态信息,确定所述满足条件的参数收集组中s个完成迭代的节点;
    指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;
    若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数,其中,所述x<所述s;
    若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合,其中,所述y≥所述s。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述方法还包括:
    在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;
    将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;
    将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数,包括:
    从所述W个参数收集组中确定一个节点作为组间融合节点;
    在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数;
    或者,
    分别从W个参数收集组中的每个参数收集组确定一个节点,将所述确定的节点确定为新参数收集组;
    当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
  7. 根据权利要求1所述的方法,其特征在于,所述向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,包括:
    通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述满足条件的参数收集组的第一模型参数;或者,
    向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的所述第一模型参数。
  8. 根据权利要求1或2所述的方法,其特征在于,所述机器学习***还包括参数服务器,一个参数收集组以及对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
  9. 根据权利要求8所述的方法,其特征在于,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述方法还包括:
    在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
  11. 根据权利要求10所述的方法,其特征在于,所述将所述参数收集组和所述参数分发组中包括的节点进行重新分组,包括:
    基于预设的节点标识与节点编号之间的对应关系,以及参数收集组个数和参数分发组个数,用节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
    用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
    将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分 发组余数相同的节点确定为同一参数分发组。
  12. 一种模型参数融合装置,其特征在于,所述装置应用于机器学习***,所述机器学习***包括至少一个参数收集组和至少一个参数分发组,每个参数收集组对应至少一个参数分发组,每个参数收集组中包含至少一个节点,每个参数分发组中包含至少一个节点,至少一个所述参数收集组包含的节点与所对应的参数分发组包含的节点不相同,所述装置包括:
    第一融合单元,用于在任一参数收集组满足组内融合条件时,融合所述满足条件的参数收集组中M个节点的模型参数,得到所述满足条件的参数收集组的第一模型参数,其中,所述满足条件的参数收集组最低融合节点数s≤所述M≤所述满足条件的参数收集组包含节点的总个数;
    第一发送单元,用于向所述满足条件的参数收集组对应的参数分发组中的N个节点发送所述满足条件的参数收集组的所述第一模型参数,其中,1≤所述N≤所述满足条件的参数收集组对应的参数分发组包含节点的总个数。
  13. 根据权利要求12所述装置,其特征在于,所述第一融合单元包括:
    接收模块,用于接收所述满足条件的参数收集组中完成迭代的M个节点发送的所述M个节点的模型参数;
    融合模块,用于根据接收的所述M个节点发送的模型参数进行融合,得到所述满足条件的参数收集组的第一模型参数。
  14. 根据权利要求12所述的装置,其特征在于,所述第一融合单元包括:
    获取模块,用于获取所述满足条件的参数收集组中节点的状态信息;
    指示模块,用于根据所述满足条件的参数收集组中节点的状态信息,指示所述满足条件的参数收集组中完成迭代的M个节点进行模型参数融合,得到所述参数收集组的第一模型参数。
  15. 根据权利要求14所述的装置,其特征在于,所述指示模块具体用于:
    根据所述参数收集组中节点的状态信息,确定所述参数收集组中s个完成迭代的节点;
    指示完成迭代的s个节点中的一个节点融合所述s个节点的模型参数;
    若在所述完成迭代的s个节点进行模型参数融合过程中,新增x个节点完成迭代,则指示所述新增x个节点中的一个节点融合所述新增x个节点的模型参数以及所述s个节点融合后的模型参数,其中,所述x<所述s;
    若在所述完成迭代的s个节点进行模型参数融合过程中,新增y个节点完成迭代,则指示所述新增y个节点中的一个节点融合所述y个节点的模型参数,并将所述y个节点融合后的模型参数与所述s个节点融合后的模型参数再次进行融合,其中,所述y≥所述s。
  16. 根据权利要求12-15任一项所述的装置,其特征在于,所述装置还包括:
    第二融合单元,用于在W个参数收集组之间满足组间融合条件时,分别将所述W个参数收集组的每个参数收集组中节点的模型参数进行整体融合,获得所述W个参数收集组中每个参数收集组的第二模型参数;
    第三融合单元,用于将所述W个参数收集组中每个参数收集组的第二模型参数进行融合,得到第三模型参数;
    第二发送单元,用于将所述第三模型参数发送给所述W个参数收集组的节点或发送给所述W个参数收集组上一层参数分发组的节点。
  17. 根据权利要求16所述的装置,其特征在于,所述第三融合单元具体用于:
    从所述W个参数收集组中确定一个节点作为组间融合节点;
    在所述W个参数收集组中除所述组间融合节点所在的参数收集组之外的其他参数收集组中分别选择一个节点将对应的参数收集组的第二模型参数发送给所述组间融合节点,使得所述组间融合节点将所述W个参数收集组的第二模型参数进行融合,得到所述第三模型参数;
    或者,
    分别从所述W个参数收集组包括的每个参数收集组确定一个节点,将所述确定的节点确定为新参数收集组;
    当所述新参数收集组满足组内融合条件时,将满足组内融合条件的所述W个参数收集组的第二模型参数进行融合,得到第三模型参数。
  18. 根据权利要求12所述的装置,其特征在于,所述第一发送单元具体用于:
    通过广播方式向所述满足条件的参数收集组对应的参数分发组中的节点发送所述参数收集组的第一模型参数;或者,
    向所述满足条件的参数收集组对应的参数分发组中第一节点发送所述满足条件的参数收集组的所述第一模型参数,使得所述第一节点通过迭代方式依次向所述N个节点中除所述第一节点之外的其余节点发送所述满足条件的参数收集组的第一模型参数。
  19. 根据权利要求12或13所述的装置,其特征在于,所述机器学习***还包括参数服务器,一个参数收集组以及对应的参数分发组对应同一个参数服务器,不同参数收集组以及对应的参数分发组对应不同参数服务器。
  20. 根据权利要求19所述的装置,其特征在于,所述参数服务器包括Y层,且第j+1层的一个参数服务器对应第j层的至少一个参数服务器,所述参数收集组、以及所述参数收集组对应的参数分发组与第1层参数服务器对应,其中,1≤j<j+1≤Y。
  21. 根据权利要求12-20任一项所述的装置,其特征在于,所述装置还包括:
    第一分组单元,用于在满足预设条件时,将所述参数收集组和所述参数分发组中包括的节点进行重新分组。
  22. 根据权利要求21所述的装置,其特征在于,所述第一分组单元具体用于:
    基于预设的节点标识与节点编号之间的对应关系、以及参数收集组个数和参数分发组个数,用所述节点标识对应的节点编号除以所述参数收集组的个数,得到所述节点的收集组余数;
    用所述节点标识对应的节点编号除以所述参数分发组的个数,得到所述节点的分发组余数;
    将所述收集组余数相同的节点确定为同一参数收集组,以及将所述分发组余数相同的节点确定为同一参数分发组。
  23. 一种模型参数融合装置,其特征在于,所述模型参数融合装置包括处理器和存储器,所述存储器中存储代码和数据,所述处理器可运行存储器中的代码,所述处理器用于执行上述权利要求1-11任一项 所述的模型参数融合方法。
  24. 根据权利要求23所述的装置,其特征在于,所述模型参数融合装置为参数服务器,所述参数服务器独立于所述节点设置,或者配置在所述节点上。
PCT/CN2015/094722 2015-11-16 2015-11-16 模型参数融合方法及装置 WO2017084016A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020187017016A KR102118073B1 (ko) 2015-11-16 2015-11-16 모델 파라미터 조합 방법 및 장치
EP15908513.3A EP3370159A4 (en) 2015-11-16 2015-11-16 Model parameter fusion method and apparatus
PCT/CN2015/094722 WO2017084016A1 (zh) 2015-11-16 2015-11-16 模型参数融合方法及装置
CN201580001411.5A CN107209746B (zh) 2015-11-16 2015-11-16 模型参数融合方法及装置
US15/980,866 US11386350B2 (en) 2015-11-16 2018-05-16 Model parameter combination method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/094722 WO2017084016A1 (zh) 2015-11-16 2015-11-16 模型参数融合方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/980,866 Continuation US11386350B2 (en) 2015-11-16 2018-05-16 Model parameter combination method and apparatus

Publications (1)

Publication Number Publication Date
WO2017084016A1 true WO2017084016A1 (zh) 2017-05-26

Family

ID=58717192

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/094722 WO2017084016A1 (zh) 2015-11-16 2015-11-16 模型参数融合方法及装置

Country Status (5)

Country Link
US (1) US11386350B2 (zh)
EP (1) EP3370159A4 (zh)
KR (1) KR102118073B1 (zh)
CN (1) CN107209746B (zh)
WO (1) WO2017084016A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447274A (zh) * 2017-08-30 2019-03-08 第四范式(北京)技术有限公司 用于执行机器学习的分布式***及其方法
US20210271975A1 (en) * 2019-04-10 2021-09-02 Tencent Technology (Shenzhen) Company Limited User tag generation method and apparatus, storage medium, and computer device
US11373116B2 (en) * 2015-11-16 2022-06-28 Huawei Technologies Co., Ltd. Model parameter fusion method and apparatus

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10521332B1 (en) * 2018-09-28 2019-12-31 Dspace Digital Signal Processing And Control Engineering Gmbh Parametrization of a simulation model
US11829888B2 (en) 2019-03-27 2023-11-28 International Business Machines Corporation Modifying artificial intelligence models using model fragments
CN110705177B (zh) * 2019-09-29 2023-05-16 支付宝(杭州)信息技术有限公司 基于机器学习的终端风险评估模型的生成方法及其***
KR102295948B1 (ko) * 2019-11-26 2021-08-30 한전케이디엔주식회사 연합 학습을 통한 인공지능 기반 보안관제 시스템 및 방법
CN111191792B (zh) * 2019-12-11 2022-07-15 深圳平安医疗健康科技服务有限公司 数据分发方法、装置和计算机设备
CN111178443B (zh) * 2019-12-31 2023-10-31 东软集团股份有限公司 模型参数选择、图像分类、信息识别方法及装置、设备
WO2021097494A2 (en) * 2020-05-30 2021-05-20 Futurewei Technologies, Inc. Distributed training of multi-modal machine learning models
CN114221871A (zh) * 2021-04-09 2022-03-22 无锡江南计算技术研究所 一种网格化流水的全收集方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011129819A1 (en) * 2010-04-13 2011-10-20 Empire Technology Development Llc Combined-model data compression
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
CN104699894A (zh) * 2015-01-26 2015-06-10 江南大学 基于实时学习的高斯过程回归多模型融合建模方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012093899A (ja) * 2010-10-26 2012-05-17 Hitachi Ltd 計算機システム、シミュレーション方法、及びプログラム
US9633315B2 (en) 2012-04-27 2017-04-25 Excalibur Ip, Llc Method and system for distributed machine learning
CN104463424A (zh) 2014-11-11 2015-03-25 上海交通大学 众包中任务最优分配方法及其***
CN104834709B (zh) * 2015-04-29 2018-07-31 南京理工大学 一种基于负载均衡的并行余弦模式挖掘方法
EP3745284A1 (en) * 2015-11-16 2020-12-02 Huawei Technologies Co., Ltd. Model parameter fusion method and apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011129819A1 (en) * 2010-04-13 2011-10-20 Empire Technology Development Llc Combined-model data compression
CN104463324A (zh) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 一种基于大规模高性能集群的卷积神经网络并行处理方法
CN104699894A (zh) * 2015-01-26 2015-06-10 江南大学 基于实时学习的高斯过程回归多模型融合建模方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SEBASTIAN RIEDEL ET AL., MODEL COMBINATION FOR EVENT EXTRACTION IN BIONLP 2011, 31 December 2011 (2011-12-31), pages 51 - 55, XP055382437 *
See also references of EP3370159A4 *
WANG, YANG. ET AL.: "Multiple Rank Aggregation Based on Directly Optimizing Performace Measure", CHINESE JOURNAL OF COMPUTER, vol. 37, no. 8, 31 August 2014 (2014-08-31), pages 1658 - 1668, XP009506019 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11373116B2 (en) * 2015-11-16 2022-06-28 Huawei Technologies Co., Ltd. Model parameter fusion method and apparatus
CN109447274A (zh) * 2017-08-30 2019-03-08 第四范式(北京)技术有限公司 用于执行机器学习的分布式***及其方法
US20210271975A1 (en) * 2019-04-10 2021-09-02 Tencent Technology (Shenzhen) Company Limited User tag generation method and apparatus, storage medium, and computer device

Also Published As

Publication number Publication date
CN107209746B (zh) 2019-10-22
KR102118073B1 (ko) 2020-06-02
EP3370159A1 (en) 2018-09-05
CN107209746A (zh) 2017-09-26
EP3370159A4 (en) 2018-12-26
KR20180082577A (ko) 2018-07-18
US11386350B2 (en) 2022-07-12
US20180260739A1 (en) 2018-09-13

Similar Documents

Publication Publication Date Title
WO2017084016A1 (zh) 模型参数融合方法及装置
CN108028803B (zh) 用于确定网络中的服务方案的拓扑的方法、控制器和***
US11373116B2 (en) Model parameter fusion method and apparatus
CN110069341B (zh) 边缘计算中结合功能按需配置的有依赖关系任务的调度方法
CN104052803A (zh) 一种去中心化的分布式渲染方法及渲染***
TW201717066A (zh) 叢集運算架構的資源規劃方法、系統及裝置
CN110209549B (zh) 数据处理方法、相关装置、相关设备和***
CN103327121A (zh) 一种p2p网络资源传输方法和装置
US20230281513A1 (en) Data model training method and apparatus
CN105703927A (zh) 一种资源分配方法、网络设备和网络***
CN107656807A (zh) 一种虚拟资源的自动弹性伸缩方法及装置
CN106201715A (zh) 一种任务调度方法和装置
CN104883585A (zh) 显示媒体数据的方法、设备及***
CN110891087B (zh) 一种日志传输方法、装置及电子设备和存储介质
CN111711702B (zh) 一种基于通信拓扑的分布式协同交互方法及***
CN110557679A (zh) 一种视频内容识别方法、设备、介质和***
CN113254215B (zh) 数据处理方法和装置、存储介质及电子设备
CN112543354B (zh) 业务感知的分布式视频集群高效伸缩方法和***
CN109474696A (zh) 一种网络服务方法、装置、电子设备及可读存储介质
CN110362575B (zh) 一种生成数据的全局索引的方法及装置
CN106294721A (zh) 一种集群数据统计及导出方法及装置
CN113015179A (zh) 基于深度q网络的网络资源选择方法、装置以及存储介质
CN104780562B (zh) 一种处理数据的方法、装置及***
WO2024139573A1 (zh) 阈值确定方法、装置、存储介质及电子装置
EP4383665A1 (en) System and method for finding configuration mappings in monitoring networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15908513

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2015908513

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20187017016

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020187017016

Country of ref document: KR