CN112990422A

CN112990422A - Parameter server, client and weight parameter processing method and system

Info

Publication number: CN112990422A
Application number: CN201911273613.1A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2021-06-18

Abstract

The application relates to a parameter server, a client and a method and a system for processing weight parameters. The parameter server comprises a processor and a communication interface; the communication interface is used for receiving the gradient parameters sent by each client; the processor is used for updating the first weight parameter of the neural network model to be trained, which is stored locally, according to each gradient parameter to obtain a second weight parameter; the processor is further configured to perform quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a preset data format conversion algorithm to obtain a third weight parameter; the communication interface is further configured to send the third weight parameter to each client, so that each client updates the weight parameter of the local neural network model to be trained to the third weight parameter. By adopting the method and the device, the training time of the neural network can be shortened.

Description

Parameter server, client and weight parameter processing method and system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and a system for processing parameter servers, clients, and weight parameters.

Background

Neural network models trained on large data sets have significant effects in many fields such as speech recognition, image recognition, natural language processing, and the like. At present, the traditional training mode of the neural network model is a single-machine training mode, that is, a single training machine is adopted to train the neural network model to be trained. In the training process of the neural network model, a large amount of sample data is needed for training to obtain the neural network model meeting the requirements, so that the training time of the single-machine training mode is long. For example, it takes up to a week to complete a training of a reference data set based on ImageNet (a large visualization database for visual object recognition software research) on a training machine with a GPU (Graphics Processing Unit) installed. Therefore, a neural network training scheme capable of reducing the training time is needed.

Disclosure of Invention

Accordingly, it is desirable to provide a parameter server, a client, a method and a system for processing weight parameters.

In a first aspect, a parameter server is provided, the parameter server comprising a processor and a communication interface;

the communication interface is used for receiving the gradient parameters sent by each client;

the processor is used for updating the first weight parameter of the neural network model to be trained, which is stored locally, according to each gradient parameter to obtain a second weight parameter;

the processor is further configured to perform quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a preset data format conversion algorithm to obtain a third weight parameter;

the communication interface is further configured to send the third weight parameter to each client, so that each client updates the weight parameter of the local neural network model to be trained to the third weight parameter.

As an optional implementation manner, the length of the data type corresponding to the second weight parameter is greater than the length of the data type corresponding to the third weight parameter.

As an optional implementation manner, the data type corresponding to the second weight parameter is a single-precision data type or a double-precision data type, and the data type corresponding to the third weight parameter is an integer data type.

As an optional implementation manner, the communication interface is further configured to send the third weight parameter to each client in a broadcast manner.

In a second aspect, there is provided a client comprising a processor and a communication interface;

the communication interface is used for receiving a third weight parameter sent by the parameter server;

the processor is configured to update the weight parameter of the neural network model to be trained to the third weight parameter to obtain an updated neural network model, input sample data to the updated neural network model to obtain an output result, and determine a gradient parameter according to a preset gradient algorithm and the output result;

the communication interface is further configured to send the gradient parameter to the parameter server, so that the parameter server updates the second weight parameter stored in the parameter server according to the gradient parameter sent by each client.

In a third aspect, a system for processing weight parameters is provided, the system comprising a parameter server according to any one of the first aspect and a plurality of clients according to the second aspect.

In a fourth aspect, a method for processing weight parameters is provided, where the method is applied to a parameter server, and the method includes:

receiving gradient parameters sent by each client;

updating a first weight parameter of a locally stored neural network model to be trained according to each gradient parameter to obtain a second weight parameter;

according to a preset quantization algorithm and a data format conversion algorithm, performing quantization processing and data format conversion processing on the second weight parameter to obtain a third weight parameter;

and sending the third weight parameter to each client so that each client updates the weight parameter of the local neural network model to be trained into the third weight parameter.

In a fifth aspect, a method for processing weight parameters is provided, where the method is applied to a client, and the method includes:

receiving a third weight parameter sent by the parameter server;

updating the weight parameter of the neural network model to be trained into the third weight parameter to obtain an updated neural network model, and inputting sample data into the updated neural network model to obtain an output result;

and determining gradient parameters according to a preset gradient algorithm and the output result, and sending the gradient parameters to the parameter server so that the parameter server updates the second weight parameters stored in the parameter server according to the gradient parameters sent by each client.

In a sixth aspect, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 7.

In a seventh aspect, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as claimed in claim 8.

The embodiment of the application provides a parameter server, a client and a method and a system for processing weight parameters. The parameter server includes a processor and a communication interface. The communication interface is used for receiving the gradient parameters sent by each client; the processor is used for updating the first weight parameter of the neural network model to be trained, which is locally stored, according to each gradient parameter to obtain a second weight parameter; the processor is further used for carrying out quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a preset data format conversion algorithm to obtain a third weight parameter; and the communication interface is further used for sending the third weight parameter to each client so that each client updates the weight parameter of the local neural network model to be trained into the third weight parameter. Therefore, the training time of the neural network model can be reduced by simultaneously training the neural network model to be trained through the plurality of clients. In addition, the parameter server performs quantization processing and data format conversion processing on the second weight parameter, and sends the third weight parameter to each client through the communication interface after obtaining the third weight parameter, so that each client does not need to perform the quantization processing and the data format conversion processing on the second weight parameter, and the processing performance of each client is improved. Moreover, the second weight parameter with the larger length of the data type is converted into the third weight parameter with the smaller length of the data type, so that the occupation of bandwidth can be reduced.

Drawings

Fig. 1 is an architecture diagram of a system for processing weight parameters according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for processing a weight parameter according to an embodiment of the present application;

fig. 3 is a flowchart of a method for processing a weight parameter according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, not all embodiments of the present disclosure. All other embodiments, which can be derived by one skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The embodiment of the present application provides a processing system for weight parameters, as shown in fig. 1, the processing system for weight parameters includes a Parameter Server (PS) 110 and a plurality of clients (workers) 120. The parameter server 110 may store therein first weight parameters of the neural network model to be trained. When the parameter server 110 receives the gradient parameters sent by each client 120, the parameter server 110 may update the first weight parameter according to each received gradient parameter to obtain a second weight parameter. Then, the parameter server 110 may perform quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a data format conversion algorithm to obtain a third weight parameter, and send the third weight parameter to each client 120. The same neural network model to be trained is stored in each client 120, and different sample data in the same sample data set are stored. For each client 120, after the client 120 receives the third weight parameter, the weight parameter of the neural network model to be trained may be updated to the third weight parameter. The client 120 may then input the sample data into the neural network model to be trained and output the training results. The client 120 may then determine gradient parameters according to a predetermined gradient algorithm and training results, and send the gradient parameters to the parameter server 110. In this way, the training time can be reduced by training the same neural network model to be trained by a plurality of clients 120. Wherein, the parameter server 110 comprises a processor 111 and a communication interface 112; client 120 includes a processor 121 and a communication interface 122. The processor 111 and the processor 121 may be an IPU (Intelligent Processing Unit), a CPU (Central Processing Unit), or a GPU (Graphics Processing Unit). The functions of the processor 111 and the communication interface 112 in the parameter server 110 are described as follows:

a communication interface 112 for receiving the gradient parameters sent by each client 120.

In practice, during the training process of the neural network, each client 120 may input sample data into the neural network model, resulting in an output result. Each client 120 may then determine gradient parameters based on a preset gradient algorithm and the output results. Each client 120 may then send the gradient parameters to the parameter server 110. Accordingly, the communication interface 112 in the parameter server 110 can receive the gradient parameters sent by each client 120 and send each gradient parameter to the processor 111.

And the processor 111 is configured to update the locally stored first weight parameter of the neural network model to be trained according to each gradient parameter, so as to obtain a second weight parameter.

In an implementation, the parameter server 110 may store therein weight parameters of the neural network model to be trained. After receiving each gradient parameter, the processor 111 may update the first weight parameter of the locally stored neural network model to be trained according to each gradient parameter, to obtain a second weight parameter. The first weight parameter may be an initial weight parameter of the neural network model to be trained, or a weight parameter obtained by updating the weight parameter of the neural network model to be trained last time by the processor 111. Optionally, after receiving each gradient parameter, the processor 111 may further determine an average value of each gradient parameter, and update the first weight parameter of the locally stored neural network model to be trained according to the average value, to obtain a second weight parameter.

The processor 111 is further configured to perform quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a preset data format conversion algorithm, so as to obtain a third weight parameter.

In implementation, in the process of each iterative training, the parameter server 110 updates the first weight parameter according to the gradient parameter of the first data type to obtain a second weight parameter. The data type of the second weight parameter is also the first data type. In order to increase the operation rate of each client 120, after the processor 111 obtains the second weight parameter, the processor may further perform quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a data format conversion algorithm to obtain a third weight parameter of the second data type, and send the third weight parameter to the communication interface 112. The length of the data type corresponding to the second weight parameter is greater than the length of the data type corresponding to the third weight parameter, that is, the length of the first data type is greater than the length of the second data type. In this way, the parameter server 110 quantizes the second weight parameter with a larger length into the third weight parameter with a smaller length, so as to increase the operation rate of each client 120.

In addition, since the parameter server 110 performs quantization processing and data format conversion processing on the second weight parameter to obtain a third weight parameter, and then sends the third weight parameter to each client 120 through the communication interface 112, each client 120 does not need to perform the above-mentioned quantization processing and data format conversion processing on the second weight parameter, thereby improving the processing performance of each client 120.

The communication interface 112 is further configured to send the third weight parameter to each client 120, so that each client 120 updates the weight parameter of the local neural network model to be trained to the third weight parameter.

In an implementation, the communication interface 112 sends the third weight parameter to each client 120, so that each client 120 updates the weight parameter of the neural network model to be trained to the third weight parameter. Subsequently, each client 120 may be iteratively trained based on the updated neural network model. Optionally, the communication interface 112 is further configured to send the third weight parameter to each client 120 in a broadcast manner.

Meanwhile, the processor 111 performs quantization processing and data format conversion processing on the second weight parameter with the larger length of the data type according to a preset quantization algorithm and a data format conversion algorithm to obtain a third weight parameter with the smaller length of the data type. Therefore, the occupation of the bandwidth can be reduced by converting the second weight parameter with the larger length of the data type into the third weight parameter with the smaller length of the data type. For example, the processor 111 may perform quantization processing and data format conversion processing on the second weight parameter with the data type length of 32 bits according to a preset quantization algorithm and a data format conversion algorithm, so as to obtain a third weight parameter with the data type length of 8 bits. Optionally, the data type corresponding to the second weight parameter is a single precision (float) data type or a double precision data (double) type, the data type corresponding to the third weight parameter is an integer (int) data type, and the data type corresponding to the second weight parameter and the data type corresponding to the third weight parameter may also be other data types, which is not limited in the implementation of the present application.

It should be noted that a technician may set a low-precision operation unit in the parameter server 110 to implement the functions of the processor 111.

The functions of the processor 121 and the communication interface 122 in the client 120 are described as follows:

and a communication interface 122, configured to receive the third weight parameter sent by the parameter server 110.

In an implementation, for each client 120, the communication interface 122 in the client 120 may receive the third weight parameter sent by the parameter server 110 and send the received third weight parameter to the processor 121.

The processor 121 is configured to update the weight parameter of the neural network model to be trained to a third weight parameter, to obtain an updated neural network model, input sample data to the updated neural network model, to obtain an output result, and determine a gradient parameter according to a preset gradient algorithm and the output result.

In implementation, after the processor 121 receives the third weight parameter, the weight parameter of the neural network model to be trained may be updated to the third weight parameter, so as to obtain an updated neural network model. Then, the processor 121 may input the sample data to the updated neural network model to obtain an output result, and determine a gradient parameter according to a preset gradient algorithm and the output result. The sample data may be sample image data or voice sample data, and the embodiment of the present application is not limited; the neural network model may be a convolutional neural network model, or may be other types of neural network models, and the embodiments of the present application are not limited. The processor 121 may then send the determined gradient parameters to the communication interface 122. Thus, the training time of the neural network model can be reduced by simultaneously training the neural network model to be trained through the plurality of clients 120.

The communication interface 122 is further configured to send the gradient parameter to the parameter server 110, so that the parameter server 110 updates the second weight parameter stored in the parameter server 110 according to the gradient parameter sent by each client 120.

In an implementation, after receiving the gradient parameters, the communication interface 122 may send the gradient parameters to the parameter server 110, so that the parameter server 110 updates the second weight parameters stored in the parameter server 110 according to the gradient parameters sent by each client 120.

The embodiment of the present application further provides a method for processing weight parameters, where the method is applied to the parameter server 110, and as shown in fig. 2, the processing procedure of the method is as follows:

step 201, receiving the gradient parameters sent by each client 120.

Step 202, updating the first weight parameter of the neural network model to be trained, which is stored locally, according to each gradient parameter, to obtain a second weight parameter.

And 203, performing quantization processing and data format conversion processing on the second weight parameter according to a preset quantization algorithm and a data format conversion algorithm to obtain a third weight parameter.

Step 204, sending the third weight parameter to each client 120, so that each client 120 updates the weight parameter of the local neural network model to be trained to the third weight parameter.

The processing procedures of step 201 to step 204 and the processing procedure type of the parameter server 110 are not described herein again.

The embodiment of the present application further provides a method for processing weight parameters, which is applied to the client 120, as shown in fig. 3, the processing procedure of the method is as follows:

step 301, receiving the third weight parameter sent by the parameter service 110.

Step 302, updating the weight parameter of the neural network model to be trained to a third weight parameter to obtain an updated neural network model, and inputting sample data to the updated neural network model to obtain an output result.

Step 303, determining a gradient parameter according to a preset gradient algorithm and an output result, and sending the gradient parameter to the parameter server 110, so that the parameter server 110 updates the second weight parameter stored in the parameter server 110 according to the gradient parameter sent by each client 120.

The processing procedures of steps 301 to 303 and the type of the processing procedure of the client 120 are not described herein again.

In one embodiment, a computer readable storage medium has a computer program stored thereon, and the computer program is used for implementing the steps of the method for processing the weight parameter when being executed by a processor.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

It is further noted that, although the various steps in the flowcharts of fig. 2-3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing may be better understood in light of the following clauses:

clause a1, corresponding to right 1; clause a2, corresponding to right 2; clause a3, corresponding to right 3; clause a4, corresponding to right 4; clause a5, corresponding to right 5; clause a6, corresponding to right 6; clause a7, corresponding to claim 7; clause A8, corresponding to right 8; clause a9, corresponding to right 9; clause a10, corresponding to claim 10.

For example, clause a1, a parameter server, wherein the parameter server comprises a processor and a communication interface;

Clause a2, the parameter server according to clause a1, wherein the length of the data type corresponding to the second weight parameter is greater than the length of the data type corresponding to the third weight parameter.

Clause A3, the parameter server according to clause a2, wherein the data type corresponding to the second weight parameter is a single-precision data type or a double-precision data type, and the data type corresponding to the third weight parameter is an integer data type.

Clause a4, the parameter server according to clause a1, wherein the communication interface is further configured to send the third weight parameter to the clients by broadcast.

Clause a5, a client, wherein the client comprises a processor and a communication interface;

Clause a6, a system for processing weight parameters, the system comprising a parameter server according to any of clauses a1 to clause a4 and a plurality of clients according to clause a 5.

Clause a7, a method for processing weight parameters, wherein the method is applied to a parameter server, and the method comprises:

receiving gradient parameters sent by each client;

Clause A8, a method for processing weight parameters, the method being applied to a client, the method comprising:

receiving a third weight parameter sent by the parameter server;

Clause a9, a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method of clause a 7.

Clause a10, a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method of clause A8.

The foregoing detailed description of the embodiments of the present disclosure has been presented for purposes of illustration and description and is intended to be exemplary only and is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Meanwhile, a person skilled in the art should, according to the idea of the present disclosure, change or modify the embodiments and applications of the present disclosure. In view of the above, this description should not be taken as limiting the present disclosure.

Claims

1. A parameter server, comprising a processor and a communication interface;

2. The parameter server of claim 1, wherein the length of the data type corresponding to the second weight parameter is greater than the length of the data type corresponding to the third weight parameter.

3. The parameter server of claim 2, wherein the data type corresponding to the second weight parameter is a single-precision data type or a double-precision data type, and the data type corresponding to the third weight parameter is an integer data type.

4. The parameter server of claim 1, wherein the communication interface is further configured to send the third weight parameter to each client via a broadcast.

5. A client, wherein the client comprises a processor and a communication interface;

6. A system for processing weight parameters, the system comprising a parameter server according to any of claims 1 to 4 and a plurality of clients according to claim 5.

7. A method for processing weight parameters is applied to a parameter server, and comprises the following steps:

receiving gradient parameters sent by each client;

8. A method for processing weight parameters, which is applied to a client, and comprises the following steps:

receiving a third weight parameter sent by the parameter server;

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as claimed in claim 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as claimed in claim 8.