CN111767989A

CN111767989A - Neural network training method and device

Info

Publication number: CN111767989A
Application number: CN202010601427.2A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-13

Abstract

The application discloses a training method and device of a neural network, relates to the technical field of artificial intelligence and deep learning, and can be used for image processing. The specific implementation mode comprises the following steps: acquiring a neural network to be trained, and starting training of the neural network, wherein the training of the neural network is executed in a distributed manner on different subjects and comprises at least two iterations; in response to reaching any preset iteration turn in training, performing a turn of learning rate attenuation process at least two attenuation speeds on different subjects, and performing iteration at a learning rate obtained by attenuation; in response to completing the at least two iterations of the neural network, obtaining a trained neural network based on the neural network after the at least two iterations. Different attenuation speeds can be adopted, asynchronous attenuation of the learning rate is achieved, and therefore convergence of the neural network in the training process is promoted. And moreover, a distributed training mode is adopted, so that the training speed is improved.

Description

Neural network training method and device

Technical Field

The embodiment of the application relates to the technical field of computers, in particular to the technical field of artificial intelligence and deep learning, and particularly relates to a training method and device of a neural network.

Background

The Learning rate (Learning rate) is an important parameter of the neural network in the deep Learning process, and determines the convergence condition of the neural network. Specifically, the larger the learning rate, the faster the convergence speed of the neural network. If the learning rate is too large, the gradient of the learning rate may oscillate back and forth, which may result in failure to achieve convergence.

In order to overcome the above problems, the related art may gradually decrease the learning rate in the training process in a manner of decreasing the learning rate, thereby facilitating convergence of the neural network.

Disclosure of Invention

A training method and apparatus for a neural network, an electronic device, and a storage medium are provided.

According to a first aspect, there is provided a training method of a neural network, comprising: acquiring a neural network to be trained, and starting the training of the neural network, wherein the training of the neural network is executed in a distributed manner in different subjects and comprises at least two iterations; in response to reaching any preset iteration round in training, carrying out a round of attenuation process of the learning rate at least two attenuation speeds on different subjects, and carrying out iteration on the learning rate obtained by attenuation; and responding to the completion of at least two iterations of the neural network, and obtaining the trained neural network based on the neural network after the at least two iterations.

According to a second aspect, there is provided a training apparatus for a neural network, comprising: the network acquisition unit is configured to acquire a neural network to be trained and start the training of the neural network, wherein the training of the neural network is executed in a distributed manner on different subjects and comprises at least two iterations; an iteration unit configured to perform a round of learning rate attenuation process at least two attenuation speeds in response to reaching any preset iteration round in training, and perform iteration at an attenuated learning rate; a determining unit configured to obtain a trained neural network based on the neural network after at least two iterations in response to completing the at least two iterations of the neural network.

According to a third aspect, there is provided an electronic device comprising: one or more processors; a storage device to store one or more programs that, when executed by one or more processors, cause the one or more processors to implement a method as in any embodiment of a training method for a neural network.

According to a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of the embodiments as a training method for a neural network.

According to the scheme of the application, different attenuation speeds can be adopted, and asynchronous attenuation of the learning rate is achieved, so that convergence of a neural network in the training process is promoted, and the convergence speed is increased. And moreover, a distributed training mode is adopted, so that the training speed is further improved. In some cases, the convergence rate increasing scheme is used for image processing, so that the processing speed of the image can be increased. Furthermore, the method is beneficial to reducing the time delay of the execution equipment and reducing the occupation and consumption of computing resources such as a memory and the like.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram to which some embodiments of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of training a neural network according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a training method of a neural network according to the present application;

FIG. 4 is a flow diagram of yet another embodiment of a method of training a neural network according to the present application;

FIG. 5 is a schematic diagram of an embodiment of a training apparatus for a neural network according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a training method of a neural network according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the neural network training method or the neural network training apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as video applications, live applications, instant messaging tools, mailbox clients, social platform software, and the like, may be installed on the

terminal devices

101, 102, and 103.

Here, the

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server providing support for the

terminal devices

101, 102, 103. The background server may analyze and perform other processing on the received data such as the neural network to be trained, and feed back a processing result (e.g., the trained neural network) to the terminal device.

It should be noted that the training method of the neural network provided in the embodiment of the present application may be executed by the server 105 or the

terminal devices

101, 102, and 103, and accordingly, the training apparatus of the neural network may be disposed in the server 105 or the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method of training a neural network in accordance with the present application is shown. The training method of the neural network comprises the following steps:

step 201, obtaining a neural network to be trained, and starting training of the neural network, wherein the training of the neural network is executed in a distributed manner in different subjects, and includes at least two iterations.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) on which the training method of the neural network operates may acquire the neural network to be trained and initiate training of the neural network. The neural network can be various neural networks for detecting images, such as deep neural networks for detecting images, such as convolutional neural networks, residual neural networks, and the like. The training process may be performed by a distributed entity, where an entity refers to an entity, such as an electronic device, that may participate in performing distributed training of the neural network. Any subject that completes training of the neural network requires at least two iterations.

Step 202, in response to reaching any preset iteration round in training, performing a round of learning rate attenuation process at least at two attenuation speeds on different subjects, and performing iteration at the learning rate obtained by attenuation.

In this embodiment, the executing entity may perform a round of learning rate attenuation process at least two attenuation speeds in response to reaching a preset round of iteration in the at least two rounds of iteration in training. In this way, each subject may iterate through the neural network at each learning rate obtained during the subject's round of learning rate decay. That is, any one subject may perform one or more attenuations per attenuation pass, and accordingly, one or more learning rates may be obtained.

Step 203, in response to the completion of at least two iterations of the neural network, obtaining a trained neural network based on the neural network after the at least two iterations.

In this embodiment, the executing entity may obtain the trained neural network based on the neural network after the at least two iterations when the at least two iterations of the neural network are completed. In practice, the execution subject may obtain the trained neural network based on at least two iterations in various ways. For example, the executing entity may directly use the neural network after at least two iterations as the trained neural network.

The method provided by the embodiment of the application can adopt different attenuation speeds, and realizes asynchronous attenuation of the learning rate, so that convergence of the neural network in the training process is promoted. And moreover, a distributed training mode is adopted, so that the training speed is improved.

In some optional implementation manners of this embodiment, the computation time of the neural network to be trained is less than a preset time; the iteration with the learning rate obtained by the attenuation in step 202 may include: and carrying out backward propagation on the result of the backward propagation by using the learning rate obtained by attenuation, wherein the attenuation of the learning rate exists at least once in each round of attenuation process of each main body.

In these optional implementation manners, the neural network to be trained may be a neural network whose operation time is less than a preset time duration, that is, a neural network with a faster operation speed (for example, an operation speed is greater than a preset speed threshold). During a round of decay, the subject may derive one or more learning rates. That is, the number of times of decay of the learning rate may be different for different subjects in each round of decay. For example, in a certain round of attenuation process, the number 1 of the subject performs three attenuations at the first time point, the second time point and the third time point, and the number 2 of the subject performs one attenuation at the second time point. In both subject No. 1 and subject No. 2, the learning rate decays to a value a during the decay of this round.

The execution subject may perform back propagation using these learning rates, resulting in back propagation. Then, the executing body can also forward propagate the result of the backward propagation, thereby completing a round of iteration.

The implementation modes can select the neural network with high operation speed for training, so that the operation speed of the trained neural network is ensured to be higher to a certain extent. Meanwhile, the neural network training method and the neural network training device train in a back propagation mode and a forward propagation mode, the precision of the trained neural network can be improved, and the trained neural network with high efficiency and high precision is obtained.

In some optional implementations of this embodiment, the method may further include: for each iteration, summarizing the gradient of the attenuation of the learning rate of different subjects; based on the gradient, parameters of the neural network to be trained are updated.

In these alternative implementations, the execution agent may determine, for each agent, an attenuation gradient for each attenuation of the agent in each pass of the attenuation process. And the executing body can update the parameters in the neural network to be trained based on the summarized gradient and learning rate. Specifically, the executing agent may determine and update the parameters of the neural network corresponding to each agent by using the learning rate and the summarized gradient adopted in the training process for each agent. The parameters to be updated in the neural network by different agents using back propagation may be different or different, thereby achieving distributed training. It should be noted that, determining the parameters of the neural network by using the gradient and the learning rate is a prior art means in the field, and is not described herein again.

Different main bodies can be adopted for the implementation modes to participate in distributed training, and attenuation gradients corresponding to the main bodies are gathered, so that efficient training and parameter updating are achieved.

In some optional implementations of the embodiment, the different body includes a plurality of processors in at least two electronic devices.

In these optional implementations, the neural network may be trained in a multi-machine multi-card manner, that is, the neural network is trained by using at least two electronic devices, and at least two processors participating in training are provided in the electronic device existing in the at least two electronic devices.

The realization modes can adopt a mode of multiple machines and multiple cards for training, thereby further improving the training efficiency.

In some optional implementations of this embodiment, after initiating training of the neural network in step 201, the method may further include: and in response to reaching other iteration turns except the preset iteration turn in at least two turns, iterating at the same learning rate in different subjects.

In these alternative implementations, the executing entity may perform asynchronous attenuation of the learning rate only in each preset iteration round, that is, perform iteration at different learning rates in different entities, and perform iteration at the same learning rate in different entities in other iteration rounds.

In the implementation modes, iteration can be continued by using the learning rate obtained by attenuation in iteration rounds except the preset iteration round, so that the learning rate obtained by attenuation is adopted in multiple iterations to improve the longitude of the trained neural network.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the training method of the neural network according to the present embodiment. In the application scenario of fig. 3, an executing agent 301 obtains a neural network 302 to be trained, and initiates training of the neural network 302, wherein the training of the neural network is performed in a distributed manner in different agents and includes at least two iterations. The execution subject 301 performs a round of learning rate attenuation process at least two attenuation speeds in different subjects in response to reaching any preset iteration round in training, and performs iteration at the attenuated learning rate 303. The execution subject 301, in response to completing at least two iterations of the neural network, obtains a trained neural network 304 based on the neural network after the at least two iterations.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method of training a neural network is shown. The process 400 includes the following steps:

step 401, obtaining a neural network to be trained, and starting training of the neural network, wherein the training of the neural network is performed in a distributed manner in different subjects, and includes at least two iterations.

In this embodiment, an execution subject (for example, a server or a terminal device shown in fig. 1) on which the training method of the neural network operates may acquire the neural network to be trained and initiate training of the neural network. The neural network can be various neural networks for detecting images, such as deep neural networks for detecting images, such as convolutional neural networks, residual neural networks, and the like. The training process may be performed by a distributed entity, where an entity refers to an entity, such as an electronic device, that may participate in performing distributed training of the neural network. The execution subject needs to perform at least two iterations to complete the training of the neural network.

Step 402, obtaining target learning rate data set for a decay process of the learning rate, wherein each round of decay process corresponds to at least one round of iteration.

In the present embodiment, the execution subject described above may acquire target learning rate data. The learning rate decay for each round may be performed in one round of iteration, or may be performed in two or more consecutive rounds of iteration.

Specifically, the target learning rate data may be the learning rate itself to be achieved in a round of decay, or may be data that can be converted into the learning rate to be achieved, such as a ratio with respect to a reference learning rate (i.e., a baseline learning rate).

And step 403, in response to any preset iteration round in the training, performing an attenuation process of the learning rate by using the target learning rate data of one round of attenuation process corresponding to any preset iteration round as an attenuation termination value in different subjects, and performing iteration by using the learning rate obtained through attenuation.

In this embodiment, the executing entity may perform a round of decay process of the learning rate at least two decay rates in response to reaching a preset round of iteration in the at least two rounds of iteration in the training, and use the target learning rate data as a termination value of each round of decay. In this way, each subject may iterate through the neural network at each learning rate obtained during the subject's round of learning rate decay. That is, any one subject may have one or more decays per decay run, so that one or more learning rates may be obtained. The last decay in the round results in a learning rate, i.e., a learning rate corresponding to the target learning rate data.

In response to completing at least two iterations of the neural network, a trained neural network is obtained based on the neural network after the at least two iterations, step 404.

The embodiment can set a target value for each attenuation round, so that the same learning rate is finally achieved by each main body in each attenuation round while asynchronous attenuation is achieved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of a training apparatus for a neural network, the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the embodiment of the apparatus may further include the same or corresponding features or effects as the embodiment of the method shown in fig. 2, except for the features described below. The device can be applied to various electronic equipment.

As shown in fig. 5, the training apparatus 500 of the neural network of the present embodiment includes: a network acquisition unit 501, an iteration unit 502 and a determination unit 503. The network acquiring unit 501 is configured to acquire a neural network to be trained, and start training of the neural network, where the training of the neural network is performed in a distributed manner in different subjects and includes at least two iterations; an iteration unit 502 configured to perform a round of a decay process of a learning rate at least two decay speeds in response to reaching any preset iteration round in training, and perform iteration at a decay-derived learning rate in different subjects; a determining unit 503 configured to, in response to completing at least two iterations of the neural network, obtain a trained neural network based on the neural network after the at least two iterations.

In this embodiment, specific processes of the network obtaining unit 501, the iteration unit 502, and the determining unit 503 of the training apparatus 500 for neural network and technical effects thereof may refer to related descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, which are not described herein again.

In some optional implementations of this embodiment, the apparatus further includes: a data acquisition unit configured to acquire target learning rate data set for a decay process of a learning rate, wherein each round of the decay process corresponds to at least one round of iteration; and an iteration unit, further configured to execute a decay process of performing a round of learning rates at least two decay rates as follows: and taking the target learning rate data of one round of attenuation process corresponding to any preset iteration round as an attenuation termination value to perform the attenuation process of the learning rate.

In some optional implementation manners of this embodiment, the computation time of the neural network to be trained is less than a preset time; an iteration unit further configured to perform an iteration with the attenuated learning rate as follows: and carrying out backward propagation on the result of the backward propagation by using the learning rate obtained by attenuation, wherein the attenuation of the learning rate exists at least once in each round of attenuation process of each main body.

In some optional implementations of this embodiment, the apparatus further includes: a summarization unit configured to summarize gradients of learning rate attenuations of different subjects for each iteration round; an updating unit configured to update a parameter of the neural network to be trained based on the gradient.

In some optional implementations of this embodiment, the apparatus further includes: an execution unit configured to, after starting training of the neural network, in response to reaching other iteration rounds than a preset iteration round among the at least two rounds, iterate at a same learning rate in different subjects.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 6 is a block diagram of an electronic device according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the neural network training method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of a neural network provided by the present application.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the training method of the neural network in the embodiment of the present application (for example, the network acquisition unit 501, the iteration unit 502, and the determination unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, that is, implements the training method of the neural network in the above method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of a training electronic device of a neural network, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 may optionally include memory located remotely from the processor 601, which may be connected to the training electronics of the neural network via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the training method of the neural network may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the training electronics of the neural network, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an iteration unit, and a determination unit. The names of these units do not in some cases form a limitation on the unit itself, and for example, the network acquisition unit may also be described as a "unit that acquires a neural network to be trained, and initiates training of the neural network".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a neural network to be trained, and starting the training of the neural network, wherein the training of the neural network is executed in a distributed manner in different subjects and comprises at least two iterations; in response to reaching any preset iteration round in training, carrying out a round of attenuation process of the learning rate at least two attenuation speeds on different subjects, and carrying out iteration on the learning rate obtained by attenuation; and responding to the completion of at least two iterations of the neural network, and obtaining the trained neural network based on the neural network after the at least two iterations.

The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A method of training a neural network, the method comprising:

acquiring a neural network to be trained, and starting training of the neural network, wherein the training of the neural network is executed in a distributed manner on different subjects and comprises at least two iterations;

in response to reaching any preset iteration turn in training, performing a turn of learning rate attenuation process at least two attenuation speeds on different subjects, and performing iteration at a learning rate obtained by attenuation;

in response to completing the at least two iterations of the neural network, obtaining a trained neural network based on the neural network after the at least two iterations.

2. The method of claim 1, wherein the method further comprises:

acquiring target learning rate data set for the attenuation process of the learning rate, wherein each round of attenuation process corresponds to at least one round of iteration; and

the decay process of a round of learning rate at least two decay rates comprises the following steps:

and performing the attenuation process of the learning rate by taking the target learning rate data of the attenuation process of one round corresponding to any preset iteration round as an attenuation termination value.

3. The method according to claim 1, wherein the operation time of the neural network to be trained is less than a preset time;

the iterating with the attenuated learning rate includes:

and carrying out backward propagation on the result of the backward propagation by using the learning rate obtained by attenuation, wherein the attenuation of the learning rate exists at least once in each round of attenuation process of each main body.

4. The method of claim 1, wherein the method further comprises:

for each iteration, summarizing gradients of learning rate attenuation of the different subjects;

updating parameters of the neural network to be trained based on the gradient.

5. The method of any of claims 1-4, wherein the different subject includes multiple processors in at least two electronic devices.

6. The method of one of claims 1-4, wherein after the initiating training of the neural network, the method further comprises:

and in response to reaching other iteration rounds except the preset iteration round in the at least two rounds, performing iteration at the same learning rate on the different subjects.

7. An apparatus for training a neural network, the apparatus comprising:

the network acquisition unit is configured to acquire a neural network to be trained and start the training of the neural network, wherein the training of the neural network is performed in a distributed manner on different subjects and comprises at least two iterations;

an iteration unit configured to perform a round of learning rate attenuation process at least two attenuation speeds in response to reaching any preset iteration round in training, and perform iteration at an attenuated learning rate;

a determining unit configured to obtain a trained neural network based on the neural network after the at least two iterations in response to completing the at least two iterations of the neural network.

8. The apparatus of claim 7, wherein the apparatus further comprises:

a data acquisition unit configured to acquire target learning rate data set for a decay process of the learning rate, wherein each round of decay process corresponds to at least one round of iteration; and

the iteration unit is further configured to execute the attenuation process of performing a round of learning rate at least two attenuation speeds as follows:

9. The device of claim 7, wherein the operation time of the neural network to be trained is less than a preset time;

the iteration unit is further configured to perform the iteration with the attenuated learning rate as follows:

10. The apparatus of claim 7, wherein the apparatus further comprises:

a summarization unit configured to summarize, for each iteration, gradients of learning rate attenuations of the different subjects;

an updating unit configured to update a parameter of the neural network to be trained based on the gradient.

11. The apparatus of one of claims 7-10, wherein the different body comprises a plurality of processors in at least two electronic devices.

12. The apparatus according to one of claims 7-10, wherein the apparatus further comprises:

an execution unit configured to, after the initiation of training of the neural network, iterate at the different subjects at a same learning rate in response to reaching other iteration rounds of the at least two rounds other than the preset iteration round.

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.

14. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1-6.