CN111695689B

CN111695689B - Natural language processing method, device, equipment and readable storage medium

Info

Publication number: CN111695689B
Application number: CN202010542341.7A
Authority: CN
Inventors: 赖志权; 杨越童; 李东升; 蔡蕾; 张立志; 冉浙江; 梅松竹; 王庆林
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2023-06-20
Anticipated expiration: 2040-06-15
Also published as: CN111695689A

Abstract

The invention discloses a natural language processing method, which comprises the following steps: receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the language processing model. By applying the technical scheme provided by the embodiment of the invention, the accuracy and the processing efficiency of the language processing model for natural language processing are improved. The invention also discloses a natural language processing device, equipment and a storage medium, which have corresponding technical effects.

Description

Natural language processing method, device, equipment and readable storage medium

Technical Field

The present invention relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for processing natural language.

Background

At present, the deep learning method has been widely used for solving the problem of natural language processing, along with the deep research of the problem of natural language processing, the adopted deep learning model is more and more complex, and training data on which the deep learning model depends is more and more huge, on the other hand, in natural language processing applications such as machine translation, dialogue generation and the like, along with the change of language environment and training data, incremental training is required to be frequently carried out on the model so as to ensure the accuracy and the effectiveness of natural language processing by the language processing model. Thus, in the case of using a computing device, it takes a long time for these deep learning algorithms to complete training the deep learning model, meeting the convergence of the need for high-speed iteration of the model for natural language processing applications. Therefore, it is desirable to train the language processing model on multiple computing devices in a distributed training manner, which may reduce the time spent training.

One of the key problems of distributed training is to efficiently complete synchronization of gradient or model parameters and reduce communication overhead, so that the utilization rate of computing equipment is improved and the training progress is accelerated. For the existing distributed training method, a synchronous gradient synchronization mode is mostly adopted to keep the consistency of model parameters on each node. In this manner, each iteration of the training round requires the transfer and synchronization of gradients to be completed for each distributed node. The distributed training method is widely used for distributed training of the image classification model, and good effects are achieved through experimental verification.

In the current research, the trained deep neural network is mainly used for solving the problem of computer vision, and a typical representative of the trained deep neural network is the deep neural network for solving the problem of image classification, and the characteristic of the class of networks is that all gradient data can be represented by a dense matrix. There have been few studies attempting to distributively train language models in natural language processing, which are characterized by sparse portions of gradient data, which are represented by sparse matrices commonly used in existing distributive deep learning frameworks. Gradient data represented by the dense matrix is transferred among computing nodes of the cluster, so that the gradient data can be more efficiently realized by the existing set communication library, and the gradient data represented by the sparse matrix is difficult to efficiently transmit in the cluster. Therefore, the sparse model in the fields of natural language processing and the like is subjected to distributed training, the time required for the deep neural network model to converge to the target progress is long, and in the natural language processing process, the accuracy of the natural language processing by the language processing model is low, and the processing efficiency is low.

In summary, how to effectively solve the problems of low accuracy and low processing efficiency of the existing language processing model in performing the natural language processing is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The invention aims to provide a natural language processing method which improves the accuracy and the processing efficiency of natural language processing by a language processing model; another object of the present invention is to provide a natural language processing apparatus, device, and computer-readable storage medium.

In order to solve the technical problems, the invention provides the following technical scheme:

a natural language processing method, comprising:

receiving natural language information to be processed;

inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;

and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.

In one embodiment of the present invention, the process of obtaining the target natural language processing model by performing model training through the model aggregation algorithm of the model average includes:

preprocessing an original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model;

grouping the model parameters according to the update frequencies to obtain model parameter groups;

determining a synchronization interval of each model parameter set respectively;

and in the model iteration process, carrying out synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.

In a specific embodiment of the present invention, preprocessing an original natural language processing model to obtain an update frequency of each model parameter in the original natural language processing model, including:

selecting a first preset number of data sets from the training data set;

sequentially inputting each data set into the original natural language processing model to perform the first preset number of model iteration operations on the original natural language processing model;

forward computing and back-propagating the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;

carrying out statistical operation on non-zero values in the gradient values in each model iteration by using an indication function to obtain the corresponding gradient updating effective times of each model parameter in each model iteration;

and respectively calculating the proportion of the effective times of the gradient update to the preset times to obtain the update frequency corresponding to each model parameter.

In one embodiment of the present invention, grouping each of the model parameters according to each of the update frequencies to obtain each of the model parameter sets includes:

inputting each model parameter into a preset sortable container;

sorting the model parameters by using the sorting container according to the update frequency corresponding to the model parameters respectively to obtain a sorting result;

and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.

In a specific embodiment of the present invention, determining the synchronization interval of each of the model parameter sets includes:

calculating an average update frequency of the update frequencies of the model parameters in each model parameter set for each model parameter set;

and calculating the synchronization interval of each model parameter group according to each average updating frequency.

In a specific embodiment of the present invention, in a model iteration process, performing a synchronization operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronization interval, including:

selecting a target training node from the training nodes;

initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result;

broadcasting the initialization result to other training nodes except the target training node by utilizing the target training node;

initializing the iteration times of the model;

in the model iteration process, the residual operation is carried out on the accumulated result of the iteration times of each model by utilizing each synchronous interval;

and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder.

In one embodiment of the present invention, performing a synchronization operation on each of the model parameters in a set of target model parameters corresponding to a synchronization interval with zero remainder includes:

respectively obtaining target model parameter sets in the training nodes;

respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain average model parameters;

setting each model parameter in the target model parameter set in each training node as each average model parameter.

A natural language processing apparatus, comprising:

the information receiving module is used for receiving natural language information to be processed;

the information input module is used for inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;

and the information processing module is used for carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.

A natural language processing device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the natural language processing method as described above when executing the computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of a natural language processing method as described above.

By applying the method provided by the embodiment of the invention, the natural language information to be processed is received; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model. The target natural language processing model is obtained through distributed training by using a model average model aggregation algorithm, the model average model aggregation algorithm is suitable for training the language processing model which is expressed by a sparse matrix commonly used in a distributed deep learning framework, the model training time is shortened greatly, and the accuracy and the processing efficiency of the natural language processing by the language processing model are improved.

Correspondingly, the embodiment of the invention also provides a natural language processing device, a device and a computer readable storage medium corresponding to the natural language processing method, which have the technical effects and are not repeated herein.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an implementation of a natural language processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of another implementation of a natural language processing method according to an embodiment of the present invention;

FIG. 3 is a block diagram illustrating a natural language processing device according to an embodiment of the present invention;

fig. 4 is a block diagram of a natural language processing device according to an embodiment of the present invention.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment one:

referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a natural language processing method according to an embodiment of the present invention, where the method may include the following steps:

s101: and receiving the natural language information to be processed.

When the natural language processing is needed, the natural language information to be processed is sent to the processing center. The processing center receives natural language information to be processed. The natural language information to be processed may be sentences to be translated, sentences of a dialog to be generated, and the like.

S102: inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average.

And carrying out distributed training on the original natural language processing model by using a model aggregation algorithm of model average in advance to obtain a target natural language processing model. After receiving the natural language information to be processed, the processing center inputs the natural language information to be processed into the target natural language processing model. When the natural language information to be processed is the sentence to be translated, carrying out distributed training on the original machine translation model by using a model aggregation algorithm of model average in advance to obtain a target machine translation model, and inputting the sentence to be translated into the target machine translation model; when the natural language information to be processed is a sentence of the dialogue to be generated, the original dialogue generation model is subjected to distributed training by using a model aggregation algorithm of model average in advance to obtain a target dialogue generation model, and the sentence of the dialogue to be generated is input into the target dialogue generation model.

S103: and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.

After the natural language information to be processed is input into the target natural language processing model, the language processing model is utilized to perform corresponding natural language understanding or natural language generating operation on the natural language information to be processed. Taking the example, when the natural language information to be processed is a sentence to be translated, performing translation operation on the sentence to be translated by using a target machine translation model; and when the natural language information to be processed is the sentence of the dialogue to be generated, performing dialogue generation operation on the sentence of the dialogue to be generated by using the target dialogue generation model. The model aggregation algorithm of model average is suitable for training the language processing model which is expressed by the sparse matrix commonly used in the distributed deep learning frame, greatly shortens the model training time, and improves the accuracy and the processing efficiency of the language processing model for natural language processing.

It should be noted that, the target natural language processing model is not limited to processing sentences to be translated and sentences of dialog to be generated, and can also be used for information extraction, text emotion analysis, personalized recommendation and the like, and the corresponding target natural language processing model is trained in advance according to different application scenes.

It should be noted that, based on the first embodiment, the embodiment of the present invention further provides a corresponding improvement scheme. The following embodiments relate to the same steps as those in the first embodiment or the steps corresponding to the first embodiment, and the corresponding beneficial effects can also be referred to each other, so that the following modified embodiments will not be repeated.

Embodiment two:

referring to fig. 2, fig. 2 is a flowchart of another implementation of a natural language processing method according to an embodiment of the present invention, where the method may include the following steps:

s201: and preprocessing the original natural language processing model to obtain the update frequency of each model parameter in the original natural language processing model.

In the model iteration process, an original natural language processing model is obtained, and the original natural language processing model is preprocessed to obtain the updating frequency of each model parameter in the original natural language processing model.

In one embodiment of the present invention, step S201 may include the steps of:

step one: selecting a first preset number of data sets from the training data set;

step two: sequentially inputting each data set into the original natural language processing model to perform a first preset number of model iteration operations on the original natural language processing model;

step three: forward computing and back-propagating the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;

step four: and carrying out statistical operation on non-zero values in gradient values in each model iteration by using an indication function to obtain the corresponding gradient updating effective times of each model parameter in each model iteration.

Step five: and respectively calculating the proportion of the effective times of each gradient update to the preset times to obtain the update frequency corresponding to each model parameter.

For convenience of description, the above five steps are described in combination.

Selecting a first preset number of data sets in a pre-acquired training data set, for example, selecting m data sets, sequentially inputting each data set into the original natural language processing model, wherein each data set corresponds to one iteration of the original natural language processing model, and accordingly performing a first preset number of model iteration operations on the original natural language processing model. Forward computing and back-propagating the original natural language processing model aiming at each data group to obtain gradient values of each model parameter in the original natural language processing model

Gradient values in each model iteration using the indicator function I>

Statistical operations are performed on non-zero values in (a) to obtain a set of values in each model iterationAnd obtaining the effective times of the gradient update corresponding to each model parameter in each model iteration respectively, wherein the gradient update corresponding to each model parameter is effective or not. The definition of the indicator function I is as follows:

when (when)

When it is stated that the corresponding gradient update is valid,/->

Representing the gradient values of the ith model parameter after the t-th iteration.

And respectively calculating the proportion of the effective times of each gradient update to the preset times to obtain the update frequency corresponding to each model parameter. The ratio of the effective times of each gradient update to the preset times can be calculated by the following formula to obtain the update frequency alpha corresponding to each model parameter _i ：

S202: and grouping the model parameters according to the updating frequencies to obtain the model parameter groups.

Because the language processing model is represented by a sparse matrix in the distributed deep learning framework, after the update frequency of each model parameter in the original natural language processing model is obtained, grouping operation is carried out on each model parameter according to each update frequency, so as to obtain each model parameter group.

In one embodiment of the present invention, step S202 may include the steps of:

step one: inputting each model parameter into a preset sortable container;

step two: sequencing the model parameters by utilizing a sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result;

step three: and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.

For convenience of description, the above three steps are described in combination.

Pre-deploying the sequencable container, inputting each model parameter into the pre-arranged sequencable container, and obtaining the sparseness degree of each model parameter, namely alpha _i And sequencing the model parameters by utilizing a sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result. And dividing each model parameter into a second preset number of model parameter groups according to the sorting result. The second preset number is represented by a preset super parameter q, and the i-th group model parameters with similar sparsity degree are represented by p _i The set of model parameters can be represented as:

P＝{p ₁ ,p ₂ ,…,p _q }；

s203: the synchronization interval for each set of model parameters is determined separately.

After each model parameter set is obtained, the synchronization interval of each model parameter set is determined separately.

In one embodiment of the present invention, step S203 may include the steps of:

step one: calculating an average update frequency of the update frequencies of the model parameters in the model parameter sets for each model parameter set;

step two: and calculating the synchronization interval of each model parameter group according to each average updating frequency.

For convenience of description, the above two steps are described in combination.

After the update frequency of each model parameter is obtained and the model parameters are grouped to obtain each model parameter set, the average update frequency of the update frequency of each model parameter in the model parameter set is calculated for each model parameter set

And calculating the synchronization interval of each model parameter group according to each average updating frequency. The average update frequency +.Can be calculated specifically according to the following formula>

Where || represents 1-norm calculation.

After determining the synchronization interval of each model parameter set, the synchronization interval k of each model parameter set is calculated based on the average update frequency _i The respective synchronization interval k for each set of model parameters can be calculated by the following formula _i ：

Wherein k is _i The synchronization interval corresponding to the i-th group of model parameter sets is represented by K, which is represented by a synchronization interval set formed by the synchronization intervals corresponding to the respective groups of model parameter sets:

and lambda is a synchronization interval setting coefficient, so that an invention operator can dynamically adjust the synchronization interval of different sparse depth neural network models based on priori knowledge.

S204: and in the model iteration process, performing synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.

And in the model iteration process, performing synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model.

In one embodiment of the present invention, step S204 may include the steps of:

step one: selecting a target training node from the training nodes;

step two: initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result;

step three: broadcasting an initialization result to other training nodes except the target training node by using the target training node;

step four: initializing the iteration times of the model;

step five: in the model iteration process, the remainder operation is carried out on the accumulated results of the iteration times of each model by utilizing each synchronous interval;

step six: and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder to obtain the target natural language processing model.

For convenience of description, the above six steps are described in combination.

And selecting target training nodes from the training nodes, and initializing model parameters in the original natural language processing model in each target training node to obtain an initialization result.

The target training node may be any training node in each training node, for example, each training node may be numbered in advance, the numbers are respectively 0 and 1 and … …, and the training node with the number 0 is selected as the target training node. Initializing each model parameter in the original natural language processing model in each target training node to obtain an initialization result. Broadcasting the initialization result to other training nodes except the target training node by using the target training node. Initializing the iteration times of the model, for example, initializing the iteration times to 0 before carrying out iterative training on an original natural language processing model, and respectively carrying out remainder taking operation on the accumulated result of the iteration times t of each model by utilizing each synchronous interval in the model iteration process, wherein the remainder taking operation can be calculated by the following formula:

t modk _i ；

and carrying out synchronous operation on each model parameter in the target model parameter set corresponding to the synchronous interval with zero remainder to obtain the target natural language processing model.

In a specific embodiment of the present invention, the synchronization operation for each model parameter in the target model parameter set corresponding to the synchronization interval with zero remainder may include the following steps:

step one: respectively acquiring target model parameter sets in all training nodes;

step two: respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain each average model parameter;

step three: setting each model parameter in the target model parameter set in each training node as each average model parameter to obtain a target natural language processing model.

Respectively obtaining target model parameter sets in all training nodes, respectively carrying out mean value calculation on corresponding model parameters in the target model parameter sets to obtain average model parameters, setting the model parameters in the target model parameter sets in all training nodes as the average model parameters, and obtaining the target natural language processing model.

The step S201 to the step S203 complete the solution of the key parameters P and K, and the step S201 to the step S203 may be obtained by performing offline computation of a single machine using only one training node. Since the model is not trained in steps S201 to S203, modification of model parameters is not required, and only the corresponding gradient values are calculated according to different batch training data, the time consumption is much shorter than that of a common single machine training. A single server with Intel to strong CPU and 4 pieces of Injeida RTX 2080Ti GPU can be selected, and one piece of GPU in the server is used for completing the steps S201 to S203 aiming at the LM1b model, so that the time consumption is short. Empirically, the time consumed by steps S201 through S203 is negligible relative to the overall training time.

Compared with the existing distributed training mode of the language processing model, the method of the invention has the advantages that the distributed training is carried out on the language processing model on a single server and two servers, and the training time is greatly shortened.

S205: and receiving the natural language information to be processed.

S206: inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average.

S207: and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.

Corresponding to the above method embodiment, the embodiment of the present invention further provides a natural language processing device, where the natural language processing device described below and the natural language processing method described above may be referred to correspondingly.

Referring to fig. 3, fig. 3 is a block diagram illustrating a natural language processing device according to an embodiment of the present invention, where the device may include:

an information receiving module 31 for receiving natural language information to be processed;

an information input module 32 for inputting the natural language information to be processed into the target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average;

the information processing module 33 is configured to perform corresponding natural language understanding or natural language generating operation on the to-be-processed natural language information by using the target natural language processing model.

The device provided by the embodiment of the invention is applied to receive the natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the language processing model. The target natural language processing model is obtained through distributed training by using a model average model aggregation algorithm, the model average model aggregation algorithm is suitable for training the language processing model which is expressed by a sparse matrix commonly used in a distributed deep learning framework, the model training time is shortened greatly, and the accuracy and the processing efficiency of the natural language processing by the language processing model are improved.

In one embodiment of the present invention, the apparatus includes a model training module comprising:

the parameter updating frequency obtaining sub-module is used for preprocessing the original natural language processing model to obtain the updating frequency of each model parameter in the original natural language processing model;

the parameter set obtaining submodule is used for grouping the model parameters according to the updating frequencies to obtain the model parameter sets;

the synchronization interval determining submodule is used for respectively determining the synchronization interval of each model parameter set;

and the parameter synchronization sub-module is used for carrying out synchronous operation on each model parameter in the corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval in the model iteration process to obtain the target natural language processing model.

In a specific embodiment of the present invention, the parameter update frequency obtaining submodule includes:

a data set selecting unit for selecting a first preset number of data sets from the training data set;

the model iteration unit is used for sequentially inputting each data set into the original natural language processing model so as to perform a first preset number of model iteration operations on the original natural language processing model;

the gradient value obtaining unit is used for carrying out forward calculation and backward propagation on the original natural language processing model aiming at each data group to obtain a gradient value of each model parameter in the original natural language processing model;

the effective times obtaining unit is used for carrying out statistics operation on non-zero values in gradient values in each model iteration by using an indication function to obtain gradient updating effective times corresponding to each model parameter in each model iteration respectively;

the updating frequency obtaining unit is used for respectively calculating the proportion of the effective times of each gradient updating to the preset times to obtain the updating frequency corresponding to each model parameter.

In a specific embodiment of the present invention, the parameter set obtaining submodule includes:

a parameter input unit for inputting each model parameter into a preset sortable container;

the sequencing result obtaining unit is used for sequencing the model parameters by utilizing the sequencing container according to the update frequency corresponding to the model parameters respectively to obtain a sequencing result;

and the parameter set dividing unit is used for dividing each model parameter into a second preset number of model parameter sets according to the sorting result.

In one embodiment of the present invention, the synchronization interval determination submodule includes:

an average update frequency calculation unit configured to calculate, for each model parameter group, an average update frequency of update frequencies of model parameters in the model parameter group;

and the synchronization interval calculation unit is used for calculating the synchronization interval of each model parameter set according to each average update frequency.

In one embodiment of the present invention, the parameter synchronization sub-module includes:

the node selection unit is used for selecting a target training node from all the training nodes;

the parameter initializing unit is used for initializing each model parameter in the original natural language processing model in each target training node to obtain an initializing result;

the broadcasting unit is used for broadcasting the initialization result to other training nodes except the target training node by using the target training node;

the iteration number initializing unit is used for initializing the iteration number of the model;

the remainder taking unit is used for taking remainder of the accumulated results of the iteration times of each model by utilizing each synchronous interval in the model iteration process;

and the parameter synchronization unit is used for performing synchronization operation on each model parameter in the target model parameter set corresponding to the synchronization interval with zero remainder.

In a specific embodiment of the present invention, the parameter synchronization unit includes:

the parameter set acquisition subunit is used for respectively acquiring the target model parameter sets in the training nodes;

the parameter average value obtaining subunit is used for respectively carrying out average value calculation on corresponding model parameters in each target model parameter group to obtain each average model parameter;

and the parameter setting subunit is used for setting each model parameter in the target model parameter set in each training node as each average model parameter.

Corresponding to the above method embodiment, referring to fig. 4, fig. 4 is a schematic diagram of a natural language processing device provided by the present invention, where the device may include:

a memory 41 for storing a computer program;

the processor 42 is configured to execute the computer program stored in the memory 41, and implement the following steps:

receiving natural language information to be processed; inputting the natural language information to be processed into a target natural language processing model; the target natural language processing model is obtained by performing distributed training through a model aggregation algorithm of model average; and carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model.

For the description of the apparatus provided by the present invention, please refer to the above method embodiment, and the description of the present invention is omitted herein.

Corresponding to the above method embodiments, the present invention also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:

The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

For the description of the computer-readable storage medium provided by the present invention, refer to the above method embodiments, and the disclosure is not repeated here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. The apparatus, device and computer readable storage medium of the embodiments are described more simply because they correspond to the methods of the embodiments, and the description thereof will be given with reference to the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, but the description of the examples above is only for aiding in understanding the technical solution of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims

1. A method of natural language processing, comprising:

receiving natural language information to be processed;

performing corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model;

the process of obtaining the target natural language processing model through model training by the model aggregation algorithm of the model average comprises the following steps:

in the model iteration process, performing synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval to obtain the target natural language processing model;

and grouping the model parameters according to the update frequencies to obtain model parameter sets, wherein the grouping comprises the following steps:

inputting each model parameter into a preset sortable container;

2. The method according to claim 1, wherein preprocessing an original natural language processing model to obtain an update frequency of each model parameter in the original natural language processing model, comprises:

selecting a first preset number of data sets from the training data set;

3. The natural language processing method of claim 1, wherein determining the synchronization interval of each of the model parameter sets, respectively, comprises:

4. A natural language processing method according to any one of claims 1 to 3, wherein in a model iteration process, performing a synchronization operation on each of the model parameters in a corresponding model parameter set of the original natural language processing model in each training node according to each of the synchronization intervals, includes:

selecting a target training node from the training nodes;

initializing the iteration times of the model;

5. The method of claim 4, wherein synchronizing each of the model parameters in the set of target model parameters corresponding to a synchronization interval with zero remainder comprises:

respectively obtaining target model parameter sets in the training nodes;

6. A natural language processing apparatus, comprising:

the information processing module is used for carrying out corresponding natural language understanding or natural language generating operation on the natural language information to be processed by utilizing the target natural language processing model;

wherein the apparatus comprises a model training module comprising:

the parameter synchronization sub-module is used for carrying out synchronous operation on each model parameter in a corresponding model parameter set of the original natural language processing model in each training node according to each synchronous interval in the model iteration process to obtain a target natural language processing model;

the process of grouping the model parameters by the parameter set obtaining submodule according to the update frequencies to obtain the model parameter sets includes: inputting each model parameter into a preset sortable container; sorting the model parameters by using the sorting container according to the update frequency corresponding to the model parameters respectively to obtain a sorting result; and dividing each model parameter into a second preset number of model parameter groups according to the sorting result.

7. A natural language processing device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the natural language processing method of any one of claims 1 to 5 when executing the computer program.

8. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the natural language processing method according to any one of claims 1 to 5.