CN113947095B

CN113947095B - Multilingual text translation method, multilingual text translation device, computer equipment and storage medium

Info

Publication number: CN113947095B
Application number: CN202111255219.2A
Authority: CN
Inventors: 吴天博; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-10-27
Filing date: 2021-10-27
Publication date: 2024-07-02
Anticipated expiration: 2041-10-27
Also published as: CN113947095A

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a multilingual text translation method, which comprises the steps of obtaining a text to be translated and a preset translation model, and encoding the text to be translated according to the preset translation model to obtain a first encoding vector; inputting a first coding vector to a first attention layer to obtain a second coding vector; inputting a second coding vector to the feedforward layer to obtain a target coding vector; calculating the candidate translation text according to the target coding layer to obtain a candidate coding vector; inputting the candidate coding vector and the target coding vector to a second attention layer to obtain a semantic coding vector; and calculating the semantic coding vector to obtain a prediction probability, and selecting a candidate translation text with the maximum prediction probability as a target translation text. The application also provides a multilingual text translation device, a computer device and a storage medium. In addition, the application also relates to a blockchain technology, and target translation text can be stored in the blockchain. The application realizes intelligent translation of multilingual text.

Description

Multilingual text translation method, multilingual text translation device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a multilingual text translation method, apparatus, computer device, and storage medium.

Background

With the high-speed development of information technology, artificial intelligence is becoming more popular, and intelligent translation is performed on texts through the artificial intelligence, so that the information processing efficiency can be greatly improved.

Currently, neural machine translation has become the most advanced machine translation model, but if the inter-translation model is trained only by conventional methods, when we need to make translations in multiple languages, multiple single language inter-translation models need to be trained. Therefore, great manpower, material resources and financial resources are consumed in training and deploying the model. The machine translation model based on the neural network mainly comprises a single language inter-translation model, a multi-to-one language translation model and a one-to-multi language translation model, wherein the multi-to-multi language translation model often has the problem that the translation accuracy and the translation efficiency cannot be simultaneously considered.

Disclosure of Invention

The embodiment of the application aims to provide a multilingual text translation method, a device, computer equipment and a storage medium, which are used for solving the technical problem that the current multilingual translation cannot ensure the translation accuracy and efficiency at the same time.

In order to solve the technical problems, the embodiment of the application provides a multilingual text translation method, which adopts the following technical scheme:

Acquiring a text to be translated and a trained preset translation model, inputting the text to be translated into the preset translation model, and coding the text to be translated according to a source coding layer of an encoder in the preset translation model to obtain a first coding vector, wherein the preset translation model comprises the encoder and a decoder, the encoder comprises a source coding layer, a first attention layer and a feedforward layer, and the decoder comprises a target coding layer, a second attention layer, a linear network layer and an activation layer;

inputting the first coding vector to a first attention layer in the encoder, and calculating to obtain a second coding vector;

Inputting the second coding vector to a feedforward layer in the encoder, and calculating to obtain a target coding vector;

obtaining a candidate translation text corresponding to the text to be translated, and calculating the candidate translation text according to a target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text;

inputting the candidate coding vector and the target coding vector to a second attention layer of the decoder, and calculating to obtain a semantic coding vector;

And calculating the semantic coding vector based on a linear network layer and an activation layer of the decoder to obtain the prediction probability of the candidate translation text, and selecting the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated.

Further, before the step of obtaining the text to be translated and the trained preset translation model, the method further includes:

Acquiring source corpus data and target translation data corresponding to the source corpus data, and determining training data according to the source corpus data and the target translation data, wherein the source corpus data and the target translation data comprise data of different languages;

Constructing a basic translation model, inputting the training data into the basic translation model according to the sequence of training batches, and calculating to obtain the current loss value of the basic translation model of the current training batch;

And carrying out iterative training on the basic translation model according to the current loss value to obtain an iteratively trained basic translation model, and determining the iteratively trained basic translation model as the preset translation model when the loss function calculated by the iteratively trained basic translation model converges.

Further, the step of determining training data according to the source corpus data and the target translation data includes:

acquiring language categories respectively corresponding to the source corpus data and the target translation data, and performing tag identification on the source corpus data according to the language categories to obtain tag corpus data;

and word segmentation is carried out on the label corpus data and the target translation data, so that training data of the basic translation model are obtained.

Further, the step of inputting the training data to the basic translation model according to the sequence of training batches, and calculating to obtain the current loss value of the basic translation model of the current training batch includes:

calculating language loss values of different languages of the training data in the current training batch according to the basic translation model;

And acquiring a historical loss value of a previous training batch of the current training batch, and calculating to obtain a current loss value of the current training batch according to the historical loss value and the language loss value.

Further, the step of calculating the current loss value of the current training batch according to the historical loss value and the language loss value includes:

acquiring the number of different languages in the current training batch and the weight coefficient of the basic translation model in the current training batch;

And calculating a loss average value of the basic translation model according to the number, the weight coefficient and the language loss value, and calculating the current loss value based on the loss average value and the historical loss value.

Further, the step of calculating the candidate translation text according to the target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text includes:

encoding the candidate translation text according to a target encoding layer of the decoder to obtain a third encoding vector;

and performing multi-head attention calculation on the third coding vector to obtain the candidate coding vector.

Further, the step of inputting the candidate encoding vector and the target encoding vector to the second attention layer of the decoder, and calculating the semantic encoding vector includes:

acquiring a preset first weight parameter, a preset second weight parameter and a preset third weight parameter, calculating according to the target coding vector and the first weight parameter to obtain a first information matrix, calculating according to the target coding vector and the second weight parameter to obtain a second information matrix, and calculating according to the candidate coding vector and the third weight parameter to obtain a third information matrix;

And calculating the dot product of the third information matrix and the first information matrix to obtain target similarity, and calculating the semantic coding vector according to the target similarity and the second information matrix.

In order to solve the technical problems, the embodiment of the application also provides a multilingual text translation device, which adopts the following technical scheme:

the coding module is used for acquiring a text to be translated and a trained preset translation model, inputting the text to be translated into the preset translation model, and coding the text to be translated according to a source coding layer of a coder in the preset translation model to obtain a first coding vector, wherein the preset translation model comprises the coder and a decoder, the coder comprises a source coding layer, a first attention layer and a feedforward layer, and the decoder comprises a target coding layer, a second attention layer, a linear network layer and an activation layer;

the first calculation module is used for inputting the first coding vector to a first attention layer in the encoder, and calculating to obtain a second coding vector;

the second calculation module is used for inputting the second coding vector to a feedforward layer in the encoder and calculating to obtain a target coding vector;

The third calculation module is used for obtaining a candidate translation text corresponding to the text to be translated, and calculating the candidate translation text according to a target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text;

A fourth calculation module, configured to input the candidate encoding vector and the target encoding vector to a second attention layer of the decoder, and calculate a semantic encoding vector;

And the prediction module is used for calculating the semantic coding vector based on a linear network layer and an activation layer of the decoder to obtain the prediction probability of the candidate translation text, and selecting the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated.

In order to solve the above technical problems, the embodiment of the present application further provides a computer device, which adopts the following technical schemes:

In order to solve the above technical problems, an embodiment of the present application further provides a computer readable storage medium, which adopts the following technical schemes:

According to the multilingual text translation method, a text to be translated and a trained preset translation model are obtained, the text to be translated is input into the preset translation model, the text to be translated is encoded according to a source encoding layer of an encoder in the preset translation model to obtain a first encoding vector, the preset translation model comprises the encoder and a decoder, the encoder comprises a source encoding layer, a first attention layer and a feedforward layer, and the decoder comprises a target encoding layer, a second attention layer, a linear network layer and an activation layer, so that the source encoding layer of the encoder can accurately express the text to be translated; then, inputting the first coding vector to a first attention layer in the encoder, and calculating to obtain a second coding vector; inputting the second coding vector to a feedforward layer in the encoder, and calculating to obtain a target coding vector; obtaining a candidate translation text corresponding to the text to be translated, and calculating the candidate translation text according to a target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text; then, inputting the candidate coding vectors and the target coding vectors to a second attention layer of the decoder, and calculating to obtain semantic coding vectors, wherein more semantic information of the text to be translated can be obtained through the second attention layer; finally, calculating the semantic coding vector based on a linear network layer and an activation layer of the decoder to obtain the prediction probability of the candidate translation text, and selecting the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated, thereby realizing the purpose of integrating the multilingual text on one model for translation, saving the training time and resources of the model, and improving the multilingual text translation efficiency while ensuring the accuracy of the multilingual text translation.

Drawings

In order to more clearly illustrate the solution of the present application, a brief description will be given below of the drawings required for the description of the embodiments of the present application, it being apparent that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without the exercise of inventive effort for a person of ordinary skill in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a multilingual text translation method according to the present application;

FIG. 3 is a schematic diagram illustrating the construction of one embodiment of a multilingual text translation apparatus in accordance with the present application;

FIG. 4 is a schematic structural diagram of one embodiment of a computer device in accordance with the present application.

Reference numerals: the multilingual text translation apparatus 300 includes an encoding module 301, a first calculation module 302, a second calculation module 303, a third calculation module 304, a fourth calculation module 305, and a prediction module 306.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

In order to make the person skilled in the art better understand the solution of the present application, the technical solution of the embodiment of the present application will be clearly and completely described below with reference to the accompanying drawings.

As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.

The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture ExpertsGroup Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving PictureExperts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.

It should be noted that, the multilingual text translation method provided by the embodiment of the present application is generally executed by a server/terminal device, and accordingly, the multilingual text translation apparatus is generally disposed in the server/terminal device.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow chart of one embodiment of a multilingual text translation method according to the present application is shown. The multilingual text translation method comprises the following steps:

step S201, obtaining a text to be translated and a trained preset translation model, inputting the text to be translated into the preset translation model, and encoding the text to be translated according to a source coding layer of an encoder in the preset translation model to obtain a first coding vector, wherein the preset translation model comprises the encoder and a decoder, the encoder comprises a source coding layer, a first attention layer and a feedforward layer, and the decoder comprises a target coding layer, a second attention layer, a linear network layer and an activation layer.

In this embodiment, the text to be translated is a received text to be translated, and the preset translation model is a trained multilingual translation model. When receiving the text to be translated, the multi-language text translation can be carried out on the text to be translated according to the preset translation model. The preset translation model comprises an encoder and a decoder, wherein the encoder comprises a source coding layer, a first attention layer and a feedforward layer, and the decoder comprises a target coding layer, a second attention layer, a linear network layer and an activation layer. Inputting a text to be translated into a preset translation model, and encoding the text to be translated according to a source encoding layer of an encoder in the preset translation model to obtain a first encoding vector. The coding comprises position coding and fixed coding, each word in the text to be translated is subjected to position coding and fixed coding respectively to obtain a corresponding position coding vector and a corresponding fixed coding vector, and the position coding vector and the fixed coding vector are used for obtaining a word vector corresponding to each word, wherein the word vector is a first coding vector.

Step S202, inputting the first encoding vector to the first attention layer in the encoder, and calculating to obtain a second encoding vector.

In this embodiment, when a first encoded vector is obtained, the first encoded vector is input to a first attention layer in an encoder, wherein a multi-head attention mechanism is adopted in the first attention layer, and multi-head attention calculation is performed on the input first encoded vector according to the first attention layer to obtain a second encoded vector. By means of the second coding vector, semantic information between words in the text to be translated can be accurately represented. Specifically, three different preset weight matrixes are obtained, and the preset weight matrixes are multiplied by the first coding vector respectively to obtain three corresponding different weight matrixes, such as a first weight matrix Q, a second weight matrix K and a third weight matrix V; calculating the dot multiplication between the first weight matrix Q and the second weight matrix K to obtain a dot multiplication result, dividing the dot multiplication result by a preset value to obtain an output result, and enabling the obtained output result to have a more stable gradient; and normalizing the output result, multiplying the normalized result by a third weight matrix, and finally obtaining the subcode vector of the current attention head. Calculating the sub-coding vector of each attention head, and splicing all the sub-coding vectors to obtain a final second coding vector.

Step S203, inputting the second encoding vector to a feedforward layer in the encoder, and calculating to obtain a target encoding vector.

In this embodiment, when the second encoded vector is obtained, the second encoded vector is input to a feedforward layer (feed forward network) of the encoder, where the feedforward layer includes a two-layer fully-connected layer, where the activation function of the first layer is Relu and the activation function is not used by the second layer. And performing dimension conversion on the second coding vector according to the feedforward layer to obtain a dimension-converted second coding vector, wherein the dimension-converted second coding vector is the target coding vector of the text to be translated.

Step S204, obtaining a candidate translation text corresponding to the text to be translated, and calculating the candidate translation text according to a target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text.

In this embodiment, the encoder and the decoder are in parallel network structures, and when the source coding layer in the encoder calculates the text to be translated, the candidate translation text corresponding to the text to be translated may be input to the decoder, and the candidate translation text is calculated based on the target coding layer of the decoder. In addition, after the target coding vector of the text to be translated is obtained, the candidate translation text corresponding to the text to be translated is obtained, and the candidate translation text is calculated according to the target coding layer of the decoder, so that the candidate coding vector corresponding to the candidate translation text is obtained. Specifically, the target coding layer in the decoder and the source coding layer in the encoder have the same structure, and can all adopt bert (pre-training language model) coding structures which are completed by pre-training. And carrying out position coding and fixed coding on the candidate translation text according to the target coding layer, and then splicing vectors obtained by the position coding and the fixed coding to obtain candidate coding vectors corresponding to the candidate translation text.

Step S205, inputting the candidate encoding vector and the target encoding vector to the second attention layer of the decoder, and calculating to obtain a semantic encoding vector.

In this embodiment, when the candidate encoding vector and the target encoding vector are obtained, the candidate encoding vector and the target encoding vector are input to a second attention layer of the decoder, and multi-head attention calculation is performed on the candidate encoding vector and the target encoding vector according to the second attention layer, so as to obtain the semantic encoding vector.

Step S206, calculating the semantic coding vector based on a linear network layer and an activation layer of the decoder to obtain the prediction probability of the candidate translation text, and selecting the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated.

In this embodiment, when the semantic coding vector is obtained, the semantic coding vector is calculated based on the linear network layer and the activation layer of the decoder, so as to obtain the prediction probability of the candidate translation text. Specifically, a linear function of a linear network layer is obtained, and a semantic coding vector is calculated according to the linear function to obtain a prediction linear parameter; and then, inputting the predicted linear parameter to an activation layer, and normalizing the predicted linear parameter based on an activation function (such as a SIGMOD function) of the activation layer to obtain the predicted probability of the candidate translation text. Further, when the semantic coding vector is obtained, in order to improve the accuracy and efficiency of model prediction, the semantic coding vector can be input to a feedforward layer in a decoder, and the semantic coding vector is subjected to dimension reduction according to the feedforward layer of the decoder to obtain a dimension-reduced semantic coding vector; and then, inputting the semantic coding vector after the dimension reduction to a linear network layer and an activation layer to obtain the prediction probability of the candidate translation text. The predicted probability is a probability value of the text to be translated corresponding to the candidate translation text, for example Apple is the text to be translated, the candidate translation text comprises Apple and orange, and the predicted probability is the probability of Apple translation into Apple and the probability of Apple translation into orange. And when the prediction probability corresponding to the candidate translation text is obtained, selecting the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated.

It is emphasized that the target translation text may also be stored in a blockchain node in order to further ensure privacy and security of the target translation text.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

The embodiment realizes that the multilingual text is concentrated on one model for translation, saves the training time and resources of the model, and improves the efficiency of multilingual text translation while ensuring the accuracy of multilingual text translation.

In some optional implementations of this embodiment, before the step of obtaining the text to be translated and the trained preset translation model, the method further includes:

In this embodiment, before a trained preset translation model is obtained, the basic translation model needs to be trained to obtain the preset translation model. Collecting source corpus data and target translation data corresponding to the source corpus data, wherein the source corpus data can be multi-language data such as Chinese or English, the target translation data is translation data corresponding to each source corpus data, and the target translation data also corresponds to multi-language data such as Chinese or English. Training data is selected from the source corpus data and the target translation data, wherein the training data comprises a plurality of groups of source corpus data and target translation data, and each source corpus data and the corresponding target translation data are one group of data. And constructing a basic translation model, wherein the basic translation model and a preset translation model have the same structure and different model parameters. And inputting the training data into the basic translation model according to the training batch, and calculating the training data according to the basic translation model to obtain the current loss value of the basic translation model of the current training batch.

Specifically, encoding source corpus data in training data according to an encoding layer of an encoder in the basic translation model to obtain a source corpus encoding vector; then, multi-head attention calculation is carried out on the source corpus coding vector through a first self-attention layer of an encoder in the basic translation model, so that semantic coding of the source corpus data is obtained; and inputting the semantic code to a feedforward layer of an encoder in the basic translation model, and performing dimension conversion on the semantic code according to the feedforward layer to obtain a target code vector. Then, inputting the target coding vector into a decoder in the basic translation model, and calculating the target coding vector and the translation coding vector according to a second self-attention layer of the decoder to obtain a translation semantic vector; the translation coding vector is a coding vector obtained by multi-head attention calculation after the decoder carries out coding calculation on target translation data. And finally, calculating the translation semantic vector through a linear conversion layer and a full connection layer of the decoder, outputting to obtain the prediction probability of the current source corpus data and the target translation data, and selecting the target translation data corresponding to the maximum prediction probability as the prediction translation data corresponding to the source corpus data. And acquiring real translation data corresponding to the source corpus data, and inputting the real translation data and the predicted translation data into a preset loss function of the basic translation model, namely calculating to obtain a current loss value of the basic translation model of the current training batch.

Performing iterative training on the basic translation model according to the current loss value, namely adjusting parameters of the basic translation model according to the current loss value to obtain an adjusted basic translation model; when the training data of the second training batch are input into the adjusted basic translation model, calculating the training data of the second training batch according to the adjusted basic translation model to obtain a loss value of the second training batch; and continuously adjusting the parameters of the adjusted basic translation model according to the loss value of the second training batch. According to the method, the parameters of the basic translation model are iteratively adjusted until the loss value calculated by a certain training batch is minimum, namely, the loss function calculated by the basic translation model of a certain training batch is converged, and the basic translation model is determined to be trained, so that the preset translation model is obtained.

According to the method, the device and the system, the single basic translation model is subjected to iterative training through the source corpus data and the target translation data, so that training efficiency of the model is improved, and the preset translation model can be obtained through training to accurately and efficiently translate the text to be translated.

In some optional implementations of this embodiment, the step of determining training data according to the source corpus data and the target translation data includes:

In this embodiment, when the source corpus data and the target translation data are obtained, language types respectively corresponding to the source corpus data and the target translation data are obtained, and tag identification is performed on the source corpus data according to the language types, so as to obtain tag corpus data. For example, the language class of the source corpus data is Chinese, the language class of the target translation data is English, and the source corpus data is labeled according to the language class, wherein the label is < zh-en >, zh is the language class of the source corpus data, represents Chinese, and en is the language class of the target translation data corresponding to the source corpus data, and represents English. And respectively segmenting the label corpus data and the target translation data to obtain a first phrase corresponding to the label corpus data and a second phrase corresponding to the target translation data, and combining the first phrase and the second phrase to obtain training data of the basic translation model.

According to the method, the device and the system, the source corpus data are identified through the language categories, the training data are determined according to the tag corpus data and the target translation data, so that multilingual translation can be carried out on the text through the trained basic translation model, the use efficiency of the model is improved, and the resource development cost of multilingual translation is reduced.

In some optional implementations of this embodiment, the step of inputting the training data into the base translation model according to the training batch order, and calculating the current loss value of the base translation model of the current training batch includes:

In this embodiment, since the training data of each training batch includes source corpus data and target translation data of different languages, when calculating the loss value of the basic translation model of the current training batch, it is necessary to calculate the language loss value corresponding to the different languages in the training data input by the current training batch, and calculate the current loss value of the basic training model according to the language loss value.

Specifically, the language loss value is a loss value corresponding to different languages in the same training batch, wherein the same language of the same training batch corresponds to the same loss value. Calculating language loss values corresponding to different languages in the same training batch, and then summing the language loss values to obtain a total loss value; and calculating the ratio of the total loss value to the number of languages to obtain the average language value. And obtaining a historical loss value of the current training batch corresponding to the last training batch, and calculating to obtain the current loss value of the basic translation model of the current training batch according to the historical loss value and the language average value.

According to the method and the device for the basic translation model, the current loss value of the basic translation model is calculated through the language loss value and the historical loss value, so that the basic translation model can be accurately trained through the current loss value, the time and the resources of model training are saved, and the training efficiency and the accuracy of multilingual translation are improved.

In some optional implementations of this embodiment, the step of calculating the current loss value of the current training batch according to the historical loss value and the language loss value includes:

In this embodiment, when calculating the current loss value of the basic translation model, the current loss value may be calculated by using the average loss value and the historical loss value of the basic translation model in the current training batch. Specifically, the number of different languages in the current training batch and the weight coefficient corresponding to the basic translation model in the current training batch are obtained, wherein the weight coefficient is a preset value, and the preset value can be obtained through model training. Calculating to obtain a loss average value according to the number of different languages, the weight coefficient and the language loss value; and carrying out summation calculation on the loss average value and the historical loss value to obtain the current loss value of the basic translation model of the current training batch. The calculation formula of the current loss value is as follows:

Where Loss _j-1 represents a historical Loss value, α _j represents a weight coefficient, m represents a number, and Loss _ji represents a language Loss value.

According to the method and the device for multi-language translation, the loss value of the multi-language is calculated, so that the trained translation model can carry out multi-language translation on the text, and the accuracy of the multi-language translation is further improved.

In some optional implementations of this embodiment, the step of calculating the candidate translated text according to the target coding layer of the decoder to obtain the candidate coding vector of the candidate translated text includes:

In this embodiment, when the candidate translation text is obtained, in addition to encoding the candidate translation text according to the target encoding layer of the decoder, the vector directly encoded by the target encoding layer may be used as a candidate encoding vector, and the vector directly encoded by the target encoding layer may be used as a third encoding vector, and multi-head attention calculation may be performed according to the third encoding vector, so as to obtain a final vector as a candidate encoding vector.

Specifically, encoding the candidate translation text according to a target encoding layer of the decoder to obtain a third encoding vector; and then, performing multi-head attention calculation on the third coding vector to obtain candidate coding vectors. Specifically, the target coding layer is the coding layer of the same structure and parameters as the encoder, such as bert coding structure to be pre-trained, the source coding layer of the encoder and the target coding layer of the decoder as the basic translation model. Performing position coding and fixed coding on the text to be translated according to the target coding layer, and splicing vectors obtained by the position coding and the fixed coding to obtain a third coding vector; and then, performing multi-head attention calculation on the third coding vector to obtain candidate coding vectors. The multi-head attention calculating method of the third encoding vector is the same as the multi-head attention calculating method corresponding to the first encoding vector, and will not be described herein.

According to the embodiment, the multi-head attention calculation is carried out on the third coding vector corresponding to the candidate translation text, so that the calculated candidate coding vector can express the semantics of the text more accurately, and the accuracy rate of the text translation is further improved.

In some optional implementations of this embodiment, the step of inputting the candidate encoding vector and the target encoding vector to the second attention layer of the decoder, and calculating the semantic encoding vector includes:

In this embodiment, when the candidate encoding vector and the target encoding vector are obtained, attention calculation is performed on the target encoding vector and the candidate encoding vector according to the second attention layer of the decoder, so as to obtain the semantic encoding vector. Specifically, a first weight parameter, a second weight parameter and a third weight parameter which are preset are obtained, a first information matrix k is obtained through calculation according to a target coding vector and the first weight parameter, a second information matrix v is obtained through calculation according to the target coding vector and the second weight parameter, and a third information matrix q is obtained through calculation according to a candidate coding vector and the third weight parameter; then, calculating the dot product of the third information matrix and the first information matrix to obtain target similarity, and dividing the target similarity by a preset value to obtain an output value; and normalizing the output value, multiplying the normalized result by a second information matrix, and finally obtaining the attention coding vector of the current attention head. And calculating the attention code vector of each attention head, and splicing all the attention code vectors obtained through calculation to obtain the final semantic code vector.

According to the method, the candidate coding vector and the target coding vector are calculated through the second attention layer, so that the obtained semantic coding vector can contain more text semantic information, and the accuracy of multilingual translation is further improved.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a multilingual text translation apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 3, the multilingual text translation apparatus 300 according to the present embodiment includes: an encoding module 301, a first calculation module 302, a second calculation module 303, a third calculation module 304, a fourth calculation module 305, and a prediction module 306. Wherein:

The encoding module 301 is configured to obtain a text to be translated and a trained preset translation model, input the text to be translated to the preset translation model, encode the text to be translated according to a source encoding layer of an encoder in the preset translation model to obtain a first encoding vector, where the preset translation model includes the encoder and a decoder, the encoder includes a source encoding layer, a first attention layer and a feedforward layer, and the decoder includes a target encoding layer, a second attention layer, a linear network layer and an activation layer;

A first calculation module 302, configured to input the first encoded vector to a first attention layer in the encoder, and calculate a second encoded vector;

A second calculation module 303, configured to input the second encoding vector to a feedforward layer in the encoder, and calculate a target encoding vector;

A third calculation module 304, configured to obtain a candidate translation text corresponding to the text to be translated, and calculate the candidate translation text according to a target coding layer of the decoder, so as to obtain a candidate coding vector of the candidate translation text;

In some alternative implementations of the present embodiment, the third computing module 304 includes:

The encoding unit is used for encoding the candidate translation text according to a target encoding layer of the decoder to obtain a third encoding vector;

And the first calculation unit is used for carrying out multi-head attention calculation on the third coding vector to obtain the candidate coding vector.

A fourth calculation module 305, configured to input the candidate encoding vector and the target encoding vector to the second attention layer of the decoder, and calculate a semantic encoding vector;

In some alternative implementations of the present embodiment, the fourth computing module 305 includes:

The second calculation unit is used for acquiring preset first weight parameters, second weight parameters and third weight parameters, calculating according to the target coding vector and the first weight parameters to obtain a first information matrix, calculating according to the target coding vector and the second weight parameters to obtain a second information matrix, and calculating according to the candidate coding vector and the third weight parameters to obtain a third information matrix;

And the third calculation unit is used for calculating the dot product of the third information matrix and the first information matrix to obtain target similarity, and calculating the semantic coding vector according to the target similarity and the second information matrix.

And the prediction module 306 is configured to calculate the semantic coding vector based on a linear network layer and an activation layer of the decoder, obtain a prediction probability of the candidate translation text, and select the candidate translation text with the maximum prediction probability as the target translation text of the text to be translated.

In some optional implementations of this embodiment, the multilingual text translation apparatus 300 further includes:

The acquisition module is used for acquiring source corpus data and target translation data corresponding to the source corpus data, and determining training data according to the source corpus data and the target translation data, wherein the source corpus data and the target translation data comprise data in different languages;

The building module is used for building a basic translation model, inputting the training data into the basic translation model according to the sequence of the training batches, and calculating to obtain the current loss value of the basic translation model of the current training batch;

The training module is used for carrying out iterative training on the basic translation model according to the current loss value to obtain an iteratively trained basic translation model, and determining the iteratively trained basic translation model as the preset translation model when the loss function calculated by the iteratively trained basic translation model converges.

In some optional implementations of this embodiment, the acquisition module includes:

The obtaining unit is used for obtaining language categories corresponding to the source corpus data and the target translation data respectively, and carrying out tag identification on the source corpus data according to the language categories to obtain tag corpus data;

the word segmentation unit is used for segmenting the label corpus data and the target translation data to obtain training data of the basic translation model.

In some optional implementations of this embodiment, the building block includes:

the fourth calculation unit is used for calculating language loss values of different languages of the training data in the current training batch according to the basic translation model;

And a fifth calculation unit, configured to obtain a historical loss value of a previous training batch of the current training batch, and calculate a current loss value of the current training batch according to the historical loss value and the language loss value.

In some optional implementations of the present embodiment, the fifth computing unit includes:

The obtaining subunit is used for obtaining the number of different languages in the current training batch and the weight coefficient of the basic translation model in the current training batch;

And the calculating subunit is used for calculating the loss average value of the basic translation model according to the number, the weight coefficient and the language loss value, and calculating the current loss value based on the loss average value and the historical loss value.

The multilingual text translation device provided by the embodiment realizes that multilingual texts are concentrated on one model for translation, saves training time and resources of the model, and improves the efficiency of multilingual text translation while ensuring the accuracy of multilingual text translation.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 4, fig. 4 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only computer device 6 having components 61-63 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 61 includes at least one type of readable storage media including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 61 is typically used to store an operating system installed on the computer device 6 and various types of application software, such as computer readable instructions of a multilingual text translation method. Further, the memory 61 may be used to temporarily store various types of data that have been output or are to be output.

The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, such as computer readable instructions for executing the multilingual text translation method.

The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.

The computer equipment provided by the embodiment realizes that the multilingual text is concentrated on one model for translation, saves the training time and the resources of the model, and improves the efficiency of the multilingual text translation while ensuring the accuracy of the multilingual text translation.

The present application also provides another embodiment, namely, a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the multilingual text translation method as described above.

The computer readable storage medium provided by the embodiment realizes that the multilingual text is concentrated on one model for translation, saves training time and resources of the model, and improves the efficiency of multilingual text translation while ensuring the accuracy of multilingual text translation.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.

It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims

1. A method for multilingual text translation, comprising the steps of:

The method comprises the steps of obtaining a text to be translated and a trained preset translation model, inputting the text to be translated into the preset translation model, and encoding the text to be translated according to a source coding layer of an encoder in the preset translation model to obtain a first coding vector, wherein the preset translation model comprises the encoder and a decoder, the encoder comprises the source coding layer, a first attention layer and a feedforward layer, and the decoder comprises a target coding layer, a second attention layer, a linear network layer and an activation layer, and before the step of obtaining the text to be translated and the trained preset translation model, the method further comprises the steps of:

performing iterative training on the basic translation model according to the current loss value to obtain an iteratively trained basic translation model, and determining the iteratively trained basic translation model as the preset translation model when a loss function calculated by the iteratively trained basic translation model converges;

The method comprises the steps of obtaining a candidate translation text corresponding to the text to be translated, calculating the candidate translation text according to a target coding layer of the decoder to obtain a candidate coding vector of the candidate translation text, wherein the step of calculating the candidate translation text according to the target coding layer of the decoder to obtain the candidate coding vector of the candidate translation text comprises the following steps:

Performing multi-head attention calculation on the third coding vector to obtain the candidate coding vector;

Inputting the candidate coding vector and the target coding vector to a second attention layer of the decoder, and calculating to obtain a semantic coding vector, wherein the step of inputting the candidate coding vector and the target coding vector to the second attention layer of the decoder, and calculating to obtain the semantic coding vector comprises the following steps:

Calculating the dot product of the third information matrix and the first information matrix to obtain target similarity, and calculating the semantic coding vector according to the target similarity and the second information matrix;

2. The multilingual text translation method according to claim 1, wherein the step of determining training data from the source corpus data and the target translation data comprises:

3. The multilingual text translation method according to claim 1, wherein the step of inputting the training data to the base translation model in the order of training batches, and calculating a current loss value of the base translation model of a current training batch comprises:

4. The multilingual text translation method of claim 3 wherein the step of calculating a current loss value for the current training batch based on the historical loss value and the language loss value comprises:

5. A multilingual text translation apparatus for implementing the steps of the multilingual text translation method as recited in any one of claims 1 to 4, the multilingual text translation apparatus comprising:

6. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the multilingual text translation method of any one of claims 1 to 4.

7. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the multilingual text translation method of any one of claims 1 to 4.