CN112749569A

CN112749569A - Text translation method and device

Info

Publication number: CN112749569A
Application number: CN201911038762.XA
Authority: CN
Inventors: 朱长峰; 于恒; 骆卫华
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2021-05-04
Anticipated expiration: 2039-10-29
Also published as: CN112749569B

Abstract

The invention discloses a text translation method and device. Wherein, the method comprises the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language. The invention solves the technical problem of poor multi-language compatibility of the multi-language translation model caused by language uniqueness in the prior art.

Description

Text translation method and device

Technical Field

The invention relates to the field of language processing, in particular to a text translation method and device.

Background

With the rapid development of translation services, more and more languages need to be supported, so that the workload of model training, deployment, operation and maintenance is rapidly increased, and the requests of the models cannot be combined to use the GPU computing resources in batch, thereby causing the waste of the computing resources. For some low-resource languages, the translation-related capability cannot be provided due to the loss of the labeled data.

The multi-language Neural (NMT) uses a unified NMT model to simultaneously provide the Translation capability of a plurality of language pairs, which greatly reduces the training, deployment, operation and maintenance work of the model and the online service cost. Meanwhile, the translation quality of the low-resource language pair is improved through the cross-language knowledge migration. However, since the multilingual translation model cannot model the uniqueness of the language, there is still a certain quality loss, resulting in insufficient translation accuracy.

Aiming at the problem of poor multi-language compatibility of a multi-language translation model caused by language uniqueness in the prior art, no effective solution is provided at present.

Disclosure of Invention

The embodiment of the invention provides a text translation method and a text translation device, which at least solve the technical problem of poor multi-language compatibility of a multi-language translation model caused by language uniqueness in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a method for translating a text, including: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

According to another aspect of the embodiments of the present invention, there is also provided a method for translating a text, including: acquiring a text to be translated; the method comprises the steps of translating a text to be translated to a target language through a multi-language translation model, wherein the multi-language translation model acquires a text vector of the text to be translated, maps the text vector to a semantic space to obtain a semantic representation which is corresponding to the text to be translated and irrelevant to the language, and converts the semantic representation into the text of the target language.

According to another aspect of the embodiments of the present invention, there is also provided a text translation apparatus, including: the first acquisition module is used for acquiring a text to be translated; the second obtaining module is used for obtaining at least one text vector of the text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; a conversion module for converting at least one text vector into an intermediate semantic; and the decoding module is used for decoding the intermediate semantics so that the text to be translated is translated to the target language.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to perform the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

According to another aspect of the embodiments of the present invention, there is also provided a translation system including: the device comprises an encoder, a translation module and a translation module, wherein the encoder is used for encoding a text to be translated to obtain at least one text vector of the text to be translated, and the text vector is used for representing vector representations of the text to be translated corresponding to different languages; an intermediate semantics module, communicatively coupled to the encoder, for converting the at least one text vector to intermediate semantics; and the decoder is in communication connection with the intermediate semantic module and is used for decoding the intermediate semantic to translate the text to be translated to a target language.

In the embodiment of the invention, a unified multilingual translation model is used for executing the steps of an encoder, a decoder and extracting intermediate semantics, and an intermediate semantic module used for language perception is explicitly introduced to extract a language-independent intermediate semantic from semantic spaces of different languages, so that the problem of translation knowledge conflict caused by language uniqueness is well relieved by using the migration of cross-language knowledge.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 shows a hardware configuration block diagram of a computer device (or mobile device) for implementing a translation method of a text;

fig. 2 is a flowchart of a text translation method according to embodiment 1 of the present application;

FIG. 3 is a diagram of a multilingual translation model in accordance with an embodiment of the present application;

fig. 4 is a flowchart of a text translation method according to embodiment 2 of the present application;

fig. 5 is a schematic view of a text translation apparatus according to embodiment 3 of the present application;

fig. 6 is a schematic diagram of a text translation apparatus according to embodiment 4 of the present application;

fig. 7 is a flowchart of a text translation apparatus according to embodiment 5 of the present application;

fig. 8 is a schematic view of a text translation apparatus according to embodiment 6 of the present application;

fig. 9 is a block diagram of a computer device according to embodiment 7 of the present application; and

fig. 10 is a schematic diagram of a translation system according to embodiment 9 of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

NMT is a technology for constructing a machine translation model by using a neural network structure.

language-aware intermediate semantic representation module, which converts different language representations to intermediate semantic representation interlngua, takes into account the characteristics of each language.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for translation of text, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer device, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer device (or mobile device) for implementing a text translation method. As shown in fig. 1, computer device 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a universal BUS (BUS) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, computer device 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuitry may be a single, stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer device 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the text translation method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, that is, implementing the text translation method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to computer device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission module 106 is used to receive or transmit data via a network. Specific examples of such networks may include wireless networks provided by the communications provider of computer device 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer device 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

Under the above operating environment, the present application provides a method for translating text as shown in fig. 2. Fig. 2 is a flowchart of a text translation method according to embodiment 1 of the present application.

And step S21, acquiring the text to be translated.

Specifically, the text to be translated is text information that needs to be translated, and when the text to be translated is translated, sentences in the text to be translated can be sequentially extracted and translated.

Step S23, obtaining at least one text vector of the text to be translated, where the text vector is used to represent vector representations of the text to be translated corresponding to different languages.

Specifically, the above languages may be used to indicate the language type in a narrow sense, for example: chinese, korean, english, etc., can also be used to represent broad language families, such as: the Hindu language family and the Hanzang language family. The method can obtain a text vector corresponding to the text to be translated in the source language, wherein the source language is the language of the text to be translated.

In an alternative embodiment, at least one vector of the text to be translated may be obtained by Word2vec, TF-IDF (Term Frequency-Inverse text Frequency), or the like.

In another optional embodiment, the text to be translated may be segmented to obtain a segmentation result of the text to be translated, and the segmentation result is encoded by the encoder to obtain at least one text vector corresponding to the text to be translated.

Step S25, converting the at least one text vector into an intermediate semantic.

Specifically, the intermediate semantics may be semantic expression vectors unrelated to languages, that is, regardless of language, texts having the same semantics correspond to the same intermediate semantics.

In an alternative embodiment, at least one text vector may be converted to intermediate semantics by an intermediate semantics module. The intermediate semantic module may include a forward neural network, and the forward neural network is configured to extract semantic information irrelevant to the language from the text vector, so as to obtain an intermediate semantic corresponding to the text to be translated.

Before translation, the sample can be learned to obtain the intermediate semantic module. In an optional embodiment, the sample data used for training the intermediate semantic module may be text vectors having the same semantics and belonging to different languages, and intermediate semantics corresponding to the text vectors, and training is performed based on the sample data, so that the intermediate semantic module capable of predicting the intermediate semantics corresponding to the text vectors can be obtained. The scheme is favorable for improving the migration of cross-language knowledge by explicitly modeling the intermediate semantics, thereby improving the translation quality of the whole model, in particular the translation quality of low-resource or zero-resource language pairs.

And step S27, decoding the intermediate semantics, so that the text to be translated is translated to the target language.

In the above steps, the decoder decodes the intermediate semantics to obtain a translation result of the text to be translated, where the translation result is a text corresponding to the text to be translated in the target language.

It should be noted that steps S21 to S23 may be performed by a multi-language translation model, where the multi-language translation model includes an encoder, an intermediate semantic representation module, and a decoder, where the encoder is configured to obtain at least one text vector according to a text to be translated, the intermediate semantic representation module is configured to obtain intermediate heterosemantic meaning of the text to be translated according to the at least one text vector of the text to be translated, and the decoder is configured to decode the intermediate semantic meaning to obtain a translation result of the text to be translated.

It should also be noted that in order to support more language pair translations using fewer models, there are two common approaches: the first is to train some models containing translation from important pivot language (such as Chinese and English) to other languages, and to serve the translation requirement of some key language pairs, and to use these models containing pivot language to serve the translation requirement between non-pivot languages by means of bridging. If a German to Thai translation is required (de2th), the German to English translation may be used (de2en) followed by an English to Thai translation (en2 th). However, this approach is prone to false overlap, which is especially problematic when two bridging models are trained using disparate domain data. On the other hand, it is time-consuming to obtain the translation result through two translations, so that the scheme is difficult to meet certain real-time translation scenes with high delay requirements, and the scheme has problems. The second category is to use the traditional neural machine translation model, but if only all language pair corpora are forcibly trained together to obtain a unified model, there is a problem that the unified model cannot solve the problem of translation knowledge conflict caused by language uniqueness in the case that the language features are very different in the multilingual pair composition, especially the language order is very different.

The embodiment of the application uses a unified multi-language translation model to execute the steps of the encoder, the decoder and the extraction of the intermediate semantics, and extracts the language-independent intermediate semantics from the semantic spaces of different languages by explicitly introducing an intermediate semantic module for language perception, thereby utilizing the migration of cross-language knowledge and well relieving the problem of translation knowledge conflict caused by language uniqueness.

Compared with the existing multilingual NMT technology, the scheme has the advantage that 1-2 BLEU (bilingual evaluation understudu) values are improved on two data sets. In a zero-resource translation scene, compared with the existing multilingual NMT technology, the scheme improves the BLEU value of 10 points, and brings huge translation quality improvement.

As an alternative embodiment, obtaining at least one text vector of a text to be translated includes: obtaining a text vector of a text to be translated through an encoder, the encoder comprising: at least one first sub-coding layer, each first sub-coding layer comprising: a first semantic mechanism layer and a first forward neural network layer, wherein the step of the language translation model obtaining the text vector of the text to be translated through the encoder comprises: acquiring a word vector and a position vector corresponding to a text to be translated; acquiring a first attention mechanism parameter according to the word vector and the position vector: and inputting the first attention mechanism parameters into the first sub-coding layers for coding to obtain text vectors of the texts to be translated output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer calculates the parameters output by the previous first sub-coding layer and then inputs the parameters into the first forward neural network layer for coding.

Specifically, the word vector of the text to be translated may be a vector obtained by performing word segmentation on the text to be translated and then performing vectorization, for example: a50000-hidden-size matrix word embedding is initialized, the text to be translated is a word sequence, embedding of the text to be translated is taken according to a fixed word id in the word embedding, and therefore a word vector of the text to be translated is obtained. The position vector corresponding to the text to be translated can be formed according to the position vector of each participle, and the position vector of each participle can be obtained through the following formula:

wherein, PE_(pos，2i)For indicating the position, two vector values at (pos, 2i), d_modelFor representing dimension information, the above formula is equivalent to that the size of the position vector of each position is the number of cells (hidden _ size) in the hidden layer, the odd digit number in this vector is a cosine function, and the even position is a sine function.

The above attention mechanism parameters include Q, K, V, where Q is used to denote query, and K and V are used to denote key-value. Note that the mechanism itself is a function that implements mapping from query, a series of key-value pairs (key-values) to output, the output result is obtained by weighted summation of values, and the weight value corresponding to each value is calculated by a compatibility function through query and key.

The first attention mechanism layer can be used to realize a multi-head attention mechanism, as opposed to a single attention mechanism, which is only for d_modelPerforming attention calculation on key, values and queries of dimensions, respectively performing h-time linear mapping on each dimension of the queries, the key and the values by using a multi-head attention mechanism, and in the h-time linear mapping, performing an attention function on the queries, the key and the values obtained by mapping each time in parallel to generate d_valueThe value of dimension is output, and h d are output_valueThe output values of the dimensions are spliced together, and the final output can be obtained by performing attention mapping again.

Fig. 3 is a schematic diagram of a multilingual translation model according to an embodiment of the present application, and in an alternative embodiment, as shown in fig. 3, the above steps may be performed by an encoder, where the encoder includes six first sub-encoding layers, each of which includes a first auto-action layer (self-action) and a first forward neural network layer (FF-layer), a word vector Emb corresponding to a text to be translated is found from a vector matrix (wo-ed embedding (scr)) of a source language, and a position vector Pos _ Emb of the text to be translated is obtained through operation, and Q, K, V corresponding to the word vector Emb and the position vector Pos _ Emb are obtained according to the word vector Emb and the position vector Pos _ Emb, and Q, K, V are input to the first encoding sub-layer of the encoder, so that a text vector H _ enc ═ FFN (ATT (Q, K, V)) output by the encoder is an ordinary feedforward network, ATT is a Multi-head attention mechanism.

As an optional embodiment, obtaining a position vector corresponding to a text to be translated includes: obtaining a source language vector corresponding to a source language, wherein the source language is the language of a text to be translated; determining a first offset vector according to the source language seed vector; acquiring position information of a text to be translated; and superposing the first offset vector on the position information to obtain a position vector of the text to be translated.

Specifically, the position information may be calculated by the formula (1). The source language is a language to which the text to be translated belongs, the first offset vector can be a preset fixed vector, the first offset vector is determined according to the source language, and a vector corresponding to the source language is used as the first offset vector to be superposed on the position information, so that the language characteristics of the text to be translated can be introduced into the text vector.

In an alternative embodiment, as shown in fig. 3, a language vector (language embedding) obtained from initialization is selected according to a language of a text to be translated, so as to obtain a first bias vector. And superposing the first offset vector in the position information to obtain a position vector.

It should be noted that the same position in a sentence obviously has different meanings in different languages, and in the above scheme, an offset generated by a language vector is added to the position information, so that the position vectors of each cosine corresponding to different languages represent different representations, and thus a position vector Pos _ Emb (L _ Emb) related to the language is established according to the first offset vector determined by the source language, and further, the language feature of the text to be translated is introduced into the text vector.

According to the scheme, the first offset vector is used for modeling the languages of the text to be translated in the encoding process, so that the differences of different languages are considered, and the influence of language conflicts on the translation quality is reduced by the multi-language-pair integrated unified model. And then reduced training, deployment and operation and maintenance cost, promoted the overall performance of online service simultaneously, for example: service throughput (QPS) and Response Time (RT) are greatly improved.

As an alternative embodiment, converting at least one text vector into an intermediate semantic comprises: and mapping at least one text vector to a semantic space to obtain intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.

In the above steps, at least one text vector is mapped to a semantic space, the semantic space may include a plurality of sub-semantic spaces, and semantic information in the text vector is extracted through the sub-semantic spaces, so as to obtain an intermediate semantic corresponding to the text vector.

As an alternative embodiment, mapping the text vector to a semantic space to obtain an intermediate semantic, includes: determining a second attention mechanism parameter according to a text vector of the text to be translated, wherein a query element parameter in the second attention mechanism parameter is determined according to a source language vector of the text to be translated and a subspace vector corresponding to the query element, a key value pair in the second attention mechanism parameter is the text vector of the text to be translated, a semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space has a corresponding subspace vector; the text vector is mapped to a semantic space according to the second attention mechanism parameter.

Specifically, the query element in the parameter of the second attention mechanism is a parameter query, and the key-value pair in the second attention mechanism is a key-value.

The semantic space may include a plurality of sub-semantic spaces, and since any one semantic representation may be mapped to different dimensions, each dimension represents one type of semantic information, for example, most of the sentences may contain a subject, a predicate, an object, and the like, the semantic space may include a subject sub-semantic space, a predicate sub-semantic space, an object sub-semantic space, and the like, and the sub-semantic spaces extract sub-semantics in the text from the original text vector representation respectively. Different sub-semantic spaces emphasize and reflect different semantic information, so that vectors of the sub-semantic spaces are orthogonal under the optimal condition, the expression semantics are not mutually contained, and the characteristic can also be used as a training target for training the middle semantic module.

The source language vector of the text to be translated is a vector corresponding to the source language, the subspace vector is a vector corresponding to the subspace, and each subspace corresponds to a general subspace vector.

The above steps are used to convert the text vector H _ enc of the text to be translated in each language into a middle semantic I with a fixed size and no language relation, and in the conversion process, the source language type of the text to be translated needs to be considered.

In an alternative implementation, I ═ FFN (ATT (Q, K, V)), where Q ═ FFN (L _ emb, I _ emb), K and V are both H _ enc, L _ emb is a source seed vector, I _ emb is a subspace vector, and each sub-semantic space corresponds to a common I _ emb. The purpose of the intermediate semantic module is to map the encoder representations H _ enc from different languages onto a fixed number of language independent sub-semantic spaces. In the semantic extraction process, different conversion methods are adopted for different languages by introducing L _ emb.

As an alternative embodiment, mapping the text vector to the semantic space according to the second attention mechanism parameter includes: mapping the text vector to a semantic space according to the second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, and each sub-semantic mapping layer comprises: a second attention mechanism layer and a third forward neural network layer, the step of mapping the text vector to the semantic space according to the second attention mechanism parameter by the semantic mapping model comprises: and inputting the second attention mechanism parameter into the semantic mapping model to obtain the intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer operates the parameter output by the previous sub-semantic mapping layer and inputs the parameter into the third forward neural network layer for operation.

In particular, the second attention mechanism may still be a multi-head attention mechanism.

In an alternative embodiment, as shown in fig. 3, the language-aware interlingua is an intermediate semantic module, and the module includes 3 sub-semantic mapping layers, and each sub-semantic mapping layer includes a second attention-addressing layer (enc-addressing) and a third forward neural network layer (FF-layer). Firstly, searching a sub-semantic vector corresponding to a query element from a semantic space vector (language embedding), acquiring a source language vector corresponding to a source language of a text to be translated from the language vector (language embedding), acquiring a query based on the sub-semantic vector and the source language vector through a second forward neural network layer (FF-layer), inputting the acquired query and K and V output by an encoder into an intermediate semantic module, and performing operation through a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer) of the intermediate semantic module to obtain semantic representation corresponding to the text vector.

As an alternative embodiment, decoding the intermediate semantics to translate the text to be translated into the target language includes: acquiring context information of a text to be translated; and decoding the intermediate semantics of the text to be translated through a decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.

Specifically, the context information may belong to the same file as the text to be translated, and the context information may also be a vector corresponding to the context that has been translated. The context information may be obtained according to the context of the document to be translated, and in an alternative embodiment, the translated context of the document to be translated may be encoded to obtain the context information of the document to be translated. In the decoding process of the decoder, the context information of the text to be translated is introduced, so that the translation result refers to the context of the text to be translated, and the problem of low accuracy caused by isolated translation of the text to be translated is avoided.

As an alternative embodiment, the context information of the text to be translated includes: acquiring context information of a text to be translated by using the word vector of the translated context and the position vector of the translated context, wherein the context information comprises: obtaining a word vector of a translated context; acquiring second bias information corresponding to the target language; and superposing second bias information on the position information of the translated context to obtain a position vector of the translated context.

Specifically, the target language is a language to be translated into finally needed text, each language has a language vector corresponding to the target language, and the vector of the target language can be used as a second offset vector for offsetting the position information of the translated context.

In an alternative embodiment, as shown in fig. 3, the word vector corresponding to the translated context is obtained through the word matrix word embedding (tgt) of the target language, and the position information of the translated context is obtained through the above formula (1). And biasing the position information of the translated context by using second biasing information determined based on the target language, thereby obtaining a position vector of the translated context.

The above steps are similar to the process of coding the text to be translated, coding is carried out through a multi-head attention mechanism, and meanwhile, a position vector method related to languages is also adopted. In this way the translation process is made to refer to context-related information that has already completed translation.

According to the scheme, the target language is modeled in the decoding process through the second offset vector, so that the difference of different languages is considered, and the influence of language conflict on translation quality is reduced by a unified model of multi-language pair fusion. And then reduced training, deployment and operation and maintenance cost, promoted the overall performance of online service simultaneously, for example: increase QPS (number of times requests are processed per second), decrease RT (response time).

As an alternative embodiment, the decoder comprises at least one second sub-coding layer and a classification layer, each second sub-coding layer comprising: a third attention mechanism layer, a fourth attention mechanism layer and a fourth forward neural network layer, wherein the decoder decodes the intermediate semantic representation of the text to be translated according to the context information to obtain the text corresponding to the text to be translated in the target language, and the method comprises the following steps: inputting the context information into a third attention mechanism layer to obtain an operation result of the third attention mechanism layer; inputting the operation result of the third attention mechanism layer and the intermediate semantic meaning of the text to be translated into a fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; inputting the operation result of the fourth attention mechanism layer to a fourth forward neural network layer to obtain the operation result of the fourth forward neural network layer; and the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.

The decoding process is realized through a second sub-coding layer and a classification layer, a third attention mechanism layer and a fourth attention mechanism layer in the second sub-coding layer can be both multi-head attention mechanism layers, the third attention mechanism layer is used for introducing translated context information of the text to be translated in the coding process, and the fourth attention mechanism layer is used for introducing middle semantics of the text to be translated in the coding process. The classification layer can be a softmax mechanism, and softmax takes the output of the second sub-coding layer as input, classifies texts to be translated on all word lists, and predicts the translation result with the best current position.

In an alternative embodiment, as shown in fig. 3, the decoder includes 6 second sub-coding layers and 1 classification layer, a third layer attention mechanism layer (self-attention) in the second sub-coding layers represents information that the translation process refers to the translated context according to the word vector and the position vector of the translated context, and a fourth attention mechanism layer (inter-attention) in the second sub-coding layers performs an operation according to the output of the third attention mechanism layer and the intermediate semantic model, which represents that the translation process refers to the intermediate semantic of the text to be translated. And finally, after being coded by a fourth forward neural network layer (FF-layer), the coded words are output to a classification layer (linear & softmax), and the classification layer classifies the words according to the output of the second sub-coding layer and the second offset vector of the target language, so that the words of the most suitable target language are determined.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, an embodiment of a text translation method is further provided, and fig. 4 is a flowchart of a text translation method according to embodiment 2 of the present application, and is shown in fig. 4, where the method includes the following steps:

and step S41, acquiring the text to be translated.

And step S43, translating the text to be translated to the target language according to the text vector through the multi-language translation model, wherein the multi-language translation model acquires the text vector of the text to be translated, maps the text vector to a semantic space to obtain semantic representations which are corresponding to the text to be translated and irrelevant to the language, and converts the semantic representations into the text of the target language.

The intermediate semantics can be semantic expression vectors irrelevant to languages, namely, the texts with the same semantics correspond to the same intermediate semantics no matter what language

It should be noted that the multilingual translation model in the above embodiment of the present application may also perform other steps in embodiment 1, which is not described herein again.

It should be noted that, if the system bridging method is adopted, error propagation and error superposition are easy to occur, and the delay is increased by twice of the original dimension. The scheme uses a uniform multi-language model, so that the problem of error propagation can be relieved, and meanwhile, the calling delay cannot be increased. And an intermediate semantic module for language perception is explicitly introduced, so that a language-independent intermediate semantic is extracted from semantic spaces of different languages, thereby utilizing the transfer of cross-language knowledge and well relieving the problem of translation knowledge conflict caused by language uniqueness.

Since the intermediate semantics are independent of the specific language, and the corpus of any two languages can be added into the model training, the multi-language model supports N languages, and the number of models is reduced from the original number N (N-1) to 1. Meanwhile, due to the unification of the models, translation requests of different language pairs can be combined and used for calling GPU computing resources, and the service throughput (QPS) and the Response Time (RT) are greatly improved

As an alternative embodiment, the method further comprises: obtaining a multilingual translation model, wherein the step of obtaining the multilingual translation model comprises: acquiring sample data, wherein the sample data comprises: the sample text and the actual translation result for translating the sample text to other languages; acquiring a loss function of the initial multi-language translation model in the process of learning sample data by the initial multi-language translation model; and determining the loss function as a minimum objective function, and adjusting the model parameters of the initial multi-language translation model.

Because the multi-language translation model is a unified model, unified training can be carried out.

As an alternative embodiment, the loss function comprises at least one of:

a. the initial multi-language translation model translates the sample text to obtain a difference value between a model translation result and an actual translation result corresponding to the sample text;

b. the initial multi-language translation model translates an actual translation result of the sample text to obtain a difference value between a model translation result and the sample text;

c. the difference value of the sample text and the converted text corresponding to the sample text, wherein the initial multilingual translation model converts the sample text into semantic representation, and then converts the semantic representation into the text with the same language as the sample text to obtain the converted text corresponding to the sample text;

d. the difference value of the actual translation result corresponding to the sample text and the converted text corresponding to the actual translation result is obtained, wherein the initial multilingual translation model converts the actual translation result of the sample text into semantic representation, and then converts the semantic representation into the text with the same language as the actual translation result to obtain the converted text corresponding to the actual translation result;

e. a distance between two semantic representations, wherein the two semantic representations comprise: the initial multi-language translation model converts the sample text to obtain semantic representation, and the initial multi-language translation model converts the actual translation result corresponding to the sample text to obtain semantic representation.

In order to learn better intermediate semantics, reduce inter-language conflicts and promote the migration of cross-language knowledge, the above scheme proposes a plurality of loss functions as training targets, which are described below:

the above-mentioned loss function a is a Translation object (Translation object) that uses cross entropy to measure the difference between the automatic Translation (i.e. the Translation translated from the initial multilingual Translation model to the sample text) and the standard Translation (i.e. the actual Translation corresponding to the sample Translation), and the difference can be expressed as the actual Translation corresponding to the sample Translation

Wherein t is used for representing automatic translation, s is used for representing a text to be translated, and n is used for representing training times.

The above-mentioned loss function b is also a Translation object (Translation object), which is a reverse Translation object, and the cross entropy is used to measure the difference between the standard Translation and the automatic Translation, which can be expressed as the difference

The above-mentioned loss functions c and d are both semantic Reconstruction objectives (Reconstruction objectives): to make information that intermediate semantics can be maximized lossless, semantic reconstruction objectives are introduced to constrain this byThe translation loss of the process of converting the original text input into the intermediate semantic and then translating into the original text is taken as semantic reconstruction loss and is respectively represented as L _ s2s and L _ t2t, wherein,

the loss function e is a Semantic consistency object (Semantic consistency object), and the intermediate Semantic module converts the representation of each language into intermediate representation, so that cross-language knowledge can be migrated, and translation quality is improved. In order to measure the language independence of the intermediate semantic representations and the consistency of the semantic representations, a semantic consistency target is introduced, namely the intermediate semantic representations I _ s and I _ t generated respectively for the original text and the translated text are measured, the cosine distance of the intermediate semantic representations is measured, the smaller the distance, the more consistent the semantic representations are, the loss function can be expressed as L_dist＝1-sim(I^s，I^t)。

In an alternative embodiment, if all of the above-described loss functions are used as the minimum objective function, the minimum objective function is obtained:

the scheme provides a combination of multiple training targets, the introduction of Interlingua brings extremely low semantic loss by using a semantic reconstruction target, and the sufficient semantic independence of Interlingua modeled by using a semantic consistency target for constraint modeling has good semantic consistency.

Example 3

According to an embodiment of the present invention, there is also provided a text translation apparatus for implementing the text translation method according to embodiment 1, and fig. 5 is a schematic diagram of a text translation apparatus according to embodiment 3 of the present application, and as shown in fig. 5, the apparatus 500 includes:

the first obtaining module 502 is configured to obtain a text to be translated.

The second obtaining module 504 is configured to obtain at least one text vector of the text to be translated, where the text vector is used to represent vector representations of the text to be translated in different languages.

A conversion module 506, configured to convert the at least one text vector into an intermediate semantic.

And the decoding module 508 is configured to decode the intermediate semantics, so that the text to be translated is translated to the target language.

It should be noted here that the first obtaining module 502, the second obtaining module 504, the converting module 506 and the decoding module 508 correspond to steps S21 to S27 in embodiment 1, and the four modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer device 10 provided in the first embodiment.

As an alternative embodiment, the second obtaining module includes: the first obtaining submodule is used for obtaining a text vector of a text to be translated through an encoder, and the encoder comprises: at least one first sub-coding layer, each first sub-coding layer comprising: a first mean-time mechanism layer and a first forward neural network layer, wherein the first acquisition submodule comprises: the first acquisition unit is used for acquiring a word vector and a position vector corresponding to a text to be translated; a second obtaining unit, configured to obtain, according to the word vector and the position vector, a first attention mechanism parameter: and the first input unit is used for inputting the first attention mechanism parameters to the first sub-coding layers for coding to obtain text vectors of the texts to be translated output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer calculates the parameters output by the previous first sub-coding layer and then inputs the parameters to the first forward neural network layer for coding.

As an alternative embodiment, the first obtaining unit includes: the first obtaining subunit is used for obtaining a source language vector corresponding to a source language, wherein the source language is the language of the text to be translated; a determining subunit, configured to determine a first offset vector according to the source language seed vector; the second acquisition subunit is used for acquiring the position information of the text to be translated; and the superposition subunit is used for superposing the first offset vector on the position information to obtain a position vector of the text to be translated.

As an alternative embodiment, the conversion module comprises: and the mapping submodule is used for mapping at least one text vector to a semantic space to obtain an intermediate semantic, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

As an alternative embodiment, the mapping submodule includes: the first determining unit is used for determining a second attention mechanism parameter according to a text vector of the text to be translated, wherein a query element parameter in the second attention mechanism parameter is determined according to a source language vector of the text to be translated and a subspace vector corresponding to the query element, a key value pair in the second attention mechanism parameter is the text vector of the text to be translated, a semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space has a corresponding subspace vector; and the mapping unit is used for mapping the text vector to the semantic space according to the second attention mechanism parameter.

As an alternative embodiment, the mapping unit is further configured to map the text vector to the semantic space according to the second attention mechanism parameter through a semantic mapping model, where the semantic mapping model includes at least one sub-semantic mapping layer, and each sub-semantic mapping layer includes: a second attention mechanism layer and a third forward neural network layer, the mapping unit comprising: and the input subunit is used for inputting the second attention mechanism parameter to the semantic mapping model to obtain the intermediate semantic of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer calculates the parameter output by the previous sub-semantic mapping layer and then inputs the parameter to the third forward neural network layer for calculation.

As an alternative embodiment, the decoding module comprises: the second obtaining submodule is used for obtaining the context information of the text to be translated; and the decoding submodule is used for decoding the intermediate semantics of the text to be translated through the decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.

As an alternative embodiment, the context information of the text to be translated includes: the word vector of the translated context and the position vector of the translated context, and the second obtaining sub-module comprises: a third obtaining unit, configured to obtain a word vector of a translated context; the fourth obtaining unit is used for obtaining second bias information corresponding to the target language; and the superposition unit is used for superposing the second bias information on the position information of the translated context to obtain the position vector of the translated context.

As an alternative embodiment, the decoder comprises at least one second sub-coding layer and a classification layer, each second sub-coding layer comprising: a third attention mechanism layer, a fourth attention mechanism layer and a fourth forward neural network layer, wherein the decoding submodule comprises: the second input unit is used for inputting the context information to the third attention mechanism layer to obtain an operation result of the third attention mechanism layer; the third input unit is used for inputting the operation result of the third attention mechanism layer and the middle semantic meaning of the text to be translated into a fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; the fourth input unit is used for inputting the operation result of the fourth attention mechanism layer to the fourth forward neural network layer for encoding to obtain the operation result of the fourth forward neural network layer; and the classification unit is used for determining the corresponding text of the text to be translated in the target language by the classification layer according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.

Example 4

According to an embodiment of the present invention, there is also provided a text translation apparatus for implementing the text translation method according to embodiment 2, and fig. 6 is a schematic diagram of a text translation apparatus according to embodiment 4 of the present application, and as shown in fig. 6, the apparatus 600 includes:

an obtaining module 602, configured to obtain a text to be translated.

The translation module 604 is configured to translate the text to be translated to the target language through the multi-language translation model, where the multi-language translation model obtains a text vector of the text to be translated, maps the text vector to a semantic space, obtains a semantic representation corresponding to the text to be translated and unrelated to the language, and converts the semantic representation into the text of the target language.

It should be noted here that the above-mentioned obtaining module 602 and the translating module 604 correspond to steps S41 to S43 in embodiment 2, and the two modules are the same as the example and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the modules described above as a part of the apparatus may be run in the computer device 10 provided in the first embodiment.

As an alternative embodiment, the apparatus further comprises: the model acquisition module is used for acquiring a language translation model, wherein the model acquisition module comprises: the sample acquisition submodule is used for acquiring sample data, wherein the sample data comprises: the sample text and the actual translation result for translating the sample text to other languages; the loss function acquisition sub-module is used for acquiring a loss function of the initial multi-language translation model in the process of learning the sample data by the initial multi-language translation model; and the determining submodule is used for determining the loss function as a minimum objective function and adjusting the model parameters of the initial multi-language translation model.

As an alternative embodiment, the loss function comprises at least one of:

the initial multi-language translation model translates the sample text to obtain a difference value between a model translation result and an actual translation result corresponding to the sample text;

the initial multi-language translation model translates an actual translation result of the sample text to obtain a difference value between a model translation result and the sample text;

the difference value of the sample text and the converted text corresponding to the sample text, wherein the initial multilingual translation model converts the sample text into semantic representation, and then converts the semantic representation into the text with the same language as the sample text to obtain the converted text corresponding to the sample text;

the difference value of the actual translation result corresponding to the sample text and the converted text corresponding to the actual translation result is obtained, wherein the initial multilingual translation model converts the actual translation result of the sample text into semantic representation, and then converts the semantic representation into the text with the same language as the actual translation result to obtain the converted text corresponding to the actual translation result;

a distance between two semantic representations, wherein the two semantic representations comprise: the initial multi-language translation model converts the sample text to obtain semantic representation, and the initial multi-language translation model converts the actual translation result corresponding to the sample text to obtain semantic representation.

Example 5

According to an embodiment of the present invention, there is also provided a method for translating a text, and fig. 7 is a flowchart of an apparatus for translating a text according to embodiment 5 of the present application, and as shown in fig. 7, the method includes:

step S71, obtaining a word vector of the text to be translated and a source language vector of the text to be translated, and determining a text vector corresponding to the text to be translated, wherein the source language is the language of the text to be translated.

In an alternative embodiment, a text vector may be determined by a multi-head attention mechanism according to a word vector and a position vector corresponding to a text to be translated, wherein when determining the position vector, a source language vector is used as a bias vector to bias the position vector, so that source language information is introduced into the text vector.

Step S73, the text vector is converted into an intermediate semantic.

In an alternative embodiment, the text vector is converted to intermediate semantics, which may still be based on source language information. Specifically, a subspace vector corresponding to the introduced sub-semantic space and a source language vector of the text to be translated execute a multi-head attention mechanism, so that language information of the text to be translated is also introduced in the process of converting the intermediate semantics.

And step S75, decoding the intermediate semantics according to the target language vector, so that the text to be translated is translated to the target language.

In an alternative embodiment, in the decoding process, a target language vector may be introduced for translation, so that the translation process refers to the target language.

In another alternative embodiment, before decoding, the decoding process may refer to the translated context by encoding information that introduces the translated context of the text to be translated, so as to further improve the translation accuracy.

Therefore, the characteristics of the languages are modeled in the processes of acquiring the text vectors, extracting the middle semantics and decoding, and the difference of each language is considered, so that the influence of multi-language on the integrated unified model is reduced, the requirement of service is met, the training, deployment and operation and maintenance costs are greatly reduced, and the overall performance of online service is improved.

As an alternative embodiment, converting the text vector into an intermediate semantic comprises: and mapping the at least one text vector to a semantic space to obtain the intermediate semantic, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

In the above steps, at least one text vector is mapped to a semantic space, the semantic space may include a plurality of sub-semantic spaces, and semantic information in the text vector is extracted through the sub-semantic spaces, so as to obtain an intermediate semantic corresponding to the text vector. For a specific implementation, see example 1, which is not described herein.

As an alternative embodiment, different source languages correspond to different intermediate semantic modules, and the converting the text vector into an intermediate semantic includes: calling an intermediate semantic module corresponding to the source language according to the source language type vector of the text to be translated; and converting the text vector into an intermediate semantic by the called intermediate semantic module, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

In the above scheme, different source languages correspond to different intermediate semantic modules, that is, the intermediate semantic module is used to convert a text vector of a language into intermediate semantics, and therefore, when performing conversion, the intermediate semantic module corresponding to the source language needs to be called according to the source language. For example, the source language of the text to be translated is german, so when the conversion is performed, the intermediate semantic module corresponding to german is called first, and the text vector corresponding to the text to be translated in german is converted to the intermediate semantic by the intermediate semantic module.

In an optional embodiment, the intermediate semantic modules corresponding to different source languages may have corresponding identifiers, the text to be translated also has an identifier indicating the source language, the intermediate semantic modules corresponding to the text to be translated may be searched according to the identifier corresponding to the text to be translated, and the searched intermediate semantic modules are called, so as to convert the text vector into the intermediate semantic.

Example 6

According to an embodiment of the present invention, there is also provided a text translation apparatus for implementing the text translation method according to embodiment 5, and fig. 8 is a schematic diagram of a text translation apparatus according to embodiment 6 of the present application, and as shown in fig. 8, the apparatus 800 includes:

the obtaining module 802 is configured to obtain a word vector of a text to be translated and a source language vector of the text to be translated, and determine a text vector corresponding to the text to be translated, where the source language is a language of the text to be translated.

A conversion module 804, configured to convert the text vector into an intermediate semantic.

And a decoding module 806, configured to decode the intermediate semantic according to the target language vector, so that the text to be translated is translated to the target language.

It should be noted here that the obtaining module 802, the converting module 804 and the decoding module 806 correspond to steps S71 to S75 in embodiment 5, and the three modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer device 10 provided in the first embodiment.

As an alternative embodiment, different source languages correspond to different intermediate semantic modules, and the conversion module includes: the calling submodule is used for calling an intermediate semantic module corresponding to the source language according to the source language vector of the text to be translated; and the conversion sub-module is used for converting the text vector into an intermediate semantic through the called intermediate semantic module, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

Example 7

Embodiments of the present invention may provide a computer device that may be any one of a group of computer devices. Optionally, in this embodiment, the computer device may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer device may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer device may execute the program code of the following steps in the text translation method: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

Alternatively, fig. 9 is a block diagram of a computer device according to embodiment 7 of the present application. As shown in fig. 9, the computer apparatus a may include: one or more processors 902 (only one of which is shown), memory 904, and a peripherals interface 906.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the text translation method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, so as to implement the text translation method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

Optionally, the processor may further execute the program code of the following steps: obtaining a text vector of a text to be translated through an encoder, the encoder comprising: at least one first sub-coding layer, each first sub-coding layer comprising: a first semantic layer and a first forward neural network layer, wherein the step of obtaining a text vector of the text to be translated by the encoder comprises: acquiring a word vector and a position vector corresponding to a text to be translated; acquiring a first attention mechanism parameter according to the word vector and the position vector: and inputting the first attention mechanism parameters into the first sub-coding layers for coding to obtain text vectors of the texts to be translated output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer calculates the parameters output by the previous first sub-coding layer and then inputs the parameters into the first forward neural network layer for coding.

Optionally, the processor may further execute the program code of the following steps: obtaining a source language vector corresponding to a source language, wherein the source language is the language of a text to be translated; determining a first offset vector according to the source language seed vector; acquiring position information of a text to be translated; and superposing the first offset vector on the position information to obtain a position vector of the text to be translated.

Optionally, the processor may further execute the program code of the following steps: and mapping at least one text vector to a semantic space to obtain intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.

Optionally, the processor may further execute the program code of the following steps: determining a second attention mechanism parameter according to a text vector of the text to be translated, wherein a query element parameter in the second attention mechanism parameter is determined according to a source language vector of the text to be translated and a subspace vector corresponding to the query element, a key value pair in the second attention mechanism parameter is the text vector of the text to be translated, a semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space has a corresponding subspace vector; the text vector is mapped to a semantic space according to the second attention mechanism parameter.

Optionally, the processor may further execute the program code of the following steps: mapping the text vector to a semantic space according to the second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, and each sub-semantic mapping layer comprises: a second attention mechanism layer and a third forward neural network layer, the step of mapping the text vector to the semantic space according to the second attention mechanism parameter by the semantic mapping model comprises: and inputting the second attention mechanism parameter into the semantic mapping model to obtain the intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer operates the parameter output by the previous sub-semantic mapping layer and inputs the parameter into the third forward neural network layer for operation.

Optionally, the processor may further execute the program code of the following steps: acquiring context information of a text to be translated; and decoding the intermediate semantics of the text to be translated through a decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.

Optionally, the processor may further execute the program code of the following steps: acquiring context information of a text to be translated by using the word vector of the translated context and the position vector of the translated context, wherein the context information comprises: obtaining a word vector of a translated context; acquiring second bias information corresponding to the target language; and superposing second bias information on the position information of the translated context to obtain a position vector of the translated context.

Optionally, the processor may further execute the program code of the following steps: inputting the context information into a third attention mechanism layer to obtain an operation result of the third attention mechanism layer; inputting the operation result of the third attention mechanism layer and the intermediate semantic meaning of the text to be translated into a fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; inputting the operation result of the fourth attention mechanism layer into a fourth forward neural network layer for encoding to obtain the operation result of the fourth forward neural network layer; and the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.

The embodiment of the invention provides a text translation method. The steps of the encoder, the decoder and the extraction of the intermediate semantics are executed by using a uniform multi-language translation model, and the intermediate semantics module used for language perception is explicitly introduced, so that the language-independent intermediate semantics are extracted from the semantic spaces of different languages, thereby utilizing the transfer of cross-language knowledge and well relieving the problem of translation knowledge conflict caused by language uniqueness.

It can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the computer device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 is a diagram illustrating a structure of the electronic device. For example, computer device A may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 8

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the text translation method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer devices in a computer device group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representation of the text to be translated corresponding to different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to the target language.

Example 9

An embodiment of the present invention further provides a translation system, and fig. 10 is a schematic diagram of a translation system according to embodiment 9 of the present application, and shown in fig. 10, the translation system includes:

the encoder 90 is configured to encode a text to be translated to obtain at least one text vector of the text to be translated, where the text vector is used to represent vector representations of the text to be translated in different languages.

Specifically, the text to be translated is text information that needs to be translated, and when the text to be translated is translated, sentences in the text to be translated can be sequentially extracted and translated. The above languages may be used to indicate a language type in a narrow sense, for example: chinese, korean, english, etc., can also be used to represent broad language families, such as: the Hindu language family and the Hanzang language family. The method can obtain a text vector corresponding to the text to be translated in the source language, wherein the source language is the language of the text to be translated.

An intermediate semantics module 92, communicatively coupled to the encoder, is configured to convert the at least one text vector into intermediate semantics.

And the decoder 94 is in communication connection with the intermediate semantic module and is configured to decode the intermediate semantic, so that the text to be translated is translated into the target language.

It should be noted that the encoder, the intermediate semantic module, and the decoder may form a multi-language translation model for execution, where the multi-language translation model includes an encoder, an intermediate semantic representation module, and a decoder, where the encoder is configured to obtain at least one text vector according to a text to be translated, the intermediate semantic representation module is configured to obtain intermediate heterology of the text to be translated according to the at least one text vector of the text to be translated, and the decoder is configured to decode the intermediate semantic to obtain a translation result of the text to be translated.

As an alternative embodiment, the encoder includes: a plurality of first sub-coding layers, each of the first sub-coding layers comprising:

the first attention mechanism layer is used for carrying out operation according to a first attention mechanism parameter, wherein the first attention mechanism parameter is determined according to a word vector and a position vector corresponding to the text to be translated;

and the first forward neural network layer is used for coding according to the operation result of the first attention mechanism layer to obtain the text vector of the text to be translated.

As an alternative embodiment, the intermediate semantic module includes:

the second forward neural network layer is used for determining query element parameters in the second attention mechanism parameters according to the source language vector of the text to be translated and the subspace vectors corresponding to the query elements, wherein the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector;

a plurality of semantic mapping models, each of the semantic mapping models comprising:

the second attention mechanism layer is used for performing operation according to a second attention mechanism parameter, wherein a key value pair in the second attention mechanism parameter is a text vector of the text to be translated;

and the third forward neural network layer is used for predicting the middle semantic meaning of the text to be translated according to the operation result of the second attention mechanism layer.

In an alternative embodiment, as shown in fig. 3, the intermediate semantic module (language-aware interlocking) includes a second forward neural network layer and 3 sub-semantic mapping layers, and each sub-semantic mapping layer includes a second attention-making layer (enc-attention) and a third forward neural network layer (FF-layer (b)). Firstly, searching a sub-semantic vector corresponding to a query element from a semantic space vector (language embedding), acquiring a source language vector corresponding to a source language of a text to be translated from the language vector (language embedding), acquiring a query based on the sub-semantic vector and the source language vector through a second forward neural network layer (FF-layer (a)), inputting the acquired query and K and V output by an encoder into an intermediate semantic module, and performing operation through a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer (b)) of the intermediate semantic module to obtain a semantic representation corresponding to the text vector.

As an alternative embodiment, the decoder includes: at least one second sub-coding layer and a classification layer, each of said second sub-coding layers comprising:

the third attention mechanism layer is used for extracting context information of the text to be translated by operating the context of the text to be translated;

the fourth attention mechanism layer is used for introducing semantic representation of the text to be translated by operating the operation result of the third attention mechanism and the like and the middle semantic meaning of the text to be translated;

the fourth forward neural network layer is used for coding the intermediate semantics of the text to be translated according to the operation result of the fourth attention mechanism layer to obtain a coding result carrying context information;

the classification layer is configured to determine, according to the operation result of the at least one second sub-coding layer and a target language vector corresponding to a target language, a text corresponding to the text to be translated in the target language, where the operation result of the at least one second sub-coding layer is a coding result output by a fourth forward neural network layer in a last second sub-coding layer.

In an optional embodiment, as shown in fig. 3, the word vector corresponding to the translated context is obtained through a word matrix word embedding (tgt) of the target language, and the position information of the translated context is obtained through the formula (1) in embodiment 1. And biasing the position information of the translated context by using second biasing information determined based on the target language, thereby obtaining a position vector of the translated context.

In an alternative embodiment, as shown in fig. 3, the decoder includes 6 second sub-coding layers and a classification layer, a third layer attention mechanism layer (self-attention) in the second sub-coding layers represents information that the translation process refers to the translated context according to the word vector and the position vector of the translated context, and a fourth attention mechanism layer (attention) in the second sub-coding layers performs an operation according to the output of the third attention mechanism layer and the intermediate semantic model to represent that the translation process refers to the intermediate semantic of the text to be translated. And finally, outputting the coded words to a classification layer after a fourth forward neural network layer (FF-layer), and classifying the words by the classification layer according to the output of the second sub-coding layer and the second offset vector of the target language so as to determine the most suitable words of the target language.

As an optional embodiment, different source languages correspond to different intermediate semantic modules, and the translation system further calls the corresponding intermediate semantic modules according to the source language of the text to be translated.

As an alternative embodiment, the intermediate semantic module is located within the encoder or the decoder.

In the above scheme, the intermediate semantic module may be disposed in the encoder to process the text to be translated as a whole with the encoder, or disposed in the decoder to process the result output by the encoder as a whole with the decoder.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for translating text, comprising:

acquiring a text to be translated;

obtaining at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages;

converting the at least one text vector to intermediate semantics;

and decoding the intermediate semantics to translate the text to be translated to a target language.

2. The method of claim 1, wherein obtaining at least one text vector of the text to be translated comprises: obtaining a text vector of the text to be translated through an encoder, wherein the encoder comprises: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first semantic layer and a first forward neural network layer, wherein the step of obtaining a text vector of the text to be translated by the encoder comprises:

acquiring a word vector and a position vector corresponding to the text to be translated;

acquiring a first attention mechanism parameter according to the word vector and the position vector;

and inputting the first attention mechanism parameter into the first sub-coding layer for coding to obtain a text vector of the text to be translated output by the first sub-coding layer, wherein in each first sub-coding layer, the first attention mechanism layer calculates the parameter output by the previous first sub-coding layer and then inputs the parameter into the first forward neural network layer for coding.

3. The method according to claim 2, wherein obtaining the position vector corresponding to the text to be translated comprises:

obtaining a source language vector corresponding to a source language, wherein the source language is the language of the text to be translated;

determining a first offset vector according to the source language seed vector;

acquiring the position information of the text to be translated;

and superposing the first offset vector on the position information to obtain the position vector of the text to be translated.

4. The method of claim 1, wherein converting the at least one text vector into intermediate semantics comprises:

and mapping the at least one text vector to a semantic space to obtain the intermediate semantic, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

5. The method of claim 4, wherein mapping the text vector to a semantic space to obtain the intermediate semantics comprises:

determining a second attention mechanism parameter according to the text vector of the text to be translated, wherein a query element parameter in the second attention mechanism parameter is determined according to a source language type vector of the text to be translated and a subspace vector corresponding to the query element, a key value pair in the second attention mechanism parameter is the text vector of the text to be translated, the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space has a corresponding subspace vector;

mapping the text vector to the semantic space according to the second attention mechanism parameter.

6. The method of claim 5, wherein mapping the text vector to the semantic space according to the second attention mechanism parameter comprises:

mapping the text vector to the semantic space according to the second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, each sub-semantic mapping layer comprises: a second attention mechanism layer and a third forward neural network layer, the step of mapping the text vector to the semantic space according to the second attention mechanism parameter by the semantic mapping model comprises:

and inputting the second attention mechanism parameter into the semantic mapping model to obtain the intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer calculates the parameter output by the previous sub-semantic mapping layer and inputs the parameter into the third forward neural network layer for calculation.

7. The method according to claim 1, wherein decoding the intermediate semantics such that the text to be translated is translated to a target language comprises:

acquiring context information of the text to be translated;

and decoding the intermediate semantics of the text to be translated through a decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.

8. The method of claim 7, wherein the context information of the text to be translated comprises: obtaining context information of the text to be translated by using the word vector of the translated context and the position vector of the translated context, wherein the context information comprises:

obtaining a word vector of the translated context;

acquiring second bias information corresponding to the target language;

and superposing the second bias information on the position information of the translated context to obtain a position vector of the translated context.

9. The method of claim 7, wherein the decoder comprises at least one second sub-coding layer and a classification layer, each of the second sub-coding layers comprising: a third attention mechanism layer, a fourth attention mechanism layer and a fourth forward neural network layer, wherein the decoding is performed on the intermediate semantic of the text to be translated according to the context information through a decoder to obtain a text corresponding to the text to be translated in the target language, and the method comprises the following steps:

inputting the context information to the third attention mechanism layer to obtain an operation result of the third attention mechanism layer;

inputting the operation result of the third attention mechanism layer and the intermediate semantic meaning of the text to be translated into the fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer;

inputting the operation result of the fourth attention mechanism layer to the fourth forward neural network layer for encoding to obtain the operation result of the fourth forward neural network layer;

and the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.

10. A method for translating text, comprising:

acquiring a text to be translated;

and translating the text to be translated to a target language through a multi-language translation model, wherein the multi-language translation model acquires a text vector of the text to be translated, maps the text vector to a semantic space to obtain a semantic representation which is corresponding to the text to be translated and irrelevant to the language, and converts the semantic representation into the text of the target language.

11. The method of claim 10, further comprising: obtaining the language translation model, wherein the step of obtaining the language translation model comprises:

obtaining sample data, wherein the sample data comprises: the method comprises the steps of a sample text and an actual translation result for translating the sample text to other languages;

acquiring a loss function of an initial multi-language translation model in the process of learning the sample data by the initial multi-language translation model;

and determining the loss function as a minimum objective function, and adjusting the model parameters of the initial multi-language translation model.

12. The method of claim 11, wherein the loss function comprises at least one of:

the initial multi-language translation model translates the sample text to obtain a model translation result and a difference value of an actual translation result corresponding to the sample text;

the initial multi-language translation model translates the actual translation result of the sample text to obtain a difference value between a model translation result and the sample text;

the difference value between the sample text and the converted text corresponding to the sample text, wherein the initial multilingual translation model converts the sample text into a semantic representation, and then converts the semantic representation into a text with the same language as the sample text to obtain the converted text corresponding to the sample text;

a difference value between an actual translation result corresponding to the sample text and a converted text corresponding to the actual translation result, wherein the initial multilingual translation model converts the actual translation result of the sample text into a semantic representation, and then converts the semantic representation into a text of the same language as the actual translation result to obtain the converted text corresponding to the actual translation result;

a distance between two semantic representations, wherein the two semantic representations comprise: and the initial multi-language translation model converts the sample text to obtain semantic representation and converts the initial multi-language translation model to obtain semantic representation of an actual translation result corresponding to the sample text.

13. A method for translating text, comprising:

acquiring a word vector of a text to be translated and a source language vector of the text to be translated, and determining a text vector corresponding to the text to be translated, wherein the source language is the language of the text to be translated;

converting the text vector into an intermediate semantic;

and decoding the intermediate semantics according to the target language vector, so that the text to be translated is translated to the target language.

14. The method of claim 13, wherein different source languages correspond to different intermediate semantic modules, and converting the text vector into intermediate semantics comprises:

calling an intermediate semantic module corresponding to the source language according to the source language type vector of the text to be translated;

and converting the text vector into an intermediate semantic by the called intermediate semantic module, wherein the intermediate semantic is used for representing semantic representation which is corresponding to the text to be translated and is irrelevant to the language.

15. An apparatus for translating text, comprising:

the first acquisition module is used for acquiring a text to be translated;

the second obtaining module is used for obtaining at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages;

a conversion module for converting the at least one text vector into intermediate semantics;

and the decoding module is used for decoding the intermediate semantics, so that the text to be translated is translated to a target language.

16. A storage medium, characterized in that the storage medium includes a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute the following steps: acquiring a text to be translated; obtaining at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages; converting the at least one text vector to intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to a target language.

17. A processor, wherein the processor is configured to execute a program, wherein the program executes to perform the following steps: acquiring a text to be translated; obtaining at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages; converting the at least one text vector to intermediate semantics; and decoding the intermediate semantics to translate the text to be translated to a target language.

18. A translation system, comprising:

the device comprises an encoder, a translation module and a translation module, wherein the encoder is used for encoding a text to be translated to obtain at least one text vector of the text to be translated, and the text vector is used for representing vector representations of the text to be translated corresponding to different languages;

an intermediate semantics module, communicatively coupled to the encoder, for converting the at least one text vector to intermediate semantics;

and the decoder is in communication connection with the intermediate semantic module and is used for decoding the intermediate semantic to translate the text to be translated to a target language.

19. The system of claim 18, wherein the encoder comprises: a plurality of first sub-coding layers, each of the first sub-coding layers comprising:

20. The system of claim 18, wherein the intermediate semantic module comprises:

21. The system of claim 18, wherein the decoder comprises: at least one second sub-coding layer and a classification layer, each of said second sub-coding layers comprising:

22. The system according to claim 18, wherein different source languages correspond to different intermediate semantic modules, and said translation system further invokes the corresponding intermediate semantic modules according to the source language of the text to be translated.

23. The system of claim 18, wherein the intermediate semantic module is located within the encoder or the decoder.