CN112749569B - Text translation method and device - Google Patents

Text translation method and device Download PDF

Info

Publication number
CN112749569B
CN112749569B CN201911038762.XA CN201911038762A CN112749569B CN 112749569 B CN112749569 B CN 112749569B CN 201911038762 A CN201911038762 A CN 201911038762A CN 112749569 B CN112749569 B CN 112749569B
Authority
CN
China
Prior art keywords
text
translated
vector
layer
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911038762.XA
Other languages
Chinese (zh)
Other versions
CN112749569A (en
Inventor
朱长峰
于恒
骆卫华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201911038762.XA priority Critical patent/CN112749569B/en
Publication of CN112749569A publication Critical patent/CN112749569A/en
Application granted granted Critical
Publication of CN112749569B publication Critical patent/CN112749569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text translation method and device. Wherein the method comprises the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language. The invention solves the technical problem of poor multi-language compatibility of the multi-language translation model caused by language uniqueness in the prior art.

Description

Text translation method and device
Technical Field
The invention relates to the field of language processing, in particular to a text translation method and device.
Background
Along with the rapid development of translation services, more languages are required to be supported, so that the workload of model training, deployment and operation and maintenance is rapidly increased, and the requests of all models cannot be combined to use GPU computing resources in batches, so that the computing resources are wasted. For some low-resource languages, the capability related to translation cannot be provided due to the lack of the labeling data.
Multilingual nerves (Neural Machine Translation, NMT) use a unified NMT model to simultaneously provide translation capabilities for multiple language pairs, which greatly reduces the training, deployment and operation-maintenance effort of the model, as well as on-line service costs. Meanwhile, through cross-language knowledge migration, the translation quality of low-resource language pairs is improved. However, since the multilingual translation model cannot model the uniqueness of language, there is still a certain quality loss, resulting in insufficient accuracy of translation.
Aiming at the problem of poor multi-language compatibility of a multi-language translation model caused by language uniqueness in the prior art, no effective solution is proposed at present.
Disclosure of Invention
The embodiment of the invention provides a text translation method and a text translation device, which at least solve the technical problem of poor multi-language compatibility of a multi-language translation model caused by language uniqueness in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a text translation method, including: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
According to another aspect of the embodiment of the present invention, there is also provided a text translation method, including: acquiring a text to be translated; and translating the text to be translated into the target language through a multi-language translation model, wherein the multi-language translation model acquires a text vector of the text to be translated, maps the text vector to a semantic space, acquires semantic representations which correspond to the text to be translated and are irrelevant to the language, and converts the semantic representations into the text of the target language.
According to another aspect of the embodiment of the present invention, there is also provided a text translation apparatus, including: the first acquisition module is used for acquiring a text to be translated; the second acquisition module is used for acquiring at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages; the conversion module is used for converting at least one text vector into intermediate semantics; and the decoding module is used for decoding the intermediate semantics so that the text to be translated is translated into the target language.
According to another aspect of the embodiment of the present invention, there is also provided a storage medium including a stored program, wherein the program controls a device in which the storage medium is located to execute the following steps when running: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
According to another aspect of the embodiment of the present invention, there is also provided a processor for running a program, wherein the program executes the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
According to another aspect of the embodiment of the present invention, there is also provided a translation system including: the coder is used for coding the text to be translated to obtain at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; an intermediate semantic module, in communication with the encoder, for converting the at least one text vector into intermediate semantics; and the decoder is in communication connection with the intermediate semantic module and is used for decoding the intermediate semantic so that the text to be translated is translated into the target language.
In the embodiment of the invention, the steps of an encoder, a decoder and extracting intermediate semantics are executed by using a unified multi-language translation model, and the intermediate semantics for language perception are extracted from the semantic space of different languages by explicitly introducing an intermediate semantics module for language perception, so that the problem of translation knowledge conflict caused by language uniqueness is well relieved by utilizing the migration of cross-language knowledge.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 shows a block diagram of a hardware architecture of a computer device (or mobile device) for implementing a method of translation of text;
FIG. 2 is a flow chart of a method for translating text according to embodiment 1 of the present application;
FIG. 3 is a schematic diagram of a multilingual translation model according to an embodiment of the present application;
FIG. 4 is a flow chart of a method for translating text according to embodiment 2 of the present application;
FIG. 5 is a schematic diagram of a text translation device according to embodiment 3 of the present application;
FIG. 6 is a schematic diagram of a text translation device according to embodiment 4 of the present application;
FIG. 7 is a flow chart of a text translation device according to embodiment 5 of the present application;
FIG. 8 is a schematic diagram of a text translation device according to embodiment 6 of the present application;
FIG. 9 is a block diagram of a computer apparatus according to embodiment 7 of the present application; and
FIG. 10 is a schematic diagram of a translation system according to embodiment 9 of the present application.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial terms or terminology appearing in the course of describing embodiments of the application are applicable to the following explanation:
NMT, a technology for constructing a machine translation model by utilizing a neural network structure.
Language-aware Interlingua language-aware intermediate semantic representation module, the characteristics of each language are taken into account when converting different language representations into intermediate semantic representation Interlingua.
Example 1
There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method of translating text, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.
The method according to the first embodiment of the present application may be implemented in a mobile terminal, a computer device or a similar computing device. Fig. 1 shows a block diagram of a hardware architecture of a computer device (or mobile device) for implementing a method of translation of text. As shown in fig. 1, the computer device 10 (or mobile device 10) may include one or more processors 102 (shown as 102a, 102b, … …,102 n) which may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA, a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a universal BUS (BUS) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, computer device 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer device 10 (or mobile device). As referred to in embodiments of the application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination connected to the interface).
The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the text translation method in the embodiment of the present invention, and the processor 102 executes the software programs and modules stored in the memory 104, thereby executing various functional applications and data processing, that is, implementing the text translation method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 104 may further include memory located remotely from processor 102, which may be connected to computer device 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communications provider of the computer device 10. In one example, the transmission module 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.
The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer device 10 (or mobile device).
It should be noted here that, in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a specific example, and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.
In the above-described operating environment, the present application provides a method for translating text as shown in fig. 2. Fig. 2 is a flowchart of a text translation method according to embodiment 1 of the present application.
Step S21, obtaining a text to be translated.
Specifically, the text to be translated is text information to be translated, and when translation is performed, sentences in the text to be translated can be sequentially extracted for translation.
Step S23, at least one text vector of the text to be translated is obtained, wherein the text vector is used for representing vector representations of the text to be translated in different languages.
In particular, the above language may be used to represent a type of language in a narrow sense, such as: chinese, korean, english, etc. may also be used to represent a broad family of languages, such as: the Indoodle family, the Tibetan family, and the like. The text vector corresponding to the text to be translated under the source language, namely the language of the text to be translated, can be obtained.
In an alternative embodiment, at least one vector of text to be translated may be obtained by Word2vec, TF-IDF (Term Frequency-inverse text Frequency), etc.
In another alternative embodiment, firstly, the text to be translated may be segmented to obtain a segmentation result of the text to be translated, and then the segmentation result is encoded by an encoder to obtain at least one text vector corresponding to the text to be translated.
Step S25, converting at least one text vector into intermediate semantics.
Specifically, the intermediate semantics may be semantic expression vectors that are independent of languages, that is, texts having the same semantics correspond to the same intermediate semantics regardless of languages.
In an alternative embodiment, the at least one text vector may be converted to intermediate semantics by an intermediate semantics module. The intermediate semantic module may include a forward neural network for extracting language independent semantic information from the text vector, thereby obtaining intermediate semantics corresponding to the text to be translated.
Prior to translation, the sample may be learned to obtain the intermediate semantic module described above. In an alternative embodiment, the sample data used for training the intermediate semantic module may be text vectors having the same semantic meaning and belonging to different languages, and the intermediate semantic meaning corresponding to the text vectors, and training is performed based on the sample data, so as to obtain the intermediate semantic module capable of predicting the intermediate semantic meaning corresponding to the text vectors. The scheme is favorable for promoting the migration of cross-language knowledge by explicitly modeling the intermediate semantics, so that the translation quality of the whole model, especially the translation quality of low-resource or zero-resource language pairs, is improved.
And step S27, decoding the intermediate semantics so that the text to be translated is translated into the target language.
In the above steps, the intermediate semantics are decoded by the decoder, and the translation result of the text to be translated is obtained, wherein the translation result is the text corresponding to the text to be translated in the target language.
It should be noted that, steps S21 to S23 may be performed by a multi-language translation model, where the multi-language translation model includes an encoder, an intermediate semantic representation module, and a decoder, where the encoder is configured to obtain at least one text vector according to a text to be translated, the intermediate semantic representation module is configured to obtain an intermediate heterosemantic of the text to be translated according to the at least one text vector of the text to be translated, and the decoder is configured to decode according to the intermediate semantic to obtain a translation result of the text to be translated.
It should also be noted that in order to support more language pairs for translation using fewer models, there are two common approaches: the first is to train some models containing the translation of important pivot languages (e.g. Chinese, english) to other languages, and serve the translation requirement of some important language pairs, and simultaneously use these models to serve the translation requirement between non-pivot languages by bridging. If the translation from German to Thai is required (de 2 th), the translation from German to English (de 2 en) can be used, and the translation from English to Thai (en 2 th) can be called. However, this approach is prone to error overlay, which is particularly problematic when the two bridging models are trained using disparate domain data. On the other hand, the translation result obtained by two translations is time-consuming, so that the scheme is difficult to meet certain real-time translation scenes with high delay requirements, and has problems. The second type is to use a traditional neural machine translation model, but if only all language pairs are forcedly trained together to obtain a unified model, there is a problem that language characteristics are greatly different in multi-language pair composition, especially the situation that the language sequences are greatly different, and the unified model cannot solve the problem of translation knowledge conflict caused by language uniqueness.
The embodiment of the application uses a unified multi-language translation model to execute the steps of an encoder, a decoder and extracting intermediate semantics, and extracts the intermediate semantics irrelevant to languages from the semantic space of different languages by explicitly introducing an intermediate semantics module for language perception, thereby utilizing the migration of cross-language knowledge and well relieving the problem of translation knowledge conflict caused by language uniqueness.
Compared with the existing multilingual NMT technology, the scheme has the advantages that the BLEU (bilingual evaluation understudu, bilingual inter-translation quality assessment auxiliary tool) value is improved by 1-2 points on two data sets. In a zero resource translation scene, compared with the existing multi-language NMT technology, the scheme improves BLEU values of 10 points, and brings about huge translation quality improvement.
As an alternative embodiment, obtaining at least one text vector of the text to be translated includes: obtaining a text vector of a text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each first sub-coding layer comprising: a first semantic mechanism layer and a first forward neural network layer, wherein the step of the language translation model obtaining text vectors of text to be translated by the encoder comprises: acquiring word vectors and position vectors corresponding to a text to be translated; according to the word vector and the position vector, a first attention mechanism parameter is acquired: and inputting the first attention mechanism parameters into the first sub-coding layers to code, and obtaining text vectors of the text to be translated, which are output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer is used for carrying out operation on the parameters output by the previous first sub-coding layer and then inputting the parameters into the first forward neural network layer to code.
Specifically, the word vector of the text to be translated may be a vector obtained by performing word segmentation on the text to be translated and then performing vectorization processing, for example: a 50000 x hidden_size matrix word embeddings may be initialized, the text to be translated is a word sequence, and in word embeddings, the text to be translated is obtained by taking its own embedding according to the fixed word id, thereby obtaining a word vector of the text to be translated. The position vector corresponding to the text to be translated can be formed according to the position vector of each word, and the position vector of each word can be obtained by the following formula:
wherein PE (pos,2i) is used to represent the vector values of two locations (pos, 2 i), d model is used to represent the dimension information, the above formula corresponds to the position vector size of each location being the number of cells in the hidden layer (hidden_size), the odd number of bits in the vector being a cosine function, and the even number of bits being a sine function.
The attention mechanism parameters described above include Q, K, V, where Q is used to represent query, and K and V are used to represent key-value. The attention mechanism itself is a function that implements a mapping from a query, a series of key-value pairs (key-value) to an output, the output result being obtained by weighted summation of values, and the weight for each value being calculated by a compatibility function through querey and key.
The first attention mechanism layer may be used to implement a multi-head attention mechanism, which performs attention computation on only the key s, the values and the queries of d model dimensions relative to a single attention mechanism, and performs linear mapping on each dimension of the queries, the key s and the values for h times, in this linear mapping, performs an attention function on the queries, the key s and the values obtained by each mapping in parallel, generates an output value of d value dimensions, splices the output values of h d value dimensions together, and performs attention mapping once again, so as to obtain a final output.
Fig. 3 is a schematic diagram of a Multi-language translation model according to an embodiment of the present application, in an alternative embodiment, the above steps may be performed by an encoder, including six first sub-encoding layers, each including a first meaning mechanism layer (self-attention) and a first forward neural network layer (FF-layer), according to the embodiment shown in fig. 3, the word vector Emb corresponding to the text to be translated is found from a vector matrix (woed embedding (scr)) of the source language, and a position vector pos_emb of the text to be translated is obtained through operation, and then, according to the word vector Emb and the position vector pos_emb, Q, K, V are obtained and input to the first sub-encoding layers of the encoder, so as to obtain a text vector h_enc=ffn (ATT, K, V)) output by the encoder, where n is a conventional feed-forward network, and ATT is a Multi-head attention (Multi-head attention).
As an optional embodiment, obtaining a position vector corresponding to the text to be translated includes: obtaining a source language vector corresponding to a source language, wherein the source language is the language of a text to be translated; determining a first bias vector according to the source language vector; acquiring position information of a text to be translated; and superposing the first offset vector on the position information to obtain a position vector of the text to be translated.
Specifically, the above positional information may be calculated by the above formula (1). The source language is a language to which the text to be translated belongs, the first offset vector may be a preset fixed vector, the first offset vector is determined according to the source language, and the vector corresponding to the source language is used as the first offset vector to be superimposed on the position information, so that language features of the text to be translated can be introduced into the text vector.
In an alternative embodiment, as shown in fig. 3, the first bias vector is obtained by selecting a language from the initialized language vectors (language embedding) according to the language of the text to be translated. And superposing the first offset vector in the position information to obtain a position vector.
It should be noted that the same position in the sentence has obviously different meanings in different languages, and by adding an offset generated by using a language vector to the position information, the above scheme makes the position vector representation of each cosine corresponding to different languages different, so that according to the first offset vector determined by the source language, a position vector pos_emb (l_emb) related to the language is established, and further language features of the text to be translated are introduced into the text vector.
According to the scheme, the language of the text to be translated is modeled in the encoding process through the first bias vector, so that the difference of different languages is considered, and the influence of language conflict on the translation quality is reduced by the unified model of the multi-language fusion. The training, deployment and operation costs are further reduced, and the overall performance of the online service is improved, for example: the service throughput (QPS) and Response Time (RT) are greatly improved.
As an alternative embodiment, converting at least one text vector into intermediate semantics comprises: and mapping at least one text vector to a semantic space to obtain intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
In the above step, at least one text vector is mapped to a semantic space, and the semantic space may include a plurality of sub-semantic spaces, and semantic information in the text vector is extracted through the sub-semantic spaces, so as to obtain intermediate semantics corresponding to the text vector.
As an alternative embodiment, mapping the text vector to a semantic space, resulting in intermediate semantics, includes: determining a second attention mechanism parameter according to the text vector of the text to be translated, wherein query element parameters in the second attention mechanism parameter are determined according to the source language vector of the text to be translated and subspace vectors corresponding to the query elements, key value pairs in the second attention mechanism parameter are text vectors of the text to be translated, and the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector; the text vector is mapped to semantic space according to the second attention mechanism parameter.
Specifically, the query element in the second attention mechanism parameter is the parameter query, and the key-value pair in the second attention mechanism is the key-value.
The semantic space may include a plurality of sub-semantic spaces, and since any semantic representation may be mapped to different dimensions, each dimension represents a semantic information, e.g., a sentence may mostly contain subjects, predicates, objects, etc., the semantic space may include a subject sub-semantic space, a predicate sub-semantic space, an object sub-semantic space, etc., which extracts sub-semantics in the text from the original text vector representation, respectively. Different sub-semantic spaces are focused on different semantic information, so that vectors of the sub-semantic spaces are orthogonal under the optimal condition, the vectors represent that the semantics are not mutually contained, and the feature can also be used as a training target for training the intermediate semantic module.
The source language vector of the text to be translated is a vector corresponding to the source language, the subspace vector is a vector corresponding to the subspace, and each subspace space corresponds to a universal subspace vector.
The above steps are used to convert the text vector h_enc of the text to be translated in each language into a fixed-size, language-independent intermediate semantic I, and the source language of the text to be translated needs to be considered during the conversion process.
In an alternative implementation, i=ffn (ATT (Q, K, V)), where q=ffn (l_emb, i_emb), K and V are both h_enc, l_emb is a source language vector, i_emb is a subspace vector, and each subspace corresponds to a generic i_emb. The purpose of the intermediate semantic module is to map the encoder representation h_enc from different languages onto a fixed number of multiple language independent sub-semantic spaces. In the process of semantic extraction, different conversion methods are adopted for different languages by introducing L_emb.
As an alternative embodiment, mapping the text vector to the semantic space according to the second attention mechanism parameter comprises: mapping the text vector to a semantic space according to a second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, each sub-semantic mapping layer comprising: a second attention mechanism layer and a third forward neural network layer, the step of mapping text vectors to semantic space according to second attention mechanism parameters by the semantic mapping model comprising: and inputting the second attention mechanism parameters into the semantic mapping model to obtain intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer operates the parameters output by the previous sub-semantic mapping layer and then inputs the parameters to a third forward neural network layer for operation.
In particular, the second attention mechanism may still be a multi-head attention mechanism.
In an alternative embodiment, as shown in connection with FIG. 3, language-aware interlingua is an intermediate semantic module that includes 3 sub-semantic mapping layers, each of which includes a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer). Firstly, sub-semantic vectors corresponding to query elements are searched from semantic space vectors (interlingua embedding), source language vectors corresponding to source languages of a text to be translated are obtained from language vectors (language embedding), a query is obtained through a second forward neural network layer (FF-layer) based on the sub-semantic vectors and the source language vectors, the obtained query and K and V output by an encoder are input to an intermediate semantic module, and operation is carried out through a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer) of the intermediate semantic module, so that semantic representation corresponding to the text vectors is obtained.
As an alternative embodiment, decoding the intermediate semantics such that the text to be translated is translated to the target language includes: acquiring context information of a text to be translated; and decoding the intermediate semantics of the text to be translated through a decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.
Specifically, the context information may belong to the same file as the text to be translated, and the context information may also be a vector corresponding to the translated context. The context information may be obtained according to the context of the document to be translated, and in an alternative embodiment, the translated context of the text to be translated may be encoded to obtain the context information of the text to be translated. In the decoding process of the decoder, the context information of the text to be translated is introduced, so that the translation result refers to the context of the text to be translated, and the problem of low accuracy caused by isolated translation of the text to be translated is avoided.
As an alternative embodiment, the context information of the text to be translated includes: the word vector of the translated context and the position vector of the translated context acquire context information of the text to be translated, including: acquiring word vectors of translated contexts; acquiring second offset information corresponding to the target language; and superposing second offset information on the position information of the translated context to obtain a position vector of the translated context.
Specifically, the target language is the language that is finally required to translate the text to be translated into, each language has a corresponding language vector, and the vector of the target language can be used as a second offset vector for biasing the position information of the translated context.
The context information of the text to be translated includes word vectors of the translated context and position vectors of the translated context, and in an alternative embodiment, as shown in fig. 3, word vectors corresponding to the translated context are obtained through the word matrix word embedding (tgt) of the target language, and the position information of the translated context is obtained through the above formula (1). The position information of the translated context is biased by using second bias information determined based on the target language, thereby obtaining a position vector of the translated context.
The above steps are similar to the process of encoding text to be translated, encoding is performed by a multi-head attention mechanism, and a language-dependent position vector method is also adopted. In this way the translation process is made to reference the context relevant information that the translation has been completed.
According to the scheme, the target languages are modeled in the decoding process through the second bias vector, so that the difference of different languages is considered, and the influence of language conflict on translation quality is reduced by the unified model of the fusion of multiple languages. The training, deployment and operation costs are further reduced, and the overall performance of the online service is improved, for example: increase QPS (number of requests processed per second), decrease RT (response time).
As an alternative embodiment, the decoder comprises at least one second sub-coding layer and one classification layer, each second sub-coding layer comprising: a third attention mechanism layer, a fourth attention mechanism layer and a fourth forward neural network layer, wherein decoding, by a decoder, the intermediate semantic representation of the text to be translated according to the context information to obtain a text corresponding to the text to be translated in the target language, comprises: inputting the context information into a third attention mechanism layer to obtain an operation result of the third attention mechanism layer; inputting the operation result of the third attention mechanism layer and the intermediate semantics of the text to be translated into the fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; inputting the operation result of the fourth attention mechanism layer into a fourth forward neural network layer to obtain the operation result of the fourth forward neural network layer; and the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.
The decoding process is implemented through a second sub-coding layer and a classifying layer, wherein a third attention mechanism layer and a fourth attention mechanism layer in the second sub-coding layer can be multi-head attention mechanism layers, the third attention mechanism layer is used for introducing translated context information of the text to be translated in the coding process, and the fourth attention mechanism layer is used for introducing intermediate semantics of the text to be translated in the coding process. The classifying layer may be a softmax mechanism, in which the softmax takes the output of the second sub-coding layer as input, classifies the text to be translated on all word lists, and predicts the best translation result at the current position.
In an alternative embodiment, as shown in connection with fig. 3, the decoder includes 6 second sub-coding layers and 1 classification layer, a third attention mechanism layer (self-attention) in the second sub-coding layers refers to the translated context information according to the word vector and the position vector of the translated context, and a fourth attention mechanism layer (intl attention) in the second sub-coding layers operates according to the output of the third attention mechanism layer and the intermediate semantic model, indicating that the translation refers to the intermediate semantic of the text to be translated. Finally, the text is encoded by a fourth forward neural network layer (FF-layer) and then output to a classification layer (linear & softmax), and the classification layer classifies the text according to the output of the second sub-encoding layer and the second offset vector of the target language, so that the text of the most suitable target language is determined.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present invention.
Example 2
According to an embodiment of the present application, there is further provided an embodiment of a text translation method, and fig. 4 is a flowchart of a text translation method according to embodiment 2 of the present application, and in combination with fig. 4, the method includes the following steps:
Step S41, obtaining a text to be translated.
Specifically, the text to be translated is text information to be translated, and when translation is performed, sentences in the text to be translated can be sequentially extracted for translation.
Step S43, translating the text to be translated into the target language according to the text vector through a multi-language translation model, wherein the multi-language translation model obtains the text vector of the text to be translated, maps the text vector to a semantic space, obtains semantic representation which corresponds to the text to be translated and is irrelevant to the language, and converts the semantic representation into the text of the target language.
In particular, the above language may be used to represent a type of language in a narrow sense, such as: chinese, korean, english, etc. may also be used to represent a broad family of languages, such as: the Indoodle family, the Tibetan family, and the like. The text vector corresponding to the text to be translated under the source language, namely the language of the text to be translated, can be obtained.
The intermediate semantics can be language independent semantic representation vectors, i.e. text with the same semantics corresponds to the same intermediate semantics regardless of the language
In an alternative embodiment, at least one vector of text to be translated may be obtained by Word2vec, TF-IDF (Term Frequency-inverse text Frequency), etc.
In another alternative embodiment, firstly, the text to be translated may be segmented to obtain a segmentation result of the text to be translated, and then the segmentation result is encoded by an encoder to obtain at least one text vector corresponding to the text to be translated.
In the above steps, the intermediate semantics are decoded by the decoder, and the translation result of the text to be translated is obtained, wherein the translation result is the text corresponding to the text to be translated in the target language.
It should be noted that, the multilingual translation model in the above embodiment of the present application may also perform other steps in embodiment 1, which is not described herein.
It should be noted that if the system bridging is adopted, error propagation and error superposition are easy to occur, and the delay is increased by two times. The proposal can alleviate the problem of error propagation by using a unified multilingual model, and can not increase the calling delay. And an intermediate semantic module for language perception is explicitly introduced, so that an intermediate semantic irrelevant to languages is extracted from semantic spaces of different languages, and the problem of translation knowledge conflict caused by language uniqueness is well relieved by utilizing the migration of cross-language knowledge.
Since the intermediate semantics are independent of the specific language, any two language corpus can be added into the model training, so that the multi-language model supports N languages, and the number of the models is reduced from original N x (N-1) to 1. Meanwhile, due to unification of the models, the translation requests of different language pairs can be combined to call GPU computing resources, so that service throughput (QPS) and Response Time (RT) are greatly improved
As an alternative embodiment, the method further comprises: obtaining a multilingual translation model, wherein the step of obtaining the multilingual translation model comprises: obtaining sample data, wherein the sample data comprises: sample text and actual translation results of translating the sample text into other languages; acquiring a loss function of the initial multi-language translation model in the process of learning the sample data by the initial multi-language translation model; and determining the loss function as a minimum objective function, and adjusting model parameters of the initial multi-language translation model.
Since the multilingual translation model is a unified model, unified training can be performed.
As an alternative embodiment, the loss function comprises at least one of:
a. The initial multilingual translation model translates the sample text to obtain a difference value between a model translation result and an actual translation result corresponding to the sample text;
b. The initial multilingual translation model translates the actual translation result of the sample text to obtain a difference value between the model translation result and the sample text;
c. The method comprises the steps that a sample text and a difference value of a conversion text corresponding to the sample text are obtained, wherein an initial multi-language translation model converts the sample text into semantic representation, and then the semantic representation is converted into a text identical to the language of the sample text, so that the conversion text corresponding to the sample text is obtained;
d. The method comprises the steps that an actual translation result corresponding to a sample text and a difference value of a conversion text corresponding to the actual translation result are obtained, wherein an initial multi-language translation model converts the actual translation result of the sample text into semantic representation, and then the semantic representation is converted into a text identical to the language of the actual translation result, so that the conversion text corresponding to the actual translation result is obtained;
e. A distance between two semantic representations, wherein the two semantic representations comprise: the method comprises the steps of converting a sample text by an initial multi-language translation model to obtain semantic representation and converting an actual translation result corresponding to the sample text by the initial multi-language translation model to obtain semantic representation.
In order to learn better intermediate semantics, reduce the collision between languages and promote the migration of cross-language knowledge, the above scheme proposes various loss functions as training targets, and the following descriptions are respectively given below:
The loss function a is a translation target (Translation objective) that uses cross entropy to measure the difference between an automatic translation (i.e., a translation obtained by translating a sample text by an initial multi-language translation model) and a standard translation (i.e., an actual translation text corresponding to the sample translation), where the difference may be expressed as Wherein t is used for representing automatic translation, s is used for representing text to be translated, and n is used for representing training times. /(I)
The loss function b is also a translation target (Translation objective) which is a reverse translation target, and the difference between the standard translation and the automatic translation is measured by cross entropy, and the difference can be expressed as
The loss functions c and d are semantic reconstruction targets (Reconstruction objective): in order to maximize the intermediate semantics without damaging the information, a semantic reconstruction target is introduced to restrict the point, and the translation loss in the process of converting the original text input into the intermediate semantics and then translating the intermediate semantics into the original text is taken as the semantic reconstruction loss and is respectively expressed as L_s2s and L_t2t, wherein,
The loss function e is a semantic consistency target (Semantic consistency objective), and the intermediate semantic module converts the representation of each language into an intermediate representation, so that cross-language knowledge can be migrated, and the translation quality is improved. In order to measure the language independence of the intermediate semantic representation and the consistency of the semantic representation, a semantic consistency target is introduced, namely, intermediate semantic representations I_s and I_t generated for the original text and the translated text respectively are measured, the smaller the distance is, the more consistent the semantic representations are, and the loss function can be expressed as L dist=1-sim(Is,It.
In an alternative embodiment, if the total loss function described above is used as the minimum objective function, the minimum objective function is obtained:
The scheme provides a combination of a plurality of training targets, and the semantic reconstruction targets are used to enable Interlingua to be introduced to bring minimum semantic loss, and the semantic consistency targets are used to restrict enough semantic independence of the modeling Interlingua, so that the semantic consistency is good.
Example 3
According to an embodiment of the present application, there is further provided a text translating apparatus for implementing the text translating method of embodiment 1, and fig. 5 is a schematic diagram of a text translating apparatus according to embodiment 3 of the present application, as shown in fig. 5, the apparatus 500 includes:
A first obtaining module 502, configured to obtain text to be translated.
A second obtaining module 504, configured to obtain at least one text vector of the text to be translated, where the text vector is used to represent a vector representation of the text to be translated corresponding to different languages.
A conversion module 506 for converting the at least one text vector into intermediate semantics.
The decoding module 508 is configured to decode the intermediate semantics, so that the text to be translated is translated into the target language.
It should be noted that, the first acquiring module 502, the second acquiring module 504, the converting module 506 and the decoding module 508 correspond to steps S21 to S27 in embodiment 1, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the above module may be implemented as a part of the apparatus in the computer device 10 provided in the first embodiment.
As an alternative embodiment, the second acquisition module includes: a first obtaining sub-module, configured to obtain, by an encoder, a text vector of a text to be translated, where the encoder includes: at least one first sub-coding layer, each first sub-coding layer comprising: a first force mechanism layer and a first forward neural network layer, wherein the first acquisition submodule includes: the first acquisition unit is used for acquiring word vectors and position vectors corresponding to the text to be translated; the second acquisition unit is used for acquiring the first attention mechanism parameter according to the word vector and the position vector: the first input unit is used for inputting the first attention mechanism parameters into the first sub-coding layers to code, so as to obtain text vectors of the text to be translated, which are output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer is used for inputting the parameters output by the previous first sub-coding layer into the first forward neural network layer to code after operating the parameters output by the previous first sub-coding layer.
As an alternative embodiment, the first acquisition unit comprises: the first acquisition subunit is used for acquiring a source language vector corresponding to a source language, wherein the source language is the language of the text to be translated; a determining subunit, configured to determine a first bias vector according to the source language vector; the second acquisition subunit is used for acquiring the position information of the text to be translated; and the superposition subunit is used for superposing the first offset vector on the position information to obtain a position vector of the text to be translated.
As an alternative embodiment, the conversion module comprises: and the mapping sub-module is used for mapping at least one text vector to a semantic space to obtain intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
As an alternative embodiment, the mapping submodule includes: the first determining unit is used for determining a second attention mechanism parameter according to the text vector of the text to be translated, wherein query element parameters in the second attention mechanism parameter are determined according to the source language vector of the text to be translated and subspace vectors corresponding to the query elements, key value pairs in the second attention mechanism parameter are text vectors of the text to be translated, the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector; and the mapping unit is used for mapping the text vector to the semantic space according to the second attention mechanism parameter.
As an alternative embodiment, the mapping unit is further configured to map the text vector to the semantic space according to the second attention mechanism parameter by means of a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, each sub-semantic mapping layer comprising: a second attention mechanism layer and a third forward neural network layer, the mapping unit comprising: and the input subunit is used for inputting the second attention mechanism parameters into the semantic mapping model to obtain intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer carries out operation on the parameters output by the previous sub-semantic mapping layer and then inputs the parameters into the third forward neural network layer for operation.
As an alternative embodiment, the decoding module comprises: the second acquisition submodule is used for acquiring the context information of the text to be translated; and the decoding submodule is used for decoding the intermediate semantics of the text to be translated according to the context information through the decoder to obtain the text corresponding to the text to be translated in the target language.
As an alternative embodiment, the context information of the text to be translated includes: the word vector of the translated context and the location vector of the translated context, the second acquisition submodule comprising: a third obtaining unit for obtaining word vectors of translated contexts; the fourth acquisition unit is used for acquiring second offset information corresponding to the target language; and the superposition unit is used for superposing the second offset information on the position information of the translated context to obtain a position vector of the translated context.
As an alternative embodiment, the decoder comprises at least one second sub-coding layer and one classification layer, each second sub-coding layer comprising: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer, wherein the decoding submodule includes: the second input unit is used for inputting the context information to the third attention mechanism layer to obtain an operation result of the third attention mechanism layer; the third input unit is used for inputting the operation result of the third attention mechanism layer and the intermediate semantics of the text to be translated into the fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; the fourth input unit is used for inputting the operation result of the fourth attention mechanism layer into the fourth forward neural network layer for coding to obtain the operation result of the fourth forward neural network layer; and the classifying unit is used for determining the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language by the classifying layer.
Example 4
There is further provided a text translating apparatus for implementing the text translating method of embodiment 2 according to an embodiment of the present application, and fig. 6 is a schematic diagram of a text translating apparatus according to embodiment 4 of the present application, as shown in fig. 6, and the apparatus 600 includes:
an obtaining module 602, configured to obtain text to be translated.
And the translation module 604 is configured to translate the text to be translated into the target language through a multilingual translation model, wherein the multilingual translation model obtains a text vector of the text to be translated, maps the text vector to a semantic space, obtains a semantic representation corresponding to the text to be translated, which is irrelevant to the language, and converts the semantic representation into the text of the target language.
It should be noted that, the above-mentioned obtaining module 602 and the translating module 604 correspond to the steps S41 to S43 in the embodiment 2, and the two modules are the same as the examples and the application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above module may be implemented as a part of the apparatus in the computer device 10 provided in the first embodiment.
As an alternative embodiment, the above device further comprises: the model acquisition module is used for acquiring a language translation model, wherein the model acquisition module comprises: the sample acquisition submodule is used for acquiring sample data, wherein the sample data comprises: sample text and actual translation results of translating the sample text into other languages; the loss function acquisition sub-module is used for acquiring a loss function of the initial multi-language translation model in the process of learning the sample data by the initial multi-language translation model; and the determining submodule is used for determining the loss function as a minimum objective function and adjusting model parameters of the initial multilingual translation model.
As an alternative embodiment, the loss function comprises at least one of:
The initial multilingual translation model translates the sample text to obtain a difference value between a model translation result and an actual translation result corresponding to the sample text;
the initial multilingual translation model translates the actual translation result of the sample text to obtain a difference value between the model translation result and the sample text;
The method comprises the steps that a sample text and a difference value of a conversion text corresponding to the sample text are obtained, wherein an initial multi-language translation model converts the sample text into semantic representation, and then the semantic representation is converted into a text identical to the language of the sample text, so that the conversion text corresponding to the sample text is obtained;
The method comprises the steps that an actual translation result corresponding to a sample text and a difference value of a conversion text corresponding to the actual translation result are obtained, wherein an initial multi-language translation model converts the actual translation result of the sample text into semantic representation, and then the semantic representation is converted into a text identical to the language of the actual translation result, so that the conversion text corresponding to the actual translation result is obtained;
a distance between two semantic representations, wherein the two semantic representations comprise: the method comprises the steps of converting a sample text by an initial multi-language translation model to obtain semantic representation and converting an actual translation result corresponding to the sample text by the initial multi-language translation model to obtain semantic representation.
Example 5
According to an embodiment of the present application, there is further provided a text translation method, and fig. 7 is a flowchart of a text translation device according to embodiment 5 of the present application, as shown in fig. 7, where the method includes:
step S71, obtaining word vectors of the text to be translated and source language vectors of the text to be translated, and determining text vectors corresponding to the text to be translated, wherein the source language is the language of the text to be translated.
Specifically, the text to be translated is text information to be translated, and when translation is performed, sentences in the text to be translated can be sequentially extracted for translation.
In an alternative embodiment, the text vector may be determined by a multi-head attention mechanism according to the word vector and the position vector corresponding to the text to be translated, where when determining the position vector, the position vector is biased using the source language vector as a bias vector, so that the source language information is introduced into the text vector.
Step S73, converting the text vector into intermediate semantics.
Specifically, the intermediate semantics may be semantic expression vectors that are independent of languages, that is, texts having the same semantics correspond to the same intermediate semantics regardless of languages.
In an alternative embodiment, the text vector is converted to intermediate semantics, which may still be based on the source language information. Specifically, a multi-head attention mechanism is executed by introducing subspace vectors corresponding to the subspace space and source language vectors of the text to be translated, so that language information of the text to be translated is also introduced in the process of converting intermediate semantics.
And step S75, decoding the intermediate semantics according to the target language vector so that the text to be translated is translated into the target language.
In an alternative embodiment, the target language vector may be introduced for translation during decoding, such that the translation process references the target language.
In an alternative embodiment, the decoding process may also refer to the translated context by encoding information that introduces the translated context of the text to be translated prior to decoding, to further improve the translation accuracy.
Therefore, the scheme models the characteristics of languages in the processes of acquiring text vectors, extracting intermediate semantics and decoding, and considers the differences of the languages, so that the influence of multiple languages on the fused unified model is reduced, the requirement of serviceability is met, the cost of training, deployment and operation is greatly reduced, and the overall performance of online service is improved.
As an alternative embodiment, converting the text vector into intermediate semantics includes: mapping the at least one text vector to a semantic space to obtain the intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
In the above step, at least one text vector is mapped to a semantic space, and the semantic space may include a plurality of sub-semantic spaces, and semantic information in the text vector is extracted through the sub-semantic spaces, so as to obtain intermediate semantics corresponding to the text vector. The specific embodiment is described in example 1, and is not repeated here.
As an alternative embodiment, the converting the text vector into the intermediate semantic module corresponding to different source languages includes: calling an intermediate semantic module corresponding to the source language according to the source language vector of the text to be translated; and converting the text vector into intermediate semantics through the called intermediate semantics module, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
In the above scheme, different source languages correspond to different intermediate semantic modules, that is, the intermediate semantic modules are used for converting text vectors of one language into intermediate semantics, so that when conversion is performed, the intermediate semantic modules corresponding to the source languages need to be called according to the source languages. For example, the source language of the text to be translated is german, so when the text to be translated is converted, an intermediate semantic module corresponding to german is called first, and the text vector corresponding to the text to be translated in german is converted into intermediate semantics by the intermediate semantic module.
In an alternative embodiment, the intermediate semantic modules corresponding to different source languages may have corresponding identifiers, the text to be translated also has an identifier representing the source language thereof, the corresponding intermediate semantic module may be searched according to the identifier corresponding to the text to be translated, and the searched intermediate semantic module may be called, so as to convert the text vector into the intermediate semantic.
Example 6
There is further provided a text translating apparatus for implementing the text translating method of embodiment 5 according to an embodiment of the present application, and fig. 8 is a schematic diagram of a text translating apparatus according to embodiment 6 of the present application, as shown in fig. 8, and the apparatus 800 includes:
The obtaining module 802 is configured to obtain a word vector of the text to be translated and a source language vector of the text to be translated, and determine a text vector corresponding to the text to be translated, where the source language is a language of the text to be translated.
A conversion module 804 for converting the text vector into intermediate semantics.
The decoding module 806 is configured to decode the intermediate semantics according to the target language vector, so that the text to be translated is translated into the target language.
It should be noted that, the above-mentioned obtaining module 802, the converting module 804 and the decoding module 806 correspond to steps S71 to S75 in embodiment 5, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above module may be implemented as a part of the apparatus in the computer device 10 provided in the first embodiment.
As an alternative embodiment, the conversion module includes: the calling sub-module is used for calling an intermediate semantic module corresponding to the source language according to the source language vector of the text to be translated; and the conversion sub-module is used for converting the text vector into intermediate semantics through the called intermediate semantics module, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
Example 7
Embodiments of the present invention may provide a computer device, which may be any one of a group of computer devices. Alternatively, in the present embodiment, the above-mentioned computer device may be replaced with a terminal device such as a mobile terminal.
Alternatively, in this embodiment, the above-mentioned computer device may be located in at least one network device among a plurality of network devices of the computer network.
In this embodiment, the above-mentioned computer device may execute the program code of the following steps in the text translation method: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
Alternatively, fig. 9 is a block diagram of a computer device according to embodiment 7 of the present application. As shown in fig. 9, the computer device a may include: one or more (only one shown) processors 902, memory 904, and a peripheral interface 906.
The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the text translation method and apparatus in the embodiments of the present invention, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the text translation method described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
Optionally, the above processor may further execute program code for: obtaining a text vector of a text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each first sub-coding layer comprising: a first semantic mechanism layer and a first forward neural network layer, wherein the step of obtaining text vectors for text to be translated by the encoder comprises: acquiring word vectors and position vectors corresponding to a text to be translated; according to the word vector and the position vector, a first attention mechanism parameter is acquired: and inputting the first attention mechanism parameters into the first sub-coding layers to code, and obtaining text vectors of the text to be translated, which are output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer is used for carrying out operation on the parameters output by the previous first sub-coding layer and then inputting the parameters into the first forward neural network layer to code.
Optionally, the above processor may further execute program code for: obtaining a source language vector corresponding to a source language, wherein the source language is the language of a text to be translated; determining a first bias vector according to the source language vector; acquiring position information of a text to be translated; and superposing the first offset vector on the position information to obtain a position vector of the text to be translated.
Optionally, the above processor may further execute program code for: and mapping at least one text vector to a semantic space to obtain intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
Optionally, the above processor may further execute program code for: determining a second attention mechanism parameter according to the text vector of the text to be translated, wherein query element parameters in the second attention mechanism parameter are determined according to the source language vector of the text to be translated and subspace vectors corresponding to the query elements, key value pairs in the second attention mechanism parameter are text vectors of the text to be translated, and the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector; the text vector is mapped to semantic space according to the second attention mechanism parameter.
Optionally, the above processor may further execute program code for: mapping the text vector to a semantic space according to a second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, each sub-semantic mapping layer comprising: a second attention mechanism layer and a third forward neural network layer, the step of mapping text vectors to semantic space according to second attention mechanism parameters by the semantic mapping model comprising: and inputting the second attention mechanism parameters into the semantic mapping model to obtain intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer operates the parameters output by the previous sub-semantic mapping layer and then inputs the parameters to a third forward neural network layer for operation.
Optionally, the above processor may further execute program code for: acquiring context information of a text to be translated; and decoding the intermediate semantics of the text to be translated through a decoder according to the context information to obtain the text corresponding to the text to be translated in the target language.
Optionally, the above processor may further execute program code for: the word vector of the translated context and the position vector of the translated context acquire context information of the text to be translated, including: acquiring word vectors of translated contexts; acquiring second offset information corresponding to the target language; and superposing second offset information on the position information of the translated context to obtain a position vector of the translated context.
Optionally, the above processor may further execute program code for: inputting the context information into a third attention mechanism layer to obtain an operation result of the third attention mechanism layer; inputting the operation result of the third attention mechanism layer and the intermediate semantics of the text to be translated into the fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer; inputting the operation result of the fourth attention mechanism layer into a fourth forward neural network layer for coding to obtain the operation result of the fourth forward neural network layer; and the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.
The embodiment of the invention provides a text translation method. The steps of an encoder, a decoder and extracting intermediate semantics are executed by using a unified multi-language translation model, and the intermediate semantics module for language perception is explicitly introduced to extract the intermediate semantics irrelevant to languages from the semantic space of different languages, so that the problem of translation knowledge conflict caused by language uniqueness is well relieved by utilizing the migration of cross-language knowledge.
It will be appreciated by those skilled in the art that the configuration shown in fig. 9 is only illustrative, and the computer device may be a terminal device such as a smart phone (e.g. an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile internet device (Mobile INTERNET DEVICES, MID), a PAD, etc. Fig. 9 is not limited to the structure of the electronic device. For example, computer device A may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
Example 8
The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be used to store program code executed by the text translation method provided in the first embodiment.
Alternatively, in this embodiment, the storage medium may be located in any one of a group of computer devices in a computer network, or in any one of a group of mobile terminals.
Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: acquiring a text to be translated; obtaining at least one text vector of a text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting at least one text vector into intermediate semantics; and decoding the intermediate semantics to translate the text to be translated into the target language.
Example 9
Embodiments of the present application also provide a translation system, and fig. 10 is a schematic diagram of a translation system according to embodiment 9 of the present application, and in combination with fig. 10, the system includes:
And the encoder 90 is configured to encode the text to be translated to obtain at least one text vector of the text to be translated, where the text vector is used to represent a vector representation of the text to be translated in different languages.
Specifically, the text to be translated is text information to be translated, and when translation is performed, sentences in the text to be translated can be sequentially extracted for translation. The above language may be used to represent a type of language in a narrow sense, such as: chinese, korean, english, etc. may also be used to represent a broad family of languages, such as: the Indoodle family, the Tibetan family, and the like. The text vector corresponding to the text to be translated under the source language, namely the language of the text to be translated, can be obtained.
In another alternative embodiment, firstly, the text to be translated may be segmented to obtain a segmentation result of the text to be translated, and then the segmentation result is encoded by an encoder to obtain at least one text vector corresponding to the text to be translated.
An intermediate semantic module 92, in communication with the encoder, for converting the at least one text vector into intermediate semantics.
Specifically, the intermediate semantics may be semantic expression vectors that are independent of languages, that is, texts having the same semantics correspond to the same intermediate semantics regardless of languages.
In an alternative embodiment, the at least one text vector may be converted to intermediate semantics by an intermediate semantics module. The intermediate semantic module may include a forward neural network for extracting language independent semantic information from the text vector, thereby obtaining intermediate semantics corresponding to the text to be translated.
Prior to translation, the sample may be learned to obtain the intermediate semantic module described above. In an alternative embodiment, the sample data used for training the intermediate semantic module may be text vectors having the same semantic meaning and belonging to different languages, and the intermediate semantic meaning corresponding to the text vectors, and training is performed based on the sample data, so as to obtain the intermediate semantic module capable of predicting the intermediate semantic meaning corresponding to the text vectors. The scheme is favorable for promoting the migration of cross-language knowledge by explicitly modeling the intermediate semantics, so that the translation quality of the whole model, especially the translation quality of low-resource or zero-resource language pairs, is improved.
And the decoder 94 is in communication connection with the intermediate semantic module and is used for decoding the intermediate semantic so that the text to be translated is translated into the target language.
In the above steps, the intermediate semantics are decoded by the decoder, and the translation result of the text to be translated is obtained, wherein the translation result is the text corresponding to the text to be translated in the target language.
It should be noted that the above-mentioned encoder, intermediate semantic module and decoder may constitute a multi-language translation model for execution, where the multi-language translation model includes an encoder, an intermediate semantic representation module and a decoder, where the encoder is configured to obtain at least one text vector according to a text to be translated, the intermediate semantic representation module is configured to obtain an intermediate heterosemantic of the text to be translated according to the at least one text vector of the text to be translated, and the decoder is configured to decode according to the intermediate semantic to obtain a translation result of the text to be translated.
It should also be noted that in order to support more language pairs for translation using fewer models, there are two common approaches: the first is to train some models containing the translation of important pivot languages (e.g. Chinese, english) to other languages, and serve the translation requirement of some important language pairs, and simultaneously use these models to serve the translation requirement between non-pivot languages by bridging. If the translation from German to Thai is required (de 2 th), the translation from German to English (de 2 en) can be used, and the translation from English to Thai (en 2 th) can be called. However, this approach is prone to error overlay, which is particularly problematic when the two bridging models are trained using disparate domain data. On the other hand, the translation result obtained by two translations is time-consuming, so that the scheme is difficult to meet certain real-time translation scenes with high delay requirements, and has problems. The second type is to use a traditional neural machine translation model, but if only all language pairs are forcedly trained together to obtain a unified model, there is a problem that language characteristics are greatly different in multi-language pair composition, especially the situation that the language sequences are greatly different, and the unified model cannot solve the problem of translation knowledge conflict caused by language uniqueness.
As an alternative embodiment, the encoder includes: a plurality of first sub-coding layers, each of the first sub-coding layers comprising:
The first attention mechanism layer is used for carrying out operation according to a first attention mechanism parameter, wherein the first attention mechanism parameter is determined according to a word vector and a position vector corresponding to the text to be translated;
and the first forward neural network layer is used for encoding according to the operation result of the first attention mechanism layer to obtain the text vector of the text to be translated.
Fig. 3 is a schematic diagram of a Multi-language translation model according to an embodiment of the present application, in an alternative embodiment, the above steps may be performed by an encoder, including six first sub-encoding layers, each including a first meaning mechanism layer (self-attention) and a first forward neural network layer (FF-layer), according to the embodiment shown in fig. 3, the word vector Emb corresponding to the text to be translated is found from a vector matrix (woed embedding (scr)) of the source language, and a position vector pos_emb of the text to be translated is obtained through operation, and then, according to the word vector Emb and the position vector pos_emb, Q, K, V are obtained and input to the first sub-encoding layers of the encoder, so as to obtain a text vector h_enc=ffn (ATT, K, V)) output by the encoder, where n is a conventional feed-forward network, and ATT is a Multi-head attention (Multi-head attention).
In an alternative embodiment, as shown in fig. 3, the first bias vector is obtained by selecting a language from the initialized language vectors (language embedding) according to the language of the text to be translated. And superposing the first offset vector in the position information to obtain a position vector.
It should be noted that the same position in the sentence has obviously different meanings in different languages, and by adding an offset generated by using a language vector to the position information, the above scheme makes the position vector representation of each cosine corresponding to different languages different, so that according to the first offset vector determined by the source language, a position vector pos_emb (l_emb) related to the language is established, and further language features of the text to be translated are introduced into the text vector.
According to the scheme, the language of the text to be translated is modeled in the encoding process through the first bias vector, so that the difference of different languages is considered, and the influence of language conflict on the translation quality is reduced by the unified model of the multi-language fusion. The training, deployment and operation costs are further reduced, and the overall performance of the online service is improved, for example: the service throughput (QPS) and Response Time (RT) are greatly improved.
As an alternative embodiment, the above intermediate semantic module includes:
The second forward neural network layer is used for determining query element parameters in the second attention mechanism parameters according to the source language vectors of the text to be translated and subspace vectors corresponding to the query elements, wherein the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector;
A plurality of semantic mapping models, each semantic mapping model comprising:
the second attention mechanism layer is used for carrying out operation according to a second attention mechanism parameter, wherein key value pairs in the second attention mechanism parameter are text vectors of the text to be translated;
and the third forward neural network layer is used for predicting the intermediate semantics of the text to be translated according to the operation result of the second attention mechanism layer.
In an alternative embodiment, as shown in connection with FIG. 3, the intermediate semantic module (language-aware interlingua) includes a second forward neural network layer and 3-layer sub-semantic mapping layers, each sub-semantic mapping layer including a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer (b)). Firstly, sub-semantic vectors corresponding to query elements are searched from semantic space vectors (interlingua embedding), source language vectors corresponding to source languages of a text to be translated are obtained from language vectors (language embedding), a query is obtained based on the sub-semantic vectors and the source language vectors through a second forward neural network layer (FF-layer (a)), the obtained query and K and V output by an encoder are input to an intermediate semantic module, and the semantic representation corresponding to the text vectors is obtained through operation of a second attention mechanism layer (enc-attention) and a third forward neural network layer (FF-layer (b)) of the intermediate semantic module.
As an alternative embodiment, the decoder includes: at least one second sub-coding layer and a classification layer, each of said second sub-coding layers comprising:
The third attention mechanism layer is used for extracting the context information of the text to be translated by operating the context of the text to be translated;
a fourth attention mechanism layer, configured to introduce semantic representation of the text to be translated by performing an operation on an operation result of the third attention mechanism and the like and intermediate semantics of the text to be translated;
The fourth forward neural network layer is used for encoding the intermediate semantics of the text to be translated according to the operation result of the fourth attention mechanism layer to obtain an encoding result carrying context information;
the classifying layer is configured to determine, according to an operation result of the at least one second sub-coding layer and a target language vector corresponding to a target language, a text corresponding to the text to be translated in the target language, where the operation result of the at least one second sub-coding layer is a coding result output by a fourth forward neural network layer in the last second sub-coding layer.
The context information of the text to be translated includes the word vector of the translated context and the position vector of the translated context, and in an alternative embodiment, as shown in fig. 3, the word vector corresponding to the translated context is obtained through the word matrix word embedding (tgt) of the target language, and the position information of the translated context is obtained through the formula (1) in embodiment 1. The position information of the translated context is biased by using second bias information determined based on the target language, thereby obtaining a position vector of the translated context.
In an alternative embodiment, shown in connection with fig. 3, the decoder includes 6 second sub-coding layers and a classification layer, a third attention mechanism layer (self-attention) in the second sub-coding layers refers to the translated context information according to the word vector and the position vector of the translated context, and a fourth attention mechanism layer (intl attention) in the second sub-coding layers operates according to the output of the third attention mechanism layer and the intermediate semantic model, indicating that the translation refers to the intermediate semantic of the text to be translated. Finally, the words are coded through a fourth forward neural network layer (FF-layer) and then output to a classification layer, and the classification layer classifies the words according to the output of the second sub-coding layer and the second offset vector of the target language, so that the most suitable words of the target language are determined.
As an alternative embodiment, different source languages correspond to different intermediate semantic modules, and the translation system further invokes the corresponding intermediate semantic modules according to the source language of the text to be translated.
In the above scheme, different source languages correspond to different intermediate semantic modules, that is, the intermediate semantic modules are used for converting text vectors of one language into intermediate semantics, so that when conversion is performed, the intermediate semantic modules corresponding to the source languages need to be called according to the source languages. For example, the source language of the text to be translated is german, so when the text to be translated is converted, an intermediate semantic module corresponding to german is called first, and the text vector corresponding to the text to be translated in german is converted into intermediate semantics by the intermediate semantic module.
In an alternative embodiment, the intermediate semantic modules corresponding to different source languages may have corresponding identifiers, the text to be translated also has an identifier representing the source language thereof, the corresponding intermediate semantic module may be searched according to the identifier corresponding to the text to be translated, and the searched intermediate semantic module may be called, so as to convert the text vector into the intermediate semantic.
As an alternative embodiment, the intermediate semantic module is located within the encoder or the decoder.
In the above scheme, the intermediate semantic module may be disposed in the encoder, and process the text to be translated as a whole with the encoder, or may be disposed in the decoder, and process the result output by the encoder as a whole with the decoder.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (22)

1. A method for translating text, comprising:
Acquiring a text to be translated;
acquiring at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages;
converting the at least one text vector into intermediate semantics;
Decoding the intermediate semantics to enable the text to be translated into a target language;
Wherein the obtaining at least one text vector of the text to be translated includes: obtaining a text vector of the text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer;
decoding the intermediate semantics to enable the text to be translated into a target language, including:
Acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
2. The method of claim 1, wherein the step of obtaining, by an encoder, the text vector for the text to be translated comprises:
Acquiring word vectors and position vectors corresponding to the text to be translated;
acquiring a first attention mechanism parameter according to the word vector and the position vector;
And inputting the first attention mechanism parameters into the first sub-coding layers to code, and obtaining the text vector of the text to be translated output by the first sub-coding layers, wherein in each first sub-coding layer, the first attention mechanism layer calculates the parameters output by the previous first sub-coding layer and then inputs the parameters to the first forward neural network layer to code.
3. The method of claim 2, wherein obtaining the location vector corresponding to the text to be translated comprises:
Obtaining a source language vector corresponding to a source language, wherein the source language is the language of the text to be translated;
determining a first offset vector according to the source language vector;
Acquiring the position information of the text to be translated;
And superposing the first offset vector on the position information to obtain the position vector of the text to be translated.
4. The method of claim 1, wherein converting the at least one text vector into intermediate semantics comprises:
Mapping the at least one text vector to a semantic space to obtain the intermediate semantics, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
5. The method of claim 4, wherein mapping the text vector to semantic space results in the intermediate semantics, comprising:
Determining a second attention mechanism parameter according to the text vector of the text to be translated, wherein query element parameters in the second attention mechanism parameter are determined according to a source language vector of the text to be translated and subspace vectors corresponding to the query elements, key value pairs in the second attention mechanism parameter are text vectors of the text to be translated, and the semantic space comprises a plurality of subspace spaces, and each subspace space is provided with a corresponding subspace vector;
Mapping the text vector to the semantic space according to the second attention mechanism parameter.
6. The method of claim 5, wherein mapping the text vector to the semantic space according to the second attention mechanism parameter comprises:
mapping the text vector to the semantic space according to the second attention mechanism parameter by a semantic mapping model, wherein the semantic mapping model comprises at least one sub-semantic mapping layer, each sub-semantic mapping layer comprising: a second attention mechanism layer and a third forward neural network layer, the step of mapping the text vector to the semantic space according to the second attention mechanism parameters by a semantic mapping model comprising:
And inputting the second attention mechanism parameters into the semantic mapping model to obtain intermediate semantics of the text vector output by the semantic mapping model, wherein in each sub-semantic mapping layer, the second attention mechanism layer operates the parameters output by the previous sub-semantic mapping layer and then inputs the parameters to the third forward neural network layer for operation.
7. The method of claim 1, wherein the context information of the text to be translated comprises: the word vector of the translated context and the position vector of the translated context acquire the context information of the text to be translated, and the method comprises the following steps:
Acquiring word vectors of the translated context;
Acquiring second offset information corresponding to the target language;
and superposing the second offset information on the position information of the translated context to obtain a position vector of the translated context.
8. The method of claim 1, wherein decoding, by a decoder, the intermediate semantics of the text to be translated according to the context information to obtain a text corresponding to the text to be translated in the target language, comprises:
inputting the context information to the third attention mechanism layer to obtain an operation result of the third attention mechanism layer;
Inputting the operation result of the third attention mechanism layer and the intermediate semantics of the text to be translated to the fourth attention mechanism layer to obtain the operation result of the fourth attention mechanism layer;
inputting the operation result of the fourth attention mechanism layer to the fourth forward neural network layer for coding to obtain the operation result of the fourth forward neural network layer;
And the classification layer determines the text corresponding to the text to be translated in the target language according to the operation result of the at least one second sub-coding layer and the target language vector corresponding to the target language.
9. A method for translating text, comprising:
Acquiring a text to be translated;
Translating the text to be translated into a target language through a multi-language translation model, wherein the multi-language translation model acquires a text vector of the text to be translated, maps the text vector to a semantic space, acquires semantic representations which correspond to the text to be translated and are irrelevant to languages, and converts the semantic representations into texts of the target language;
The multi-language translation model is used for acquiring the text vector of the text to be translated through the following steps: the multilingual translation model obtains a text vector of the text to be translated through an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer;
the multi-language translation model is further configured to convert the semantic representation into text in a target language by: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
10. The method according to claim 9, wherein the method further comprises: the language translation model is obtained, wherein the step of obtaining the language translation model comprises the following steps:
Obtaining sample data, wherein the sample data comprises: sample text and the actual translation result of translating the sample text into other languages;
acquiring a loss function of the initial multi-language translation model in the process of learning the sample data by the initial multi-language translation model;
And determining the loss function as a minimum objective function, and adjusting model parameters of the initial multi-language translation model.
11. The method of claim 10, wherein the loss function comprises at least one of:
The initial multi-language translation model translates the sample text to obtain a difference value between a model translation result and an actual translation result corresponding to the sample text;
The initial multi-language translation model translates the actual translation result of the sample text to obtain a difference value between the model translation result and the sample text;
The sample text is converted into semantic representation by the initial multi-language translation model, and then the semantic representation is converted into a text which is the same as the language of the sample text, so that the converted text corresponding to the sample text is obtained;
the method comprises the steps that an actual translation result corresponding to a sample text and a difference value of a conversion text corresponding to the actual translation result are obtained, wherein the initial multi-language translation model converts the actual translation result of the sample text into semantic representation, and then converts the semantic representation into a text identical to the language of the actual translation result, so that the conversion text corresponding to the actual translation result is obtained;
A distance between two semantic representations, wherein the two semantic representations comprise: the initial multi-language translation model converts the sample text to obtain semantic representation and the initial multi-language translation model converts the actual translation result corresponding to the sample text to obtain semantic representation.
12. A method for translating text, comprising:
acquiring a word vector of a text to be translated and a source language vector of the text to be translated, and determining a text vector corresponding to the text to be translated, wherein the source language is the language of the text to be translated;
Converting the text vector into intermediate semantics;
decoding the intermediate semantics according to the target language vector so that the text to be translated is translated into the target language;
The determining the text vector corresponding to the text to be translated includes: obtaining a text vector of the text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer;
Decoding the intermediate semantics according to a target language vector, so that the text to be translated is translated into the target language, including: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
13. The method of claim 12, wherein the converting the text vector into intermediate semantics for different source languages corresponds to different intermediate semantics modules, comprising:
calling an intermediate semantic module corresponding to the source language according to the source language vector of the text to be translated;
And converting the text vector into intermediate semantics through the called intermediate semantics module, wherein the intermediate semantics are used for representing semantic representations which correspond to the text to be translated and are irrelevant to languages.
14. A text translation device, comprising:
the first acquisition module is used for acquiring a text to be translated;
The second acquisition module is used for acquiring at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated corresponding to different languages;
A conversion module for converting the at least one text vector into intermediate semantics;
The decoding module is used for decoding the intermediate semantics so that the text to be translated is translated into a target language;
the second obtaining module is further configured to obtain at least one text vector of the text to be translated by: obtaining a text vector of the text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer;
The decoding module is further configured to decode the intermediate semantics so that the text to be translated is translated into a target language by: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
15. A storage medium comprising a stored program, wherein the program, when run, controls a device on which the storage medium resides to perform the steps of: acquiring a text to be translated; acquiring at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting the at least one text vector into intermediate semantics; decoding the intermediate semantics to enable the text to be translated into a target language; wherein the obtaining at least one text vector of the text to be translated includes: obtaining a text vector of the text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer; decoding the intermediate semantics to enable the text to be translated into a target language, including: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
16. A processor for running a program, wherein the program when run performs the steps of: acquiring a text to be translated; acquiring at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages; converting the at least one text vector into intermediate semantics; decoding the intermediate semantics to enable the text to be translated into a target language; wherein the obtaining at least one text vector of the text to be translated includes:
Obtaining a text vector of the text to be translated by an encoder, the encoder comprising: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer; decoding the intermediate semantics to enable the text to be translated into a target language, including: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information by a decoder to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
17. A translation system, comprising:
The coder is used for coding the text to be translated to obtain at least one text vector of the text to be translated, wherein the text vector is used for representing vector representations of the text to be translated in different languages;
an intermediate semantic module, in communication with the encoder, for converting the at least one text vector into intermediate semantics;
The decoder is in communication connection with the intermediate semantic module and is used for decoding the intermediate semantic so that the text to be translated is translated into a target language;
wherein the encoder comprises: at least one first sub-coding layer, each of the first sub-coding layers comprising: a first force mechanism layer and a first forward neural network layer;
The decoder is used for decoding the intermediate semantics to enable the text to be translated into the target language through the following steps: acquiring context information of the text to be translated; decoding the intermediate semantics of the text to be translated according to the context information to obtain a text corresponding to the text to be translated in the target language, wherein the decoder comprises at least one second sub-coding layer and one classifying layer, and each second sub-coding layer comprises: a third attention mechanism layer, a fourth attention mechanism layer, and a fourth forward neural network layer.
18. The system of claim 17, wherein the encoder comprises: a plurality of first sub-coding layers, each of the first sub-coding layers comprising:
The first attention mechanism layer is used for carrying out operation according to a first attention mechanism parameter, wherein the first attention mechanism parameter is determined according to a word vector and a position vector corresponding to the text to be translated;
and the first forward neural network layer is used for encoding according to the operation result of the first attention mechanism layer to obtain the text vector of the text to be translated.
19. The system of claim 17, wherein the intermediate semantic module comprises:
The second forward neural network layer is used for determining query element parameters in the second attention mechanism parameters according to the source language vectors of the text to be translated and subspace vectors corresponding to the query elements, wherein the semantic space comprises a plurality of sub-semantic spaces, and each sub-semantic space is provided with a corresponding subspace vector;
A plurality of semantic mapping models, each of the semantic mapping models comprising:
the second attention mechanism layer is used for carrying out operation according to a second attention mechanism parameter, wherein key value pairs in the second attention mechanism parameter are text vectors of the text to be translated;
and the third forward neural network layer is used for predicting the intermediate semantics of the text to be translated according to the operation result of the second attention mechanism layer.
20. The system of claim 17, wherein the decoder comprises: at least one second sub-coding layer and a classification layer, each of said second sub-coding layers comprising:
The third attention mechanism layer is used for extracting the context information of the text to be translated by operating the context of the text to be translated;
a fourth attention mechanism layer, configured to introduce semantic representation of the text to be translated by performing an operation on an operation result of the third attention mechanism and the like and intermediate semantics of the text to be translated;
The fourth forward neural network layer is used for encoding the intermediate semantics of the text to be translated according to the operation result of the fourth attention mechanism layer to obtain an encoding result carrying context information;
the classifying layer is configured to determine, according to an operation result of the at least one second sub-coding layer and a target language vector corresponding to a target language, a text corresponding to the text to be translated in the target language, where the operation result of the at least one second sub-coding layer is a coding result output by a fourth forward neural network layer in the last second sub-coding layer.
21. The system of claim 17, wherein different source languages correspond to different intermediate semantic modules, the translation system further invoking the corresponding intermediate semantic modules according to the source language of the text to be translated.
22. The system of claim 17, wherein the intermediate semantic module is located within the encoder or the decoder.
CN201911038762.XA 2019-10-29 2019-10-29 Text translation method and device Active CN112749569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911038762.XA CN112749569B (en) 2019-10-29 2019-10-29 Text translation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911038762.XA CN112749569B (en) 2019-10-29 2019-10-29 Text translation method and device

Publications (2)

Publication Number Publication Date
CN112749569A CN112749569A (en) 2021-05-04
CN112749569B true CN112749569B (en) 2024-05-31

Family

ID=75641634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911038762.XA Active CN112749569B (en) 2019-10-29 2019-10-29 Text translation method and device

Country Status (1)

Country Link
CN (1) CN112749569B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221581A (en) * 2021-05-13 2021-08-06 北京小米移动软件有限公司 Text translation method, device and storage medium
CN113343716B (en) * 2021-05-20 2022-09-30 北京三快在线科技有限公司 Multilingual translation method, device, storage medium and equipment
CN113392658A (en) * 2021-06-18 2021-09-14 北京爱奇艺科技有限公司 Statement translation method and device, computer equipment and storage medium
CN113239710A (en) * 2021-06-23 2021-08-10 合肥讯飞数码科技有限公司 Multi-language machine translation method and device, electronic equipment and storage medium
CN113539239B (en) * 2021-07-12 2024-05-28 网易(杭州)网络有限公司 Voice conversion method and device, storage medium and electronic equipment
CN113869070B (en) * 2021-10-15 2024-05-24 大连理工大学 Multi-language neural machine translation method integrating specific language adapter modules

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN108681539A (en) * 2018-05-07 2018-10-19 内蒙古工业大学 A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks
CN109446534A (en) * 2018-09-21 2019-03-08 清华大学 Machine translation method and device
CN110264987A (en) * 2019-06-18 2019-09-20 王子豪 Chord based on deep learning carries out generation method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10366292B2 (en) * 2016-11-03 2019-07-30 Nec Corporation Translating video to language using adaptive spatiotemporal convolution feature representation with dynamic abstraction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484681A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 A kind of method generating candidate's translation, device and electronic equipment
CN106484682A (en) * 2015-08-25 2017-03-08 阿里巴巴集团控股有限公司 Based on the machine translation method of statistics, device and electronic equipment
CN108027812A (en) * 2015-09-18 2018-05-11 迈克菲有限责任公司 System and method for multipath language translation
CN108681539A (en) * 2018-05-07 2018-10-19 内蒙古工业大学 A kind of illiteracy Chinese nerve interpretation method based on convolutional neural networks
CN109446534A (en) * 2018-09-21 2019-03-08 清华大学 Machine translation method and device
CN110264987A (en) * 2019-06-18 2019-09-20 王子豪 Chord based on deep learning carries out generation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
机器翻译方法研究与发展综述;侯强;侯瑞丽;;计算机工程与应用;20190307(10);全文 *
融合先验信息的蒙汉神经网络机器翻译模型;樊文婷;侯宏旭;王洪彬;武静;李金廷;;中文信息学报;20180615(06);全文 *

Also Published As

Publication number Publication date
CN112749569A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
CN112749569B (en) Text translation method and device
CN107368476B (en) Translation method, target information determination method and related device
CN110569846A (en) Image character recognition method, device, equipment and storage medium
CN108734212B (en) Method for determining classification result and related device
CN111144120A (en) Training sentence acquisition method and device, storage medium and electronic equipment
CN108304376B (en) Text vector determination method and device, storage medium and electronic device
CN111695344A (en) Text labeling method and device
CN114757176A (en) Method for obtaining target intention recognition model and intention recognition method
Malode Benchmarking public large language model
CN117540221B (en) Image processing method and device, storage medium and electronic equipment
CN111274813A (en) Language sequence marking method, device storage medium and computer equipment
CN111460804B (en) Text processing method, device and system
CN111814496A (en) Text processing method, device, equipment and storage medium
CN112836057B (en) Knowledge graph generation method, device, terminal and storage medium
JP7390442B2 (en) Training method, device, device, storage medium and program for document processing model
US20200089774A1 (en) Machine Translation Method and Apparatus, and Storage Medium
CN110852066B (en) Multi-language entity relation extraction method and system based on confrontation training mechanism
CN110956034B (en) Word acquisition method and device and commodity search method
JP5787934B2 (en) Information processing apparatus, information processing method, and information processing program
CN110929508B (en) Word vector generation method, device and system
CN111523952B (en) Information extraction method and device, storage medium and processor
CN111401083A (en) Name identification method and device, storage medium and processor
CN118113815B (en) Content searching method, related device and medium
CN111046149A (en) Content recommendation method and device, electronic equipment and storage medium
JP7107609B1 (en) Language asset management system, language asset management method, and language asset management program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant