CN111160049A

CN111160049A - Text translation method, device, machine translation system and storage medium

Info

Publication number: CN111160049A
Application number: CN201911244875.5A
Authority: CN
Inventors: 李良友; 王龙跃; 刘群; 陈晓
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-05-15
Anticipated expiration: 2039-12-06
Also published as: CN111160049B

Abstract

The application discloses a text translation method and a device in the field of artificial intelligence, wherein the method comprises the following steps: acquiring a candidate translation; selecting constraints in a preset constraint set according to the attention weight calculated by the text translation model; and when the target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint, or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set. When the candidate translation is expanded, the preset constraint set can be selected or filtered, all constraints can be avoided being used each time the candidate translation is expanded, the expansion of the candidate translation can be accelerated, and the translation speed is increased.

Description

Text translation method, device, machine translation system and storage medium

Technical Field

The present application relates to the field of machine translation technologies, and more particularly, to a text translation method, apparatus, machine translation system, and storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

With the continuous development of artificial intelligence technology, natural language human-computer interaction systems, which enable human-computer interaction through natural language, become more and more important. Human-computer interaction through natural language requires a system capable of recognizing specific meanings of human natural language. Typically, systems identify the specific meaning of a sentence by employing key information extraction on the sentence in natural language.

In recent years, neural machine translation has been rapidly developed, surpasses traditional statistical machine translation, and becomes a mainstream machine translation technology. Many companies, such as ***, hundredths, microsoft, etc., have applied neural machine translation to their translation products. In order to enable certain phrases or words in an input source language sentence to be translated correctly, the present neural machine translation supports manual intervention on the translation results of the neural machine translation. One way is to add known correct translations as constraints to the neural-machine translation and to ensure that the target of the constraints will certainly appear in the final output translation.

Therefore, how to use these constraints efficiently and accurately becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a text translation method, a text translation device, a machine translation system and a storage medium, which can efficiently and accurately use constraint when performing machine translation, thereby improving the translation speed.

In a first aspect, the present application provides a text translation method, including: acquiring a candidate translation corresponding to a source text; selecting constraints in a preset constraint set according to attention weights calculated by a text translation model, wherein the constraints represent correct translations of at least part of the source text; when target constraints for expanding the candidate translations are selected, expanding the candidate translations according to the target constraints; or when target constraints for expanding the candidate translation are not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises a plurality of words of a target language, and the target language is the language to which the candidate translation belongs.

Optionally, when a target constraint for expanding the candidate translation is selected, expanding the candidate translation according to the target constraint and the preset candidate word set.

The candidate translation is an intermediate result or a final result of translating the source text. For example, in the process of translating "I graduate from great company" into "I graduate from heart University of Technology", I "," igradated "," I graduate from ", etc. are all intermediate results of translating" I graduate from great company ", when" I "is taken as a candidate, the new candidate obtained by expanding" I "may include" I graduate ", further, the obtained candidate" I graduate "may be continuously expanded, and the new candidate obtained by expanding" I graduate "may include" I graduate from "; after the candidate translation "I-gradated from Hefei University of Technology" is obtained, the candidate translation is not expanded, and the candidate translation is the final result of translating the source text.

It should be understood that, in the present application, there may be one or more candidate translations, and when each candidate translation in the one or more candidate translations is expanded, one or more new candidate translations may be obtained.

The text translation model of the present application may be a neural network translation model based on a neural network, the neural network translation model including a portion related to an attention mechanism, the portion related to the attention mechanism may be calculated to obtain a corresponding attention weight during the translation process. Alternatively, the neural network model may include an encoder for reading the source text and generating a digitized representation for each of the source language words included in the source text, a decoder for generating a translation of the source text, i.e., a sentence in the target language, and an attention mechanism associated portion for dynamically providing attention weights to the decoder for generating the target words at different times based on the output of the encoder and the state of the decoder.

The attention weight may be a value that characterizes how relevant each source language word in the source text is to the decoder state at the current time. For example, at a certain time, if the attention weights of 4 source language words corresponding to the source text are 0.5, 0.3, 0.1, and 0.1, respectively, it may indicate that the correlation degrees of the 4 source language words with the decoder state are 0.5, 0.3, 0.1, and 0.1, respectively, and the correlation degree of the first source language word with the decoder state is the highest, and the probability that the decoder is currently generating the target word corresponding to the first source language word is the highest.

The candidate word set includes words in a plurality of target languages, and the target languages are languages to which the candidate translations belong. The candidate word set can be a preset candidate word library, and the text translation model scores candidate words in the candidate word library at each moment so as to determine candidate words for expanding the candidate translation according to scores of the candidate words. For example, candidate translations may be expanded using candidate words whose scores exceed a preset threshold.

Constraints in the present application may characterize correct translations or correct translations of at least a portion of the source text. Optionally, the at least part of the source text may be a source language word or a source language phrase comprised by the source text. Optionally, the constraint may include source end position information and a target word corresponding to the source end position information, where a source end refers to a source text input end, and the target word is a correct translation of a source word of a source end position indicated by the source end position information. Alternatively, the constraint may be in the form of: [ position of source language word in source text ]: target words, e.g., [4 ]: hefei University of Technology. Alternatively, the constraint may be in the form of: source language words: target words, etc., e.g., big in synthesis: hefei University of Technology.

Both the source and candidate translations may belong to natural language, which is generally a language that naturally evolves with culture. Optionally, the source text belongs to a first natural language, the candidate translation text belongs to a second natural language, and the first language and the second language are different natural languages. The source sentence belonging to the first natural language may refer to the source sentence being a piece of text expressed in the first natural language, and the candidate translation sentence belonging to the second natural language may refer to the candidate translation sentence being a piece of text expressed in the second natural language. The source and candidate translations may belong to any two different types of natural language.

It should be understood that the execution subject of the text translation method of the present application may be a text translation apparatus or a machine translation system.

In the above technical solution, when the candidate translation is expanded, the constraints in the preset constraint set are selected or filtered according to the attention weight calculated by the text translation model. And when the target constraint is selected, expanding the candidate translation by using the target constraint, and when the target constraint is not selected, expanding the candidate translation without using the constraint in a constraint set preset by the constraint, namely, expanding the candidate translation only according to a preset candidate word set. Therefore, all constraints can be avoided from being used when the candidate translation is expanded every time, and the expansion of the candidate translation can be accelerated; and because the attention weight can represent the degree of correlation between each source language word in the source text at the current moment and the decoder state, the preset constraint is selected according to the attention weight, the constraint with lower degree of correlation with the current decoder can be ignored, and the influence on the quality of the candidate translation is reduced. Therefore, the technical scheme can ensure the quality of the candidate translation and accelerate the expansion of the candidate translation, thereby improving the translation speed.

In one possible implementation, the selecting the constraint in the preset constraint set according to the attention weight calculated by the text translation model includes: acquiring attention weights respectively corresponding to each constraint from the text translation model according to a source end position corresponding to each constraint in the preset constraint set, wherein the source end position is a position of a word corresponding to each constraint in the source text; selecting constraints in the preset constraint set according to the attention weight corresponding to each constraint.

It should be understood that the terms to which constraints are said herein correspond are source language words to which constraints correspond.

The source position corresponding to the constraint can be determined by the source position information in the constraint.

In a possible implementation manner, the selecting, according to the attention weight corresponding to each constraint, a constraint in the preset set of constraints includes: processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether to use the constraint corresponding to the heuristic signal when expanding the candidate translation; and selecting constraints in the preset constraint set according to the heuristic signal corresponding to each constraint.

In one possible implementation, the selecting the constraint in the preset constraint set according to the attention weight calculated by the text translation model includes: selecting a target attention weight meeting a preset requirement from the attention weights; and selecting constraints in a preset constraint set according to a source end position corresponding to the target attention weight and a source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

It should be understood that the term corresponding to the target attention weight is the source language term corresponding to the target attention weight, and similarly, the term corresponding to the constraint is the source language term corresponding to the constraint.

In a possible implementation manner, the selecting constraints in a preset constraint set according to the attention weight calculated by the text translation model includes: selecting constraints in a preset constraint set according to attention weights calculated by a text translation model and states of the candidate translations, wherein the states of the candidate translations include the constraint and the constraint, and under the condition that the candidate translations are obtained by using partial word extension of target phrases, the states of the candidate translations are in the constraint, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set; the target constraint satisfies at least one of the following conditions: the attention weight corresponding to the target constraint meets a preset requirement; the state of the candidate translation is in a constraint.

In some cases, a constraint may include multiple qualifiers, for example, constraint [4] corresponding to "Dow": the HefeiUniverty of Technology includes Hefei, University, of and Technology 4 qualifiers. In a scenario of expanding candidate translation word by word, there is a partial qualifier that may already use a certain constraint in the candidate translation to be expanded at the current time, and a state of the candidate translation at this time may be referred to as being in the constraint, for example, the candidate translation at the current time is "I gradated from Hefei", and the constraint [4] has already been used: "Hefei" in Hefei University of Technology. In view of the above situation, the above technical solution may combine the attention weight calculated by the text translation model and the state of the current candidate translation, and select the constraint in the preset constraint set, so that the selection result is more accurate.

Alternatively, when the candidate translation is within the constraints, the candidate translation may be expanded only according to the target constraints.

Optionally, when the candidate translation is in the constraint, the candidate translation may be expanded according to a preset candidate word set and a target constraint.

In a second aspect, the present application provides a text translation apparatus, comprising a memory for storing a program; a processor for executing the memory-stored program, the processor being configured to, when the memory-stored program is executed by the processor: acquiring a candidate translation corresponding to a source text; selecting constraints in a preset constraint set according to attention weights calculated by a text translation model, wherein the constraints represent correct translations of at least part of the source text; when target constraints for expanding the candidate translations are selected, expanding the candidate translations according to the target constraints; or when target constraints for expanding the candidate translation are not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises a plurality of words of a target language, and the target language is the language to which the candidate translation belongs.

In one possible implementation, the processor is specifically configured to: acquiring attention weights respectively corresponding to each constraint from the text translation model according to a source end position corresponding to each constraint in the preset constraint set, wherein the source end position is a position of a word corresponding to each constraint in the source text; selecting constraints in the preset constraint set according to the attention weight corresponding to each constraint.

In one possible implementation, the processor is specifically configured to: processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether to use the constraint corresponding to the heuristic signal when expanding the candidate translation; and selecting constraints in the preset constraint set according to the heuristic signal corresponding to each constraint.

In one possible implementation, the processor is specifically configured to: selecting a target attention weight meeting a preset requirement from the attention weights; and selecting constraints in a preset constraint set according to a source end position corresponding to the target attention weight and a source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

In one possible implementation, the processor is specifically configured to: selecting constraints in a preset constraint set according to attention weights calculated by a text translation model and states of the candidate translations, wherein the states of the candidate translations include the constraint and the constraint, and under the condition that the candidate translations are obtained by using partial word extension of target phrases, the states of the candidate translations are in the constraint, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set; the target constraint satisfies at least one of the following conditions: the attention weight corresponding to the target constraint meets a preset requirement; the state of the candidate translation is in a constraint.

In a third aspect, the present application provides a text translation apparatus, comprising a memory for storing a program; a processor configured to execute the program stored in the memory, wherein when the program stored in the memory is executed by the processor, the text translation apparatus performs the method of the first aspect or any one of the possible implementation manners of the first aspect.

Optionally, the apparatus further comprises a data interface, and the processor reads the program stored on the memory through the data interface.

In a fourth aspect, the present application provides a machine translation system including the text translation apparatus in the second aspect or any one of the possible implementations of the second aspect, where the text translation apparatus is configured to perform the method in the first aspect or any one of the possible implementations of the first aspect.

The text translation apparatus may be an electronic device (or a module located in the electronic device), and the electronic device may specifically be a mobile terminal (e.g., a smart phone), a computer, a personal digital assistant, a wearable device, an in-vehicle device, an internet of things device, or another device capable of performing natural language processing.

In a fifth aspect, the present application provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method of the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a computer program product containing instructions that, when run on a computer, cause the computer to perform the method of the first aspect or any one of the possible implementations of the first aspect.

In a seventh aspect, the present application provides a chip, where the chip includes a processor and a data interface, and the processor reads an instruction stored in a memory through the data interface to execute the first aspect or the method in any one of the possible implementation manners of the first aspect.

Optionally, as an implementation manner, the chip may further include a memory, where instructions are stored in the memory, and the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the processor is configured to execute the method in the first aspect.

In an eighth aspect, the present application provides an electronic device, where the electronic device includes the text translation apparatus in the second aspect or any one of the possible implementations of the second aspect, or the text translation apparatus in the third aspect, or the machine translation system in the fourth aspect.

Drawings

Fig. 1 is a schematic view of an application scenario of natural language processing according to an embodiment of the present application.

Fig. 2 is a schematic view of an application scenario of another natural language processing provided in an embodiment of the present application.

Fig. 3 is a schematic diagram of a device related to natural language processing provided in an embodiment of the present application.

Fig. 4 is a schematic diagram of a system architecture according to an embodiment of the present application.

Fig. 5 is a schematic diagram of an RNN model provided in an embodiment of the present application.

Fig. 6 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure.

Fig. 7 is a schematic diagram of a neural network-based text translation model according to an embodiment of the present application.

Fig. 8 is a flowchart of a text translation process provided in an embodiment of the present application.

Fig. 9 is a schematic diagram of another text translation process provided in an embodiment of the present application.

Fig. 10 is a schematic flow chart of a text translation method provided in an embodiment of the present application.

Fig. 11 is a schematic flow chart of a method for selecting constraints provided by an embodiment of the present application.

Fig. 12 is a schematic configuration diagram of a text translation apparatus according to an embodiment of the present application.

Fig. 13 is a schematic configuration diagram of a text translation apparatus according to another embodiment of the present application.

Fig. 14 is a schematic diagram of a machine translation system provided in an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

For better understanding of the solution of the embodiment of the present application, a brief description is given below to possible application scenarios of the embodiment of the present application with reference to fig. 1 to 3. The technical scheme of the embodiment of the application can be applied to various scenes as long as the sequence generation task of limited decoding needs to be carried out in the scene. Such as machine translation of a scene, automatic generation of a text summary, etc. The following describes the technical solution of the present application by taking a machine translation scenario as an example.

Fig. 1 shows a natural language processing system comprising a user device and a data processing device. The user equipment comprises a mobile phone, a personal computer or an intelligent terminal such as an information processing center. The user equipment is an initiating end of natural language data processing, and is used as an initiator of requests such as language question answering or query, and usually a user initiates the requests through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a network server, an application server, and a management server. The data processing equipment receives query sentences such as query sentences/voice/text and the like from the intelligent terminal through the interactive interface, and then performs language data processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term that includes a database that stores local and historical data, which may be on the data processing device or on other network servers.

In the natural language processing system shown in fig. 1, a user device may receive a user's instruction to request a machine translation of a source text (e.g., the source text may be a piece of chinese input by the user) to obtain a machine translation (e.g., the machine translation may be english obtained by the machine translation), and then send the source text to a data processing device, so that the data processing device translates the source text to obtain the machine translation.

In fig. 1, a data processing apparatus may execute the text translation method according to the embodiment of the present application.

Fig. 2 shows another natural language processing system, in fig. 2, the user equipment directly serves as a data processing device, and the user equipment can directly receive input from a user and directly perform processing by hardware of the user equipment itself, and a specific process is similar to that in fig. 1, and reference may be made to the above description, and details are not repeated here.

In the natural language processing system shown in fig. 2, the user device may receive an instruction from a user, and perform machine translation on the source text by the user device itself to obtain a machine translation.

In fig. 2, the user equipment itself can execute the text translation method according to the embodiment of the present application.

The user device in fig. 1 and fig. 2 may specifically be the local device 301 or the local device 302 in fig. 3, and the data processing device in fig. 1 may specifically be the execution device 210 in fig. 3, where the data storage system 250 may store data to be processed of the execution device 210, and the data storage system 250 may be integrated on the execution device 210, or may be disposed on a cloud or other network server.

The processor of fig. 1 and 2 may perform data training/machine learning/deep learning through a neural network model or other models (e.g., models based on a support vector machine), and translate the source text using the model finally trained or learned by the data, thereby obtaining a machine translation.

Fig. 4 illustrates a system architecture 100 provided by an embodiment of the present application. In fig. 4, the data collection device 160 is used for collecting training data, which in the embodiment of the present application includes a training source text and a training machine translation (a translation obtained by translating the training source text by a machine translation system).

After the training data is collected, data collection device 160 stores the training data in database 130, and training device 120 trains target model/rule 101 based on the training data maintained in database 130.

The following describes that the training device 120 obtains the target model/rule 101 based on the training data, and the training device 120 processes the input training source text and compares the output machine translation with the training machine translation until the difference between the machine translation output by the training device 120 and the training machine translation is smaller than a certain threshold value, thereby completing the training of the target model/rule 101.

The target model/rule 101 can be used to implement the text translation method according to the embodiment of the present application, that is, the source text is input into the target model/rule 101 after being subjected to relevant preprocessing (which may be processed by the preprocessing module 113 and/or the preprocessing module 114), so that a machine translation text can be obtained. The target model/rule 101 in the embodiment of the present application may specifically be a neural network. It should be noted that, in practical applications, the training data maintained in the database 130 may not necessarily all come from the acquisition of the data acquisition device 160, and may also be received from other devices. It should be noted that, the training device 120 does not necessarily perform the training of the target model/rule 101 based on the training data maintained by the database 130, and may also obtain the training data from the cloud or other places for performing the model training.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, for example, the execution device 110 shown in fig. 4, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR), a vehicle-mounted terminal, or a server or a cloud. In fig. 4, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include: the source text input by the client device.

The preprocessing module 113 and the preprocessing module 114 are configured to perform preprocessing (specifically, processing a source text to obtain a word vector) according to input data (such as the source text) received by the I/O interface 112, and in this embodiment of the application, the preprocessing module 113 and the preprocessing module 114 may not be provided (or only one of the preprocessing modules may be provided), and the computing module 111 is directly used to process the input data.

In the process that the execution device 110 preprocesses the input data or in the process that the calculation module 111 of the execution device 110 executes the calculation or other related processes, the execution device 110 may call the data, the code, and the like in the data storage system 150 for corresponding processes, and may store the data, the instruction, and the like obtained by corresponding processes in the data storage system 150.

Finally, the I/O interface 112 feeds back the results of the processing, e.g., the machine translation, to the client device 140.

It should be noted that the training device 120 may generate the target model/rule 101 corresponding to the downstream system for different downstream systems, and the corresponding target model/rule 101 may be used to achieve the above target or complete the above task, so as to provide the user with the required result.

In the case shown in fig. 4, the user may manually give input data (e.g., input a piece of text), which may be operated through an interface provided by the I/O interface 112. Alternatively, the client device 140 may automatically send input data (e.g., enter a text) to the I/O interface 112, and if the client device 140 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140 in a particular presentation form, such as in a particular manner as a display, sound, action, etc. (e.g., the output results may be machine translated text). The client device 140 may also serve as a data collection terminal, collecting input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data, and storing the new sample data in the database 130. Of course, the input data inputted to the I/O interface 112 and the output result outputted from the I/O interface 112 as shown in the figure may be directly stored in the database 130 as new sample data by the I/O interface 112 without being collected by the client device 140.

It should be noted that fig. 4 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation. For example, in FIG. 4, the data storage system 150 is an external memory with respect to the execution device 110, in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 4, a target model/rule 101 is obtained by training according to a training device 120, where the target model/rule 101 may be a neural machine translation model in this embodiment, and specifically, a neural network provided in this embodiment may be a Convolutional Neural Network (CNN), a Deep Convolutional Neural Network (DCNN), a Recurrent Neural Network (RNN), or the like.

Since RNN is a very common neural network, the structure of RNN will be described in detail below with reference to fig. 5.

Fig. 5 is a schematic structural diagram of an RNN model provided in an embodiment of the present application. Wherein each circle can be regarded as a unit and the same thing is done by each unit, thus being folded into the left half. The RNN is explained by a sentence, which is the repeated use of a unit structure.

RNN is a sequence-to-sequence model, assuming x_t-1，x_t，x_t+1Is an input: "I is China", then o_t-1，o_tShould correspond to "yes" and "china", predict what the next word is most likely? Is o_t+1The probability that it should be "human" is relatively large.

Therefore, we can make such a definition:

x_t: input indicating time t, o_t: output representing time t, s_t: indicating the memory at time t. Because the output at the current moment is determined by memory and the output at the current moment, like you are now four in large, your knowledge is a combination of knowledge learned from four in large (current input) and knowledge learned from three in large and past (memory), RNNs are also similar in this regard, and neural networks are best at integrating much content together through a series of parameters and then learning this parameter, thus defining the basis of RNNs:

s_t＝f(U*x_t+W*s_t-1)

the f () function is an activation function in a neural network, but why is it added? For example, if a very good solution method is learned at university, then the solution method used during that time? Is obviously not used. The RNN has the same idea, since it can remember important information, but it can forget it certainly when it doesn't. But what is most appropriate to filter information in the neural network? It is certainly the activation function, and therefore an activation function, which may be tanh or ReLU, or others, is used herein to do a non-linear mapping to filter information.

Suppose that the user has graduated about the great four times, and is going to participate in the research, if the user asks for the research to participate in the research, he should remember what you have learned and then go to the research, or take several books directly to participate in the research? It is clear that the idea of that RNN is to take the memory s of the current moment in time of prediction_tAnd (4) performing de-prediction. If you want to predict the probability of the next word of "i am china", it is obvious here that using softmax to predict the probability of each word is not enough, but prediction cannot be directly carried out by using a matrix, and all predictions are carried by a weight matrix V, which is expressed by the formula:

ot＝softmax(V*s_t)

wherein o is_tThe output at time t is indicated.

It should be noted that the RNN shown in fig. 5 is only an example of a recurrent neural network, and in a specific application, the recurrent neural network may also exist in the form of other network models.

Fig. 6 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present disclosure. The chip includes a neural Network Processor (NPU) 50. The chip may be provided in the execution device 110 as shown in fig. 4 to complete the calculation work of the calculation module 111. The chip may also be disposed in the training apparatus 120 as shown in fig. 4 to complete the training work of the training apparatus 120 and output the target model/rule 101. The algorithm in the recurrent neural network shown in fig. 5 can be implemented in a chip as shown in fig. 6.

The text translation method according to the embodiment of the present application may be specifically executed in the arithmetic circuit 503 and/or the vector calculation unit 507 in the NPU50, so as to obtain a machine translation.

The various modules and units in the NPU50 are briefly described below.

The NPU50 as a coprocessor may be mounted on a main CPU (host CPU) and tasks are allocated by the main CPU. The core of NPU50 is arithmetic circuit 503, and controller 504 in NPU50 can control arithmetic circuit 503 to fetch data in memory (weight memory or input memory) and perform arithmetic when NUP 50 is in operation.

In some implementations, the arithmetic circuit 503 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuitry 503 is a two-dimensional systolic array. The arithmetic circuit 503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 503 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to matrix B from the weight memory 502 and buffers each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 501 and performs matrix operation with the matrix B, and partial or final results of the obtained matrix are stored in an accumulator (accumulator) 508.

The vector calculation unit 507 may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 507 may be used for network calculation of non-convolution/non-fully connected layers (FC) in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 507 can store the processed output vector to the unified buffer 506. For example, the vector calculation unit 507 may apply a non-linear function to the output of the arithmetic circuit 503, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 507 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 503, for example for use in subsequent layers in a neural network.

The unified memory 506 is used to store input data as well as output data.

The weight data directly passes through a memory unit access controller 505 (DMAC) to transfer input data in the external memory to the input memory 501 and/or the unified memory 506, store the weight data in the external memory in the weight memory 502, and store the data in the unified memory 506 in the external memory.

A Bus Interface Unit (BIU) 510, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 509 through a bus.

An instruction fetch buffer 509 connected to the controller 504 for storing instructions used by the controller 504.

The controller 504 is configured to call the instruction cached in the instruction storage 509 to implement controlling the working process of the operation accelerator.

Generally, the unified memory 506, the input memory 501, the weight memory 502, and the instruction fetch memory 509 may all be on-chip memories. The external memory of the NPU may be a memory external to the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memory.

FIG. 7 is a schematic diagram of a neural network based text translation model. As shown in fig. 7, the text translation model includes an encoder 710, a decoder 720, and a portion 730 related to attention mechanism. Wherein, the encoder 710 is used for reading the source text (e.g., I am a student) and generating a numerical representation for each source language word (e.g., I, am, a, student) included in the source text; via the decoder 720, the source text is translated into a translation, i.e. a sentence in the target language (e.g. Je sui tudiant); the attention mechanism related portion 730 is used to dynamically provide attention weights for the decoder in generating the target word at different times based on the output of the encoder and the state of the decoder, wherein the decoder state can be characterized by the intermediate output of the decoder.

Wherein the attention weight may be a value used to characterize how relevant each source language word in the source text is to the decoder state at the current time. For example, at a certain time, if the attention weights of 4 source language words corresponding to the source text are 0.5, 0.3, 0.1, and 0.1, respectively, it may indicate that the correlation degrees of the 4 source language words with the decoder state are 0.5, 0.3, 0.1, and 0.1, respectively, and the correlation degree of the first source language word with the decoder state is the highest, and the probability that the decoder is currently generating the target word corresponding to the first source language word is the highest.

In order to enable some source language words in an input source text to be translated correctly, the current neural machine translation supports manual intervention on the translation result of the neural machine translation. One way is to add known correct translations as constraints to the neural-machine translation and to ensure that the target words included in the constraints will definitely appear in the final output translated text.

FIG. 8 is a flow diagram of a text translation process. The text translation model shown in fig. 8 may be the text translation model shown in fig. 7, and in this flow, known correct translations may be added as constraints to the neural-machine translation. Specifically, K candidate translations are used as input at each moment, and each candidate translation is decoded through a text translation model to obtain a score of each candidate word in a preset candidate word set; expanding the candidate translation according to the obtained value of the candidate word and a preset constraint set to obtain a new candidate translation set; then selecting a certain number of candidate translations from the new candidate translation set as input of the next step; if the decoding termination condition is met, outputting a translation result, namely the translation of the source text, and if the decoding termination condition is not met, further expanding the selected candidate translations with a certain number.

Wherein the constraint set includes one or more constraints that may characterize a correct translation or correct translation of at least a portion of the source text. Optionally, the at least part of the source text may be a source language word or a source language phrase comprised by the source text. The set of constraints may be given manually by the user. The candidate translation is an intermediate result or a final result of translating the source text. For example, translating "I graduated from Hefei University of Technology" into English, "I graduated from Hefei University of Technology" in the constraint set, and the correct translation "Hefei University of Technology" including "Hadamard" in the constraint set, where "I", "I graduated from" and the like are intermediate results of translating "I graduated from" in the translation process, and when "I" is used as a candidate, a new candidate obtained by expanding "I" may include "I graduated", "I Hefei", and the like, and further, the obtained new candidate may be continuously expanded; after the candidate translation "I gradated from Hefei University of Technology" is obtained, the candidate translation is the final result of translating the source text.

According to the method for generating the machine translation, the candidate translations are constrained and expanded by using the constraint in the constraint set, and meanwhile, the covering condition of the new candidate translations on the constraint set is considered in the selection stage, so that constrained target words are ensured to appear in the translation output finally. However, in the above method for generating machine translation, each candidate translation may be expanded using all constraints in the constraint set, for example, "Hefei" may also be added to the new candidate translation set in the process of generating "I", and "I Hefei", "Hefei University" and the like may also be generated in the process of generating "igradated", which obviously results in waste of time and space. Therefore, how to use these constraints efficiently and accurately becomes an urgent problem to be solved.

In view of the above problems, embodiments of the present application provide a method and an apparatus for text translation, which can efficiently and accurately use constraints when performing machine translation, thereby increasing translation speed and reducing waste of space.

The text translation method according to the embodiment of the present application is described in detail below with reference to the drawings. The text translation method according to the embodiment of the present application may be executed by devices such as the data processing device in fig. 1, the user device in fig. 2, the execution device 210 in fig. 3, and the execution device 110 in fig. 4.

Fig. 9 is a schematic diagram of a text translation process provided in an embodiment of the present application. As shown in fig. 9, compared to the text translation process shown in fig. 8, a step of selecting a constraint is added for filtering the constraint set. In the process shown in fig. 8, a complete constraint set corresponding to the source text is used for all candidate translations at all times, and in the process shown in fig. 9, constraints related to the candidate translation to be expanded at the current time may be selected at the current time to form a new constraint set, where the new constraint set is a subset of the complete constraint set, and the new constraint set is used to expand the candidate translation to be expanded at the current time. Thus, the process shown in fig. 9 can avoid using all constraints every time the candidate translation is expanded, and can accelerate the expansion of the candidate translation, thereby improving the translation speed.

Fig. 10 is a schematic flow chart of a text translation method provided in an embodiment of the present application. The text translation method according to the embodiment of the present application may be executed by devices such as the data processing device in fig. 1, the user device in fig. 2, the execution device 210 in fig. 3, and the execution device 110 in fig. 4.

At 1010, candidate translations corresponding to the source are obtained.

The candidate translation is a candidate translation to be expanded at the current moment, and may be described by using a target language.

In 1020, constraints in a preset set of constraints are selected based on attention weights output by the text translation model.

The attention weight may be a measure of how relevant each source language word in the source text characterizes the decoder state. For example, at a certain time, if the attention weights of 4 source language words corresponding to the source text are 0.5, 0.3, 0.1, and 0.1, respectively, it may indicate that the correlation degrees of the 4 source language words with the decoder state are 0.5, 0.3, 0.1, and 0.1, respectively, and the correlation degree of the first source language word with the decoder state is the highest, and the probability that the decoder is currently generating the target word corresponding to the first source language word is the highest.

The attention weight is calculated by the text translation model at the current moment. That is, the constraints in the preset constraint set are selected according to the attention weight output by the text translation model at the current moment.

Constraints in the present application may characterize correct translations or correct translations of at least a portion of the source text. Optionally, the at least part of the source text may be a source language word or a source language phrase comprised by the source text.

In some embodiments, the constraints may include source side information and destination side information, where the source side may be a source text input side and the destination side may be a translation output side.

For example, the constraints may be in the form of: source language words: the form of the target word, for example, Synergis: HefeiUniversal of Technology.

For another example, the source information included in the constraint is source location information, and the target information is a target word corresponding to the source location information. For example, the constraints may be in the form of: [ position of source language word in source text ]: target words, e.g., [4 ]: hefei University of Technology.

In the present application, there are many ways to select the constraints in the preset constraint set according to the attention weight output by the text translation model, and the present application is not limited specifically.

In some embodiments, according to a source location corresponding to each constraint in the preset constraint set, an attention weight corresponding to each constraint is obtained from the text translation model, and constraints in the preset constraint set are selected according to the obtained attention weight. And the source end position corresponding to the constraint refers to a position in the source text corresponding to the constraint. The source location information included with the constraint may indicate where in the source text the constraint characterizes the correct translation of the source word.

For example, the constraint set is { [4 ]: hefei University of Technology, [6 ]: SushmaSwaraj, the two constraints correspond to the source end position 4 and the source end position 6 respectively, attention weights corresponding to the source end position 4 and the source end position 6 are obtained from the text translation model, and the two constraints are selected according to the obtained weights.

Optionally, after obtaining the attention weight corresponding to each constraint in the constraint set from the text translation model, the obtained attention weight may be processed to obtain a heuristic signal of each constraint, and the constraint in the preset constraint set is selected according to the heuristic signal of each constraint. Wherein a heuristic is used to indicate whether to use a constraint corresponding to the heuristic when expanding the candidate translation. For example, the acquired attention weights may be respectively compared with preset thresholds, it is determined that the heuristic of the constraint corresponding to the attention weight greater than or equal to the preset threshold indicates to use the constraint when expanding the candidate translation at the current moment, and the heuristic of the constraint corresponding to the attention weight smaller than the preset threshold indicates not to use the constraint when expanding the candidate translation at the current moment. For another example, when the text translation model calculates a plurality of attention weights for the same source location at each time, the acquired plurality of attention weights corresponding to the same source location may also be processed, for example, the plurality of attention weights are summed, averaged, and the maximum top Q or other more complex processing is performed; and determining a heuristic for each constraint based on the processing result.

In other embodiments, the attention weight corresponding to each source end position at the current time may be obtained from the text translation model, and then a target attention weight meeting a preset requirement is selected from the obtained attention weights, so as to select constraints in the preset constraint set according to the source end position corresponding to the target attention weight and the source end position corresponding to each constraint in the preset constraint set. The source end position referred to herein is a position in the source text, specifically, the source end position corresponding to the target attention weight is a position of a word in the source text corresponding to the target attention weight, and the source end position corresponding to the constraint is a position of a word in the source text corresponding to the constraint.

For example, the constraint set is { [4 ]: hefei University of Technology, [6 ]: SushmaSwaraj, the attention weights of the current time corresponding to the source end position 1 to the source end position 6 obtained from the text translation model are respectively 0.01, 0.95, 0.01 and 0.01, the 6 attention weights are compared with a preset threshold value of 0.5, the target attention weight is obtained and comprises the attention weight corresponding to the source end position 4, and the source end position is selected from a constraint set as the constraint of the source end position 4, namely [4 ]: hefei University of Technology, further using constraint [4 ]: the Hefei University of technology expands the candidate translations.

In some embodiments, some constraints may include multiple qualifiers, for example, constraint [4] corresponding to "Dow": the Hefei University of Technology includes Hefei, University, of and Technology 4 qualifiers. In a scenario of expanding candidate translation word by word, there is a partial qualifier that may already use a certain constraint in the candidate translation to be expanded at the current time, and a state of the candidate translation at this time may be referred to as being in the constraint, for example, the candidate translation at the current time is "I gradated from Hefei", and the constraint [4] has already been used: "Hefei" in Hefei University of technology. In view of the above, the present application may combine the attention weight calculated by the text translation model and the state of the current candidate translation to select the constraint in the preset constraint set.

In a possible implementation manner, constraints in a preset constraint set may be selected according to the attention weight calculated by the text translation model and the state of the candidate translation, and a target constraint for expanding the candidate translation at the current time satisfies at least one of the following conditions: the attention weight corresponding to the target constraint meets the preset requirement; the candidate translations are within the constraints.

When the candidate translation to be expanded at the current moment is obtained by using partial word expansion of the target phrase, the candidate translation to be expanded is in the constraint, and the target phrase is a phrase corresponding to a certain constraint in a preset constraint set.

The preset requirement may be that the heuristic signal of the corresponding constraint is greater than a preset threshold value so as to indicate that the constraint is used when the candidate translation at the current time is expanded, and the like.

The method for selecting the constraint provided by the embodiment of the present application is described below with reference to specific examples. Fig. 11 is a schematic flow chart of a method for selecting constraints provided by an embodiment of the present application. N in fig. 11 represents the number of attention weights, and M represents the number of constraints in a preset constraint set.

As shown in FIG. 11, at 1110, a confidence is computed for each constraint in a preset set of constraints.

Specifically, from the N attention weights { aw), according to the N source end position information included in the k-th constraint of the M constraints₁,...,aw_NExtract the attention weight of the corresponding position { aw }_i,jI is more than or equal to 1 and less than or equal to N, j is more than or equal to 1 and less than or equal to N, i represents the ith attention weight, j represents the jth source end position, and k is more than or equal to 1 and less than or equal to M; according to the formula { c₁,...,c_L}＝f({aw_i,j}) to obtain L confidence coefficients { c₁,...,c_L}。

Wherein the function f may be a simple function. For example, the function f is a summation function, an averaging function, or the like. The function f may also be a complex function. For example, the function f may be a neural network or the like.

At 1120, a heuristic for constraint k is computed based on the L confidences obtained at 1110.

In particular, it can be according to the formula h_k＝g({c₁,...,c_LH) calculating a heuristic signal that yields a constraint k, where h_kA heuristic signal representing constraint k. Heuristic signal h_kThere may be two values, each indicating whether the constraint k is used when the current candidate translation is extended. For example, the two values of the heuristic signal are 1 and 0, when the value of the heuristic signal is 1, it indicates that the constraint k is used when the current candidate translation is extended, and when the value of the heuristic signal is 0, it indicates that the constraint k is not used when the current candidate translation is extended.

Wherein the function g may be a simple function. E.g., summing, averaging, etc., and then comparing with a preset threshold, returning a 1 if greater than the preset threshold, otherwise returning a 0. The function g may also be a complex function. E.g., a neural network that outputs a 1 or 0, etc.

At 1130, a predetermined set of constraints is filtered or selected based on the heuristics for each constraint obtained at 1120 and the state of the candidate translation.

In particular, assume that s is used_kIndicating the status of the candidate translation when s_k0 means that the current candidate translation is in the k-th constraint, s _k1 means that the current candidate translation is not in the kth constraint, the new set of constraints may be expressed as:

{k|s_k0 or h_k＝1}

That is, if the kth constraint satisfies any one of the following two conditions, the kth constraint is a target constraint for expanding the candidate translation at the current time, and is added to the new constraint set:

condition 1: the current candidate translation is in this constraint;

condition 2: the heuristic for this constraint is 1.

According to the technical scheme, before the candidate translation is expanded, the constraints in the preset constraint set are selected or filtered, the constraints with low correlation degree with the current decoder state can be ignored, and therefore the translation speed is accelerated while the quality of the candidate translation is not influenced.

At 1030, when a target constraint for expanding the candidate translation is selected, the candidate translation is expanded according to the target constraint. Or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

Optionally, when a target constraint for expanding the candidate translation is selected, the candidate translation may be expanded only according to the target constraint, or the candidate translation may be expanded according to the target constraint and the preset candidate word set, which is not specifically limited in the embodiment of the present application.

For example, the candidate translation at the current time is "I gradated from", and the constraint set is { [4 ]: HefeiUniversal of Technology, [6 ]: sushma Swaraj, based on the attention weight and the state of the candidate translation, selects a target constraint [4] for expanding the candidate translation: hefei University of Technology, then according to a preset candidate set and target constraint [4 ]: the Hefei University of Technology expands the candidate translation "I graduatedfrom" without using the constraint [6 ]: sushma Swaraj expands the candidate translation "I-mapped from".

For another example, the candidate translation at the current time is "I gradated", and the constraint set is { [4 ]: HefeiUniversal of Technology, [6 ]: sushma Swaraj, if the target constraint for expanding the candidate translation is not selected according to the attention weight and the state of the candidate translation, the candidate translation "Igradated" is expanded only according to a preset candidate word set.

It should be understood that the source, candidate, and translation may all belong to natural language, which generally refers to a language that naturally evolves with culture. Optionally, the source text belongs to a first natural language, the candidate translation and the translation belong to a second natural language, and the first language and the second language are different natural languages. The source sentence belonging to the first natural language may refer to the source sentence being a piece of text expressed in the first natural language, the candidate translation and the translation belonging to the second natural language, and may refer to the candidate translation and the translation being a piece of text expressed in the second natural language. The source and candidate translations may belong to any two different types of natural language.

The text translation method of the present application is described in more detail below with reference to specific examples.

Example 1

The source language is 'I' graduation from joint great ', and comprises 4 source language words' I 'from joint great' and 'I'.

The constraint set is "{ [4 ]: hefei University of Technology } "contains only one constraint that corresponds to the correct translation of the 4 th source language word" Synergite ".

The generation of a correct translation "I graduated from Hefei University of Technology" may be as follows.

1) When target words corresponding to the first three source language words are generated, constraint heuristic signals are all 0, and constraint expansion is not used.

For example, the input candidate translation "I graduated" is expanded to obtain the attention weight, the attention weights corresponding to 4 source end positions are [0.01, 0.01, 0.97, 0.01], the confidence coefficient of the constraint [4] is 0.01 and is lower than a preset threshold value 0.5, an enlightening signal 0 is generated, the "I graduated" is expanded according to a preset candidate word set, and the new candidate translation "I graduated from" is obtained after selection.

2) And expanding the input candidate translation text 'I gradated from'.

Acquiring attention weights, wherein the attention weights corresponding to 4 source end positions are respectively [0.01, 0.01, 0.01 and 0.97], the confidence coefficient of the constraint [4] is 0.97 and is higher than a preset threshold value 0.5, generating an enlightening signal 1, expanding the candidate translation text from according to a first qualifier 'Hefei' in a preset candidate word set and the constraint [4], and selecting to acquire the 'I-scaled from Hefei'.

3) And expanding the input candidate translation text 'I gradated from Hefei'.

And acquiring attention weights, wherein the attention weights corresponding to the 4 source end positions are [0.25, 0.25, 0.25 and 0.25], respectively, the confidence coefficient of the constraint [4] is 0.25 and is lower than a preset threshold value 0.5, and a heuristic signal 0 is generated. However, because the current candidate translation is in the constraint, the candidate translation "I scaled from Hefei" is still expanded according to the second qualifier "University" in the constraint [4], and "I scaled from hefeiiuniversity" is obtained after selection.

4) Continuing the expansion, because the candidate translation "I graduated from Hefei University" is in the constraint, the candidate translation is expanded using the third qualifier "of" and the fourth qualifier "Technology" in the constraint [4] in sequence, resulting in "I graduated from Hefei University of Technology".

5) The decoding is terminated and the translation result "I gradated from Hefei University of technology" is output.

Example 2

1) When the first three target words are generated, the heuristic signals of the constraint are all 0, which indicates that constraint expansion is not used.

2) And expanding the input candidate translation text 'I gradated from'.

Obtaining attention weights, wherein the attention weights corresponding to 4 source end positions are respectively [0.01, 0.01, 0.01 and 0.97], the confidence coefficient of the constraint [4] is 0.97 and is higher than a preset threshold value 0.5, generating an enlightening signal 1, expanding the candidate translation text "I gradated from" according to a preset candidate word set and all limiting words of the constraint [4], and selecting to obtain the "I gradated from Hefei University of Technology".

3) The decoding is terminated and the translation result "I gradated from Hefei University of technology" is output.

According to the technical scheme, when the candidate translation is expanded, the candidate translation is expanded according to the target constraint obtained after selection or filtering, so that the situation that all constraints are used when the candidate translation is expanded every time can be avoided, and when the target constraint is not selected, the candidate translation is expanded only according to the preset candidate word set, and the situation that the candidate translation is still expanded by using the constraint when the constraint is not needed can be avoided. Therefore, the technical scheme of the application can accelerate the expansion of the candidate translation, so that the translation speed is further improved.

While the embodiments of the method of the present application have been described in detail with reference to the accompanying drawings, and the embodiments of the apparatus of the present application have been described with reference to fig. 12 to 14, it should be understood that the respective apparatuses described in fig. 12 to 14 are capable of performing the respective steps of the text translation method of the embodiments of the present application, and the repeated descriptions are appropriately omitted when the embodiments of the apparatus of the present application are described below.

Fig. 12 is a schematic configuration diagram of a text translation apparatus according to an embodiment of the present application. The text translation apparatus 1200 may correspond to the data processing device shown in fig. 1 or the user device shown in fig. 2. The text translation apparatus 1200 may also correspond to the execution device 210 shown in fig. 3 or the execution device 110 shown in fig. 4.

The apparatus 1200 may include an acquisition module 1210 and a processing module 1220. The modules included in the apparatus 1200 may be implemented by software and/or hardware.

Optionally, the obtaining module 1210 may be a communication interface, or the obtaining module 1210 and the processing module 1220 may be the same module.

In the present application, the apparatus 1200 may be used to perform the steps in the method described in fig. 10.

For example:

an obtaining module 1210, configured to obtain a candidate translation corresponding to a source document;

the processing module 1220 is configured to select a constraint in a preset constraint set according to the attention weight calculated by the text translation model, where the constraint represents a correct translation of at least a part of the source text;

the processing module 1220 is further configured to, when a target constraint for expanding the candidate translation is selected, expand the candidate translation according to the target constraint; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

Optionally, the processing module 1220 is specifically configured to obtain, from the text translation model, attention weights respectively corresponding to each constraint according to a source end position corresponding to each constraint in the preset constraint set, where the source end position is a position of a word corresponding to each constraint in the source text; selecting constraints in the preset constraint set according to the attention weight corresponding to each constraint.

Optionally, the processing module 1220 is specifically configured to process the attention weight corresponding to each constraint to obtain a heuristic for each constraint, where the heuristic is used to indicate whether to use a constraint corresponding to the heuristic when expanding the candidate translation; and selecting constraints in the preset constraint set according to the heuristic signal corresponding to each constraint.

Optionally, the processing module 1220 is specifically configured to select a target attention weight meeting a preset requirement from the attention weights; and selecting constraints in a preset constraint set according to a source end position corresponding to the target attention weight and a source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

Optionally, the processing module 1220 is specifically configured to select a constraint in a preset constraint set according to the attention weight calculated by the text translation model and a state of the candidate translation, where the state of the candidate translation includes a constraint and a constraint that is not included, and in a case that the candidate translation is obtained by using partial word expansion of a target phrase, the state of the candidate translation is in the constraint, and the target phrase is a target end phrase corresponding to the constraint in the preset constraint set;

the target constraint satisfies at least one of the following conditions:

the attention weight corresponding to the target constraint meets a preset requirement;

the state of the candidate translation is in a constraint.

It should be understood that the text translation apparatus 1200 shown in fig. 12 is only an example, and the apparatus of the embodiment of the present application may further include other modules or units.

The obtaining module 1210 may be implemented by a communication interface or a processor. The processing module 1220 may be implemented by a processor. For specific functions and beneficial effects of the obtaining module 1210 and the processing module 1220, reference may be made to the related description of the method embodiment, and details are not described herein again.

Fig. 13 is a schematic configuration diagram of a text translation apparatus according to another embodiment of the present application. The text translation apparatus 1300 may be a data processing device shown in fig. 1 or a user device shown in fig. 2. The text translation apparatus 1300 may also correspond to the execution device 210 shown in fig. 3 and the execution device 110 shown in fig. 4.

As shown in fig. 13, text translation apparatus 1300 may include a memory 1310 and a processor 1320. Only one memory and processor are shown in fig. 13. In an actual text translation device product, there may be one or more processors and one or more memories. The memory may also be referred to as a storage medium or a storage device, etc. The memory may be provided independently of the processor, or may be integrated with the processor, which is not limited in this embodiment.

The memory 1310 and the processor 1320 communicate with each other, passing control and/or data signals, through internal connection paths.

Specifically, a memory 1310 for storing a program;

a processor 1320 for executing the program stored in the memory 1310, wherein when the program stored in the memory 1310 is executed by the processor 1320, the processor 1320 is configured to:

acquiring a candidate translation corresponding to a source text;

selecting constraints in a preset constraint set according to attention weights calculated by a text translation model, wherein the constraints represent correct translations of at least part of the source text;

when target constraints for expanding the candidate translations are selected, expanding the candidate translations according to the target constraints; or when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set.

The text translation apparatus 1300 may further include an input/output interface 1330, where the text translation apparatus 1300 may obtain the source text through the input/output interface 1330, specifically, obtain the source text from another device (for example, a terminal device) through the input/output interface 1330, and after obtaining the source text, obtain the machine translation through the processing of the processor 1320. The text translation apparatus 1300 can transmit the machine translation to another device through the input-output interface 1330.

It should be understood that the text translation apparatus 1300 shown in fig. 13 is only an example, and the apparatus of the embodiment of the present application may further include other modules or units.

For specific working processes and beneficial effects of the text translation apparatus 1300, reference may be made to the related description in the method embodiment, and details are not described herein again.

Fig. 14 is a schematic diagram of a machine translation system 1400 provided by an embodiment of the present application.

The machine translation system 1400 may correspond to the data processing device shown in fig. 1 or the user device shown in fig. 2, among others. The machine translation system 1400 may also correspond to the execution device 210 shown in fig. 3, the execution device 110 shown in fig. 4.

As shown in fig. 14, machine translation system 1400 may include a memory 1410 and a processor 1420. Only one memory and processor are shown in fig. 14. In an actual machine translation system product, there may be one or more processors and one or more memories. The memory may also be referred to as a storage medium or a storage device, etc. The memory may be provided independently of the processor, or may be integrated with the processor, which is not limited in this embodiment.

The memory 1410 and the processor 1420 communicate with each other, passing control and/or data signals, through internal connection paths.

Specifically, a memory 1410 for storing programs;

a processor 1420 configured to execute the programs stored in the memory 1410, wherein when the programs stored in the memory 1410 are executed by the processor 1420, the processor 1420 is configured to:

acquiring a candidate translation corresponding to a source text;

The machine translation system 1400 may further include an input/output interface 1430, where the text translation apparatus 1400 may obtain the source text through the input/output interface 1430, specifically, obtain the source text from another device (for example, a terminal device) through the input/output interface 1430, and after obtaining the source text, obtain the machine translation through the processing of the processor 1420. The machine translation system 1400 may be capable of transmitting the machine translation to other devices via the input-output interface 1430.

It should be understood that the machine translation system 1400 shown in fig. 14 is only an example, and the machine translation system of the embodiments of the present application may further include other modules or units.

The specific operation and beneficial effects of the machine translation system 1400 can be referred to the related description in the method embodiment, and are not described herein again.

It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of text translation, comprising:

acquiring a candidate translation corresponding to a source text;

when target constraints for expanding the candidate translations are selected, expanding the candidate translations according to the target constraints; alternatively, the first and second electrodes may be,

and when the target constraint for expanding the candidate translation is not selected, expanding the candidate translation according to a preset candidate word set, wherein the candidate word set comprises a plurality of words of a target language, and the target language is the language to which the candidate translation belongs.

2. The method of claim 1, wherein selecting constraints from a preset set of constraints based on attention weights calculated by a text translation model comprises:

acquiring attention weights respectively corresponding to each constraint from the text translation model according to a source end position corresponding to each constraint in the preset constraint set, wherein the source end position is a position of a word corresponding to each constraint in the source text;

selecting constraints in the preset constraint set according to the attention weight corresponding to each constraint.

3. The method of claim 2, wherein said selecting constraints of said preset set of constraints according to said attention weight corresponding to said each constraint comprises:

processing the attention weight corresponding to each constraint to obtain a heuristic signal of each constraint, wherein the heuristic signal is used for indicating whether to use the constraint corresponding to the heuristic signal when expanding the candidate translation;

and selecting constraints in the preset constraint set according to the heuristic signal corresponding to each constraint.

4. The method of claim 1, wherein selecting constraints from a preset set of constraints based on attention weights calculated by a text translation model comprises:

selecting a target attention weight meeting a preset requirement from the attention weights;

and selecting constraints in a preset constraint set according to a source end position corresponding to the target attention weight and a source end position corresponding to each constraint in the preset constraint set, wherein the source end position corresponding to the target attention weight is the position of a word corresponding to the target attention weight in the source text, and the source end position corresponding to each constraint is the position of the word corresponding to each constraint in the source text.

5. The method of claim 1, wherein selecting constraints from a preset set of constraints according to the attention weights calculated by the text translation model comprises:

selecting constraints in a preset constraint set according to attention weights calculated by a text translation model and states of the candidate translations, wherein the states of the candidate translations include the constraint and the constraint, and under the condition that the candidate translations are obtained by using partial word extension of target phrases, the states of the candidate translations are in the constraint, and the target phrases are target end phrases corresponding to the constraints in the preset constraint set;

the target constraint satisfies at least one of the following conditions:

the state of the candidate translation is in a constraint.

6. A text translation apparatus, comprising:

a memory for storing a program;

a processor for executing the memory-stored program, the processor being configured to, when the memory-stored program is executed by the processor:

acquiring a candidate translation corresponding to a source text;

7. The apparatus of claim 6, wherein the processor is specifically configured to:

8. The apparatus of claim 7, wherein the processor is specifically configured to:

9. The apparatus of claim 6, wherein the processor is specifically configured to:

10. The apparatus of claim 6, wherein the processor is specifically configured to:

the target constraint satisfies at least one of the following conditions:

the state of the candidate translation is in a constraint.

11. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a program code comprising instructions for performing part or all of the steps of the method according to any one of claims 1 to 5.