CN109933808B - Neural machine translation method based on dynamic configuration decoding - Google Patents

Neural machine translation method based on dynamic configuration decoding Download PDF

Info

Publication number
CN109933808B
CN109933808B CN201910095193.6A CN201910095193A CN109933808B CN 109933808 B CN109933808 B CN 109933808B CN 201910095193 A CN201910095193 A CN 201910095193A CN 109933808 B CN109933808 B CN 109933808B
Authority
CN
China
Prior art keywords
decoding
decision model
model
sentence
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910095193.6A
Other languages
Chinese (zh)
Other versions
CN109933808A (en
Inventor
王强
李炎洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Yayi Network Technology Co ltd
Original Assignee
Shenyang Yayi Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Yayi Network Technology Co ltd filed Critical Shenyang Yayi Network Technology Co ltd
Priority to CN201910095193.6A priority Critical patent/CN109933808B/en
Publication of CN109933808A publication Critical patent/CN109933808A/en
Application granted granted Critical
Publication of CN109933808B publication Critical patent/CN109933808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a neural machine translation method based on dynamic configuration decoding, which is characterized in that a decision model based on a convolutional neural network is added on the basis of a Transformer model, coding information obtained by coding is used as input and is sent to the decision model, the decision model carries out convolution, pooling and normalization processing on the coding information, and corresponding decoding configuration is output; decoding by using a trained decoder according to the decoding configuration, and scoring the selected decoding configuration; improving the decision model by adopting a reinforcement learning method according to the scoring result to obtain a trained decision model; and translating by adopting a trained improved self-attention mechanism model, and outputting a translated text with higher accuracy. The invention uses a small decision model with low training cost, and is obtained by training the trained machine translation model in an end-to-end mode without retraining the whole machine translation model.

Description

Neural machine translation method based on dynamic configuration decoding
Technical Field
The invention belongs to the technical field of machine translation, and relates to a neural machine translation method based on dynamic configuration decoding.
Background
Neural machine translation techniques currently employ neural networks based on an encoder-decoder framework for modeling. First, the input source sentence uses the encoder of the network to obtain a vector of fixed dimensions, and then the decoder of the network uses this vector to generate the corresponding translation results word by word. This approach has achieved optimal translation performance in the inter-translation of many different languages.
When a decoder of a neural network generates translation results, there are typically many parameters that control the behavior of the decoder. For example, the decoder may generate a plurality of possible translation results and corresponding scores. Generally we will pick the translation result with the highest score, but in many cases the network performance is not good enough, and we need to make some adjustments to these scores using the length ratio parameter to prevent translation results that are too short or too long from being picked. An example of score adjustment using length ratios is as follows:
correct answer: she had many beautiful clothes
Translation result 1: she had many beautiful clothes
Result 1 scored: -0.1-0.2-0.15-0.13-0.1
Translation result 2: there are many clothes
Result 2 scoring: -0.12-0.15-0.1
For translation result 1, its total score is (-0.1 + -0.2+ -0.15+ -0.13+ -0.1)/5 = -0.68/5= -0.136, where 5 is the length of translation result 1, and the total score of translation result 2 is (-0.12 + -0.15+ -0.1)/3 = -0.37/3= -0.123. Because translation result 2 scores higher than translation result 1, the decoder will pick translation result 2 as the final output. Obviously, translation result 1 is closer to the correct answer, while translation result 2 is too short in comparison. The length ratio parameter takes the length of the translation result into account on the basis of the total score. With a length ratio equal to 1.5, the score for translation result 1 is now-0.68/5 1.5 = 0.06, where the denominator 5 is the length of the translation result 1, i.e. the number of words. The score of the corresponding translation result 2 became-0.37/3 1.5 And (4) keeping the value of-0.07. Picking on the basis of this score, the decoder will select translation result 1 as the final output.
In addition to the length ratio, the decoder has many other parameters to control its different behavior, such as bundle size to control the range of decoder search, decoding length to limit the number of words in the final translation result, etc. In practice, the decoder usually uses a globally uniform parameter configuration to generate the translation result, i.e. the parameter configuration used by the decoder is not changed regardless of the source sentence. In fact, the optimal parameter configurations of different source sentences are different, for example, some sentences tend to generate short translations, and other sentences tend to generate long translations. An example of using different length ratio settings for different source sentences is as follows:
source language 1: concerning about
Target language 1: take care of
Source language 2: is easier to be
Target language 2: easier
For source language 1, it has only one word and its correct translation has three words, so the decoder should be inclined to generate long translations, i.e. a larger length ratio, when generating translations. Whereas for source language 2, there are two words and the correct translation is only one, so the decoder should tend to generate short translations, i.e. smaller length ratios.
Therefore, a decision method is needed to select the corresponding optimal parameter configuration according to different source sentences.
Disclosure of Invention
The invention aims to provide a neural machine translation method based on dynamic configuration decoding, which aims to solve the problem that a network generates wrong translation results because different parameter configurations cannot be set for different input source sentences in the decoding technology of the neural machine translation in the prior art.
The invention provides a neural machine translation method based on dynamic configuration decoding, which comprises the following steps:
step 1: adding a decision model between an encoder and a decoder of a Transformer model of the self-attention mechanism to form an improved self-attention mechanism model, wherein the decision model is established on the basis of a convolutional neural network;
and 2, step: inputting bilingual sentence-level parallel data, performing word segmentation processing on a source language and a target language respectively to obtain bilingual parallel sentence pairs after word segmentation, and training an encoder and a decoder of an improved self-attention mechanism model;
and step 3: coding a source language sentence of the bilingual parallel sentence pair after word segmentation by using a trained coder according to a time sequence to obtain the state of each time sequence on a hidden layer, namely the coding information of different layers under each time sequence;
and 4, step 4: the obtained coding information is used as input and sent into a decision model, the decision model carries out convolution, pooling and normalization processing on the coding information, and corresponding decoding configuration is output;
and 5: decoding by using a trained decoder according to the decoding configuration output by the decision model, and scoring the selected decoding configuration;
step 6: according to the score given by the evaluation standard, improving the decision model by adopting a reinforcement learning method to obtain a trained decision model;
and 7: and inputting a source sentence into an encoder of the improved self-attention mechanism model, sending the obtained encoding information into the decision model, and translating by a decoder according to decoding configuration output by the decision model.
In the neural machine translation method based on dynamic configuration decoding, the bilingual sentence-level parallel data input in the step 2 is a set of bilingual inter-translated sentence pairs, and each sentence pair consists of a source language sentence and a target language sentence.
In the neural machine translation method based on dynamic configuration decoding, a maximum likelihood method is adopted in step 2 to train an encoder and a decoder of an improved self-attention mechanism model.
In the neural machine translation method based on dynamic configuration decoding of the present invention, the step 3 specifically is:
given a source sentence, the encoder uses N nonlinear transformation layers for encoding, and finally obtains the following encoding information:
Figure BDA0001964331930000041
where N is the number of layers of the nonlinear transform layer included in the encoder, T is the length of the input source statement, and each element of H is a word vector of length C.
In the neural machine translation method based on dynamic configuration decoding of the present invention, the step 4 is specifically:
step 4.1: carrying out convolution operation on input coding information H;
step 4.2: performing pooling operation on the output of the convolution;
step 4.3: repeating the convolution and pooling operations for multiple times to output a three-dimensional tensor
Figure BDA0001964331930000042
Wherein T is 1 <T,N 1 < N, usemax-over-time firing method at T of three-dimensional tensor U 1 Dimension reduction processing is carried out on the dimensions to obtain a two-dimensional matrix
Figure BDA0001964331930000043
Step 4.4: reconstruction of U 1 Is a one-dimensional vector
Figure BDA0001964331930000044
Wherein L = N 1 ×C 1 Then put U 2 The method is input into the full connection layer processing, and the following calculation is carried out:
Z=W 2 ·f(W 1 ·U 2 +b 1 )+b 2
wherein W 1 Is a real matrix of shape (D, L), b 1 Is a real vector of length D, W 2 Is a matrix of real numbers of shape (O, D), b 2 Is a real number vector with length of O, Z is a real number vector with length of O, and O is the number of all optional configurations, and f is a nonlinear activation function;
step 4.5: and substituting Z into the softmax function to obtain a real number vector P with the length of O, wherein each element of P represents the probability of the corresponding configuration to be selected, and selecting the configuration with the highest probability as decoding configuration output.
In the neural machine translation method based on dynamic configuration decoding of the present invention, the step 5 specifically is:
step 5.1: decoding by adopting a beam search method;
step 5.2: and scoring the translation result by adopting a BLEU evaluation index.
In the neural machine translation method based on dynamic configuration decoding of the present invention, the step 6 specifically adopts a policy gradient method or a Q learning method to improve the decision model.
The neural machine translation method based on dynamic configuration decoding at least has the following beneficial effects:
1. the method introduces a new decision-making model in the machine translation model, and can automatically generate proper decoding configuration according to different source language inputs.
2. The invention uses a small decision model with low training cost, and is obtained by training the trained machine translation model in an end-to-end mode without retraining the whole machine translation model.
Drawings
FIG. 1 is a flow chart of a method for neural machine translation based on dynamic configuration decoding of the present invention;
FIG. 2 is a schematic diagram of the improved self-attention mechanism model of the present invention;
FIG. 3 is a block diagram of a decision model of the present invention.
Detailed Description
Fig. 1 shows a neural machine translation method based on dynamic configuration decoding, which includes the following steps:
step 1: adding a decision model between an encoder and a decoder of a Transformer model of the self-attention mechanism to form an improved self-attention mechanism model, wherein the decision model is built on the basis of a convolutional neural network.
Fig. 2 is a schematic structural diagram of an improved self-attention mechanism model of the present invention, in which a decision model is added on the basis of a Transformer model, a suitable decoding configuration is automatically generated by the decision model according to different source language inputs, and a decoder performs a decoding operation according to the decoding configuration, so that the accuracy of translation can be improved.
Step 2: inputting bilingual sentence-level parallel data, performing word segmentation processing on a source language and a target language respectively to obtain bilingual parallel sentence pairs after word segmentation, and training an encoder and a decoder of an improved self-attention mechanism model;
the input bilingual sentence-level parallel data is a set of bilingual inter-translation sentence pairs, and each sentence pair consists of a source language sentence and a target language sentence.
In specific implementation, a maximum likelihood method is adopted to train an encoder and a decoder of the improved self-attention mechanism model.
And step 3: coding a source language sentence of the bilingual parallel sentence pair after word segmentation by using a trained coder according to a time sequence, and acquiring the state of each time sequence on a hidden layer, namely the coding information of different layers under each time sequence;
given a source sentence, the encoder encodes the source sentence by using N nonlinear transformation layers to finally obtain encoded information
Figure BDA0001964331930000061
Where H is a matrix of shape (N, T), N is the number of non-linear transform layers included in the encoder, T is the length of the input source sentence, and each element of H is a word vector of length C.
And 4, step 4: and the obtained coding information is used as input and is sent into a decision model, the decision model carries out convolution, pooling and normalization processing on the coding information, and corresponding decoding configuration is output. FIG. 3 is a block diagram of a decision model of the present invention, which is built based on a convolutional neural network, and includes a plurality of convolutional layers and pooling layers. The decision model is modeled as a multi-class discriminator, each class corresponding to a different decoding configuration. The following is an example of a decision model generating decoding arrangement:
bundle size: 5,10
Length ratio: 0.9,1,1.1
Possible decoding configurations/categories: (5,0.9), (5,1), (5,1.1), (10,0.9), (10,1), (10,1.1)
In this example, the decoding configuration has two parameters, a beam size and a length ratio, each having a different value, e.g., a beam size of 5 or 10 and a length ratio of 0.9,1 and 1.1. For the decision model, the decoding configuration it can choose is a total of 2 × 3=6, i.e., 6 classes. When a decision model selects one of the categories, it is equivalent to picking a particular set of decoding configurations.
For a decision model, given the output H of the encoder as the input to the model, the model classifies and outputs probabilities P of choosing different classes, O being the number of classes. Because the decoder can only accept a set of configurations for decoding, the decision model will pick the configuration corresponding to the class with the highest probability as the final output.
The invention designs a special network structure for the decision model. For the input H of the decision model, its shape is (N, T, C), where T is the source language sentence length, N is the number of layers of the encoder, and C is the length of the continuous vector corresponding to each word at each layer. H can be considered as an image, where T and N are the length and width of the image, respectively, and C is the number of color channels of the image. Based on the observation result, the task of the decision model is modeled into an image classification problem, so that a general model structure in the image classification task can be used for reference. The invention is improved on the basis of LeNet-5. Step 4 specifically generates the decoding configuration by the following steps:
step 4.1: carrying out convolution operation on input coding information H;
in particular, the decision model includes J convolution kernels, each convolution kernel having its own weight matrix W and an offset b, W being (3, C) in shape and b being a real vector of length C. Where the input to each convolution kernel is the model input H. The convolution operation will output a matrix A of shape (T-3 +1, N-3+1, J) as shown in the following equation:
Figure BDA0001964331930000071
after the convolution operation output result a is obtained, the model performs a nonlinear transformation on each element thereof, typically using a ReLU activation function, as shown in the following formula:
A′=max(A,0)
where A is the result of the convolution operation and A' is a matrix of real numbers that is the same shape as A.
Step 4.2: after the convolution operation, the decision model performs a pooling operation on the output a' of the convolution. Pooling maximizes A' using a window of shape (3,3) to obtain a pooling operation output M, as shown in the following equation:
Figure BDA0001964331930000081
where M is a real matrix shaped as (T-6 +2, N-6+2, J). The model then repeats the process of convolution-pooling a number of times.
Step 4.3: repeating the convolution and pooling operations for multiple times to output a three-dimensional tensor
Figure BDA0001964331930000082
Wherein T is 1 <T,N 1 < N due to T 1 The size of the source language sentence is related to T, the length T of different source language sentences is different, and therefore the decision model uses the max-over-time posing method to determine the T of the three-dimensional tensor U 1 Dimension reduction processing is carried out on the dimension to obtain a two-dimensional matrix with fixed size
Figure BDA0001964331930000083
As shown in the following equation:
Figure BDA0001964331930000084
step 4.4: reconstruction of U 1 Is a one-dimensional vector
Figure BDA0001964331930000085
Wherein L = N 1 ×C 1 I.e. by connecting U 1 All row vectors of (A) constitute (U) 2 . Then put U 2 The method is input into the full connection layer processing, and the following calculation is carried out:
Z=W 2 ·f(W 1 ·U 2 +b 1 )+b 2
wherein W 1 Is a matrix of real numbers of shape (D, L), b 1 Is a real vector of length D, W 2 Is a matrix of real numbers of shape (O, D), b 2 Is a real number vector with length of O, Z is a real number vector with length of O, and O is the number of all optional configurations, and f is a nonlinear activation function;
in a specific implementation, the same activation function ReLU as the convolution operation is used.
Step 4.5: and substituting Z into the softmax function to carry out normalization processing, and obtaining a real number vector P with the length of O. Each element of P represents the probability of the corresponding configuration to be chosen, the configuration with the highest probability being selected as the decoding configuration output Config.
The probability distributions for the different decoding configurations are calculated specifically by:
Figure BDA0001964331930000086
wherein e is a natural base number, P i Is the probability value, Z, corresponding to the ith element of P i Is the ith element in the real vector Z.
And 5: decoding by using a trained decoder according to the decoding configuration output by the decision model, and scoring the selected decoding configuration by using a common evaluation standard, wherein the step 5 specifically comprises the following steps:
step 5.1: decoding by adopting a beam searching method;
the decoder uses the coding information H output by the coder, the decoding configuration Config output by the decision model and a special sentence start symbol<sos>(i.e., the 0 th word S of the target language 0 ) As input, a probability distribution vector of a first word of a target language is output
Figure BDA0001964331930000091
Where V is the vocabulary size of the target language, pr 1 Each element of (a) represents a probability of selecting a corresponding target language vocabulary. Decoder according to Pr 1 Scoring all vocabularies of the target language in a scoring mode Score specified by Config, and selecting the first B vocabularies with the highest scores as the first words S of the target language 1 Candidate set Q of 1 ={Q 1,1 ,...,Q 1,B Where B is the number of candidates specified by the decoding configuration Config, i.e. the bundle size. Q 1 Each element in (a) is respectively equal to S 0 Combine to generate B sentences Y 1 ={S 0 Q 1,1 ,...,S 0 Q 1,B In which Y is 1 Each element in the first word is used as a second word S of the calculation target language 2 Corresponding probability distributionVector Pr 2 B different Pr of length V are obtained 2 The vector is reconstructed into a matrix with the shape of (B, V)
Figure BDA0001964331930000092
The decoder is based on Score and
Figure BDA0001964331930000093
scoring and selecting the first B target words with the highest scores as the second words S 2 Candidate set Q of 2 ={Q 2,1 ,...,Q 2,B In which Q 2 Each element of the sentence being associated with the input sentence Y used to calculate it 1 Combining to obtain the probability distribution vector Pr of the third word of the calculated target language 3 Required input sentence set Y 2 ={S 0 Q 1,1 Q 2,1 ,...,S 0 Q 1, B Q 2,B }. By analogy, the decoder generates translation results continuously word by word until the sentence end symbol<eos>The length of the sentence in the selected or input sentence set Y reaches the limit specified by Config, at which time the decoder returns the sentence X with the highest score in Y as the final translation and finishes decoding.
Step 5.2: and scoring the translation result by adopting a BLEU evaluation index.
Resulting in a decoded result X and a reference translation Ref, a common evaluation criterion can be used to score the quality of the translation result X. A common evaluation criterion is typically the BLEU value. It calculates the accuracy of the different n-grams between the translation results and the reference answers, where n-grams are short sequences of n words. The following is an example of calculating n-gram accuracy:
original text: weather today
And (3) translation results: is is is is a
And (3) referencing an answer: today is a nice day
Now the accuracy of the 1-gram is calculated, the 1-gram occurring simultaneously in the translation result and the reference answer has is and a, the number of times they occur in the translation result is 4 and 1, respectively, and the one having the least number of times they occur in the translation result and the reference answer is 1 and 1, respectively, then the final 1-gram accuracy is (1+1)/(4+1) =2/5.BLEU is the average accuracy of 1-gram,2-gram,3-gram and 4-gram.
Step 6: according to the score given by the evaluation standard, improving the decision model by adopting a reinforcement learning method to obtain a trained decision model;
in specific implementation, a strategy gradient method or a Q learning method is adopted to improve the decision model.
After obtaining the output P (probability of different configurations) and Config (configuration selected finally) of the decision model and the score R given by the evaluation criterion, the decision model will use these information for learning. Here, the objective function of the decision model is maxE P [R]Meaning that the result R of the decision made by the decision model based on its own output P is expected to be maximized.
A general neural network can directly learn through an end-to-end method, but the use of the end-to-end method requires that operations involved in the whole calculation process are conductive, and operations involved in a decision model are not conductive, such as a process of obtaining Config from P. The general solution is to transform the objective function of the decision model into maxR × log P by the score function method of the strategy gradient method Config In which P is Config Is the probability that the decision model picks the Config. The meaning of the transformed objective function is that if R is higher, the decision model will adjust its parameters so that Config is selected with higher probability when the same input is encountered next time. If R is lower, the decision model will make the probability of Config corresponding lower at the next prediction, so as to avoid selecting Config.
And 7: and inputting a source sentence into an encoder of the improved self-attention mechanism model, sending the obtained encoding information into the decision model, and translating by a decoder according to decoding configuration output by the decision model.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the scope of the present invention, which is defined by the appended claims.

Claims (7)

1. A neural machine translation method based on dynamic configuration decoding is characterized by comprising the following steps:
step 1: adding a decision model between an encoder and a decoder of a Transformer model of the self-attention mechanism to form an improved self-attention mechanism model, wherein the decision model is established based on a convolutional neural network;
and 2, step: inputting bilingual sentence-level parallel data, performing word segmentation processing on a source language and a target language respectively to obtain bilingual parallel sentence pairs after word segmentation, and training an encoder and a decoder of an improved self-attention mechanism model;
and step 3: coding a source language sentence of the bilingual parallel sentence pair after word segmentation by using a trained coder according to a time sequence to obtain the state of each time sequence on a hidden layer, namely the coding information of different layers under each time sequence;
and 4, step 4: the obtained coding information is used as input and sent into a decision model, the decision model carries out convolution, pooling and normalization processing on the coding information, and corresponding decoding configuration is output;
and 5: decoding by using a trained decoder according to the decoding configuration output by the decision model, and scoring the selected decoding configuration;
step 6: according to the score given by the evaluation standard, improving the decision model by adopting a reinforcement learning method to obtain a trained decision model;
and 7: and inputting a source sentence into an encoder of the improved self-attention mechanism model, sending the obtained encoding information into the decision model, and translating by a decoder according to decoding configuration output by the decision model.
2. The neural-machine translation method based on dynamic configuration decoding of claim 1, wherein the bilingual sentence-level parallel data input in step 2 is a set of bilingual inter-translated sentence pairs, each sentence pair consisting of a source language sentence and a target language sentence.
3. The neural-machine translation method based on dynamic configuration decoding of claim 1, wherein the maximum likelihood method is used in step 2 to train the encoder and decoder of the improved self-attention mechanism model.
4. The neural-machine translation method based on dynamic configuration decoding as claimed in claim 1, wherein said step 3 is specifically:
given a source sentence, the encoder uses N nonlinear transformation layers for encoding, and finally obtains the following encoding information:
Figure FDA0001964331920000021
where N is the number of layers of the nonlinear transformation layer included in the encoder, T is the length of the input source sentence, and each element of H is a word vector of length C.
5. The neural-machine translation method based on dynamic configuration decoding as claimed in claim 1, wherein said step 4 is specifically:
step 4.1: carrying out convolution operation on input coding information H;
step 4.2: performing pooling operation on the output of the convolution;
step 4.3: repeating the convolution and pooling operations for multiple times to output a three-dimensional tensor
Figure FDA0001964331920000022
Wherein T is 1 <T,N 1 <N, using max-over-time firing method on T of three-dimensional tensor U 1 Dimension reduction processing is carried out on the dimension to obtain a two-dimensional matrix
Figure FDA0001964331920000023
Step 4.4: reconstruction of U 1 Is a one-dimensional vector
Figure FDA0001964331920000024
Wherein L = N 1 ×C 1 Then put U 2 The method is input into the full connection layer processing, and the following calculation is carried out:
Z=W 2 ·f(W 1 ·U 2 +b 1 )+b 2
wherein W 1 Is a matrix of real numbers of shape (D, L), b 1 Is a real vector of length D, W 2 Is a matrix of real numbers of shape (O, D), b 2 Is a real number vector with length of O, Z is a real number vector with length of O, and O is the number of all optional configurations, and f is a nonlinear activation function;
step 4.5: and substituting Z into the softmax function to obtain a real number vector P with the length of O, wherein each element of P represents the probability of the corresponding configuration to be selected, and selecting the configuration with the highest probability as decoding configuration output.
6. The neural-machine translation method based on dynamic configuration decoding as claimed in claim 1, wherein said step 5 is specifically:
step 5.1: decoding by adopting a beam search method;
step 5.2: and scoring the translation result by adopting a BLEU evaluation index.
7. The neural-machine translation method based on dynamic configuration decoding of claim 1, wherein said step 6 employs a strategy gradient method or a Q learning method to improve the decision model.
CN201910095193.6A 2019-01-31 2019-01-31 Neural machine translation method based on dynamic configuration decoding Active CN109933808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910095193.6A CN109933808B (en) 2019-01-31 2019-01-31 Neural machine translation method based on dynamic configuration decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910095193.6A CN109933808B (en) 2019-01-31 2019-01-31 Neural machine translation method based on dynamic configuration decoding

Publications (2)

Publication Number Publication Date
CN109933808A CN109933808A (en) 2019-06-25
CN109933808B true CN109933808B (en) 2022-11-22

Family

ID=66985322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910095193.6A Active CN109933808B (en) 2019-01-31 2019-01-31 Neural machine translation method based on dynamic configuration decoding

Country Status (1)

Country Link
CN (1) CN109933808B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113762408A (en) * 2019-07-09 2021-12-07 北京金山数字娱乐科技有限公司 Translation model and data processing method
CN110489766B (en) * 2019-07-25 2020-07-10 昆明理工大学 Chinese-lower resource neural machine translation method based on coding induction-decoding deduction
CN110457710B (en) * 2019-08-19 2022-08-02 电子科技大学 Method and method for establishing machine reading understanding network model based on dynamic routing mechanism, storage medium and terminal
CN110765785B (en) * 2019-09-19 2024-03-22 平安科技(深圳)有限公司 Chinese-English translation method based on neural network and related equipment thereof
CN110765966B (en) * 2019-10-30 2022-03-25 哈尔滨工业大学 One-stage automatic recognition and translation method for handwritten characters
CN111191785B (en) * 2019-12-20 2023-06-23 沈阳雅译网络技术有限公司 Structure searching method based on expansion search space for named entity recognition
CN111178091B (en) * 2019-12-20 2023-05-09 沈阳雅译网络技术有限公司 Multi-dimensional Chinese-English bilingual data cleaning method
CN111382582B (en) * 2020-01-21 2023-04-07 沈阳雅译网络技术有限公司 Neural machine translation decoding acceleration method based on non-autoregressive
CN111581988B (en) * 2020-05-09 2022-04-29 浙江大学 Training method and training system of non-autoregressive machine translation model based on task level course learning
CN117392168B (en) * 2023-08-21 2024-06-04 浙江大学 Method for performing nerve decoding by utilizing single photon calcium imaging video data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038159A (en) * 2017-03-09 2017-08-11 清华大学 A kind of neural network machine interpretation method based on unsupervised domain-adaptive
CN108845994A (en) * 2018-06-07 2018-11-20 南京大学 Utilize the neural machine translation system of external information and the training method of translation system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多编码器多解码器的大规模维汉神经网络机器翻译模型;张金超 等;《中文信息学报》;20180930;全文 *

Also Published As

Publication number Publication date
CN109933808A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933808B (en) Neural machine translation method based on dynamic configuration decoding
CN111324744B (en) Data enhancement method based on target emotion analysis data set
CN109492202B (en) Chinese error correction method based on pinyin coding and decoding model
CN109887484B (en) Dual learning-based voice recognition and voice synthesis method and device
CN109359294B (en) Ancient Chinese translation method based on neural machine translation
CN110069790B (en) Machine translation system and method for contrasting original text through translated text retranslation
CN108829684A (en) A kind of illiteracy Chinese nerve machine translation method based on transfer learning strategy
CN110929030A (en) Text abstract and emotion classification combined training method
CN110750959A (en) Text information processing method, model training method and related device
CN111767731A (en) Training method and device of grammar error correction model and grammar error correction method and device
CN113158665A (en) Method for generating text abstract and generating bidirectional corpus-based improved dialog text
CN111078866B (en) Chinese text abstract generation method based on sequence-to-sequence model
CN111753557A (en) Chinese-more unsupervised neural machine translation method fusing EMD minimized bilingual dictionary
CN111783423A (en) Training method and device of problem solving model and problem solving method and device
CN115293138B (en) Text error correction method and computer equipment
CN110457459B (en) Dialog generation method, device, equipment and storage medium based on artificial intelligence
CN115293139B (en) Training method of speech transcription text error correction model and computer equipment
CN110298046B (en) Translation model training method, text translation method and related device
CN116663578A (en) Neural machine translation method based on strategy gradient method improvement
CN116168401A (en) Training method of text image translation model based on multi-mode codebook
CN116861929A (en) Machine translation system based on deep learning
CN111428518B (en) Low-frequency word translation method and device
CN112528168B (en) Social network text emotion analysis method based on deformable self-attention mechanism
CN114281954A (en) Multi-round dialog reply generation system and method based on relational graph attention network
US11586833B2 (en) System and method for bi-directional translation using sum-product networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wang Qiang

Inventor after: Li Yanyang

Inventor before: Wang Qiang

Inventor before: Li Yanyang

Inventor before: Xiao Tong

Inventor before: Zhu Jingbo

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant