CN108932232A

CN108932232A - A kind of illiteracy Chinese inter-translation method based on LSTM neural network

Info

Publication number: CN108932232A
Application number: CN201810428619.0A
Authority: CN
Inventors: 苏依拉; 孙晓骞; 高芬; 张振; 王宇飞; 赵亚平; 牛向华
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2018-12-04

Abstract

In order to change the relatively backward status of Inner Mongolia development of Mechanical Translation, the present invention has studied a kind of illiteracy Chinese inter-translation method based on LSTM neural network, compared with the existing machine translation method based on statistics, the present invention has fully considered the connection between entire sentence and context using the neural machine translation method based on LSTM first, improves the efficiency of machine translation；Secondly, improving the quality and efficiency of translation using the encoder and decoder for being substantially two-way LSTM neural network, finally, training these data as optimization algorithm using most small quantities of stochastic gradient descent algorithm, the quality of translation is further improved.

Description

A kind of illiteracy Chinese inter-translation method based on LSTM neural network

Technical field

The invention belongs to machine translation mothod field, in particular to a kind of illiteracy Chinese intertranslation side based on LSTM neural network Method.

Background technique

The Mongols is national one of the important composition of the Chinese nation 56, be the nomadic Typical Representative in grassland and The important succession person of Grassland Culture, Mongol are then the dominant languages that Mongolians compatriot uses, and are official's languages of Mongolia Speech, use scope is extensive, has critically important status in the world.Along with the fast development of China's economic, the Mongols with Economy, cultural exchanges between Han nationality is more and more extensive, and the Mongols will be exchanged with Han nationality compatriot and just need to translate, and people Work translates higher cost, this brings many inconvenience to the economic development of the Mongols.

Fortunately, with the arrival of big data era, artificial intelligence rapid development, machine translation is more and more mature, leads to The intertranslation crossed between computer Mongol and Chinese becomes possibility.For example, a Chinese articles can be provided, machine can be certainly It is dynamic quickly to generate a Mongolian article.In recent years, it is mutual for the values of two races culture to cover Chinese research on the machine translation Infiltration promotes foreign trade and cultural exchanges between the foundation and promotion and Mongolia of good national relations to play very big Effect.

In late 1940s this more than 60 years time so far, the step of development of Mechanical Translation is follow, is studied Personnel also never stopped the research that machine translation is applied in covering Chinese translation.

The sentence element of Chinese and Mongolian sentence element different from, and have on the word order of sentence it is very big not Together, this brings very big difficulty to Chinese machine translation is covered.

The translation system of earlier version is phrase-based machine translation, i.e. PBMT (Phrase-based Machine Translation).The sentence of input can be divided into one group of word or phrase by PBMT, and it is individually translated.This is apparently not Optimal Translation Strategy has ignored the connection between entire sentence and context completely.

Summary of the invention

In order to overcome the disadvantages of the above prior art, the purpose of the present invention is to provide a kind of based on LSTM neural network Chinese inter-translation method is covered, is turned over from the pervious illiteracy Chinese machine based on statistics contacted between entire sentence and context of ignoring completely It translates, to the neural machine translation based on LSTM contacted between entire sentence and context is fully considered, improves machine translation Quality and efficiency, improve traditional NMT (Neural Machine Translation), change Inner Mongolia machine The relatively backward status of translation development.

To achieve the goals above, the technical solution adopted by the present invention is that：

A kind of illiteracy Chinese inter-translation method based on LSTM neural network, using coding-decoding structure, encoder reads original language Sentence, is encoded to the fixed vector of dimension, and decoder reads the vector, sequentially generates object language.Encoder and decoder are equal Using LSTM neural network, in the encoder, the source language sentence of input is carried out by a two-way LSTM neural network Coding forms context semantic vector group wherein each sentence is expressed as a context semantic vector, and the context is semantic The coding that vector is intended to as user；In a decoder, the continuous circular flow of LSTM neural network generates in object language Each word, while generating each word, consider input source language sentence corresponding to context semantic vector, make The content that must be generated is consistent with the meaning of original language.

The encoder uses two-way LSTM model, the output at unidirectional LSTM neural network moment and current time And input information before is related, and the output at two-way LSTM neural network a certain moment not only with current time and it Preceding input information is related, also information-related with later, adequately considers contacting between entire sentence and context.

The LSTM neural network of forward directionInput x=(x is read according to the sequence of source language sentence input₁..., x_I), x₁ Indicate the 1st word of input, x_IIt indicates the i-th word of input, and calculates preceding to hidden layer state Indicate preceding the 1st semantic vector element to hidden layer state,To the i-th semantic vector element of hidden layer state before indicating；Afterwards To LSTM neural network read sequence (x by inputting opposite sequence with original text_I..., x₁), x_IIndicate the i-th of input Word, x₁Indicate the 1st word of input, and to hidden layer state after calculating To hidden layer state after expression 1st semantic vector element,To the i-th semantic vector element of hidden layer state after expression, by forward and backward hidden layer state It links together to obtain the explanation vector of each wordJ from 1 to I,It is semantic to hidden layer state before indicating The transposition of vector,To the transposition of hidden layer state semantic vector after expression.

Decoder：

In given source language sentence x and target language { y₁..., y_t-1Under conditions of, decoder sequentially generates object language Word y_t, defining conditional probability in decoding layer is：

p(y_t|y₁,...,y_t-1, X) and=g (y_t-1,s_t,c_t)

Wherein, g is activation primitive sigmoid, s_tIt is hidden layer state of the LSTM neural network in t moment of decoder, c_tTable Show the external input information in generating process.

Translate the guarantee of accuracy rate

In generating process above, every a Chinese sentence generates a context semantic vector, for translating a primitive Sentence, therefore, object statement after translation without departing from text original meaning, meanwhile, the generation result of latter sentence depends on previous sentence Word, therefore ensure that the continuity of entire sentence, ensure that the accuracy rate of translation.

Training

During training, the output vector of decoder is input to a softmax and is returned in layer, selection Softmax returns the cross entropy of actual probability distribution and desired output that layer obtains as loss function, and loss function is：

Y is desired output, and a is reality output, and n is batch size.

In order to accelerate to train, prevent from falling into local optimum, present invention employs most small quantities of stochastic gradient descent algorithm conducts Optimization algorithm：

Wherein,:=it is synchronized update,For learning rate；W and b ceaselessly update, when the value of w and b tends towards stability Terminate, completes gradient descent procedures.

Compared with the existing machine translation method based on statistics, the present invention fully considered entire sentence and context it Between connection, improve the efficiency of machine translation；Secondly, ensure that higher using the encoder and decoder for being substantially LSTM Translation quality；Finally, improving the accuracy of translation by way of with residual error study and reducing gradient disappearance.

Detailed description of the invention

Fig. 1 is the resolution principle figure of hidden layer state.

Fig. 2 is Chinese of the invention to Mongolian translation schematic illustration.

Fig. 3 is the translation schematic illustration of Mongol of the invention to Chinese.

Specific embodiment

The embodiment that the present invention will be described in detail with reference to the accompanying drawings and examples.

Illiteracy Chinese inter-translation method based on LSTM neural network, using coding-decoding structure, encoder and decoder are all logical Cross LSTM realization.

Wherein encoder uses two-way LSTM model, the LSTM neural network of forward directionAccording to the suitable of source language sentence input Sequence reads input x=(x₁..., x_I), x₁Indicate the 1st word of input, x_IIndicate the i-th word of input, and before calculating to Hidden layer state Indicate preceding the 1st semantic vector element to hidden layer state,To hidden layer shape before indicating The i-th semantic vector element of state；Backward LSTM neural network inputs opposite sequence reading sequence by with original text (x_I..., x₁), x_IIndicate the i-th word of input, x₁Indicate the 1st word of input, and to hidden layer state after calculating To the 1st semantic vector element of hidden layer state after expression,To the i-th semantic vector of hidden layer state after expression Element links together forward and backward hidden layer state to obtain the explanation vector of each wordJ from 1 to I,Indicate the preceding transposition to hidden layer state semantic vector,To the transposition of hidden layer state semantic vector after expression.

Fig. 1 is the Computing Principle of hidden layer state, which is referred to as memory block (block of memory), mainly contains three Door (forget gate, input gate, output gate) and a memory unit (cell).The Na Tiaoshui of top in box Horizontal line, referred to as location mode (cell state), it can control information and pass to subsequent time just as a conveyer belt.

The solution procedure of hidden layer state：

The first step：Determine that information can be by cell state first.

This decision is controlled by " forget gate " layer by sigmoid, it can be according to the output h of last moment_j-1 With current input x_jTo generate one 0 to 1 f_jValue, to decide whether the information C for allowing last moment to acquire_jBy or part it is logical It crosses.0 indicates not allow to pass through completely, and the value between 0-1 indicates that part passes through, and 1 indicates to allow to pass through completely.

It is as follows：

f_j=Sigmod (W_f*x_j+W_fh_j-1+b_f)

Wherein：W_fAnd b_fIt indicates to forget the parameter looked after and guided in the neuron that door includes.

Second step：Generate the new information for needing to update.

This step includes two parts, and first part is that " input gate " layer determines which value is used to by sigmoid It updates, i_jValue be 1 when indicate do not need to update, need to update when between 0 or 0-1.Second part is tanh layers and uses next life The candidate value of Cheng XinIt may be added in cell state as the candidate value that current layer generates,Value be 0-1 Or it is added to when 1 in cell state.The value that this two parts generates in conjunction with being updated, such as following formula：

Wherein：Wherein：W_iAnd b_iThe parameter looked after and guided in the neuron that expression input gate includes.W_CAnd b_CIndicate that memory is single The parameter looked after and guided in the neuron that member includes.

It is exactly to lose unwanted information that the first step and second step, which combine, adds the process of new information：

Third step：The output of decision model.

It is to obtain an initial output by sigmoid layers first, then uses tanh by C_jValue zooms to -1 to 1 Between, then the output that obtains with sigmoid is by being multiplied, thus to hidden layer state before obtaining

Wherein：W_oAnd b_oThe parameter looked after and guided in the neuron that expression out gate includes.

It is calculated with same method

p(y_t|y₁,...,y_t-1, X) and=g (y_t-1,s_t,c_t)

Translate the guarantee of accuracy rate

Training

During training, the vector that decoder is obtained inputs a softmax and returns in layer, obtains possible outcome Probability distribution.The actual probability distribution for selecting softmax to obtain in desired output cross entropy as loss function.Lose letter Number is：

Y is desired output, and a is reality output, and n is batch size.

In order to accelerate to train, prevents from falling into local optimum, the parameter w and b inside neuron are adjusted, which claims For gradient decline, present invention employs most small quantities of stochastic gradient descent algorithms as optimization algorithm：

It is two specific embodiments that the Chinese covers intertranslation below.

Embodiment 1, the Chinese translate illiteracy：

Referring to Fig. 2, firstly, the lower half portion of Fig. 2, (x₁..., x_t) indicate a Chinese sentence t word, user is defeated The t word entered passes through a two-way LSTM network in order and is encoded, and forms a context semantic vector groupThen, it is encoded again by the two-way LSTM network in reverse order, forms a context semantic vector GroupThen, the coding that these context semantic vectors are intended to as user.(Fig. 1's is upper during generation Portion), in decoder, the continuous circular flow of LSTM neural network firstly generates the hidden layer state s at i moment_i, then generate Each of object language word y_i, while generating each word, need to consider to input context language corresponding to this Adopted vector, so that the content that this generates is consistent with the meaning of original language.

Specific translation steps are as follows：

1. the source language sentence x=(x that encoder reads input₁..., x_I)；

2. the x read is encoded to hidden layer state using Recognition with Recurrent Neural Network by encoder, formed a context semanteme to Amount group

3. the x read is inversely encoded to hidden layer state using Recognition with Recurrent Neural Network by encoder, method same as step 2 To hidden layer state after obtaining, to form a context semantic vector group

4. encoder links together forward and backward state to obtain the explanation vector of each word

5. the continuous circular flow of LSTM neural network of decoder generates the hidden layer state s of t moment_t；

6. decoder is in given source language sentence x and target language { y₁..., y_t-1Under conditions of, sequentially generate target language The word y of speech_t.The coding vector for the original language to input that encoder calculates is inputed to the RNN unit of decoder, so Afterwards, decoder can calculate probability vector according to Recognition with Recurrent Neural Network unit.I.e. for each of target language sentence word Probability is calculated.Finally, generating object language according to the probability sampling being calculated.

Embodiment 2, illiteracy translate the Chinese：

It is translated with the Chinese and covers same method, referring to Fig. 3, the lower half portion of Fig. 3, (x₁..., x_t) indicate that the t of a Mongolian is a Word, the t word that system inputs user pass through a two-way LSTM network in order and encode, and form a context language Adopted Vector GroupsThen, it is encoded again by a two-way LSTM network in reverse order, forms about one Literary semantic vector groupThen, the coding that these context semantic vectors are intended to as user.In the mistake of generation In journey (top of Fig. 2), the continuous circular flow of LSTM neural network firstly generates the hidden layer state s at i moment_i, then give birth to At each of Mongolian word y_i, while generating each word, need to consider to input the semanteme of context corresponding to this Vector, so that the content that this generates is consistent with the meaning of original language.

Claims

1. a kind of illiteracy Chinese inter-translation method based on LSTM neural network, using coding-decoding structure, encoder reads source language sentence Son, is encoded to the fixed vector of dimension, and decoder reads the vector, sequentially generates object language, which is characterized in that encoder and Decoder is all made of LSTM neural network, in the encoder, by the source language sentence of input by a two-way LSTM nerve Network is encoded, wherein each sentence is expressed as a context semantic vector, forms context semantic vector group, it is described on The hereafter coding that semantic vector is intended to as user；In a decoder, the continuous circular flow of LSTM neural network generates mesh Each of poster speech word considers context language corresponding to the source language sentence of input while generating each word Adopted vector, so that the content generated is consistent with the meaning of original language.

2. the illiteracy Chinese inter-translation method based on LSTM neural network according to claim 1, which is characterized in that the encoder makes With two-way LSTM model, the LSTM neural network of forward directionInput x=(x is read according to the sequence of source language sentence input₁..., x_I), x₁Indicate the 1st word of input, x_IIt indicates the i-th word of input, and calculates preceding to hidden layer state Indicate preceding the 1st semantic vector element to hidden layer state,To the i-th semantic vector element of hidden layer state before indicating； Backward LSTM neural network inputs opposite sequence reading sequence (x by with original text_I..., x₁), x_IIndicate the I of input A word, x₁Indicate the 1st word of input, and to hidden layer state after calculating To the 1st of hidden layer state the after expression A semantic vector element,To the i-th semantic vector element of hidden layer state after expression, forward and backward hidden layer state is linked The explanation vector of each word is obtained togetherJ from 1 to I,To hidden layer state semantic vector before indicating Transposition,To the transposition of hidden layer state semantic vector after expression.

3. the illiteracy Chinese inter-translation method based on LSTM neural network according to claim 2, which is characterized in that the hidden layer state Calculating included the following steps using block of memory (memory block) framework：

The first step：Determine that information can be by location mode (cell state) first

This decision is controlled by gate layers of forget by sigmoid, it can be according to the output h of last moment_j-1With it is current Input x_jGenerate one 0 to 1 f_jValue, to decide whether the information C for allowing last moment to acquire_jBy or part pass through, it is as follows：

f_j=Sigmod (W_f ^*x_j+W_fh_j-1+b_f)

Wherein：W_fAnd b_fIt indicates to forget the parameter looked after and guided in the neuron that door includes；

Second step：Generate the new information for needing to update

The step includes two parts, and first part is gate layers of input and determines which value is used to update by sigmoid, and second Part is tanh layers and is used to generate new candidate valueIt may be added to cell as the candidate value that current layer generates In state；The value that this two parts generates is combined and is updated, such as following formula：

i_j=Sigmod (W_i*x_j+W_i*h_j-1+b_i)

Wherein：W_iAnd b_iThe parameter looked after and guided in the neuron that expression input gate includes；W_CAnd b_CIndicate the mind that memory unit includes Parameter through having been looked after and guided in member；

The first step and second step combine, that is, lose unwanted information, add the process of new information：

Third step, the output of decision model

It is to obtain an initial output by sigmoid layers first, then uses tanh by C_jValue zooms between -1 to 1, then with The output that sigmoid is obtained is by multiplication, thus to hidden layer state before obtaining

o_j=Sigmod (W_o*x_j+W_o*h_j-1+b_o)

Wherein：W_oAnd b_oThe parameter looked after and guided in the neuron that expression out gate includes；

It is calculated with same method

4. the illiteracy Chinese inter-translation method based on LSTM neural network according to claim 1, which is characterized in that the decoder In, in given source language sentence x and target language { y₁..., y_t-1Under conditions of, decoder sequentially generates the word y of object language_t, Defining conditional probability in decoding layer is：

p(y_t|y₁,...,y_t-1, X) and=g (y_t-1,s_t,c_t)

Wherein, g is activation primitive sigmoid, s_tIt is hidden layer state of the LSTM neural network in t moment of decoder, c_tIndicate life At external input information in the process, y₁..., y_t-1Refer to the 1st word of object language that has generated to the t-1 word.

5. the illiteracy Chinese inter-translation method based on LSTM neural network according to claim 1, which is characterized in that in the following way Carry out model data training：

The output vector of decoder is input to a softmax to return in layer, selects softmax to return the reality that layer obtains general As loss function, loss function is the cross entropy of rate distribution and desired output：

Y is desired output, and a is reality output, and n is batch size.

6. the illiteracy Chinese inter-translation method based on LSTM neural network according to claim 5, which is characterized in that using most in small batches with Machine gradient descent algorithm accelerates to train as optimization algorithm, prevents from falling into local optimum, be looked after and guided by following formula：

Wherein,:=it is synchronized update,For learning rate；W and b does not stop to update, and terminates when the value of w and b tends towards stability, complete At gradient descent procedures.