CN110705253A

CN110705253A - Burma language dependency syntax analysis method and device based on transfer learning

Info

Publication number: CN110705253A
Application number: CN201910808117.5A
Authority: CN
Inventors: 毛存礼; 满志博; 余正涛; 王红斌; 王振晗; 马文举
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2020-01-17

Abstract

The invention relates to a Burma language dependency syntax analysis method and device based on transfer learning, and belongs to the technical field of natural language processing. The method comprises the following steps of preprocessing Burmese data: carrying out English and Burma bilingual word vector representation, and representing the bilingual word vectors in the same semantic space; migration of English dependency syntactic analysis corpus: migrating the dependency arc, position and part-of-speech information of English to Burma, and carrying out Burma dependency syntactic analysis model training to obtain Burma dependency syntactic analysis model; and vectorizing and expressing the input Burma sentence through a pre-trained Burma dependency syntax analysis model, and then carrying out Burma dependency syntax analysis prediction. The Burma language dependency syntax analysis device based on the transfer learning is manufactured in a functional modularization mode according to the steps, dependency syntax analysis is achieved on Burma language sentences, the problem that the performance is poor due to the fact that Burma language dependency syntax analysis data are lack is solved, and the Burma language dependency syntax analysis device has important theoretical and practical application values.

Description

Burma language dependency syntax analysis method and device based on transfer learning

Technical Field

The invention relates to a Burma language dependency syntax analysis method and device based on transfer learning, and belongs to the technical field of natural language processing.

Background

The problem of insufficient linguistic data of low-resource languages is solved by using the idea of transfer learning, which is a research hotspot of current natural language processing. The main reason is that a large amount of accurate dependency syntax analysis linguistic data exist in English, but for Burma with scarce resources, data scarcity is marked, a small-scale Burma mark data set can be obtained only through collection and manual marking of the linguistic data, and the training data is too small and inevitably affects the Burma dependency syntax analysis effect. On the premise of no Burma dependency syntax parsing corpus, the accurate dependency syntax parsing corpus in English is utilized to migrate the corpus to Burma, so that a good effect can be obtained.

Disclosure of Invention

The invention provides a method and a device for analyzing Burma dependency syntax based on migration learning, which are used for solving the problems of scarcity of Burma dependency syntax analysis marking data, small-scale training data and poor effect of Burma emotion classification and solving the problem of poor effect of a model trained by the marking data.

The technical scheme of the invention is as follows: the Burma language dependency syntax analysis method based on the transfer learning comprises the following specific steps:

step1, preprocessing Burma data: carrying out English and Burma bilingual word vector representation, and representing the bilingual word vectors in the same semantic space;

step2, English dependency parsing corpus migration: migrating the dependency arc, position and part-of-speech information of English to Burma, and carrying out Burma dependency syntactic analysis model training to obtain Burma dependency syntactic analysis model;

step3, vectorizing the input Burma sentence by the pre-trained Burma dependency syntax analysis model, and then performing Burma dependency syntax analysis prediction.

As a preferred embodiment of the present invention, the Step1 specifically comprises the following steps:

step1.1, acquiring 20106 sentences of the divided words in English and Burmese through an Asian language tree library website (http:// www2. nick. go. jp/astrec-att/member/mutiyama/ALT /);

step1.2, fusing syllable characteristic information and syllable position characteristic information of Burma language by using a convolutional neural network CNN to train a monolingual word vector of Burma language;

step1.3, training bilingual word vectors by using the Burmese bilingual corpus, and then combining the bilingual word vectors with the monolingual word vectors according to a certain proportion to map the Burmese bilingual word vectors in the same semantic space.

As a preferable scheme of the invention, the step Step1.2 comprises the following specific steps:

step1.2.1, initializing the vector of Burma vocabulary at the input layer of the convolutional neural network: randomly initializing syllable vectors for the syllables to represent the Burma words by the syllables; for the input Burma vocabulary, dividing Burma words into Burma syllables, wherein the Burma words are composed of Burma phonetic syllables, because each Burma phonetic syllable has initialized random vector, d represents the dimension of the vector of Burma phonetic syllables, C is the syllable in Burma words, the initial vector of one Burma word becomes the combination of several syllable vectors Q belonged to R^d×|C|(ii) a Suppose that the Burma word k belongs to v and is composed of a series of syllables [ c₁,c₂,c₃,c₄,...c₁]Where l is the length of the Burma word k; then, the syllable level of k is represented by matrix C^k∈R^d×lGiven, wherein the j_SyllableThe columns correspond to

The syllable vector of, i.e. the first of Q

Columns;

step1.2.2, extracting Burma phonetic segment characteristics from the convolutional layer of the convolutional neural network: applications C^kAnd a filter H of width w in H ∈ R^d×wAfter convolution operation, a bias is added and non-linearity is applied to obtain the feature mapping f^k∈R^l-w+1(ii) a Specifically, f^kI th of (1)_SyllableThe individual elements are given by:

f^k[i_syllable]＝tanh(＜C^k[*,i_Syllable:i_{Musical scale}+w-1],H＞+b)

Wherein C is^k[*,i_Syllable:i_Syllable+w-1]Is C^kI of (a)_Syllable-(i_Syllable+ w-1) column; and finally, extracting the information with the highest f value in the features by adopting maxporoling:

step1.2.3, further extracting features in the convolutional neural network by using a gate structure network: as the mutual relation among syllables and syllable position characteristics in a Burma word are further extracted, the concrete formula is as follows:

z＝g(Wy+b)+y

in the formula, g is a nonlinear activation function tanh, y is the output of the last network, W represents weight, and z represents the correlation between the extracted syllables, namely syllable features; b represents parameters randomly generated in the training process;

wherein, because of the language characteristics of Burma, the location characteristics are corresponded with the syllable characteristics, and after extracting the syllable characteristics of Burma, the location characteristics can be corresponded with.

As a preferable scheme of the invention, the step Step1.3 comprises the following specific steps:

step1.3.1, in Burma monolingualThe word vector represents: the prediction model takes the current word w as input to predict the context; word embedding of the current word w is denoted v_wEmbedding of context c is denoted as v'_cThe distribution probability of a word w and context c is expressed as a softmax function of the form:

v represents a vocabulary, a parameter theta is contained in a word embedding matrix and a context embedding matrix, and the prediction model obtains a maximized log value through training to train a data set D; d represents a set of word w and context c pairs;

j (theta) represents the word vector of the Burmese monolingua, and c' represents the word in the vocabulary V;

step1.3.2, after each monolingual word vector of the Burma bilingual is expressed, training bilingual word vectors by using Burma bilingual corpus, setting the Burma bilingual language set to be L, testing a joint target, mapping the Burma bilingual word vectors in the same semantic space, and adjusting the proportion of the bilingual word vectors through alpha and beta, wherein the specific formula is shown as follows;

wherein, the bilingual and monolingual word vectors obtained by J are expressed,

a vector of a bilingual word is represented,expressing the obtained English single-language word vector, alpha and beta expressing the proportionality coefficient of the single-language word vector and the English Burma bilingual word vector,a data set representing a bilingual word vector,

a data set representing a monolingual word vector in english.

As a preferred embodiment of the present invention, the Step2 specifically comprises the following steps:

step2.1, constructing a part of Burmese dependency syntactic analysis corpus based on a word mapping method;

the method comprises the steps of utilizing the existing English dependency syntax analysis corpus, generating an Burma dictionary by the obtained Burma parallel sentence pair, and constructing the Burma dependency syntax analysis corpus in a word mapping mode;

burma dependency syntax analysis after mapping of the English words comprises the positions, the part of speech information and dependency arc information of the Burma words;

step2.2, migration of English dependent arcs: parallel aligned language data of Burmese bilingual is passed through a W_arcThe weight matrix associates the dependency relationship between English and Burma, E_ENarcAnd E_MYarcVectors representing dependency relationships in english and burma respectively,representing the splicing of dependency arc vectors of English and Burma, wherein i and j respectively represent the ith and j Burma words;

E_MYarc＝W_arc·E_ENarc

step2.3, migration of English positional information: for BurmaWord and corresponding English part of speech establish relation matrix W_pos，E_ENposAnd E_MYposRespectively representing part-of-speech vectors in English and Burma, using a relationship matrix W_posThe part-of-speech information of English is migrated to the part-of-speech information of Burma so that the part-of-speech of Burma contains more information,

representing the concatenation of the part of speech vectors of English and Burma, wherein i and j respectively represent the ith and j Burma words;

E_MYpos＝W_pos·E_ENpos

step2.4, migration of English part-of-speech information: adding the position information into the word vector, and establishing a relation matrix W according to the mapping relation between dictionaries_loc，E_ENlocAnd E_MYlocRespectively representing the position vectors of English and Burma, using a relationship matrix W_locThe position information of the words of English and Burma is migrated to a representation space, so that the difference between Burma and English is reduced, English dependency syntax analysis knowledge can be learned in the process of training Burma dependency syntax analysis model,

representing the splicing of position vectors of English and Burma, wherein i and j respectively represent the ith and j Burma words;

E_MYloc＝W_loc·E_ENloc

step2.5, utilizing the migrated Burmese dependency syntax analysis corpus to train the Burmese dependency syntax analysis model through a Standford parser tool.

A Burma language dependency syntax analysis device based on transfer learning comprises the following modules:

the Burmese bilingual word vector characterization module is used for preprocessing Burmese data: carrying out English and Burma bilingual word vector representation, and representing the bilingual word vectors in the same semantic space;

the English dependency syntactic analysis migration module is used for migrating English dependency syntactic analysis corpora: migrating the dependency arc, position and part-of-speech information of English to Burma, and carrying out Burma dependency syntactic analysis model training to obtain Burma dependency syntactic analysis model;

and the Burma dependency syntax analysis and prediction module is used for vectorizing and expressing the input Burma sentence through the pre-trained Burma dependency syntax analysis model and then carrying out Burma dependency syntax analysis and prediction.

The invention has the beneficial effects that:

1. according to the method, the characteristics of Burma speech syllable characteristics and syllable position characteristic information are fused to strengthen the characterization capability of Burma language monolingual word vectors, and the effect of Burma language dependency syntactic analysis is improved;

2. the method migrates the part of speech, the position and the dependency arc in English to Burma, and solves the problem of poor Burma dependency syntactic analysis effect caused by insufficient dependency syntactic analysis language.

Drawings

FIG. 1 is a diagram of a Burma and English word mapping-based model in the present invention;

FIG. 2 is a representation of the migration process in the present invention;

FIG. 3 is a diagram of migration information based on the dependency syntax analysis of the migrated learning Burma language according to the present invention;

FIG. 4 is an overall flow chart of the present invention;

FIG. 5 is a diagram illustrating a Burma language dependency parsing apparatus based on transfer learning according to the present invention.

Detailed Description

Example 1: as shown in fig. 1-5, the method for analyzing the burma language dependency syntax based on the migration learning comprises the following specific steps:

step1.1, acquiring 20106 sentences of the divided words in English and Burmese through an Asian language tree library website (http:// www2. nick. go. jp/astrec-att/member/mutiyama/ALT /); the format of the resulting Burma parallel sentence is shown in Table 1:

table 1 shows the format for obtaining the Burma parallel sentence

step1.2.1, initializing the vector of Burma vocabulary at the input layer of the convolutional neural network: randomly initializing syllable vectors for the syllables to represent the Burma words by the syllables; for the input Burma vocabulary, the Burma words are divided into Burma syllables, and the Burma words are composed of Burma phonetic segments, because each Burma phonetic segment is already providedInitializing random vector, d represents dimension of vector of Burma syllable, C is syllable in Burma word, and the initial vector of Burma word is changed into combination of several syllable vectors Q belonging to R^d×|C|(ii) a Suppose that the Burma word k belongs to v and is composed of a series of syllables [ c₁,c₂,c₃,c₄,...c₁]Where l is the length of the Burma word k; then, the syllable level of k is represented by matrix C^k∈R^d×lGiven, wherein the j_SyllableThe columns correspond to

The syllable vector of, i.e. the first of Q

Columns;

f^k[i_syllable]＝tanh(＜C^k[*,i_Syllable:i_{Musical scale}+w-1],H＞+b)

corresponding to filter H (when applied to the maine word k). The role of maxporoling is to capture the most important function for a given filter, namely the one with the highest value. The filter is essentially a n-gram that selects syllables in the Burmese vocabulary, where the size of the n-gram corresponds to the filter width, which is extracted by a convolutional neural networkAnd taking the characteristics of the Burmese words. A plurality of filters of different widths are used to obtain the eigenvectors of k. So if we have a total of H filters H₁,....,H_n。

Inputting a representation of the k word;

z＝g(Wy+b)+y

word vector representation in step1.3.1, incine monolingual: the prediction model takes the current word w as input to predict the context; word embedding of the current word w is denoted v_wEmbedding of context c is denoted as v'_cThe distribution probability of a word w and context c is expressed as a softmax function of the form:

wherein, the bilingual and monolingual word vectors obtained by J are expressed,a vector of a bilingual word is represented,

expressing the obtained English single-language word vector, alpha and beta expressing the proportionality coefficient of the single-language word vector and the English Burma bilingual word vector,

a data set representing a bilingual word vector,

a data set representing a monolingual word vector in english.

burma dependency syntax analysis after mapping of the English words comprises the positions, the part of speech information and dependency arc information of the Burma words; as shown in FIG. 1, to

For example, the format of the corpus is shown in table 2;

TABLE 2 corpora required for dependency parsing

Step2.2, as shown in FIGS. 2 and 3, migration of English dependent arcs: parallel aligned language data of Burmese bilingual is passed through a W_arcThe weight matrix associates the dependency relationship between English and Burma, E_ENarcAnd E_MYarcVectors representing dependency relationships in english and burma respectively,representing the splicing of dependency arc vectors of English and Burma, wherein i and j respectively represent the ith and j Burma words;

E_MYarc＝W_arc·E_ENarc

step2.3, migration of English positional information: establishing a relation matrix W for Burmese and corresponding parts of speech of English_pos，E_ENposAnd E_MYposRespectively representing part-of-speech vectors in English and Burma, using a relationship matrix W_posThe part-of-speech information of English is migrated to the part-of-speech information of Burma so that the part-of-speech of Burma contains more information,

E_MYpos＝W_pos·E_ENpos

step2.4, migration of English part-of-speech information: adding the position information into the word vector, and establishing a relation matrix W according to the mapping relation between dictionaries_loc，E_ENlocAnd E_MYlocRespectively representing the position vectors of English and Burma, using a relationship matrix W_locThe position information of the words of English and Burma is migrated to a representation space, so that the difference between Burma and English is reduced, English dependency syntax analysis knowledge can be learned in the process of training Burma dependency syntax analysis model,representing the splicing of position vectors of English and Burma, wherein i and j respectively represent the ith and j Burma words;

E_MYloc＝W_loc·E_ENloc

Specifically, in order to verify the performance of the method, experimental data is adopted from a Burma language data set of an Asia low-resource language tree library, the Burma language data set comprises 20011 Burma sentences with well-divided words, and after the Burma sentences are removed and repeated, the Burma sentences are divided to generate 75498 Burma words.

The evaluation indexes of the dependency syntactic analysis are a dependency arc accuracy (UAS) and a dependency tag accuracy (LAS), the dependency arc accuracy is a ratio of the total number of words in the sentence of the word number with the correct dependency arc, the dependency tag accuracy is a ratio of the number of words with the correct dependency arc and the correct dependency relationship to the total number of words in the sentence, and the formula is shown as follows.

The method based on deep learning can effectively improve the performance and the result of the model, different neural network models can generate different influences on the experimental result, and currently, the neural network models commonly used in dependency syntax analysis include LSTM and Bi-LSTM neural networks. The results of the specific experiments are shown in table 3 below.

Table 3 shows the effect of different neural network models on the experimental results

Experimental results show that the Burmese dependency syntax analysis data processing based on the LSTM and Bi-LSTM deep learning method cannot achieve a good effect, the deep learning method needs a large data volume, and the Burmese dependency syntax analysis data processing method is low in data quality and small in data volume, so that the Burmese dependency syntax analysis data processing based on the LSTM and Bi-LSTM deep learning method is poor in effect.

Table 4 compares the performance of the migratory learning model by comparing the biased bilingual word vectors generated by the ratio of alpha to beta required in the bilingual word vector training in Burmese. The results of the specific experiments are shown in table 4 below.

Table 4 shows the effect of different ratios of alpha to beta on the results of the experiment

Experimental results show that the effect of mixing bilingual word vectors in Burma and English in different proportions is different, and the best effect is obtained when Burma and English are mixed with the bilingual word vectors in a ratio of 1: 1.

Table 5 shows the comparison of the effects between burmese dependency syntactic analysis model training using the migration learning based shared network parameters and burmese dependency syntactic analysis model training using the location feature information of syllables and words, respectively.

TABLE 5 Burma dependency syntactic analysis model performance fusing multiple semantic information

The experimental result shows that fusing different semantic information can affect the experimental result, the effect obtained by fusing the syllable information of Burma is better than that obtained by the method based on the shared network parameters, because the minimum unit of Burma is syllable, the syllable characteristics of Burma can be more accurately obtained by combining the syllable characteristics of Burma into Burma dependency syntax analysis, the effect obtained by the Burma dependency syntax analysis method based on dependency arc, position, part of speech and part of speech in the training set is the best, and Burma can be better represented by migrating the information of dependency arc, position and part of speech of English.

According to the concept of the present invention, the present invention further provides a Burma dependency parsing apparatus based on transfer learning, as shown in FIG. 5, the apparatus includes the following integrated modules:

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. The Burma language dependency syntax analysis method based on the transfer learning is characterized by comprising the following steps of:

the Burma language dependency syntax analysis method based on the transfer learning comprises the following specific steps:

2. The Burma language dependency syntax analysis method based on migratory learning of claim 1, characterized in that: the specific steps of Step1 are as follows:

step1.1, acquiring 20106 sentences of parallel sentence pairs of words well divided in English and Burma through an Asian language tree library website;

3. The Burma language dependency syntax analysis method based on migratory learning of claim 1, characterized in that: the specific steps of the step Step1.2 are as follows:

The syllable vector of, i.e. the first of QColumns;

f^k[i_syllable]＝tanh(＜C^k[*,i_Syllable:i_{Musical scale}+w-1],H＞+b)

z＝g(Wy+b)+y

4. The Burma language dependency syntax analysis method based on migratory learning of claim 1, characterized in that: the specific steps of the step Step1.3 are as follows:

a vector of a bilingual word is represented,

a data set representing a bilingual word vector,

a data set representing a monolingual word vector in english.

5. The Burma language dependency syntax analysis method based on migratory learning of claim 1, characterized in that: the specific steps of Step2 are as follows:

E_MYarc＝W_arc·E_ENarc

step2.3, migration of English positional information: establishing a relation matrix W for Burmese and corresponding parts of speech of English_pos，E_ENposAnd E_MYposRespectively representing part-of-speech vectors in English and Burma, andusing a relationship matrix W_posThe part-of-speech information of English is migrated to the part-of-speech information of Burma so that the part-of-speech of Burma contains more information,

E_MYpos＝W_pos·E_ENpos

E_MYloc＝W_loc·E_ENloc

6. A Burma language dependency syntax analysis device based on transfer learning is characterized in that: the system comprises the following modules: