CN107239444A

CN107239444A - A kind of term vector training method and system for merging part of speech and positional information

Info

Publication number: CN107239444A
Application number: CN201710384135.6A
Authority: CN
Inventors: 文坤梅; 李瑞轩; 刘其磊; 李玉华; 辜希武; 昝杰; 杨琪
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2017-10-10
Anticipated expiration: 2037-05-26
Also published as: CN107239444B

Abstract

The invention discloses a kind of term vector training method and system for merging part of speech and positional information, this method includes：Data pre-process to obtain target text；Participle and part-of-speech tagging are carried out to target text；Part-of-speech information is modeled and positional information is modeled；Part of speech is merged on the basis of the skip gram models based on negative sampling policy and obtains target term vector with positional information progress term vector study, the target term vector is used for word analogy task and word similarity task is assessed.The present invention considers the part-of-speech information and positional information of word, and on the basis of being modeled to the part of speech and positional information of word, the positional information between the part-of-speech information and part of speech of word is made full use of to help the training of term vector, and the renewal for parameter during training is also more reasonable.

Description

A kind of term vector training method and system for merging part of speech and positional information

Technical field

The invention belongs to natural language processing technique field, more particularly, to a kind of part of speech and positional information of merging Term vector training method and system.

Background technology

In recent years, developing rapidly with development of Mobile Internet technology so that the scale of data rapidly increases in internet, So that the complexity of data drastically increases.This allows for turning into these magnanimity without structure, the Treatment Analysis of unlabeled data A great problem.

Traditional machine learning method carries out symbolism table using Feature Engineering (Feature engineering) to data Show the bag of words presentation technology such as One-hot commonly used in modeling and solution, but Feature Engineering in order to model vectors with data The growth of complexity, the dimension of feature can also sharply increase to cause dimension disaster problem.And based on One-hot vector tables Also there is semantic gap phenomenon in the method shown.With " if two word contexts are similar, then their semanteme is also similar " Distribution hypothesis (distributional hypothesis) is suggested, and the word distribution presentation technology based on distribution hypothesis is continuous Ground is suggested.The wherein topmost distribution having based on matrix is represented, the distribution based on cluster is represented and point based on term vector Cloth is represented.Although but either based on matrix represent to be also based on distribution method for expressing that cluster represents can characteristic dimension compared with The simple contextual information of hour expression.But when characteristic dimension is higher, model for context expression especially to complexity The expression of context is just helpless.And the presentation technology based on term vector so that either for the expression of each word, also It is that the context that word is represented by the method for linear combination all avoids the problem of dimension disaster occur.And due to word The distance between can be weighed by the COS distance between their corresponding term vectors or Euclidean distance, this is also in very great Cheng The problem of semantic gap in traditional bag of words is eliminated on degree.

However, at present existing term vector research work mostly concentrate on by the structure of neutral net in simplified model come Reduce model complexity, the information such as emotion, theme have been merged in some work, and merge the research work of part-of-speech information seldom and The part of speech fineness ratio being directed in these seldom work is larger, and the utilization for part-of-speech information is very insufficient, for part-of-speech information Renewal it is also less reasonable.

The content of the invention

For the disadvantages described above or Improvement requirement of prior art, object of the present invention is to provide one kind fusion part of speech with The term vector training method and system of positional information, are directed in the research work for thus solving to merge part-of-speech information in the prior art Part of speech fineness ratio it is larger, the utilization for part-of-speech information is very insufficient, for the renewal also less rational skill of part-of-speech information Art problem.

To achieve the above object, according to one aspect of the present invention, there is provided a kind of word for merging part of speech and positional information Vectorial training method, comprises the following steps：

S1, urtext is carried out to pre-process and obtain target text；

S2, the contextual information according to word, the part of speech concentrated using part-of-speech tagging are carried out to the word in target text Part-of-speech tagging；

S3, according to the part-of-speech information of mark be modeled structure part of speech associated weights matrix M, and for part of speech to pair Answer the relative position i of word pair to be modeled, build position part of speech associated weights matrix M corresponding with position_i', wherein, matrix M ranks dimension concentrates the element in the species size of part of speech, matrix M to be the word of the row correspondence word of the element for part-of-speech tagging Co-occurrence probabilities of the property with the part of speech of the corresponding word of row of the element, matrix M_i' ranks dimension it is identical with matrix M, matrix M_i' in Element for the row correspondence word of the element part of speech and co-occurrence of the part of speech in relative position i of the corresponding word of row of the element Probability；

S4, by the matrix M and matrix M after modeling_i' be fused in skip-gram term vector models and build object module, by Object module carries out term vector study and obtains target term vector, wherein, target term vector is used for word analogy task and word Similarity task.

Preferably, step S2 specifically includes following sub-step：

S2.1, to target text carry out participle, to distinguish all words in target text；

S2.2, to each sentence in target text, according to contextual information of the word in sentence, using part-of-speech tagging The part of speech of concentration carries out part-of-speech tagging to word.

Preferably, step S3 specifically includes following sub-step：

S3.1, the word-part of speech constituted to each word in target text, generation for word and its corresponding part of speech It is right, according to word-part of speech to building part of speech associated weights matrix M, wherein, matrix M ranks dimension concentrates word for part-of-speech tagging Property species size, element in matrix M for the row correspondence word of the element part of speech and the word of the corresponding word of row of the element The co-occurrence probabilities of property；

S3.2, for part of speech the relative position i of corresponding word pair is modeled, builds position corresponding with position word Property associated weights matrix M '_i, wherein, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn element for the element Co-occurrence probabilities of the part of speech of row correspondence word with the part of speech of the corresponding word of row of the element in relative position i.

Preferably, step S4 specifically includes following sub-step：

S4.1, structure initial target function：Wherein, C tables Show the vocabulary in whole training corpus, above and below Context (w) represents that each c word is constituted before and after target word w Literary set of words, c represents window size；

S4.2, by the matrix M and matrix M after modeling_i' it is fused to the skip-gram words based on negative sampling Object module is built in vector model, and according to the fresh target function of initial target function structure object module：Wherein, NEG (w) is the negative sample collection sampled to target word w, L^w(u) marking for being sample u, positive sample marking is 1, negative sample Give a mark as 0, θ^uThe auxiliary vector used for sample word during model training,For context wordsCorresponding word VectorTransposition,For T_uWithCo-occurrence probabilities of two parts of speech when relative position relation is i；

S4.3, fresh target function is optimized, fresh target function value is maximized, and to parameter θ^u、AndGradient calculation and renewal are carried out, and target term vector is obtained when traveling through and completing to whole training corpus.

It is another aspect of this invention to provide that there is provided a kind of term vector training system for merging part of speech and positional information, bag Include：

Pretreatment module, for urtext pre-process obtaining target text；

Part-of-speech tagging module, for the contextual information according to word, the part of speech concentrated using part-of-speech tagging is to target text Word in this carries out part-of-speech tagging；

Position part of speech Fusion Module, for being modeled structure part of speech associated weights matrix M according to the part-of-speech information of mark, And the relative position i of corresponding word pair is modeled for part of speech, build position corresponding with position part of speech association power Weight matrix M '_i, wherein, matrix M ranks dimension concentrates the element in the species size of part of speech, matrix M for part-of-speech tagging to be somebody's turn to do The part of speech and the co-occurrence probabilities of the part of speech of the corresponding word of row of the element, matrix M ' of the row correspondence word of element_iRanks dimension It is identical with matrix M, matrix M '_iIn element for the element row correspondence word part of speech and the corresponding word of row of the element word Co-occurrence probabilities of the property in relative position i；

Term vector study module, for by the matrix M and matrix M ' after modeling_iIt is fused to skip-gram term vector models Middle structure object module, carries out term vector study by object module and obtains target term vector, wherein, target term vector is used for word Analogy task and word similarity task.

Preferably, the part-of-speech tagging module includes：

Word-dividing mode, for carrying out participle to target text, to distinguish all words in target text；

Part-of-speech tagging submodule, for each sentence in target text, being believed according to context of the word in sentence Breath, the part of speech concentrated using part-of-speech tagging carries out part-of-speech tagging to word.

Preferably, the position part of speech Fusion Module includes：

Part-of-speech information modeling module, for each word in target text, generation to be directed to word and its corresponding word Property constitute word-part of speech pair, according to word-part of speech to build part of speech associated weights matrix M, wherein, matrix M ranks dimension It is the part of speech and the element of the row correspondence word of the element that the element in the species size of part of speech, matrix M is concentrated for part-of-speech tagging Row correspondence word part of speech co-occurrence probabilities；

Positional information modeling module, for being modeled for part of speech to the relative position i of corresponding word pair, build with The corresponding position part of speech associated weights matrix M ' in position_i, wherein, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn Element for the row correspondence word of the element part of speech and co-occurrence of the part of speech in relative position i of the corresponding word of row of the element Probability.

Preferably, the term vector study module includes：

Initial target function builds module, for building initial target function： Wherein, C represents the vocabulary in whole training corpus, and Context (w) represents each c group of words before and after target word w Into context words collection, c represents window size；

Fresh target function builds module, for by the matrix M and matrix M ' after modeling_iIt is fused to based on negative sampling Object module is built in skip-gram term vector models, and according to the fresh target function of initial target function structure object module：Wherein, NEG (w) is the negative sample collection sampled to target word w, L^w(u) marking for being sample u, positive sample marking is 1, negative sample Give a mark as 0, θ^uThe auxiliary vector used for sample word during model training,For context wordsCorresponding word to AmountTransposition,For T_uWithCo-occurrence probabilities of two parts of speech when relative position relation is i；

Term vector learns submodule, for being optimized to fresh target function, and fresh target function value is maximized, and right Parameter θ^u、AndGradient calculation and renewal are carried out, and is obtained when whole training corpus is traveled through and completed Target term vector.

In general, the inventive method can obtain following beneficial effect compared with prior art：

(1), can be well to word by building the incidence matrix based on part of speech incidence relation Yu position incidence relation Between part of speech and positional information be modeled.

(2) adopted by the way that the modeled good incidence matrix based on part-of-speech information and positional information is fused to based on negative In the skip-gram term vector learning models of sample, more preferable term vector result on the one hand can be obtained, on the other hand can also be obtained Incidence relation weight into the corpus for model training between part of speech.

(3) because model employs the optimisation strategy of negative sampling so that the training speed of model is also than very fast.

Brief description of the drawings

Fig. 1 is that a kind of flow for the term vector training method for merging part of speech and positional information is shown disclosed in the embodiment of the present invention It is intended to；

Fig. 2 is a kind of modeler model figure of part of speech and positional information disclosed in the embodiment of the present invention；

Fig. 3 is a kind of overall flow rough schematic view disclosed in the embodiment of the present invention；

Fig. 4 is the flow of the term vector training method of another fusion part of speech and positional information disclosed in the embodiment of the present invention Schematic diagram.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.As long as in addition, technical characteristic involved in each embodiment of invention described below that Not constituting conflict between this can just be mutually combined.

Because existing term vector learning method ignores part of speech and its importance in natural language, the present invention provides a kind of Merge the term vector learning method of part of speech and positional information.This method is intended to consider single on the basis of original skip-gram models Part of speech incidence relation and position relationship between word, to allow model to train the term vector result of fusion more information, and Word analogy task and word similarity task are preferably completed using the term vector learnt.

It is a kind of term vector learning method for merging part of speech and positional information disclosed in the embodiment of the present invention as shown in Figure 1 Schematic flow sheet, comprises the following steps in the method shown in Fig. 1：

S1, urtext is carried out to pre-process and obtain target text；

Due to existed in the urtext of acquisition substantial amounts of garbage for example XML tag, web page interlinkage, image link with And as " [", "@", " ＆ ", " # " etc., training of these garbages not only to term vector is unhelpful, or even can turn into noise data, Influence the study of term vector, it is therefore desirable to fall these information filterings, it is possible to use perl script falls these information filterings.

Due to use the part-of-speech information of word in the method as proposed in the present invention, it is therefore desirable to utilize some part-of-speech taggings Instrument carries out part-of-speech tagging to text.Because the difference of context residing for a word causes it to have multiple parts of speech, it is Part-of-speech tagging can be carried out by text in advance by solving this problem, and part-of-speech tagging is carried out by its contextual information.Step S2 Specifically include following sub-step：

Wherein it is possible to text is subjected to participle using the tokenize participles instrument in openNLP, such as " I buy an Apple. common word " apple " will turn into " apple. " this non-existent word if not participle in ", influence word The study of vector.

Wherein, part-of-speech tagging disposably is carried out to a whole sentence, thus can be by same word according to its institute The multiple parts of speech located context and can had make a distinction.Here the part of speech that word is endowed belongs to Penn Treebank POS part-of-speech tagging collection.

Such as " i love you. " and " two after she give her son too much love. " progress word marks Sentence just turns into：

I_PRP (pronoun) love_VBP (verb) you_PRP (pronoun) ._.；

She_PRP (pronoun) give_VB (verb) her_PRP $ (pronoun) son_NN (noun) too_RB (adverbial word) much_ JJ (adjective) love_NN (noun) ._..

S3, according to the part-of-speech information of mark be modeled structure part of speech associated weights matrix M, and for part of speech to pair Answer the relative position i of word pair to be modeled, build position part of speech associated weights matrix M ' corresponding with position_i, wherein, matrix M ranks dimension concentrates the element in the species size of part of speech, matrix M to be the word of the row correspondence word of the element for part-of-speech tagging Co-occurrence probabilities of the property with the part of speech of the corresponding word of row of the element, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn Element for the row correspondence word of the element part of speech and co-occurrence of the part of speech in relative position i of the corresponding word of row of the element Probability；A kind of modeler model figure of part of speech and positional information disclosed in the embodiment of the present invention as shown in Figure 2, wherein, in ranks T₀~T_NRepresent part of speech, M '_i(T_t,T_t-2) represent part of speech T_tWith part of speech T_t-2Co-occurrence probabilities in relative position i.

Wherein, after the part of speech of word is obtained, how part-of-speech information is participated in term vector learning model and to new Model is solved, it is necessary to part-of-speech information is modeled first.The target of modeling is that to set up ranks dimension be all part of speech mark Note concentrates the element in the part of speech incidence relation matrix of the species size of part of speech, matrix to be the probability that two parts of speech occur.Remove Outside this, to be also modeled for position relationship, because position relationship during two part of speech co-occurrences between them is also very Important.Step S3 specifically includes following sub-step：

For example for " for the word son in she give her son too much love. ", its part of speech be NN, Word her part of speech is PRP, then the corresponding rows of part of speech PRP and the element specified by the corresponding row of part of speech NN are in matrix The co-occurrence probabilities (i.e. weights) of two parts of speech.

S3.2, for part of speech the relative position i of corresponding word pair is modeled, builds position corresponding with position word Property associated weights matrix M '_i, wherein, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn element for the element Co-occurrence probabilities (weights) of the part of speech of row correspondence word with the part of speech of the corresponding word of row of the element in relative position i.

If for example, window size is 2c, i ∈ [- c, c].When window size is 6, then M ' will be set up_-3、M′_-2、 M′_-1、M′₁、M′₂、M′₃Totally 6 matrixes.

For example for " son and her in she give her son too much love. ", when son is target word When, the associated weight value of part of speech and position corresponding to the two word parts of speech is M '_-1(PRP,NN)。

S4, by the matrix M and matrix M ' after modeling_iIt is fused in skip-gram term vector models and builds object module, by Object module carries out term vector study and obtains target term vector, wherein, target term vector is used for word analogy task and word Similarity task.

Wherein, step S4 specifically includes following sub-step：

Pass through target word w because Skip-gram model thoughts are identical_tPredict the word v (w in context_t+i) wherein, i Represent w_t+iWith w_tBetween position relationship.With sample (Context (w_t), w_t) exemplified by, wherein | Context (w_t) |=2c, its In, Context (w_t) it is by word w_tFront and rear each c word composition.The final optimization pass target of object module is still to whole training For corpus so that all to pass through target word w_tTo predict the maximization of context words namely optimize initial target Function.

For example " she give her son too much love. " words son is target word w to sample_t, c is 3, then Context(w_t)={ she, give, her, too, much, love }.

S4.2, by the matrix M and matrix M ' after modeling_iBe fused to the skip-gram words based on negative sampling to Object module is built in amount model, and according to the fresh target function of initial target function structure object module：Wherein, NEG (w) is the negative sample collection sampled to target word w, L^w(u) marking for being sample u, positive sample marking is 1, negative sample Give a mark as 0, θ^uThe auxiliary vector used for sample word during model training,For context wordsCorresponding word VectorTransposition,For T_uWithCo-occurrence probabilities of two parts of speech when relative position relation is i；

For example " she give her son too much love. " words son is positive sample to sample, now word son Label be 1, be exactly negative sample for other words such as dog, flower etc., its label be 0.

It is illustrated in figure 3 a kind of overall flow rough schematic view disclosed in the embodiment of the present invention, the object module tool of structure There are input layer, projection layer, three layers of output layer.Wherein：

Word w (t) centered on the input of input layer, output is the corresponding term vectors of center word w (t)；

Projection layer is mainly to be projected to the output result of input layer, and the input of projection layer and output is all in the model It is center word w (t) term vector；

Output layer mainly uses center word w (t) to predict such as w (t-2), w (t-1), w (t+1), above and below w (t+2) etc. The term vector of literary word.

Present invention is primarily intended to predicted using center word w (t) during its context words, it is considered to center word and its The part of speech and position relationship of context words.

For example can be using stochastic gradient rise method (Stochastic Gradient Ascent, SGA) to fresh target letter Number, which is optimized, maximizes fresh target function value.And to parameter θ^u、WithGradient calculation and renewal, Target term vector is then just obtained when having been traveled through to whole training corpus.

It is alternatively possible to be updated by the way of as follows and gradient calculation obtains target term vector：

It is illustrated in figure 4 the term vector training method of another fusion part of speech provided in an embodiment of the present invention and positional information Schematic flow sheet, in the method shown in Fig. 4, including data prediction, participle and part-of-speech tagging, part of speech and positional information are built Mould, term vector training, task assess five steps.Wherein data prediction, participle and part-of-speech tagging, part of speech and positional information are built Mould, term vector training method and step as described in Example 1, task assess can using learn above with part of speech with After the target term vector of positional information, target term vector can be used for the task such as word analogy task and word similarity In.Mainly include following two steps：

Word analogy task is done with the target term vector learnt.For example for two words pair<king,queen>With< man,woman>, by these words are carried out to corresponding term vector calculating can find to exist v (king)-v (queen)= Relation as v (man)-v (woman).

The similar task of word is done with the target term vector learnt.A word is for example given such as " dog ", by calculating Other words and " dog " COS distance or Euclidean distance, which may can obtain " puppy ", " cat " etc. and " dog ", close Cut the N number of words of preceding top of relation.

As it will be easily appreciated by one skilled in the art that the foregoing is merely illustrative of the preferred embodiments of the present invention, it is not used to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the invention etc., it all should include Within protection scope of the present invention.

Claims

1. a kind of term vector training method for merging part of speech and positional information, it is characterised in that comprise the following steps：

S1, urtext is carried out to pre-process and obtain target text；

S2, the contextual information according to word, the part of speech concentrated using part-of-speech tagging carry out part of speech to the word in target text Mark；

S3, structure part of speech associated weights matrix M is modeled according to the part-of-speech information of mark, and for part of speech to corresponding list The relative position i of word pair is modeled, and builds position part of speech associated weights matrix M ' corresponding with position_i, wherein, matrix M's Ranks dimension concentrates the element in the species size of part of speech, matrix M to be the part of speech of the row correspondence word of the element for part-of-speech tagging With the co-occurrence probabilities of the part of speech of the corresponding word of row of the element, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn Co-occurrence of the element for the part of speech of the row correspondence word of the element and the part of speech of the corresponding word of row of the element in relative position i is general Rate；

S4, by the matrix M and matrix M ' after modeling_iIt is fused in skip-gram term vector models and builds object module, by target Model carries out term vector study and obtains target term vector, wherein, target term vector is used for word analogy task and word is similar Degree task.

2. according to the method described in claim 1, it is characterised in that step S2 specifically includes following sub-step：

S2.2, to each sentence in target text, according to contextual information of the word in sentence, concentrated using part-of-speech tagging Part of speech to word carry out part-of-speech tagging.

3. method according to claim 1 or 2, it is characterised in that step S3 specifically includes following sub-step：

S3.1, the word-part of speech pair constituted to each word in target text, generation for word and its corresponding part of speech, According to word-part of speech to building part of speech associated weights matrix M, wherein, matrix M ranks dimension concentrates part of speech for part-of-speech tagging Element in species size, matrix M is the part of speech and the part of speech of the corresponding word of row of the element of the row correspondence word of the element Co-occurrence probabilities；

S3.2, for part of speech the relative position i of corresponding word pair is modeled, builds position corresponding with position part of speech pass Join weight matrix M '_i, wherein, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn element be the element row pair Answer the co-occurrence probabilities of the part of speech of word and the part of speech of the corresponding word of row of the element in relative position i.

4. method according to claim 3, it is characterised in that step S4 specifically includes following sub-step：

S4.1, structure initial target function：Wherein, C represents whole Vocabulary in individual training corpus, Context (w) represents the context list that each c word is constituted before and after target word w Word set, c represents window size；

S4.2, by the matrix M and matrix M ' after modeling_iIt is fused to the skip-gram term vector moulds based on negative sampling Object module is built in type, and according to the fresh target function of initial target function structure object module：Wherein, NEG (w) is the negative sample collection sampled to target word w, L^w(u) marking for being sample u, positive sample marking is 1, negative sample Give a mark as 0, θ^uThe auxiliary vector used for sample word during model training,For context wordsCorresponding word to AmountTransposition,For T_uWithCo-occurrence probabilities of two parts of speech when relative position relation is i；

5. a kind of term vector training system for merging part of speech and positional information, it is characterised in that including：

Pretreatment module, for urtext pre-process obtaining target text；

Part-of-speech tagging module, for the contextual information according to word, the part of speech concentrated using part-of-speech tagging is in target text Word carry out part-of-speech tagging；

Position part of speech Fusion Module, for being modeled structure part of speech associated weights matrix M according to the part-of-speech information of mark, and The relative position i of corresponding word pair is modeled for part of speech, position part of speech associated weights square corresponding with position is built Battle array M '_i, wherein, matrix M ranks dimension concentrates the element in the species size of part of speech, matrix M to be the element for part-of-speech tagging Row correspondence word part of speech and the co-occurrence probabilities of the part of speech of the corresponding word of row of the element, matrix M '_iRanks dimension and square Battle array M is identical, matrix M '_iIn element for the element row correspondence word part of speech exist with the part of speech of the corresponding word of row of the element Co-occurrence probabilities during relative position i；

Term vector study module, for by the matrix M and matrix M ' after modeling_iIt is fused in skip-gram term vector models and builds Object module, carries out term vector study by object module and obtains target term vector, wherein, target term vector is appointed for word analogy Business and word similarity task.

6. system according to claim 5, it is characterised in that the part-of-speech tagging module includes：

Part-of-speech tagging submodule, for each sentence in target text, according to contextual information of the word in sentence, adopting The part of speech concentrated with part-of-speech tagging carries out part-of-speech tagging to word.

7. the system according to claim 5 or 6, it is characterised in that the position part of speech Fusion Module includes：

Part-of-speech information modeling module, for each word in target text, generation to be directed to word and its corresponding part of speech structure Into word-part of speech pair, according to word-part of speech to building part of speech associated weights matrix M, wherein, matrix M ranks dimension is word Property mark concentrate the species size of part of speech, the element in matrix M for the row correspondence word of the element part of speech and the row of the element The co-occurrence probabilities of the part of speech of correspondence word；

Positional information modeling module, for being modeled for part of speech to the relative position i of corresponding word pair, builds and position Corresponding position part of speech associated weights matrix M '_i, wherein, matrix M '_iRanks dimension it is identical with matrix M, matrix M '_iIn member Co-occurrence of the element for the part of speech of the row correspondence word of the element and the part of speech of the corresponding word of row of the element in relative position i is general Rate.

8. system according to claim 7, it is characterised in that the term vector study module includes：

Fresh target function builds module, for by the matrix M and matrix M after modeling_i' it is fused to the skip-gram based on negative sampling Object module is built in term vector model, and according to the fresh target function of initial target function structure object module：Wherein, NEG (w) is the negative sample collection sampled to target word w, L^w(u) marking for being sample u, positive sample marking is 1, negative sample Give a mark as 0, θ^uThe auxiliary vector used for sample word during model training,For context wordsCorresponding word to AmountTransposition,For T_uWithCo-occurrence probabilities of two parts of speech when relative position relation is i；

Term vector learns submodule, for being optimized to fresh target function, and fresh target function value is maximized, and to parameter θ^u、AndGradient calculation and renewal are carried out, and target is obtained when whole training corpus is traveled through and completed Term vector.