CN107894977A - With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary - Google Patents

With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary Download PDF

Info

Publication number
CN107894977A
CN107894977A CN201711056063.9A CN201711056063A CN107894977A CN 107894977 A CN107894977 A CN 107894977A CN 201711056063 A CN201711056063 A CN 201711056063A CN 107894977 A CN107894977 A CN 107894977A
Authority
CN
China
Prior art keywords
speech
conversion
parts
vietnamese
dictionary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711056063.9A
Other languages
Chinese (zh)
Inventor
郭剑毅
赵晨
余正涛
王红斌
文永华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201711056063.9A priority Critical patent/CN107894977A/en
Publication of CN107894977A publication Critical patent/CN107894977A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to the Vietnamese part of speech labeling method for combining conversion of parts of speech part of speech disambiguation model and dictionary, belong to natural language processing technique field.The present invention obtains non-conversion of parts of speech dictionary and conversion of parts of speech dictionary based on Vietnamese dictionary by arranging first;Secondly according to Vietnamese feature, Vietnamese part-of-speech tagging feature is chosen, forms conversion of parts of speech part of speech disambiguation model;Part of speech mark is carried out to the conversion of parts of speech in testing material and non-conversion of parts of speech respectively further according to conversion of parts of speech part of speech disambiguation model and non-conversion of parts of speech dictionary;Finally the result of two kinds of marks is merged to obtain final mark result.The present invention especially considers influence of the conversion of parts of speech to part-of-speech tagging, effectively improves the accuracy of the part-of-speech tagging of Vietnamese.

Description

With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary
Technical field
The present invention relates to the Vietnamese part of speech labeling method for combining conversion of parts of speech part of speech disambiguation model and dictionary, belong to nature language Say processing technology field.
Background technology
Part-of-speech tagging is typical sequence labelling task in natural language processing, and part-of-speech tagging is for each word in sentence Assign a correct lexical token;It is widely used in many links of natural language processing process, such as chunk parsing, sentence Method analysis, name Entity recognition, noun phrase recognition, semantic analysis and machine translation etc., are played a very important role.More The research of the part-of-speech tagging of southern language effectively can provide support for the language information processing research work of follow-up Vietnamese, can be with Applied to the machine translation of Vietnamese, information retrieval and speech recognition etc., while it is also language block identifier, Vietnamese syntactic analysis The indispensable basis of device etc..But labeling method accuracy of the prior art is low, the influence of conversion of parts of speech is not accounted for yet, It is therefore desirable to provide a kind of Vietnamese part of speech labeling method of combination conversion of parts of speech.
The content of the invention
The invention provides the Vietnamese part of speech labeling method for combining conversion of parts of speech part of speech disambiguation model and dictionary, especially considers Influence of the conversion of parts of speech to part-of-speech tagging, the accuracy of the part-of-speech tagging of Vietnamese is effectively improved, for solving traditional mark The problem of accuracy of note method is relatively low.
The technical scheme is that:With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary, Methods described concretely comprises the following steps:
Step1, first manual sorting obtain Vietnamese dictionary;
Step2, non-conversion of parts of speech dictionary and conversion of parts of speech dictionary are secondly obtained based on the Vietnamese dictionary of manual sorting;
Step3, secondly according to Vietnamese language feature, have chosen Vietnamese part-of-speech tagging feature set, construct conversion of parts of speech Part of speech disambiguation model;
Step4, further according to constructed conversion of parts of speech part of speech disambiguation model and non-conversion of parts of speech dictionary respectively to new in Vietnamese Hear the conversion of parts of speech in the testing material obtained on the net and non-conversion of parts of speech carries out part of speech mark automatically;
Step5, finally the automatic fusion of result progress of two kinds of marks is obtained finally marking result.
The step Step3's concretely comprises the following steps:
Step3.1, first, different type language material is crawled by web crawler, and carries out the pretreatment of language material, in advance Processing includes data de-noising, makees word segmentation processing with participle instrument;
Step3.2, secondly, matched according to Vietnamese dictionary, write the conversion of parts of speech that automatic program identification goes out in language material Set;
Step3.3, then, according to Vietnamese conversion of parts of speech characteristic, choose the feature of conversion of parts of speech;Subsequently according to selection this A little features are dissolved into training corpus;
It is Step3.4, last, statistical analysis calculating is carried out using maximum entropy model, with reference to the conversion of parts of speech feature in Step3.3 And contextual feature, generate Vietnamese conversion of parts of speech part of speech disambiguation model.
The step Step3.1's concretely comprises the following steps:
Step3.1.1, it be have collected from the news website of Vietnamese including news, amusement, economic type article;
Step3.1.2, first pass around including arranging, going noise operation, form the language material of text sentence level;
Step3.1.3, secondly the language material of text sentence level is segmented and by Vietnamese using Vietnamese participle instrument Yan expert manually proofreads, and forms the participle language material of Sentence-level;
Step3.1.4 and then artificial part-of-speech tagging and chunk parsing are carried out to participle language material;
Step3.1.5, finally by arrange Vietnamese dictionary obtain conversion of parts of speech dictionary;Based on this dictionary, pass through volume Journey extracts Vietnamese conversion of parts of speech field language material from the part-of-speech tagging corpus built, for conversion of parts of speech part of speech disambiguation model Structure.
In the step Step3.3, its feature of conversion of parts of speech part of speech disambiguation model is mainly chosen:Word and word contextual information Feature;Part of speech contextual information feature;Chunk and chunk contextual information feature;Word sentence element feature in sentence.
The step Step4's concretely comprises the following steps:
Step4.1, Vietnamese conversion of parts of speech dictionary is primarily based on, conversion of parts of speech is extracted from the testing material for treat part-of-speech tagging With non-conversion of parts of speech;
Step4.2 then using conversion of parts of speech part of speech disambiguation model to conversion of parts of speech carry out disambiguation, after obtaining conversion of parts of speech disambiguation Mark result;
It is Step4.3, last, the non-conversion of parts of speech extracted is matched according to non-conversion of parts of speech part of speech dictionary, obtained non-simultaneous Class word marks result.
In the step Step5, for incite somebody to action both after obtaining the part-of-speech tagging of conversion of parts of speech and the part-of-speech tagging of non-conversion of parts of speech The method combined is directly to replace, because conversion of parts of speech dictionary and non-conversion of parts of speech dictionary are had in same Vietnamese dictionary Resulting, so directly replacing will not cause to conflict.
The beneficial effects of the invention are as follows:
The present invention especially considers the influence of conversion of parts of speech, language material is divided into conversion of parts of speech in the research of Vietnamese part-of-speech tagging It is marked respectively with non-conversion of parts of speech, and is arranged based on Vietnamese dictionary and obtained non-conversion of parts of speech dictionary and conversion of parts of speech word Allusion quotation:For non-conversion of parts of speech, it is contemplated that the part-of-speech tagging based on part of speech dictionary can realize the good experiment close to 100% accuracy rate As a result, this is well more many than the experimental result of the algorithm based on statistics, and when avoiding handmarking's language material it is possible that The possibility of marking error, reduce workload during mark language material;For conversion of parts of speech, the language that the present invention combines Vietnamese is special Property, constructs conversion of parts of speech corpus, have chosen above-mentioned conversion of parts of speech feature, effectively improve Vietnamese part-of-speech tagging it is correct Rate.
Brief description of the drawings
Fig. 1 is the overall flow figure in the present invention;
Fig. 2 is the flow chart of conversion of parts of speech disambiguation model construction in the present invention;
Fig. 3 is the result figure of four kinds of models, ten times of cross validation's experiments in the embodiment of the present invention;
Fig. 4 is the result figure of three kinds of model contrast experiments in the embodiment of the present invention.
Embodiment
Embodiment 1:As Figure 1-4, with reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary, Methods described concretely comprises the following steps:
Step1, first manual sorting obtain Vietnamese dictionary;It can be derived from from website (http:// Vdict.com/) swash the word got, such as have 30565 entries;
Step2, non-conversion of parts of speech dictionary and conversion of parts of speech dictionary are secondly obtained based on the Vietnamese dictionary of manual sorting;Its In obtained conversion of parts of speech dictionary be 2659;
Selection Vietnamese dictionary is that coverage rate is wider because Vietnamese dictionary is comparatively relatively more comprehensive, can be covered absolutely Most of real corpus, resulting conversion of parts of speech and non-conversion of parts of speech dictionary can also cover conversion of parts of speech and non-ambiguous category in real corpus Word.
Step3, secondly according to Vietnamese language feature, have chosen Vietnamese part-of-speech tagging feature set, construct conversion of parts of speech Part of speech disambiguation model;
Step4, further according to constructed conversion of parts of speech part of speech disambiguation model and non-conversion of parts of speech dictionary respectively to new in Vietnamese Hear the conversion of parts of speech in the testing material obtained on the net and non-conversion of parts of speech carries out part of speech mark automatically;
Step5, finally the automatic fusion of result progress of two kinds of marks is obtained finally marking result.
Further, the step Step3 is concretely comprised the following steps:
Step3.1, first, different type language material is crawled by web crawler, and carries out the pretreatment of language material, in advance Processing includes data de-noising, makees word segmentation processing with participle instrument;
Step3.2, secondly, matched according to Vietnamese dictionary, write the conversion of parts of speech that automatic program identification goes out in language material Set;
Step3.3, then, according to Vietnamese conversion of parts of speech characteristic, choose the feature of conversion of parts of speech;Subsequently according to selection this A little features are dissolved into training corpus;
It is Step3.4, last, statistical analysis calculating is carried out using maximum entropy model, with reference to the conversion of parts of speech feature in Step3.3 And contextual feature, generate Vietnamese conversion of parts of speech part of speech disambiguation model.
Further, the step Step3.1 is concretely comprised the following steps:
Step3.1.1, it be have collected from the news website of Vietnamese including news, amusement, economic type article;
Step3.1.2, first pass around including arranging, going noise operation, form the language material of text sentence level;
Step3.1.3, secondly the language material of text sentence level is segmented and by Vietnamese using Vietnamese participle instrument Yan expert manually proofreads, and forms the participle language material of Sentence-level;
Step3.1.4 and then artificial part-of-speech tagging and chunk parsing are carried out to participle language material;
Step3.1.5, finally by arrange Vietnamese dictionary obtain conversion of parts of speech dictionary;Based on this dictionary, pass through volume Journey extracts Vietnamese conversion of parts of speech field language material from the part-of-speech tagging corpus built, for conversion of parts of speech part of speech disambiguation model Structure.
Why Vietnamese conversion of parts of speech field language material is extracted from the news website of Vietnamese, be because conversion of parts of speech field language Material, it is impossible to obtained elsewhere, also no related data can be taken and use, news website of the present invention selection from Vietnamese On.
Further, in the step Step3.3, its feature of conversion of parts of speech part of speech disambiguation model is mainly chosen:Word and word Contextual information feature;Part of speech contextual information feature;Chunk and chunk contextual information feature;Word in sentence sentence into Dtex is levied.
(1) word and word contextual information feature (morphological pattern contains the information of abundant form);
Certain rule of the morphological pattern of word to word part of speech itself and the rule to the word context, for example, some folded morphologies The word of things or action is described as AABB formulas typically represent, the word of repetitive operation is typicallyed represent shaped like ABAB formulas, shaped like ABB formulas one As represent performance things state, quantity, the word etc. of sound;
(2) part of speech contextual information feature (part of speech can represent the modified relationship between part of speech);
The part of speech of the word of the context of word is to the rule of the part of speech of the word in sentence, for example, typically can in a sentence Containing verb, noun, for another example in sentence pronoun it is latter as connect verb, adverbial word or adjective, connect noun as verb is latter, it is secondary Word etc..
(3) chunk and chunk contextual information feature (representing that the word acts on played in sentence, the information such as modified relationship);
Part of speech feature in chunk between part of speech feature, and chunk, for example, noun chunk is typically by adjective and noun structure Into, for another example noun chunk it is previous as be verb chunk.
(4) word sentence element feature (subject, predicate, adverbial modifier etc.) in sentence.
Composition of the word in sentence and the rule of the word part of speech, such as:Predicate is generally verb, subject be generally pronoun or Noun etc..
Further, the step Step4 is concretely comprised the following steps:
Step4.1, Vietnamese conversion of parts of speech dictionary is primarily based on, conversion of parts of speech is extracted from the testing material for treat part-of-speech tagging With non-conversion of parts of speech;
Step4.2 then using conversion of parts of speech part of speech disambiguation model to conversion of parts of speech carry out disambiguation, after obtaining conversion of parts of speech disambiguation Mark result;
It is Step4.3, last, the non-conversion of parts of speech extracted is matched according to non-conversion of parts of speech part of speech dictionary, obtained non-simultaneous Class word marks result.
Further, in the step Step5, for obtaining the part-of-speech tagging of conversion of parts of speech and the part-of-speech tagging of non-conversion of parts of speech The method combined both afterwards is directly to replace, because conversion of parts of speech dictionary and non-conversion of parts of speech dictionary are all to have same Vietnam Obtained by language dictionary, so directly replacing will not cause to conflict.
The present embodiment is used as training corpus and testing material by the Vietnamese sentence crawled in Vietnam's news website, climbs The webpage got forms text corpus by steps such as Rule Extraction, duplicate removal, artificial marks, constructs scale as 27878 Sentence and 396,946 conversion of parts of speech field storehouses, for the invention provides the support of language material;
In order to verify the effect of the name entity of the invention identified, unified evaluation criterion will be used:Accuracy rate (Precision) as the evaluation criterion of the present invention, performance of the invention is weighed.
The present invention is in order to verify that the validity of the invention, possible designs following groups are verified:
Experiment one:The participle accuracy that can effectively improve Vietnamese is added after conversion of parts of speech model disambiguation in order to demonstrate. The 27878 part-of-speech tagging language materials marked are divided into ten parts by this experiment, then carry out ten times of cross-validation experiments, respectively Ten times of cross validation realities are carried out respectively using popular recently MEM, CRF, SVM and part of speech dictionary+conversion of parts of speech disambiguation model Test, compare Average Accuracy.Experimental result is as shown in Figure 3.The different characteristic of table 1 extracts performance shadow to domain entities hyponymy Ring;
10 times of cross validation's experiments of table
From the experimental data of table 1, MEM, CRF++, SVMmulticlass, part of speech dictionary+conversion of parts of speech disambiguation model are put down Equal accuracy rate is 91.62%, 93.71%, 94.67% and 95.22% respectively, wherein, SVMmulticlass models accuracy rate ratio CRF++ is higher by 0.96%, CRF++ and is higher by 2.09% than MEM, and the accuracy rate ratio of part of speech dictionary+conversion of parts of speech disambiguation model SVMmulticlass models are high by 0.55%.Also as shown in figure 4, so as to demonstrate add conversion of parts of speech model disambiguation after can be effectively Improve the participle accuracy of Vietnamese.
Experiment two:In order to verify the validity of present system, with part-of-speech tagging model of the present invention and existing part of speech mark Note instrument VietTagger and SVMmulticlass model carry out contrast experiment, and experimental result is as shown in table 2.
The part-of-speech tagging experimental result of table 2 contrasts
System Precision
VieTagger 92.13%
SVMulticlass 94.67%
Proposed method 95.22%
As can be seen that the part of speech mark proposed by the invention based on part of speech dictionary and conversion of parts of speech disambiguation models coupling in table 2 Injecting method achieves good annotation results, higher than VietTagger by 3.09%, higher than SVM multiclass by 0.55%, so as to It is effective and feasible to demonstrate the inventive method.
Above in conjunction with accompanying drawing to the present invention embodiment be explained in detail, but the present invention be not limited to it is above-mentioned Embodiment, can also be before present inventive concept not be departed from those of ordinary skill in the art's possessed knowledge Put that various changes can be made.

Claims (6)

1. combine the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary, it is characterised in that:
Methods described concretely comprises the following steps:
Step1, first manual sorting obtain Vietnamese dictionary;
Step2, non-conversion of parts of speech dictionary and conversion of parts of speech dictionary are secondly obtained based on the Vietnamese dictionary of manual sorting;
Step3, secondly according to Vietnamese language feature, have chosen Vietnamese part-of-speech tagging feature set, construct conversion of parts of speech part of speech Disambiguation model;
Step4, further according to constructed conversion of parts of speech part of speech disambiguation model and non-conversion of parts of speech dictionary respectively in Vietnamese News Network Conversion of parts of speech and non-conversion of parts of speech in the testing material of upper acquisition carry out part of speech mark automatically;
Step5, finally the automatic fusion of result progress of two kinds of marks is obtained finally marking result.
2. the Vietnamese part of speech labeling method of combination conversion of parts of speech part of speech disambiguation model according to claim 1 and dictionary, its It is characterised by:The step Step3's concretely comprises the following steps:
Step3.1, first, different type language material is crawled by web crawler, and carries out the pretreatment of language material, is pre-processed Make word segmentation processing including data de-noising, with participle instrument;
Step3.2, secondly, matched according to Vietnamese dictionary, write the ambiguous category set of words that automatic program identification goes out in language material;
Step3.3, then, according to Vietnamese conversion of parts of speech characteristic, choose the feature of conversion of parts of speech;Subsequently according to these spies of selection Sign is dissolved into training corpus;
It is Step3.4, last, statistical analysis calculating is carried out using maximum entropy model, with reference to the conversion of parts of speech feature in Step3.3 and Contextual feature, generate Vietnamese conversion of parts of speech part of speech disambiguation model.
3. the Vietnamese part of speech labeling method of combination conversion of parts of speech part of speech disambiguation model according to claim 2 and dictionary, its It is characterised by:The step Step3.1's concretely comprises the following steps:
Step3.1.1, it be have collected from the news website of Vietnamese including news, amusement, economic type article;
Step3.1.2, first pass around including arranging, going noise operation, form the language material of text sentence level;
Step3.1.3, secondly the language material of text sentence level is segmented and special by Vietnam's language using Vietnamese participle instrument The artificial check and correction of family, form the participle language material of Sentence-level;
Step3.1.4 and then artificial part-of-speech tagging and chunk parsing are carried out to participle language material;
Step3.1.5, finally by arrange Vietnamese dictionary obtain conversion of parts of speech dictionary;Based on this dictionary, by programming from Vietnamese conversion of parts of speech field language material, the structure for conversion of parts of speech part of speech disambiguation model are extracted in the part-of-speech tagging corpus built Build.
4. the Vietnamese part of speech labeling method of combination conversion of parts of speech part of speech disambiguation model according to claim 2 and dictionary, its It is characterised by:In the step Step3.3, its feature of conversion of parts of speech part of speech disambiguation model is mainly chosen:Word and word context letter Cease feature;Part of speech contextual information feature;Chunk and chunk contextual information feature;Word sentence element feature in sentence.
5. the Vietnamese part of speech labeling method of combination conversion of parts of speech part of speech disambiguation model according to claim 1 and dictionary, its It is characterised by:The step Step4's concretely comprises the following steps:
Step4.1, Vietnamese conversion of parts of speech dictionary is primarily based on, conversion of parts of speech and non-is extracted from the testing material for treat part-of-speech tagging Conversion of parts of speech;
Step4.2 then using conversion of parts of speech part of speech disambiguation model to conversion of parts of speech carry out disambiguation, obtain the mark after conversion of parts of speech disambiguation As a result;
It is Step4.3, last, the non-conversion of parts of speech extracted is matched according to non-conversion of parts of speech part of speech dictionary, obtains non-conversion of parts of speech Mark result.
6. the Vietnamese part of speech labeling method of combination conversion of parts of speech part of speech disambiguation model according to claim 1 and dictionary, its It is characterised by:In the step Step5, for incite somebody to action both after obtaining the part-of-speech tagging of conversion of parts of speech and the part-of-speech tagging of non-conversion of parts of speech The method combined is directly to replace.
CN201711056063.9A 2017-11-01 2017-11-01 With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary Pending CN107894977A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711056063.9A CN107894977A (en) 2017-11-01 2017-11-01 With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711056063.9A CN107894977A (en) 2017-11-01 2017-11-01 With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary

Publications (1)

Publication Number Publication Date
CN107894977A true CN107894977A (en) 2018-04-10

Family

ID=61803950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711056063.9A Pending CN107894977A (en) 2017-11-01 2017-11-01 With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary

Country Status (1)

Country Link
CN (1) CN107894977A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344406A (en) * 2018-09-30 2019-02-15 阿里巴巴集团控股有限公司 Part-of-speech tagging method, apparatus and electronic equipment
CN110457715A (en) * 2019-07-15 2019-11-15 昆明理工大学 Incorporate the outer word treatment method of the more neural machine translation set of the Chinese of classified dictionary
CN114707489A (en) * 2022-03-29 2022-07-05 马上消费金融股份有限公司 Method and device for acquiring marked data set, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137636A1 (en) * 2009-12-02 2011-06-09 Janya, Inc. Context aware back-transliteration and translation of names and common phrases using web resources
CN102646100A (en) * 2011-02-21 2012-08-22 腾讯科技(深圳)有限公司 Domain term obtaining method and system
CN103902525A (en) * 2012-12-28 2014-07-02 新疆电力信息通信有限责任公司 Uygur language part-of-speech tagging method
CN104978311A (en) * 2015-07-15 2015-10-14 昆明理工大学 Vietnamese word segmentation method based on conditional random fields
CN104991890A (en) * 2015-07-15 2015-10-21 昆明理工大学 Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora
CN106202039A (en) * 2016-06-30 2016-12-07 昆明理工大学 Vietnamese portmanteau word disambiguation method based on condition random field

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110137636A1 (en) * 2009-12-02 2011-06-09 Janya, Inc. Context aware back-transliteration and translation of names and common phrases using web resources
CN102646100A (en) * 2011-02-21 2012-08-22 腾讯科技(深圳)有限公司 Domain term obtaining method and system
CN103902525A (en) * 2012-12-28 2014-07-02 新疆电力信息通信有限责任公司 Uygur language part-of-speech tagging method
CN104978311A (en) * 2015-07-15 2015-10-14 昆明理工大学 Vietnamese word segmentation method based on conditional random fields
CN104991890A (en) * 2015-07-15 2015-10-21 昆明理工大学 Method for constructing Vietnamese dependency tree bank on basis of Chinese-Vietnamese vocabulary alignment corpora
CN106202039A (en) * 2016-06-30 2016-12-07 昆明理工大学 Vietnamese portmanteau word disambiguation method based on condition random field

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张一哲: "汉语词类划分与词性标注方法的研究", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344406A (en) * 2018-09-30 2019-02-15 阿里巴巴集团控股有限公司 Part-of-speech tagging method, apparatus and electronic equipment
CN110457715A (en) * 2019-07-15 2019-11-15 昆明理工大学 Incorporate the outer word treatment method of the more neural machine translation set of the Chinese of classified dictionary
CN110457715B (en) * 2019-07-15 2022-12-13 昆明理工大学 Method for processing out-of-set words of Hanyue neural machine translation fused into classification dictionary
CN114707489A (en) * 2022-03-29 2022-07-05 马上消费金融股份有限公司 Method and device for acquiring marked data set, electronic equipment and storage medium
CN114707489B (en) * 2022-03-29 2023-08-18 马上消费金融股份有限公司 Method and device for acquiring annotation data set, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Diab Second generation AMIRA tools for Arabic processing: Fast and robust tokenization, POS tagging, and base phrase chunking
CN105045778B (en) A kind of Chinese homonym mistake auto-collation
Constant et al. MWU-aware part-of-speech tagging with a CRF model and lexical resources
CN109408642A (en) A kind of domain entities relation on attributes abstracting method based on distance supervision
CN102214166B (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN102693222B (en) Carapace bone script explanation machine translation method based on example
CN110378409A (en) It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method
Bjarnadóttir The database of modern Icelandic inflection (Beygingarlýsing íslensks nútímamáls)
CN107220243A (en) A kind of Database Interactive translation system
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
Scheible et al. A gold standard corpus of Early Modern German
CN105868187B (en) The construction method of more translation Parallel Corpus
Gupta et al. Text summarization of Hindi documents using rule based approach
CN106202035B (en) Vietnamese conversion of parts of speech disambiguation method based on combined method
CN107894977A (en) With reference to the Vietnamese part of speech labeling method of conversion of parts of speech part of speech disambiguation model and dictionary
CN101308512B (en) Mutual translation pair extraction method and device based on web page
CN106202039B (en) Vietnamese portmanteau word disambiguation method based on condition random field
CN109033166A (en) A kind of character attribute extraction training dataset construction method
Parameswarappa et al. Kannada word sense disambiguation using decision list
CN109145286A (en) Based on BiLSTM-CRF neural network model and merge the Noun Phrase Recognition Methods of Vietnamese language feature
CN103336803B (en) A kind of computer generating method of embedding name new Year scroll
CN113343717A (en) Neural machine translation method based on translation memory library
CN103019924B (en) The intelligent evaluating system of input method and method
Schottmüller et al. Issues in translating verb-particle constructions from german to english
CN110222181A (en) A kind of film review sentiment analysis method based on Python

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Yu Zhengtao

Inventor after: Zhao Chen

Inventor after: Guo Jianyi

Inventor after: Wang Hongbin

Inventor after: Wen Yonghua

Inventor before: Guo Jianyi

Inventor before: Zhao Chen

Inventor before: Yu Zhengtao

Inventor before: Wang Hongbin

Inventor before: Wen Yonghua

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180410