CN107943786A - A kind of Chinese name entity recognition method and system - Google Patents

A kind of Chinese name entity recognition method and system Download PDF

Info

Publication number
CN107943786A
CN107943786A CN201711137581.3A CN201711137581A CN107943786A CN 107943786 A CN107943786 A CN 107943786A CN 201711137581 A CN201711137581 A CN 201711137581A CN 107943786 A CN107943786 A CN 107943786A
Authority
CN
China
Prior art keywords
name entity
target text
name
sets
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711137581.3A
Other languages
Chinese (zh)
Other versions
CN107943786B (en
Inventor
吴远辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Wanlong Securities Advisory Consultants Co Ltd
Original Assignee
Guangzhou Wanlong Securities Advisory Consultants Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Wanlong Securities Advisory Consultants Co Ltd filed Critical Guangzhou Wanlong Securities Advisory Consultants Co Ltd
Priority to CN201711137581.3A priority Critical patent/CN107943786B/en
Publication of CN107943786A publication Critical patent/CN107943786A/en
Application granted granted Critical
Publication of CN107943786B publication Critical patent/CN107943786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)

Abstract

The invention discloses a kind of Chinese name entity recognition method and system, this method to comprise the following steps:S1, carry out rule-based matched Entity recognition to target text, obtains the first name entity sets;S2, carry out target text using statistic algorithm Entity recognition, the name entity sets of acquisition second;S3, after being cleaned to the first name entity sets and the second name entity sets, obtain recognition result.After the present invention is based respectively on rule match and statistic algorithm to target text progress Entity recognition, after both recognition results are cleaned, ask for obtaining last Chinese Entity recognition result, can be while Chinese Entity recognition accuracy rate be ensured, greatly improve the recall ratio of Chinese Entity recognition, and Chinese entity automatic identification is carried out by this method, recognition speed is fast, can be widely applied in the field of information processing to text.

Description

A kind of Chinese name entity recognition method and system
Technical field
The present invention relates to computer application and field of information processing, more particularly to a kind of Chinese name entity recognition method And system.
Background technology
It is information element basic in target text to name entity, is the basis of correct understanding target text.Chinese entity Name identification is the important foundation instrument of the application fields such as information extraction, syntactic analysis, machine learning, in natural language processing skill Art occupies critical role during moving towards practical.Chinese name Entity recognition seeks to judge whether a character string represents One name entity.In information extraction research, Chinese name Entity recognition is a technology most with practical value at present.Often Method is to be based purely on the recognition methods of hidden Markov, maximum entropy model.
At present, since the name of Chinese Business Name is not strong with word rule, use is more random, often in the form of abbreviation Occur, such as " Bank of China Co., Ltd. " often occurs in the form of abbreviation, and such as " Bank of China " or " middle row ", this is public affairs Take charge of the identification of name, using bringing difficulty.It is identified generally, for referred to as this kind of Chinese name entity of Chinese company, There are following difficult point:1st, under different field, scene, name the extension of abbreviation variant.2nd, certain form of entity name becomes Change frequently, and can be followed without stringent rule.3rd, expression-form is various.4th, enormous amount, it is impossible to enumerate, it is difficult to all It is embodied in dictionary.Generally speaking, in the processing of Chinese target text, since Chinese word segmentation effect largely effects on Chinese name The recognition effect of entity, and then target text analysis and treatment effect are influenced, cause that recall ratio is low and recognition speed is slow.
The content of the invention
In order to solve above-mentioned technical problem, the object of the present invention is to provide a kind of Chinese name entity recognition method and it is System.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of Chinese name entity recognition method, comprises the following steps:
S1, carry out rule-based matched Entity recognition to target text, obtains the first name entity sets;
S2, carry out target text using statistic algorithm Entity recognition, the name entity sets of acquisition second;
S3, after being cleaned to the first name entity sets and the second name entity sets, obtain recognition result.
Further, the step S1, specifically includes:
The content of target text, be separated by S11 by sentence;
S12, carry out the content extraction based on punctuation mark rule to the target text after separation;
S13, carry out the content extraction based on syntactic template rule to the target text after separation;
S14, carry out the content extraction based on table features to the target text after separation;
S15, all name entities generation the first name entity sets that acquisition will be extracted.
Further, the step S2, specifically includes:
S21, by target text carry out word segmentation processing;
S22, based on default part of speech database, part-of-speech tagging is carried out to word segmentation processing result;
S23, based on hidden Markov model statistical learning method, after carrying out statistical analysis to part-of-speech tagging result, will point Name entity generation the second name entity sets that analysis obtains.
Further, the step S3, specifically includes:
S31, according to default noise lexicon, the first name entity sets and the second name entity sets are carried out respectively Data cleansing, rejects noise vocabulary;
S32, by after cleaning first name entity sets and second name entity sets seek union after, as name entity Recognition result.
Another technical solution is used by the present invention solves its technical problem:
A kind of Chinese name entity recognition system, including with lower module:
First identification module, for carrying out rule-based matched Entity recognition to target text, it is real to obtain the first name Body set;
Second identification module, for carrying out Entity recognition to target text using statistic algorithm, obtains the second name entity Set;
Cleaning module, after being cleaned to the first name entity sets and the second name entity sets, is identified As a result.
Further, first identification module, specifically includes:
Separating element, for the content of target text to be separated by sentence;
First extracting unit, for carrying out the content extraction based on punctuation mark rule to the target text after separation;
Second extracting unit, for carrying out the content extraction based on syntactic template rule to the target text after separation;
3rd extracting unit, for carrying out the content extraction based on table features to the target text after separation;
Generation unit, for all name entities obtained generation the first name entity sets will to be extracted.
Further, second identification module, specifically includes:
Word segmentation processing unit, for target text to be carried out word segmentation processing;
Part-of-speech tagging unit, for based on default part of speech database, part-of-speech tagging to be carried out to word segmentation processing result;
Statistical analysis unit, for based on hidden Markov model statistical learning method, uniting to part-of-speech tagging result After meter analysis, name entity generation the second name entity sets of acquisition will be analyzed.
Further, the cleaning module, specifically includes:
Data cleansing unit, for according to default noise lexicon, ordering respectively the first name entity sets and second Name entity sets carries out data cleansing, rejects noise vocabulary;
Computing unit, after the first name entity sets after cleaning and the second name entity sets are asked union, makees To name Entity recognition result.
The method of the present invention, the beneficial effect of system are:The present invention is based respectively on rule match and statistic algorithm to target text After this progress Entity recognition, after both recognition results are cleaned, ask for obtaining last Chinese Entity recognition as a result, can While Chinese Entity recognition accuracy rate is ensured, to greatly improve the recall ratio of Chinese Entity recognition, and pass through this method Chinese entity automatic identification is carried out, recognition speed is fast.
Brief description of the drawings
Fig. 1 is the flow chart of the Chinese name entity recognition method of the present invention;
Fig. 2 is the structure diagram of the Chinese name entity recognition system of the present invention.
Embodiment
With reference to Fig. 1, the present invention provides a kind of Chinese name entity recognition method, comprise the following steps:
S1, carry out rule-based matched Entity recognition to target text, obtains the first name entity sets;
S2, carry out target text using statistic algorithm Entity recognition, the name entity sets of acquisition second;
S3, after being cleaned to the first name entity sets and the second name entity sets, obtain recognition result.
Wherein, target text refers to that needs carry out the text of Chinese name Entity recognition.
After this method is based respectively on rule match and statistic algorithm to target text progress Entity recognition, by both identification As a result after being cleaned, ask for obtaining last Chinese Entity recognition as a result, can ensure Chinese Entity recognition accuracy rate Meanwhile greatly improve the recall ratio of Chinese Entity recognition, and Chinese entity automatic identification is carried out by this method, can have compared with Fast recognition speed.
Preferred embodiment is further used as, the step S1, specifically includes:
The content of target text, be separated by S11 by sentence;
S12, carry out the content extraction based on punctuation mark rule to the target text after separation;Such as in some files, Custom adds double quotation marks in entity name, or plus punctuation marks used to enclose the title, at this time, the title in double quotation marks or punctuation marks used to enclose the title is extracted Come.Therefore, corresponding punctuation mark rule, these punctuation marks rule note can be created according to the use habit of people Load and the Chinese relevant punctuation mark of entity name and corresponding decimation rule, content extraction is carried out according to punctuation mark rule Afterwards as the alternative of Chinese entity name.
S13, carry out the content extraction based on syntactic template rule to the target text after separation;For example, " declaration ", Subject before the verbs such as " title ", " saying ", is typically all entity name, therefore, according to the language habits, creates corresponding syntax mould Plate gauge then, these syntactic templates rule record with the Chinese relevant word of entity name and corresponding decimation rule, so as to To be extracted according to syntactic template regular targets text.
S14, carry out the content extraction based on table features to the target text after separation;
S15, all name entities generation the first name entity sets that acquisition will be extracted.
Preferred embodiment is further used as, the step S2, specifically includes:
S21, by target text carry out word segmentation processing;
S22, based on default part of speech database, part-of-speech tagging is carried out to word segmentation processing result;
S23, based on hidden Markov model statistical learning method, after carrying out statistical analysis to part-of-speech tagging result, will point Name entity generation the second name entity sets that analysis obtains.This step is based on hidden Markov model statistical learning method, first According to known, correct entity name, the probability that keyword occurs before it is counted, then by the high keyword of probability, Extrapolate entity name.So as on the premise of the Chinese entity name accuracy rate that identification obtains is not influenced, greatly improve The recall ratio of identification, more can comprehensively identify the Chinese entity name obtained in text, and be obtained by automatic identification Chinese entity name, recognition speed are fast.
Preferred embodiment is further used as, the step S3, specifically includes:
S31, according to default noise lexicon, the first name entity sets and the second name entity sets are carried out respectively Data cleansing, rejects noise vocabulary;
S32, by after cleaning first name entity sets and second name entity sets seek union after, as name entity Recognition result.
With reference to Fig. 2, the present invention provides a kind of Chinese name entity recognition system, including with lower module:
First identification module 100, for carrying out rule-based matched Entity recognition to target text, obtains the first name Entity sets;
Second identification module 200, for carrying out Entity recognition to target text using statistic algorithm, it is real to obtain the second name Body set;
Cleaning module 300, after being cleaned to the first name entity sets and the second name entity sets, is known Other result.
Preferred embodiment is further used as, first identification module 100, specifically includes:
Separating element, for the content of target text to be separated by sentence;
First extracting unit, for carrying out the content extraction based on punctuation mark rule to the target text after separation;
Second extracting unit, for carrying out the content extraction based on syntactic template rule to the target text after separation;
3rd extracting unit, for carrying out the content extraction based on table features to the target text after separation;
Generation unit, for all name entities obtained generation the first name entity sets will to be extracted.
Preferred embodiment is further used as, second identification module 200, specifically includes:
Word segmentation processing unit, for target text to be carried out word segmentation processing;
Part-of-speech tagging unit, for based on default part of speech database, part-of-speech tagging to be carried out to word segmentation processing result;
Statistical analysis unit, for based on hidden Markov model statistical learning method, uniting to part-of-speech tagging result After meter analysis, name entity generation the second name entity sets of acquisition will be analyzed.
Preferred embodiment is further used as, the cleaning module 300, specifically includes:
Data cleansing unit, for according to default noise lexicon, ordering respectively the first name entity sets and second Name entity sets carries out data cleansing, rejects noise vocabulary;
Computing unit, after the first name entity sets after cleaning and the second name entity sets are asked union, makees To name Entity recognition result.
One kind Chinese name entity recognition system of the present invention, can perform foregoing the provided one kind Chinese name of the present invention Entity recognition method, any combination implementation steps of executing method embodiment, possess the corresponding function of this method and beneficial to effect Fruit.
Above is the preferable of the present invention is implemented to be illustrated, but the invention is not limited to the implementation Example, those skilled in the art can also make a variety of equivalent variations on the premise of without prejudice to spirit of the invention or replace Change, these equivalent modifications or replacement are all contained in the application claim limited range.

Claims (8)

1. a kind of Chinese name entity recognition method, it is characterised in that comprise the following steps:
S1, carry out rule-based matched Entity recognition to target text, obtains the first name entity sets;
S2, carry out target text using statistic algorithm Entity recognition, the name entity sets of acquisition second;
S3, after being cleaned to the first name entity sets and the second name entity sets, obtain recognition result.
A kind of 2. Chinese name entity recognition method according to claim 1, it is characterised in that the step
S1, specifically includes:
The content of target text, be separated by S11 by sentence;
S12, carry out the content extraction based on punctuation mark rule to the target text after separation;
S13, carry out the content extraction based on syntactic template rule to the target text after separation;
S14, carry out the content extraction based on table features to the target text after separation;
S15, all name entities generation the first name entity sets that acquisition will be extracted.
A kind of 3. Chinese name entity recognition method according to claim 1, it is characterised in that the step
S2, specifically includes:
S21, by target text carry out word segmentation processing;
S22, based on default part of speech database, part-of-speech tagging is carried out to word segmentation processing result;
S23, based on hidden Markov model statistical learning method, after carrying out statistical analysis to part-of-speech tagging result, analysis is obtained Name entity generation the second name entity sets obtained.
A kind of 4. Chinese name entity recognition method according to claim 1, it is characterised in that the step
S3, specifically includes:
S31, according to default noise lexicon, data are carried out to the first name entity sets and the second name entity sets respectively Cleaning, rejects noise vocabulary;
S32, by after cleaning first name entity sets and second name entity sets seek union after, as name Entity recognition As a result.
5. a kind of Chinese name entity recognition system, it is characterised in that including with lower module:
First identification module, for carrying out rule-based matched Entity recognition to target text, obtains the first name entity set Close;
Second identification module, for carrying out Entity recognition to target text using statistic algorithm, obtains the second name entity sets;
Cleaning module, after being cleaned to the first name entity sets and the second name entity sets, obtains recognition result.
A kind of 6. Chinese name entity recognition system according to claim 5, it is characterised in that the first identification mould Block, specifically includes:
Separating element, for the content of target text to be separated by sentence;
First extracting unit, for carrying out the content extraction based on punctuation mark rule to the target text after separation;
Second extracting unit, for carrying out the content extraction based on syntactic template rule to the target text after separation;
3rd extracting unit, for carrying out the content extraction based on table features to the target text after separation;
Generation unit, for all name entities obtained generation the first name entity sets will to be extracted.
A kind of 7. Chinese name entity recognition system according to claim 5, it is characterised in that the second identification mould Block, specifically includes:
Word segmentation processing unit, for target text to be carried out word segmentation processing;
Part-of-speech tagging unit, for based on default part of speech database, part-of-speech tagging to be carried out to word segmentation processing result;
Statistical analysis unit, for based on hidden Markov model statistical learning method, statistical to be carried out to part-of-speech tagging result After analysis, name entity generation the second name entity sets of acquisition will be analyzed.
8. a kind of Chinese name entity recognition system according to claim 5, it is characterised in that the cleaning module, tool Body includes:
Data cleansing unit, for according to default noise lexicon, naming respectively the first name entity sets and second real Body set carries out data cleansing, rejects noise vocabulary;
Computing unit, after the first name entity sets after cleaning and the second name entity sets are asked union, as life Name Entity recognition result.
CN201711137581.3A 2017-11-16 2017-11-16 Chinese named entity recognition method and system Active CN107943786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711137581.3A CN107943786B (en) 2017-11-16 2017-11-16 Chinese named entity recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711137581.3A CN107943786B (en) 2017-11-16 2017-11-16 Chinese named entity recognition method and system

Publications (2)

Publication Number Publication Date
CN107943786A true CN107943786A (en) 2018-04-20
CN107943786B CN107943786B (en) 2021-12-07

Family

ID=61931531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711137581.3A Active CN107943786B (en) 2017-11-16 2017-11-16 Chinese named entity recognition method and system

Country Status (1)

Country Link
CN (1) CN107943786B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647194A (en) * 2018-04-28 2018-10-12 北京神州泰岳软件股份有限公司 information extraction method and device
CN110008307A (en) * 2019-01-18 2019-07-12 中国科学院信息工程研究所 A kind of rule-based and statistical learning deformation entity recognition method and device
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
WO2020133291A1 (en) * 2018-12-28 2020-07-02 深圳市优必选科技有限公司 Text entity recognition method and apparatus, computer device, and storage medium
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111488467A (en) * 2020-04-30 2020-08-04 北京建筑大学 Construction method and device of geographical knowledge graph, storage medium and computer equipment

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047500A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Named entity recognition using compiler methods
CN1910573A (en) * 2003-12-31 2007-02-07 新加坡科技研究局 System for identifying and classifying denomination entity
EP1783744A1 (en) * 2005-11-03 2007-05-09 Robert Bosch Corporation Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103268348A (en) * 2013-05-28 2013-08-28 中国科学院计算技术研究所 Method for identifying user query intention
CN103942347A (en) * 2014-05-19 2014-07-23 焦点科技股份有限公司 Word separating method based on multi-dimensional comprehensive lexicon
CN103995885A (en) * 2014-05-29 2014-08-20 百度在线网络技术(北京)有限公司 Method and device for recognizing entity names
CN105302794A (en) * 2015-10-30 2016-02-03 苏州大学 Chinese homodigital event recognition method and system
CN105808523A (en) * 2016-03-08 2016-07-27 浪潮软件股份有限公司 Method and apparatus for identifying document
CN105843875A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Smart robot-oriented question and answer data processing method and apparatus
CN106055545A (en) * 2015-04-10 2016-10-26 穆西格马交易方案私人有限公司 Text mining system and tool

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1910573A (en) * 2003-12-31 2007-02-07 新加坡科技研究局 System for identifying and classifying denomination entity
US20060047500A1 (en) * 2004-08-31 2006-03-02 Microsoft Corporation Named entity recognition using compiler methods
EP1783744A1 (en) * 2005-11-03 2007-05-09 Robert Bosch Corporation Unified treatment of data-sparseness and data-overfitting in maximum entropy modeling
CN102103594A (en) * 2009-12-22 2011-06-22 北京大学 Character data recognition and processing method and device
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
CN103268348A (en) * 2013-05-28 2013-08-28 中国科学院计算技术研究所 Method for identifying user query intention
CN103942347A (en) * 2014-05-19 2014-07-23 焦点科技股份有限公司 Word separating method based on multi-dimensional comprehensive lexicon
CN103995885A (en) * 2014-05-29 2014-08-20 百度在线网络技术(北京)有限公司 Method and device for recognizing entity names
CN106055545A (en) * 2015-04-10 2016-10-26 穆西格马交易方案私人有限公司 Text mining system and tool
CN105302794A (en) * 2015-10-30 2016-02-03 苏州大学 Chinese homodigital event recognition method and system
CN105808523A (en) * 2016-03-08 2016-07-27 浪潮软件股份有限公司 Method and apparatus for identifying document
CN105843875A (en) * 2016-03-18 2016-08-10 北京光年无限科技有限公司 Smart robot-oriented question and answer data processing method and apparatus

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
TIAN-FANG YAO等: "Repairing errors for Chinese word segmentation and part-of-speech tagging", 《 PROCEEDINGS. INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS》 *
YI-CHENG PAN等: "Named entity recognition from spoken documents using global evidences and external knowledge sources with applications on Mandarin Chinese", 《IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, 2005.》 *
何炎祥等: "基于CRF和规则相结合的地理命名实体识别方法", 《计算机应用与软件》 *
刘豹等: "基于统计和规则相结合的科技术语自动抽取研究", 《计算机工程与应用》 *
张宏生: "使用HMM模型改进规则自动生成的命名实体识别***性能", 《中小企业管理与科技(下旬刊)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647194A (en) * 2018-04-28 2018-10-12 北京神州泰岳软件股份有限公司 information extraction method and device
CN108647194B (en) * 2018-04-28 2022-04-19 北京神州泰岳软件股份有限公司 Information extraction method and device
WO2020133291A1 (en) * 2018-12-28 2020-07-02 深圳市优必选科技有限公司 Text entity recognition method and apparatus, computer device, and storage medium
CN111382570A (en) * 2018-12-28 2020-07-07 深圳市优必选科技有限公司 Text entity recognition method and device, computer equipment and storage medium
CN111382570B (en) * 2018-12-28 2024-05-03 深圳市优必选科技有限公司 Text entity recognition method, device, computer equipment and storage medium
CN110008307A (en) * 2019-01-18 2019-07-12 中国科学院信息工程研究所 A kind of rule-based and statistical learning deformation entity recognition method and device
CN110750991A (en) * 2019-09-18 2020-02-04 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
WO2021051872A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Entity identification method, device, apparatus, and computer readable storage medium
CN110750991B (en) * 2019-09-18 2022-04-15 平安科技(深圳)有限公司 Entity identification method, device, equipment and computer readable storage medium
CN111488467A (en) * 2020-04-30 2020-08-04 北京建筑大学 Construction method and device of geographical knowledge graph, storage medium and computer equipment
CN111488467B (en) * 2020-04-30 2022-04-05 北京建筑大学 Construction method and device of geographical knowledge graph, storage medium and computer equipment

Also Published As

Publication number Publication date
CN107943786B (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN107943786A (en) A kind of Chinese name entity recognition method and system
CN107451126B (en) Method and system for screening similar meaning words
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
Huang et al. PHMOSpell: Phonological and morphological knowledge guided Chinese spelling check
CN100536532C (en) Method and system for automatic subtilting
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
CN104408078A (en) Construction method for key word-based Chinese-English bilingual parallel corpora
CN103886034A (en) Method and equipment for building indexes and matching inquiry input information of user
CN103020230A (en) Semantic fuzzy matching method
CN108733647B (en) Word vector generation method based on Gaussian distribution
CN112069826A (en) Vertical domain entity disambiguation method fusing topic model and convolutional neural network
CN110188359B (en) Text entity extraction method
CN104750820A (en) Filtering method and device for corpuses
CN109190099B (en) Sentence pattern extraction method and device
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN109062904A (en) Logical predicate extracting method and device
CN105389303B (en) A kind of automatic fusion method of heterologous corpus
Kessler et al. Extraction of terminology in the field of construction
Sagcan et al. Toponym recognition in social media for estimating the location of events
CN108229565A (en) A kind of image understanding method based on cognition
CN108763487B (en) Mean Shift-based word representation method fusing part-of-speech and sentence information
Sheikh et al. How diachronic text corpora affect context based retrieval of oov proper names for audio news
CN109783648B (en) Method for improving ASR language model by using ASR recognition result
Al-Sultany et al. Enriching tweets for topic modeling via linking to the wikipedia

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant