CN108959240A - A kind of proprietary ontology automatic creation system and method - Google Patents

A kind of proprietary ontology automatic creation system and method Download PDF

Info

Publication number
CN108959240A
CN108959240A CN201710383135.4A CN201710383135A CN108959240A CN 108959240 A CN108959240 A CN 108959240A CN 201710383135 A CN201710383135 A CN 201710383135A CN 108959240 A CN108959240 A CN 108959240A
Authority
CN
China
Prior art keywords
phrase
sentence
proprietary
ontology
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710383135.4A
Other languages
Chinese (zh)
Inventor
雷晓军
周京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Poly Mdt Infotech Ltd
Original Assignee
Shanghai Poly Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Poly Mdt Infotech Ltd filed Critical Shanghai Poly Mdt Infotech Ltd
Priority to CN201710383135.4A priority Critical patent/CN108959240A/en
Publication of CN108959240A publication Critical patent/CN108959240A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of proprietary ontology automatic creation system and method, the system text databases, for storing text data;Natural language understanding module, input terminal are connected to text database, obtain the syntactic-semantic structure of sentence for being divided into several sentences to text data and analyzing the sentence;Phrase analysis module, input terminal are connected to natural language understanding module output end, obtain corresponding phrase and phrase relationship for the syntactic-semantic structure according to the sentence;It identifies suggestion module, wait establish proprietary ontology library, identifies suggestion module input terminal conjunctive phrase analysis module, the phrase and phrase relationship as the classification and attribute wait establish proprietary ontology and are put into wait establish in proprietary ontology library for identification.

Description

A kind of proprietary ontology automatic creation system and method
Technical field
The present invention relates to the field of semantic technology and semantic search in artificial intelligence, in particular to a kind of proprietary ontology is certainly It is dynamic to generate system and method.
Background technique
The combination of computer and internet produces a large amount of information, this makes us have the feeling being submerged quickly.Thing In fact and in this way, we while tackling unconventional massive information, are also constantly manufacturing new information.This information content is Increased in a manner of geometric progression.People wishing to be placed on computer being effectively treated to massive information, expect not only from Information flood it is middle freed, also can preferably utilize these massive informations.
The information processing of computer is confined at the beginning in the simple data of structure, although data volume may be very big, But structure is relatively simple.With the rapid enhancing of computer hardware ability, computer is used to tackle complicated problem, data The complexity of structure greatly increases.It has passed through internet to accumulate the difference of data, the data of different data sources start to collect in Together, so that data processing becomes more complicated.In computer science and artificial intelligence educational circles, the appearance of ontology and proprietary ontology It is to cope with such complexity.Ontology and proprietary ontology are Third Generation of Interconnected Network -- semantic net (Semantic Web) Basis, while being also the foundation stone of semantic search.Third Generation of Interconnected Network and semantic search are the bases of big data processing.
The writing of traditional proprietary ontology is manual work.Proprietary ontology writing worker is by ontology editor one Class (Class), entity (Entity), attribute (Property) are established in a proprietary field, at the same also need to use for reference it is existing its Its proprietary ontology absorbs certain ingredients of these proprietary ontologies.This process expends the time very much, and it is different to be easy front and back It causes.
Summary of the invention
The object of the present invention is to provide a kind of proprietary ontology automatic creation system and methods, pass through natural language understanding technology The document in one proprietary field is handled, a large amount of phrases in this proprietary field are obtained, from these phrases and phrase it Between relationship in, study establish proprietary ontology automatically, solve the problems, such as time consumption and inconsequent.
In order to achieve the goal above, the present invention is achieved by the following technical solutions:
A kind of proprietary ontology automatic creation system, its main feature is that, include:
Text database, for storing text data;
Natural language understanding module, input terminal are connected to text database, for being divided into several sentences to text data And it analyzes the sentence and obtains the syntactic-semantic structure of sentence;
Phrase analysis module, input terminal is connected to natural language understanding module output end, for the sentence according to the sentence Method semantic structure obtains corresponding phrase and phrase relationship;
Identify suggestion module, wait establish proprietary ontology library, the identification suggestion module input terminal conjunctive phrase analysis module is used It as the classification and attribute wait establish proprietary ontology and is put into wait establish proprietary in the identification phrase and phrase relationship In body library.
The proprietary ontology automatic creation system also includes other proprietary ontology libraries, is connected with identification suggestion module, uses In the phrase that default storage had been set up.
The natural language understanding module includes:
Sentence cutting unit becomes several sentences for carrying out the cutting of sentence to text;
Analysis of sentence unit is analyzed for carrying out syntax and semantic to several sentences of input, it is corresponding to obtain sentence Syntactic-semantic structure.
The phrase analysis module includes:
Phrase semantic analysis filter element, for extracting the genitive phrase in syntactic-semantic structure, and to carry out semantic analysis, Filtering has corresponding phrase with other proprietary ontology libraries, and leaving does not have corresponding phrase with other proprietary ontology libraries;
Relationship analysis unit between phrase obtains the relationship of phrase for analyzing the relationship that filtering leaves phrase and has.
A kind of proprietary body automatic generation method, its main feature is that, this method comprises the following steps:
S1 stores text data;
S2 is divided into sentence described in several sentences and analysis to obtain the syntactic-semantic structure of sentence text data;
S3 obtains corresponding phrase and phrase relationship according to the syntactic-semantic structure of the sentence;
S4 identifies the phrase and phrase relationship as the classification and attribute wait establish proprietary ontology and is put into wait establish specially Have in ontology library.
The step S2 includes:
S2.1 carries out the cutting of sentence to text, becomes several sentences;
S2.2 carries out syntax and semantic to several sentences of input and analyzes, obtains the corresponding syntactic-semantic structure of sentence.
The step S3 includes:
S3.1, extract syntactic-semantic structure in genitive phrase, and to carries out semantic analysis, filtering and other proprietary ontology libraries There is corresponding phrase, leaving does not have corresponding phrase with other proprietary ontology libraries;
S3.2, the relationship that analysis filtering leaves phrase and has obtain the relationship of phrase.
Compared with prior art, the present invention having the advantage that
It is handled, is obtained a large amount of in this proprietary field by document of the natural language understanding technology to a proprietary field Phrase, from the relationship between these phrases and phrase, proprietary ontology is established in study automatically, solves time consumption and front and back not Consistent problem.
Detailed description of the invention
Fig. 1 is a kind of block diagram of proprietary ontology automatic creation system of the present invention;
Fig. 2 is a kind of flow chart of proprietary body automatic generation method of the present invention.
Specific embodiment
The present invention is further elaborated by the way that a preferable specific embodiment is described in detail below in conjunction with attached drawing.
As shown in Figure 1, a kind of proprietary ontology automatic creation system, includes: text database 100, for storing textual data According to;Natural language understanding module 200, input terminal are connected to text database 100, for being divided into several to text data Sentence described in sentence and analysis obtains the syntactic-semantic structure of sentence;Phrase analysis module 300, input terminal is connected to nature 200 output end of language understanding module show that corresponding phrase and phrase close for the syntactic-semantic structure according to the sentence System;Identify suggestion module 400, wait establish proprietary ontology library 500, the 400 input terminal conjunctive phrase of identification suggestion module point Module 300 is analysed, the phrase and phrase relationship as the classification and attribute wait establish proprietary ontology and are put into for identification Wait establish in proprietary ontology library 500.
In a particular embodiment, proprietary ontology automatic creation system also includes other proprietary ontology libraries 600, with identification Suggestion module is connected, and for the phrase that default storage had been set up, during generating proprietary ontology, there are many specially There is ontology to be established, these ontologies there are many influences to the ontology to be established, it is not necessary that already existing class is resettled Once.
Above-mentioned natural language understanding module 200 includes: sentence cutting unit 201, cuts for carrying out sentence to text It cuts, becomes several sentences, sentence is the fundamental analysis object of natural language understanding system;Analysis of sentence unit 202, for pair Several sentences of input carry out syntax and semantic and are analyzed, and obtain the corresponding syntactic-semantic structure of sentence, the syntax of sentence Semantic structure indicates the meaning structure of sentence and the structure of composition sentence.Such as example sentence, " citizen, which send a telegram here, to be seeked advice from: parent is outer Provinces and cities' household register buys medical insurance in Shanghai after child's birth, and whether consulting child needs to handle residence permit." after this analysis of sentence Syntactic-semantic structure be:
[consulting] (Agent citizen) (Content
{ [being] (Theme parent) (Situation other provinces and towns Household register) }
{ [purchase] (Agent citizen) (Object medical insurance) (child's time birth (Shanghai Location) (child Recipient) afterwards) }
{ [consulting] ({ [handling] (Agent citizen) (Patient is occupied Content It firmly demonstrate,proves)
(child Recipient) }) }
}
}。
Above-mentioned phrase analysis module 300 includes: phrase semantic analysis filter element 301, for extracting syntactic-semantic knot Genitive phrase in structure, and to carry out semantic analysis, filtering with other proprietary ontology libraries has corresponding phrase, leave not with Other proprietary ontology libraries have corresponding phrase;Relationship analysis unit 302 between phrase leaves what phrase had for analyzing filtering Relationship obtains the relationship of phrase.Such as example sentence, " other provinces personnel can enjoy the medical insurance policies of this city in this city professional " " other provinces personnel are in this city professional " and " medical insurance policies of this city " be two phrases, " enjoyment " is that connection is two short The relationship of language.
As shown in Fig. 2, a kind of proprietary body automatic generation method, comprises the following steps:
S1 stores text data;
S2 is divided into several sentences to text data and analyzes the sentence, obtains the syntactic-semantic structure of sentence;
S3 obtains corresponding phrase and phrase relationship according to the syntactic-semantic structure of the sentence;
S4 identifies the phrase and phrase relationship as the classification and attribute wait establish proprietary ontology and is put into wait establish specially Have in ontology library.
Above-mentioned step S2 includes:
S2.1 carries out the cutting of sentence to text, becomes several sentences;
S2.2 carries out syntax and semantic to several sentences of input and analyzes, obtains the corresponding syntactic-semantic structure of sentence.
Above-mentioned step S3 includes:
S3.1, extract syntactic-semantic structure in genitive phrase, and to carries out semantic analysis, filtering and other proprietary ontology libraries There is corresponding phrase, leaving does not have corresponding phrase with other proprietary ontology libraries;
The relationship that S3.2 analysis filtering leaves phrase and has obtains the relationship of phrase.
In conclusion a kind of proprietary ontology automatic creation system of the present invention and method, pass through natural language understanding technology pair The document in one proprietary field is handled, and a large amount of phrases in this proprietary field are obtained, between these phrases and phrase Relationship in, study establish proprietary ontology automatically, solve the problems, such as time consumption and inconsequent.
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (7)

1. a kind of proprietary ontology automatic creation system, characterized by comprising:
Text database, for storing text data;
Natural language understanding module, input terminal are connected to text database, for being divided into several sentences to text data And it analyzes the sentence and obtains the syntactic-semantic structure of sentence;
Phrase analysis module, input terminal is connected to natural language understanding module output end, for the sentence according to the sentence Method semantic structure obtains corresponding phrase and phrase relationship;
Identify suggestion module, wait establish proprietary ontology library, the identification suggestion module input terminal conjunctive phrase analysis module is used It as the classification and attribute wait establish proprietary ontology and is put into wait establish proprietary in the identification phrase and phrase relationship In body library.
2. proprietary ontology automatic creation system as described in claim 1, which is characterized in that also include other proprietary ontologies Library is connected with identification suggestion module, the phrase having been set up for default storage.
3. proprietary ontology automatic creation system as described in claim 1, which is characterized in that the natural language understanding module Include:
Sentence cutting unit becomes several sentences for carrying out the cutting of sentence to text;
Analysis of sentence unit is analyzed for carrying out syntax and semantic to several sentences of input, it is corresponding to obtain sentence Syntactic-semantic structure.
4. proprietary ontology automatic creation system as claimed in claim 3, which is characterized in that the phrase analysis module packet Contain:
Phrase semantic analysis filter element, for extracting the genitive phrase in syntactic-semantic structure, and to carry out semantic analysis, Filtering has corresponding phrase with other proprietary ontology libraries, and leaving does not have corresponding phrase with other proprietary ontology libraries;
Relationship analysis unit between phrase obtains the relationship of phrase for analyzing the relationship that filtering leaves phrase and has.
5. a kind of proprietary body automatic generation method, which is characterized in that this method comprises the following steps:
S1 stores text data;
S2 is divided into sentence described in several sentences and analysis to obtain the syntactic-semantic structure of sentence text data;
S3 obtains corresponding phrase and phrase relationship according to the syntactic-semantic structure of the sentence;
S4 identifies the phrase and phrase relationship as the classification and attribute wait establish proprietary ontology and is put into wait establish specially Have in ontology library.
6. proprietary body automatic generation method as claimed in claim 5, which is characterized in that the step S2 includes:
S2.1 carries out the cutting of sentence to text, becomes several sentences;
S2.2 carries out syntax and semantic to several sentences of input and analyzes, obtains the corresponding syntactic-semantic structure of sentence.
7. proprietary body automatic generation method as claimed in claim 6, which is characterized in that the step S3 includes:
S3.1, extract syntactic-semantic structure in genitive phrase, and to carries out semantic analysis, filtering and other proprietary ontology libraries There is corresponding phrase, leaving does not have corresponding phrase with other proprietary ontology libraries;
S3.2, the relationship that analysis filtering leaves phrase and has obtain the relationship of phrase.
CN201710383135.4A 2017-05-26 2017-05-26 A kind of proprietary ontology automatic creation system and method Pending CN108959240A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710383135.4A CN108959240A (en) 2017-05-26 2017-05-26 A kind of proprietary ontology automatic creation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710383135.4A CN108959240A (en) 2017-05-26 2017-05-26 A kind of proprietary ontology automatic creation system and method

Publications (1)

Publication Number Publication Date
CN108959240A true CN108959240A (en) 2018-12-07

Family

ID=64494529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710383135.4A Pending CN108959240A (en) 2017-05-26 2017-05-26 A kind of proprietary ontology automatic creation system and method

Country Status (1)

Country Link
CN (1) CN108959240A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446942A (en) * 2008-12-10 2009-06-03 苏州大学 Semantic character labeling method of natural language sentence
CN101710343A (en) * 2009-12-11 2010-05-19 北京中机科海科技发展有限公司 Body automatic build system and method based on text mining
CN102439590A (en) * 2009-03-13 2012-05-02 发明机器公司 System and method for automatic semantic labeling of natural language texts
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
CN102609402A (en) * 2012-01-12 2012-07-25 北京航空航天大学 Device and method for generation and management of ontology model based on real-time strategy
CN103119585A (en) * 2010-12-17 2013-05-22 北京交通大学 Device for acquiring knowledge and method thereof
US20140163955A1 (en) * 2012-12-10 2014-06-12 General Electric Company System and Method For Extracting Ontological Information From A Body Of Text
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs
CN106155999A (en) * 2015-04-09 2016-11-23 科大讯飞股份有限公司 Semantics comprehension on natural language method and system
US20160364377A1 (en) * 2015-06-12 2016-12-15 Satyanarayana Krishnamurthy Language Processing And Knowledge Building System

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101446942A (en) * 2008-12-10 2009-06-03 苏州大学 Semantic character labeling method of natural language sentence
CN102439590A (en) * 2009-03-13 2012-05-02 发明机器公司 System and method for automatic semantic labeling of natural language texts
CN101710343A (en) * 2009-12-11 2010-05-19 北京中机科海科技发展有限公司 Body automatic build system and method based on text mining
CN103119585A (en) * 2010-12-17 2013-05-22 北京交通大学 Device for acquiring knowledge and method thereof
CN102609402A (en) * 2012-01-12 2012-07-25 北京航空航天大学 Device and method for generation and management of ontology model based on real-time strategy
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
US20140163955A1 (en) * 2012-12-10 2014-06-12 General Electric Company System and Method For Extracting Ontological Information From A Body Of Text
CN106155999A (en) * 2015-04-09 2016-11-23 科大讯飞股份有限公司 Semantics comprehension on natural language method and system
US20160364377A1 (en) * 2015-06-12 2016-12-15 Satyanarayana Krishnamurthy Language Processing And Knowledge Building System
CN105808525A (en) * 2016-03-29 2016-07-27 国家计算机网络与信息安全管理中心 Domain concept hypernym-hyponym relation extraction method based on similar concept pairs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李蓉蓉: "《面向复杂语义的专利本体构建方法研究》", 《中国博士学位论文全文数据库》 *

Similar Documents

Publication Publication Date Title
Siddharthan Text simplification using typed dependencies: A comparision of the robustness of different generation strategies
CN105528410B (en) The method that the online comment of a kind of pair of hospital is concluded and classified
CN107832382A (en) Method, apparatus, equipment and storage medium based on word generation video
JP6676109B2 (en) Utterance sentence generation apparatus, method and program
CN109408811B (en) Data processing method and server
KR20110009205A (en) Systems and methods for natural language communication with a computer
Riefer et al. Mining process models from natural language text: A state-of-the-art analysis
JP2005174330A (en) Method, system and program for analyzing opinion expressed from text document
WO2017198031A1 (en) Semantic parsing method and apparatus
US20120124467A1 (en) Method for automatically generating descriptive headings for a text element
CN104216873B (en) Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer
Miyazaki et al. Automatic conversion of sentence-end expressions for utterance characterization of dialogue systems
CN108614814A (en) A kind of abstracting method of evaluation information, device and equipment
Roy et al. " Is depression related to cannabis?": A knowledge-infused model for Entity and Relation Extraction with Limited Supervision
CN109446337A (en) A kind of knowledge mapping construction method and device
O’Gorman et al. The new Propbank: Aligning Propbank with AMR through POS unification
CN110245361B (en) Phrase pair extraction method and device, electronic equipment and readable storage medium
Keskes et al. Splitting Arabic texts into elementary discourse units
CN105045784B (en) The access device method and apparatus of English words and phrases
Saggion et al. Simplifying words in context. Experiments with two lexical resources in Spanish
CN108959240A (en) A kind of proprietary ontology automatic creation system and method
Peng et al. Research on tree kernel-based personal relation extraction
Skanda et al. Detecting stance in kannada social media code-mixed text using sentence embedding
CN109800219A (en) A kind of method and apparatus of corpus cleaning
JP4033011B2 (en) Natural language processing system, natural language processing method, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207

RJ01 Rejection of invention patent application after publication