CN109145071A - A kind of automated construction method and system towards geophysics field knowledge mapping - Google Patents

A kind of automated construction method and system towards geophysics field knowledge mapping Download PDF

Info

Publication number
CN109145071A
CN109145071A CN201810883507.4A CN201810883507A CN109145071A CN 109145071 A CN109145071 A CN 109145071A CN 201810883507 A CN201810883507 A CN 201810883507A CN 109145071 A CN109145071 A CN 109145071A
Authority
CN
China
Prior art keywords
relationship
knowledge
geophysics
geophysics field
deictic words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810883507.4A
Other languages
Chinese (zh)
Other versions
CN109145071B (en
Inventor
董理君
姚宏
赵东阳
康晓军
李新川
郑坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN201810883507.4A priority Critical patent/CN109145071B/en
Publication of CN109145071A publication Critical patent/CN109145071A/en
Application granted granted Critical
Publication of CN109145071B publication Critical patent/CN109145071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A kind of automated construction method towards geophysics field knowledge mapping of the present invention, first, it is established that the conceptual knowledge base of geophysics field;Secondly, establishing the corresponding relationship instruction dictionary of every kind of relationship in geophysics field neighborhood;Then geophysics field knowledge data collection is obtained;Then NLP processing is carried out to text, word-based distance and physical distance then is carried out to candidate entity to identifying to text using the geophysics field knowledge concepts of label.The candidate relationship comprising noise data is generated then according to part-of-speech tagging and location information and indicates word set, carries out noise filtering using relationship instruction dictionary;Then after the corresponding relationship deictic words of the every kind of relationship defined in advance being converted into vector, the vector turned with candidate relationship deictic words carries out similarity calculation, finds out the corresponding relationship of the highest relationship deictic words of similarity;Finally the data of these structurings are imported into chart database Neo4j, build geophysics field knowledge mapping.

Description

A kind of automated construction method and system towards geophysics field knowledge mapping
Technical field
Present invention relates particularly to a kind of automated construction method and system towards geophysics field knowledge mapping.
Background technique
Along with the deepening continuously and innovate of geophysics field theoretical research, the continuous extension of application field, the subject Interior knowledge data increases constantly, but the distribution form of the discreteness of these knowledge datas presentation causes geophysics to be led The systematicness of domain knowledge data lacks.In addition, the knowledge store structure of linear text form hinders geophysics field knowledge In people and extraneous quick circulation, the demand that people thirst for quick obtaining knowledge is not being met.Especially with big data The arrival in epoch, the discrete distribution that people thirst for the demand and knowledge data of quick obtaining mass knowledge cause acquisition of information difficult And the linear structure of knowledge data indicates to cause to understand that the contradiction between under efficiency becomes increasingly conspicuous.
In order to solve problem above, this patent proposes a kind of method of the building knowledge mapping of automation, for the earth Physical field sets up the knowledge mapping of professional domain.Input is the non-structured text of geophysics field, and output is structure The knowledge data of change, that is, our triple knowledge datas for often saying.
There is the method for many automation building knowledge mappings at present, but major part is both for the triple of specified relationship Data pick-up, this method are not suitable for that relationship is more, in more complex professional domain.And open triple extracts work That studies in English is more, and Chinese open triple extraction correlative study is also fewer, moreover, the language of Chinese and English Say that phenomenon difference is larger, so directly the method for English can not be grafted directly on Chinese, and precision is not high.
Summary of the invention
The technical problem to be solved in the present invention is that not for above-mentioned open triple automatic decimation technology at present Foot, the theoretical knowledge design feature of present invention combination geophysics field and " conceptual knowledge base " of foundation and " relationship instruction Similarity mode algorithm between dictionary ", and " the candidate relationship instruction phrase " and each " relationship instruction phrase " of generation, mentions For a kind of method and system of automation building geophysics field knowledge mapping.
A kind of automated construction method towards geophysics field knowledge mapping, comprising:
Step 1: establishing the conceptual knowledge base of the specialized vocabulary comprising geophysics field;
Step 2: establishing the knowledge data collection of the non-structured text comprising geophysics field;
Step 3: according to the knowledge data collection established in step 2, obtain knowledge data concentrate all relationships for including and The corresponding relationship deictic words of these relationships establishes the relationship instruction dictionary of geophysics field;
Step 4: NLP processing, including participle, part-of-speech tagging and the earth are carried out to knowledge data collection according to conceptual knowledge base The Entity recognition of physical field;
Step 5: whether there is relationship between any two entity identified in identification step 4, if it exists relationship, obtain Relationship between two entities;
Step 6: extraction is distributed between any two entity and noun and verb are used as time after any two entity Relationship deictic words is selected, which can embody the relationship between two entities obtained in step 5;
Step 7: the candidate relationship deictic words that step 6 extracts being gone according to the relationship instruction dictionary established in step 3 It makes an uproar processing, obtains high-precision candidate relationship deictic words;
Step 8: the high-precision candidate relationship deictic words that relationship indicates that dictionary and step 7 obtain is converted into vector, Mutual similarity is calculated, relationship corresponding with the high-precision candidate relationship instruction highest relationship deictic words of Word similarity is chosen As the relationship between two entities, the knowledge data of structuring is finally obtained;
Step 9: the knowledge data for the structuring that step 8 is obtained imports in chart database, for building earth object automatically Manage domain knowledge map.
Further, knowledge data collection is established using the method for Scrapy crawler frame in step 2.
Further, knowledge data is obtained using the method for exhaustion in step 3 and concentrates all relationships and these relationships for including Corresponding relationship deictic words.
Further, the method between step 5 identification any two entity with the presence or absence of relationship is: when between two entities When word distance is no more than default maximum distance and number of entities less than minimum range is preset, determine between the two entities There are relationships;
Further, high-precision candidate relationship deictic words is converted into using the method for Bag-of-words in step 8 Vector;
Further, the knowledge data that structuring is finally obtained in step 8 is triple data.
A kind of automation building system towards geophysics field knowledge mapping, comprising:
Vocabulary acquisition module: for establishing the conceptual knowledge base of the specialized vocabulary comprising geophysics field;
Text collection module: for establishing the knowledge data collection of the non-structured text comprising geophysics field;
Relationship acquisition module: for obtaining knowledge data and concentrating the institute for including according to the knowledge data collection established in step 2 Some relationships and the corresponding relationship deictic words of these relationships establish the relationship instruction dictionary of geophysics field;
Entity recognition module: for carrying out NLP processing, including participle, part of speech to knowledge data collection according to conceptual knowledge base The Entity recognition of mark and geophysics field;
Relation recognition module: it whether there is relationship between any two entity identified in step 4 for identification, if depositing In relationship, the relationship between two entities is obtained;
Deictic words abstraction module: being distributed between any two entity for extraction and noun after any two entity Or verb, as candidate relationship deictic words, which can embody between two entities obtained in step 5 Relationship;
Deictic words denoises module: the candidate relationship for being extracted according to the relationship instruction dictionary established in step 3 to step 6 Deictic words carries out denoising, obtains high-precision candidate relationship deictic words;
Relationship computing module: the high-precision candidate relationship deictic words obtained for that will be related to instruction dictionary and step 7 It is converted into vector, calculates mutual similarity, chooses and is indicated with the highest relationship of high-precision candidate relationship instruction Word similarity The corresponding relationship of word finally obtains the knowledge data of structuring as the relationship between two entities;
Automatically build module: the knowledge data of the structuring for obtaining step 8 imports in chart database, for automatic Build geophysics field knowledge mapping.
The knowledge mapping for the Specialized Theory that the present invention is built can accelerate knowledge data between person to person, people and machine Flowing velocity, the geophysics knowledge data of structuring is to allow machine to understand human knowledge by indicating study and provide intelligence The knowledge services (such as intelligent answer, Intelligent dialogue etc.) of change are laid a good foundation.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples, in attached drawing:
Fig. 1 is a kind of automated construction method flow chart towards geophysics field knowledge mapping of the invention;
Fig. 2 is geophysics knowledge mapping effect picture of the invention.
Specific embodiment
For a clearer understanding of the technical characteristics, objects and effects of the present invention, now control attached drawing is described in detail A specific embodiment of the invention.
A kind of automated construction method towards geophysics field knowledge mapping, realizes the specific steps of this method such as Under:
Step 1: establishing the conceptual knowledge base of geophysics field, include the profession of geophysics field in conceptual knowledge base The conceptual knowledge base is loaded into Harbin Institute of Technology's language technology platform (LTP) by vocabulary.
Step 2: the knowledge data collection of geophysics field, knowledge data collection are established using the method for Scrapy crawler frame Non-structured text comprising geophysics field extracts knowledge data collection using the conceptual knowledge base established in step 1 more A entity (for example, the concepts such as gravitational field, gravity anomaly, its Hough interface of Ke), wherein each entity (such as " terrestrial gravitation Field ", " geophysics ") it can be identified by the conceptual knowledge base established in step 1 as monitoring data, however Relationship (such as " research branch ") between the two entities then cannot, the relationship between entity and entity is included in knowledge data (such as " one of important branch that earth gravitational field is geophysics's research ") is concentrated, the automatic method for needing step 3 comes It is excavated, rather than is relied on artificial.
Step 3: according to the knowledge data collection established in step 2, obtaining knowledge data using the method for exhaustion and concentrate the institute for including Some relationships and the corresponding relationship deictic words of these relationships establish the relationship instruction dictionary of geophysics field.For example, entity Relationship between " geophysics " and entity " earth gravitational field " is " research branch ", and relationship deictic words can be and " grind Study carefully ", " branch ".In turn, in subsequent step 5, for non-structured text, " earth gravitational field is geophysics's research In one of important branch ", after having identified two entities, and there are " research ", " branch " these relationship knowledge words, thus finally Finding the relationship between two entities is " research branch ", and may finally obtaining triple, (branch, ground are studied by geophysics Gravity field).The purpose that opening relationships indicates dictionary is to push away relationship to relationship deictic words is counter from non-structured text in step 8 Foundation is provided.
Step 4: using Harbin Institute of Technology language technology platform (LTP) for being loaded with conceptual knowledge base to knowledge data Collection carries out NLP processing, is segmented, the Entity recognition of part-of-speech tagging and geophysics field.
Step 5: whether there is relationship between any two entity identified in judgment step 4, judgment method is when two Word distance is no more than default maxDistance between a entity and number of entities is less than default maxEntityDistance's When, it is believed that there are relationships between the two entities.Because word distance is shorter between entity, entity is fewer, there are the general of relationship Rate is bigger.
Step 6: extraction be distributed between entity pair and entity to noun later and verb as candidate relationship deictic words, The candidate relationship deictic words can embody the relationship between two entities identified in step 5, wherein there is 70% or so time Relationship deictic words is selected to be located between two entities, the candidate relationship deictic words of 10%-20% is located at behind two entities, is left Seldom a part of candidate relationship deictic words is located at before first entity or is not present, and these candidate relationship deictic words mostly with The form of noun or verb occurs.
Step 7: the candidate relationship deictic words that step 6 extracts being gone according to the relationship instruction dictionary established in step 3 It makes an uproar processing, obtains high-precision candidate relationship deictic words.
Step 8: the high-precision candidate relationship that the corresponding relationship instruction dictionary of every kind of relationship and step 7 obtain is indicated Word, at vector, calculates mutual similarity, selection refers to high-precision candidate relationship using the method migration of Bag-of-words Show that the corresponding relationship of the highest relationship deictic words of Word similarity as the relationship between two entities, finally obtains knowing for structuring Know data, that is, triple data.
Step 9: the triple data that step 8 is obtained import in chart database Neo4j, for building geophysics automatically Domain knowledge map.
The triple knowledge data of structuring is obtained, and is conducted into chart database Neo4j, so that it may realize knowledge mapping Visualization, as shown in Figure 2.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (7)

1. a kind of automated construction method towards geophysics field knowledge mapping characterized by comprising
Step 1: establishing the conceptual knowledge base of the specialized vocabulary comprising geophysics field;
Step 2: establishing the knowledge data collection of the non-structured text comprising geophysics field;
Step 3: according to the knowledge data collection established in step 2, obtaining all relationships and these that knowledge data concentration includes The corresponding relationship deictic words of relationship establishes the relationship instruction dictionary of geophysics field;
Step 4: NLP processing, including participle, part-of-speech tagging and geophysics are carried out to knowledge data collection according to conceptual knowledge base The Entity recognition in field;
Step 5: whether there is relationship between any two entity identified in identification step 4, if it exists relationship, obtain two Relationship between entity;
Step 6: extraction is distributed between any two entity and noun or verb are closed as candidate after any two entity It is deictic words, which can embody the relationship between two entities obtained in step 5;
Step 7: the candidate relationship deictic words that step 6 extracts being carried out at denoising according to the relationship instruction dictionary established in step 3 Reason, obtains high-precision candidate relationship deictic words;
Step 8: the high-precision candidate relationship deictic words that relationship indicates that dictionary and step 7 obtain being converted into vector, is calculated Mutual similarity chooses relationship conduct corresponding with the high-precision candidate relationship instruction highest relationship deictic words of Word similarity Relationship between two entities, finally obtains the knowledge data of structuring;
Step 9: the knowledge data for the structuring that step 8 is obtained imports in chart database, for building geophysics neck automatically Domain knowledge map.
2. a kind of automated construction method towards geophysics field knowledge mapping according to claim 1, feature It is, knowledge data collection is established using the method for Scrapy crawler frame in step 2.
3. a kind of automated construction method towards geophysics field knowledge mapping according to claim 1, feature It is, knowledge data is obtained using the method for exhaustion in step 3, all relationships for including and the corresponding relationship of these relationships is concentrated to refer to Show word.
4. a kind of automated construction method towards geophysics field knowledge mapping according to claim 1, feature It is, the method between step 5 identification any two entity with the presence or absence of relationship is: when word distance is no more than between two entities Default maximum distance and when number of entities is less than default minimum range, determines that there are relationships between the two entities.
5. a kind of automated construction method towards geophysics field knowledge mapping according to claim 1, feature It is, high-precision candidate relationship deictic words is converted into vector using the method for Bag-of-words in step 8.
6. a kind of automated construction method towards geophysics field knowledge mapping according to claim 1, feature It is, the knowledge data that structuring is finally obtained in step 8 is triple data.
7. a kind of automation towards geophysics field knowledge mapping constructs system characterized by comprising
Vocabulary acquisition module: for establishing the conceptual knowledge base of the specialized vocabulary comprising geophysics field;
Text collection module: for establishing the knowledge data collection of the non-structured text comprising geophysics field;
Relationship acquisition module: include all are concentrated for according to the knowledge data collection established in step 2, obtaining knowledge data Relationship and the corresponding relationship deictic words of these relationships establish the relationship instruction dictionary of geophysics field;
Entity recognition module: for carrying out NLP processing, including participle, part-of-speech tagging to knowledge data collection according to conceptual knowledge base And the Entity recognition of geophysics field;
Relation recognition module: it whether there is relationship between any two entity identified in step 4 for identification, close if it exists System, obtains the relationship between two entities;
Deictic words abstraction module: being distributed between any two entity for extraction and noun or dynamic after any two entity Word can embody the relationship between two entities obtained in step 5 as candidate relationship deictic words, the candidate relationship deictic words;
Deictic words denoises module: for being indicated according to the relationship instruction dictionary established in step 3 candidate relationship that step 6 extracts Word carries out denoising, obtains high-precision candidate relationship deictic words;
Relationship computing module: the high-precision candidate relationship deictic words conversion obtained for that will be related to instruction dictionary and step 7 At vector, mutual similarity is calculated, chooses and indicates the highest relationship deictic words pair of Word similarity with high-precision candidate relationship The relationship answered finally obtains the knowledge data of structuring as the relationship between two entities;
Automatically build module: the knowledge data of the structuring for obtaining step 8 imports in chart database, for building automatically Geophysics field knowledge mapping.
CN201810883507.4A 2018-08-06 2018-08-06 Automatic construction method and system for geophysical field knowledge graph Active CN109145071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810883507.4A CN109145071B (en) 2018-08-06 2018-08-06 Automatic construction method and system for geophysical field knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810883507.4A CN109145071B (en) 2018-08-06 2018-08-06 Automatic construction method and system for geophysical field knowledge graph

Publications (2)

Publication Number Publication Date
CN109145071A true CN109145071A (en) 2019-01-04
CN109145071B CN109145071B (en) 2021-08-27

Family

ID=64791709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810883507.4A Active CN109145071B (en) 2018-08-06 2018-08-06 Automatic construction method and system for geophysical field knowledge graph

Country Status (1)

Country Link
CN (1) CN109145071B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933789A (en) * 2019-02-27 2019-06-25 中国地质大学(武汉) A kind of judicial domain Relation extraction method and system neural network based
CN110222198A (en) * 2019-06-18 2019-09-10 卓尔智联(武汉)研究院有限公司 Non-ferrous metal industry knowledge mapping construction method, electronic device and storage medium
CN110222196A (en) * 2019-06-18 2019-09-10 卓尔智联(武汉)研究院有限公司 Fishery knowledge mapping construction device, method and computer readable storage medium
CN112559765A (en) * 2020-12-11 2021-03-26 中电科大数据研究院有限公司 Multi-source heterogeneous database semantic integration method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760425A (en) * 2016-01-17 2016-07-13 曲阜师范大学 Ontology data storage method
CN105760495A (en) * 2016-02-17 2016-07-13 扬州大学 Method for carrying out exploratory search for bug problem based on knowledge map
US20160210372A1 (en) * 2013-09-29 2016-07-21 Peking University Founder Group Co., Ltd. Method and system for obtaining knowledge point implicit relationship
US20160292304A1 (en) * 2015-04-01 2016-10-06 Tata Consultancy Services Limited Knowledge representation on action graph database
CN106844658A (en) * 2017-01-23 2017-06-13 中山大学 A kind of Chinese text knowledge mapping method for auto constructing and system
EP3051434A4 (en) * 2013-09-29 2017-06-14 Peking University Founder Group Co., Ltd Method and system for measurement of knowledge point relationship strength
CN107609152A (en) * 2017-09-22 2018-01-19 百度在线网络技术(北京)有限公司 Method and apparatus for expanding query formula

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160210372A1 (en) * 2013-09-29 2016-07-21 Peking University Founder Group Co., Ltd. Method and system for obtaining knowledge point implicit relationship
EP3051435A1 (en) * 2013-09-29 2016-08-03 Peking University Founder Group Co., Ltd Method and system for obtaining a knowledge point implicit relationship
EP3051434A4 (en) * 2013-09-29 2017-06-14 Peking University Founder Group Co., Ltd Method and system for measurement of knowledge point relationship strength
US20160292304A1 (en) * 2015-04-01 2016-10-06 Tata Consultancy Services Limited Knowledge representation on action graph database
CN105760425A (en) * 2016-01-17 2016-07-13 曲阜师范大学 Ontology data storage method
CN105760495A (en) * 2016-02-17 2016-07-13 扬州大学 Method for carrying out exploratory search for bug problem based on knowledge map
CN106844658A (en) * 2017-01-23 2017-06-13 中山大学 A kind of Chinese text knowledge mapping method for auto constructing and system
CN107609152A (en) * 2017-09-22 2018-01-19 百度在线网络技术(北京)有限公司 Method and apparatus for expanding query formula

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933789A (en) * 2019-02-27 2019-06-25 中国地质大学(武汉) A kind of judicial domain Relation extraction method and system neural network based
CN109933789B (en) * 2019-02-27 2021-04-13 中国地质大学(武汉) Neural network-based judicial domain relation extraction method and system
CN110222198A (en) * 2019-06-18 2019-09-10 卓尔智联(武汉)研究院有限公司 Non-ferrous metal industry knowledge mapping construction method, electronic device and storage medium
CN110222196A (en) * 2019-06-18 2019-09-10 卓尔智联(武汉)研究院有限公司 Fishery knowledge mapping construction device, method and computer readable storage medium
CN112559765A (en) * 2020-12-11 2021-03-26 中电科大数据研究院有限公司 Multi-source heterogeneous database semantic integration method
CN112559765B (en) * 2020-12-11 2023-06-16 中电科大数据研究院有限公司 Semantic integration method for multi-source heterogeneous database

Also Published As

Publication number Publication date
CN109145071B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN109145071A (en) A kind of automated construction method and system towards geophysics field knowledge mapping
CN106980858B (en) Language text detection and positioning system and language text detection and positioning method using same
CN109065021B (en) End-to-end dialect identification method for generating countermeasure network based on conditional deep convolution
WO2022198854A1 (en) Method and apparatus for extracting multi-modal poi feature
CN106777275A (en) Entity attribute and property value extracting method based on many granularity semantic chunks
CN104077447B (en) Urban three-dimensional space vector modeling method based on paper plane data
CN107862300A (en) A kind of descending humanized recognition methods of monitoring scene based on convolutional neural networks
CN109658271A (en) A kind of intelligent customer service system and method based on the professional scene of insurance
CN113239210A (en) Water conservancy literature recommendation method and system based on automatic completion knowledge graph
CN108804608A (en) A kind of microblogging rumour position detection method based on level attention
CN102929860B (en) Chinese clause emotion polarity distinguishing method based on context
CN112465144B (en) Multi-mode demonstration intention generation method and device based on limited knowledge
CN110188359B (en) Text entity extraction method
WO2019127102A1 (en) Information processing method and apparatus, cloud processing device, and computer program product
CN110647632A (en) Image and text mapping technology based on machine learning
CN108446278A (en) A kind of semantic understanding system and method based on natural language
Srihari et al. Show&tell: A semi-automated image annotation system
CN109871449A (en) A kind of zero sample learning method end to end based on semantic description
CN107885719A (en) Vocabulary classification method for digging, device and storage medium based on artificial intelligence
CN105389303A (en) Automatic heterogenous corpus fusion method
CN110472655A (en) A kind of marker machine learning identifying system and method for border tourism
CN110069771A (en) A kind of control order information processing method based on semantic chunking
CN112395954A (en) Power transmission line specific fault recognition system based on combination of natural language model and target detection algorithm
CN116485943A (en) Image generation method, electronic device and storage medium
CN111159411A (en) Knowledge graph fused text position analysis method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant