CN109062894A - The automatic identification algorithm of Chinese natural language Entity Semantics relationship - Google Patents
The automatic identification algorithm of Chinese natural language Entity Semantics relationship Download PDFInfo
- Publication number
- CN109062894A CN109062894A CN201810796558.3A CN201810796558A CN109062894A CN 109062894 A CN109062894 A CN 109062894A CN 201810796558 A CN201810796558 A CN 201810796558A CN 109062894 A CN109062894 A CN 109062894A
- Authority
- CN
- China
- Prior art keywords
- entity
- text
- relationship
- natural language
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses the automatic identification algorithms of Chinese natural language Entity Semantics relationship." entity relationship " trained text is extracted first from the primitive nature language text of input, and it is stored in " entity relationship " trained text library, then text is read from the library, extract entity sets, pick out related entities pair, construct " entity relationship " sentence, and it is stored in training " entity relationship " statement library, each sentence in " entity relationship " statement library is manually marked, machine learning is carried out to " entity relationship " statement library after mark and is modeled, so far " entity relationship " identification model is established.The present invention also proposes a kind of automatic identification algorithm using above-mentioned Chinese natural language Entity Semantics relationship to the algorithm of given Chinese natural language text generation " entity relationship " triple.Automatically the algorithm that learns the present invention is based on machine recognizes and constructs the relationship between entity, break through and avoid Chinese knowledge mapping can only searching structure data limitation.
Description
Technical field
The invention belongs to the identifications of natural language and machine learning techniques field, and in particular to a kind of Chinese natural language is real
The automatic identification algorithm of body semantic relation.
Background technique
In recent years, with the development of internet, the situation of explosive growth is presented in network data content.Due in internet
Extensive, the heterogeneous feature polynary, institutional framework is loose held, effectively obtains information to people and knowledge proposes challenge.Know
Map (Knowledge Graph) is known with its powerful semantic processing ability and open organizational capacity, for knowing for Internet era
Knowledgeization tissue and intelligent use are laid a good foundation.
Particularly, knowledge mapping is intended to describe various entities (concept) and its relationship present in real world, in turn
A huge semantic network figure is constituted, with node presentation-entity (concept) in figure, side is then made of attribute or relationship.Present
Knowledge mapping has been used to refer to various large-scale knowledge bases.
The building of extensive knowledge mapping causes enough as the starting of knowledge mapping in academia and industry
Attention.Wherein, knowledge extractive technique is then the first step of knowledge mapping building.And knowledge extractive technique is often required that from some
The knowledge elements such as entity, relationship, attribute are extracted in disclosed, non-structured text.
In the building of Chinese knowledge mapping, non-structured text often appears as Chinese natural language text.In this way,
The understanding of Chinese natural language is just at the important tool for constructing Chinese knowledge mapping.Up to the present, in Chinese natural language
Understand that aspect has been achieved for many achievements.Such as the automatic word segmentation of Chinese natural language, part-of-speech tagging, syntactic analysis, entity
Extract etc., it can all be supported there are many software both at home and abroad.Although these technologies are from largely strengthening Chinese knowledge graph
The building of spectrum, but up to the present, the relationship between entity that how to recognize be still in Chinese natural language understanding one do not have
There is the critical issue of solution, and hinders the key technology of Chinese knowledge mapping building.
For a further understanding of this key technology, need to understand first the concept of entity in knowledge mapping.In knowledge graph
In spectrum, entity can be an existent true, such as a people, a book, a building etc..Meanwhile
Entity is also possible to an abstract concept, such as Marxism.The handling implement of Chinese natural language can be from
Entity is recognized in Chinese natural language text, these entities that can be recognized include people, time, place, tissue etc..But
The handling implement of Chinese natural language has no idea to recognize the relationship between these entities, and the discrimination of the relationship between entity is
Construct the key link of Chinese knowledge mapping.
Such as in the text of a Chinese natural language, " U.S.'s tennis is picked out using the handling implement of natural language
Open championship " (event) and " New York " (place) the two entities, still " U.S. Open Tennis " and " New York " the two entities
It is that how associated can not but learn.In fact U.S. Open Tennis is carried out in New York.For another example, by Chinese natural
Language tool, which picks out Roger Federer, is the name of a people, while picking out the name that Shanghai is a city, but natural language
Speech tool can not pick out the relationship of Roger Federer Yu this city of Shanghai.In fact, the relationship in Roger Federer and Shanghai is Roger Federer
Come Shanghai and participates in annual Masters Cup's tennis open competition.Up to the present, there are no abilities for the understanding of Chinese natural language
Tell these relationships, but these exact for being very important for building knowledge mapping.
To sum up for example because have no ability to identification entity between relationship, on the basis of such knowledge mapping
The application system come, such as artificial intelligence and automatic answering system are erected, system capability is just greatly fettered.If
The problem of user is " which city Roger Federer went to take part in game ", and the knowledge mapping established, which just has no ability to answer this, asks
Topic, although it is it can be seen that carry out Roger Federer and Shanghai, New York are related, absolutely not ability is picked out and these cities for it
The associated concrete reason in city.
Based on above-mentioned difficulty and limitation, when establishing Chinese knowledge mapping, industry avoids carrying out entity relationship
Extraction.For example " Baidu's knowledge mapping " (being created by Baidu) establishes on carrying out the data that structured data searching is harvested,
And the search without unstructured data (natural language text).Another famous " search dog knowledge mapping ", and it is same
Searching structure data, and avoid the search of unstructured data.
For the knowledge mapping based on English, the entity relation extraction of early stage mainly passes through manual construction semanteme
The method of rule and template identifies entity relationship.These methods need a large amount of manual intervention, excessively cumbersome, and not clever enough
It is living.Then, the relational model between entity is gradually instead of artificial predefined grammer and rule, but there is still a need for definition in advance is real
Relationship type between body.In recent years, the information extraction frame towards open field (Open Information Extraction,
OIE) become main research direction, itself experienced open entity relation extraction and the entity relationship based on joint reasoning
The different stages and achievement such as extraction, but up to the present, this method is proved and is not suitable for Chinese natural language text
The extraction of entity relationship.
In conclusion the extraction for solving entity relationship in Chinese natural language text is the building field of Chinese knowledge mapping
Urgent problem to be solved.
Summary of the invention
The present invention is intended to provide a kind of can effectively distinguish in Chinese natural language text the novel of relationship between entity
Algorithm, the algorithm combine existing machine learning and Chinese natural language understanding newest fruits, provide reliable identification,
Avoid Chinese knowledge mapping can only searching structure data limitation, thus for establish Chinese knowledge mapping open it is new can
Energy.
To achieve the above object, the technical solution adopted by the present invention is distinguishing automatically for Chinese natural language Entity Semantics relationship
Know algorithm, specifically includes the following steps:
S1: input primitive nature language text;
S2: extracting " entity relationship " trained text from primitive nature language text, and deposit " entity relationship " training is used
Text library;
S3: text is read from " entity relationship " training text library;
S4: entity sets are extracted from text;
S5: picking out related entities pair, wins its relational statement, constructs " entity relationship " sentence;
S6: " entity relationship " sentence constructed is stored in training " entity relationship " statement library;
S7: if each text has been read, each sentence in " entity relationship " statement library is manually marked;
Otherwise return step S3;
S8: machine learning is carried out to " entity relationship " statement library after mark and is modeled;
S9: " entity relationship " identification model is established.
Entity sets are extracted from text to improve efficiency, in above-mentioned steps 4 can be used existing Chinese natural language
Processing software extracts all Chinese entity sets.
As a standard, identification two entities of related entities clock synchronization can become related entities pair in above-mentioned steps 5
Condition is that the two has to appear in the same sentence.
Construction " entity relationship " sentence described in above-mentioned steps 5 refers specifically to remove entity and retains every other content.
Preferably, being manually labeled as that the semantic pass manually marked is added at the end of each sentence described in step 7
System.
It is carried out preferably, machine learning described in step 8 can choose using bayesian algorithm or selection SVM.
The present invention also proposes a kind of automatic identification algorithm using above-mentioned Chinese natural language Entity Semantics relationship to given
Chinese natural language text generation " entity relationship " triple algorithm, specifically includes the following steps:
S21: input primitive nature language text;
S22: calling " text type " to recognize Model Distinguish text type, generates text type triple;
S23: entity sets are extracted from text;
S24: picking out related entities, wins its relational statement, constructs " entity relationship " sentence;
S25: calling " entity relationship " to recognize Model Distinguish entity relationship, generates entity relationship triple;
S26: the triple sentence of all generations is collected.
Wherein, calling " text type " described in above-mentioned steps 22 recognizes Model Distinguish text type, generates text type
Triple, specifically includes the following steps:
S31: input primitive nature language text collection;
S32: extracting " text type " trained text, is stored in " text type " trained text library;
S33: its type is manually marked to each text;
S34: the training text set library after forming mark;
S35: it carries out machine learning and models;
S36: it completes " text type " and recognizes model.
Compared with prior art, the present invention has the advantage that
1, the present invention proposes a relationship for being recognized and being constructed based on the algorithm that machine learns automatically between entity.This hair
In bright, the process of machine learning is to establish the process of identification model with " entity relationship " sentence by analyzing all training
, since these training are closed with " entity relationship " sentence in the semanteme accurately expressed between entity after manually marking
The identification, classification model that system, in this way training generate is facing an entity clock synchronization having never seen, and can be distinguished with reliable
Knowledge and magnanimity judge the strange entity to most possible semantic relation.
2, current experimental result confirms the validity and scalability of above-mentioned algorithm, has filled up Chinese natural language reality
The blank of body Relation extraction.
3, inventive algorithm proposition break through and avoid Chinese knowledge mapping can only searching structure data limitation, from
It and is to establish Chinese knowledge mapping to open new possibility.
Detailed description of the invention
Fig. 1, which is " text type ", recognizes model generation process schematic;
Fig. 2, which is " entity relationship ", recognizes model generation process schematic;
Fig. 3 is the process schematic that " entity relationship " triple is generated to given natural language text.
Specific embodiment
Explanation in detail is done further to the present invention now in conjunction with attached drawing.
The present invention provides the algorithms that one kind can effectively distinguish relationship between entity in Chinese natural language text, should
Algorithm working principle is as described below.
Algorithm input: a large amount of Chinese natural language text.Only one theme of each text, for example building is described
Tian An-men then only describes Tian An-men;As soon as if description personage, only describes this personage etc..Such as encyclopaedia text, just
It is the text for meeting such condition.
Algorithm output: largely meet triple (the Resource Description of international semantic web standards
Framework, RDF) structural data.These triple sentences effectively describe between different entity and entity
Relationship.Ontology when constructing triple sentence selects schema.org the most general in the world (for Google, to push away institute of top grade company
Using), but can also be specified by user.
The realization design and principle logical description of algorithm are as shown in Figure 1, 2, 3.
Fig. 1 describes the generation process of " text type " identification model.The input of the process be subject it is single it is original from
Right language text collection, a part therein are extracted, the training text as " text type ".Text is used in these training
It is stored into " text type " trained text library.Then, artificial to mark " text type " trained text under the guidance of expert
The type of each training text in library.Training text set after the completion of artificial mark, after forming mark.At this point, using machine
The method of study reads the training text set after mark, carries out machine learning, as a result, " text type " identification model
It establishes.
Fig. 2 describes the generation process of " entity relationship " identification model.The input of the process is similarly the single original of subject
Beginning natural language text collection, a part therein are extracted, the training text as " entity relationship ".These training are used
Text is stored into " entity relationship " trained text library, and then, text trained for each of the library carries out following
Operation:
One, current training text is read;
Two, all Chinese entity sets are extracted from the training text;
Three, in the entity sets extracted, all related entities pair are picked out.To every a pair of of related entities pair,
Its relational statement is won, and constructs " entity relationship " sentence;
Four, " entity relationship " sentence constructed is stored in training " entity relationship " statement library;
Above-mentioned operation carries out " entity relationship " training with each of text library text, as a result, construction generates
(huge) training " entity relationship " statement library.This moment, under the guidance of expert, " entity relationship " is used in artificial mark training
The specific entity relationship of each sentence in statement library." entity relationship " statement library after the completion of artificial mark, after forming mark.
At this point, reading " entity relationship " statement library after mark with the method for machine learning, machine learning is carried out, as a result, " entity
The foundation of relationship " identification model.
So far, the numerous text types of a covering, description wherein each Chinese natural language entity relationship have been obtained
Two kernel models, i.e., " text type " identification model and " entity relationship " recognize model.Now, given for any one
Chinese natural language text, utilize the two kernel models, so that it may the subject type of the given text is extracted with machine,
All entities and prior information, i.e. semantic relation between these entities in the given text.This process is by Fig. 3
In algorithm specifically describe.
Particularly, for any one given Chinese natural language text, the first step shown in Fig. 3 is to call simultaneously
And operation " text type " recognizes model, to judge that the basic semantic type of the theme of the natural language text (is worth reaffirming
It is only one theme of each text, for example describes building Tian An-men, then only describes Tian An-men;If describing a people
Object just only describes this personage etc.).The basic semantic type obtained after model running will be with RDF triple
Form is recorded, and is temporary in machine memory.
In Fig. 3, step below is mainly used for the extraction of entity relationship in text.Firstly, machine is from the given text
In extract all Chinese entity sets and pick out all related entities pair from the entity sets that these are extracted.It is right
The related entities pair that every a pair picks out win its relational statement, and construct corresponding " entity relationship " sentence.It should " entity pass
System " sentence is used to call and run " entity relationship " identification model, thus between extracting given related entities as input
Semantic relation.Relationship between the results expression of model running related entities, which also will be with the shape of RDF triple sentence
Formula is recorded, and is temporary in machine memory.
Finally, calling of the related entities that all ought be picked out to all completion models, and extract the corresponding language of entity pair
After adopted relationship, algorithm shown in Fig. 3 will do it final step: collecting the triple sentence of all generations, and is stored in relevant number
According in library.
Extraction process described in Fig. 3 is illustrated by taking any given natural language text as an example.In reality
In use, having a large amount of natural language text as inputting, algorithm described in Fig. 3 will be used in what each was inputted one by one
On text, to generate a large amount of RDF triple sentence.The core that these RDF triple sentences form knowledge mapping is constituted
Element a, in this way, knowledge mapping that can express relationship between entity and entity is just successfully constructed.
The blank of Chinese natural language entity relation extraction has been filled up in invention of the invention, is also greatly promoted at the same time
The foundation of Chinese knowledge mapping is based especially on the foundation of the knowledge mapping of Chinese natural language text.
As it was noted above, industry is in foundation due to the algorithm for lacking Chinese natural language entity relation extraction at present
When literary knowledge mapping, basic solution is the extraction for avoiding carrying out natural language entity relationship.Such as " Baidu's knowledge
Map " is built upon in the data that searching structure data are harvested, and searching without unstructured data (natural language)
Rope." search dog knowledge mapping " and same searching structure data, and avoid the search of unstructured data.
For another example described previously, the knowledge mapping based on English uses the information extraction frame towards open field in recent years
(Open Information Extraction, OIE) extracts the semantic relation between entity, but up to the present, this
Method is not suitable for the extraction of Chinese natural language text entities relationship.
Specific embodiment description:
In the following description, it is assumed that there is a number of Chinese natural language text, for example shares 10,000 text,
These texts are referred to as " urtext collection ".For convenience of description, it is assumed that the urtext collection is related to following classification: personage, builds at event
Build object, country.More types, can with and so on, equally applicable description here.
Algorithm described in Fig. 1 may be implemented as follows:
One, it is concentrated from urtext and randomly selects 100 texts in relation to personage, 100 texts in relation to event, 100
Text in relation to building, 100 the countries concerned text;
Two, such one 400 texts are obtained, form " text type " trained text library;
Three, under the guidance of expert, the artificial type for marking each text in " text type " trained text library, tool
Body is got on very well:
Text for each piece about people marks it manually as schema:Person
Text for each piece about event marks it manually as schema:Event
For each piece about the text about building, it is marked manually as schema:CivicStructure
Text for each piece about country marks it manually as schema:Country;
Four, the training text set after the completion of above-mentioned artificial mark, after forming mark;
Five, the training text set after mark is read with the method for machine learning, carries out machine learning.Here it can choose
Different learning algorithms, such as Bayes classifier, SVM etc.;
Six, machine learning the result is that " text type " recognizes model, which is stored in lasting medium for use.
Algorithm described in Fig. 2 is main contents of the invention, and details may be implemented as follows:
When describing the algorithm realization in Fig. 2, by (the above-mentioned training text set by taking training text set as above as an example
It is equal to " entity relationship " trained text library), and will be using personage as concrete type.Other kinds of realization can be with such
It pushes away.
It is assumed that this article is about Deng Jiaxian now from " entity relationship " training with an article is extracted in text library
The article of (personage):
Deng Jiaxian (1924-1986), Jiushan Association member, academician of the Chinese Academy of Sciences, famous nuclear physicist, Chinese core
The pioneer and founder of weapon development work are made that significant contribution for Chinese nuclear weapon, atomic research and development.Nineteen twenty-four
It is born in the family of one scholarly family in Anhui Huaining County.Nineteen thirty-five is admitted to will at middle school, during reading is gone to school, deeply by patriotic
The influence of national salvation movement.After nineteen thirty-seven Beijing falls into enemy hands, he is once secret to participate in anti-Japanese party.Afterwards in the case where father Deng is with the arrangement stung,
He goes to Kunming with elder sister, and department of physics of the National Southwestern Associated University is admitted in nineteen forty-one.To nineteen fifty, he is in U.S. Pu Du within 1948
University studies abroad, and obtains study abroad in Europe degree, graduation current year, he just comes back home resolutely.Deng sows Chinese Development of Atomic Weapons and hair before this
Main Tissues person, the leader of exhibition, Deng Jiaxian is always in the First Line of Chinese weapon manufacture, leader many scholars and technology people
Member, successfully devises Chinese atom bomb and hydrogen bomb, and China's national defence self-protection arms has been led to advanced international standard.Deng Jiaxian
In one experiment, by nuclear radiation, the carcinoma of the rectum is suffered from, is died on July 29th, 1986 in Beijing, 62 years old throughout the year.
It is as follows that " entity relationship " recognizes the step of model generates process:
One, it first has to extract all Chinese entity sets from " Deng Jiaxian ",
Can be used existing Chinese natural language processing software extracted from this article about Deng Jiaxian it is all
Chinese entity sets.For example, the entity that can be extracted includes the following: Deng Jiaxian (personage), Anhui Huaining County (place), southwest
Department of physics of associated university (tissue), Purdue Univ-West Lafayette USA's (tissue), Beijing (place) etc..
Two, in the entity sets extracted, all related entities pair are picked out.To every a pair of of related entities pair,
Its relational statement is won, and constructs " entity relationship " sentence.
As example, related entities that can be discernable are to there is these: Deng sows first Anhui Huaining County, and Deng sows first southwest connection
College Physics system, Deng Jiaxian Purdue Univ-West Lafayette USA are closed, Deng sows first Beijing.The condition that two entities can become related entities pair be
They have to appear in the same sentence.It is all related entities pair above based on this standard." U.S. Pu Du is big
Learn Beijing " it is not related entities pair, because they are never appeared in a sentence simultaneously.
In the following, will be to each related entities to winning its relational statement, and construct " entity relationship " sentence.With related real
Body to " Deng sow first Anhui Huaining County " for, the entity to appearing in this following sentence,
Nineteen twenty-four is born in the family of one scholarly family in Anhui Huaining County
Remove entity, retain every other content, is exactly desired " entity relationship " sentence:
Nineteen twenty-four is born in the family of a scholarly family
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
Nineteen twenty-four | birth | in | one | scholarly family | | family
So far, from related entities to " Deng sows first Anhui Huaining County ", constructs its correspondence " entity relationship
" sentence.In the following, how description constructs the entity pair again by taking related entities are to " department of physics of the National Southwestern Associated University Deng Jiaxian " as an example
It is corresponding " entity relationship " sentence.
Entity appears in this following sentence " department of physics of the National Southwestern Associated University Deng Jiaxian ",
And department of physics of the National Southwestern Associated University is admitted in nineteen forty-one
Remove entity, retain every other content, is exactly desired " entity relationship " sentence:
And it is admitted in nineteen forty-one
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
And | in | nineteen forty-one | it is admitted to
As the last one example, come analysis entities to " Deng sow first Beijing ".The analysis of other entity relationships can be complete
According to same step, no longer it is described in detail.
Entity to " Deng sow first Beijing " appear in this following sentence,
It dies on July 29th, 1986 in Beijing
Remove entity, retain every other content, is exactly desired " entity relationship " sentence:
It is dying on July 29th, 1986
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
In | 1986 | July | 29 days | | it is unfortunate | it dies
So far, three pairs of entities pair are analyzed, following three " entity relationship " sentence has been obtained:
Nineteen twenty-four | birth | in | one | scholarly family | | family
And | in | nineteen forty-one | it is admitted to
In | 1986 | July | 29 days | | it is unfortunate | it dies
Three, " entity relationship " sentence constructed is stored in training " entity relationship " statement library
Seen with above-mentioned example, " entity relationship " these at least following sentences of statement library,
Nineteen twenty-four | birth | in | one | scholarly family | | family
And | in | nineteen forty-one | it is admitted to
In | 1986 | July | 29 days | | it is unfortunate | it dies
In fact, a text can include many entities pair, to also can produce very much " entity relationship " sentence.
Four, aforesaid operations are repeated, i.e., aforesaid operations are carried out with each of text library text to " entity relationship " training,
As a result, construction produces huge training " entity relationship " statement library
Five, under the guidance of expert, the specific reality of each sentence in artificial mark training " entity relationship " statement library
Body relationship.
By taking three sentences above as an example, available following mark:
Nineteen twenty-four | birth | in | one | scholarly family | | family [schema:birthplace]
And | in | nineteen forty-one | it is admitted to [schema:alumniOf]
In | 1986 | July | 29 days | | it is unfortunate | die [schema:deathplace]
At the end of each sentence above, the semantic relation manually marked is added into.Here example is to use
Schema.org is as ontology (ontology).In different applications, user, which can choose, is more suitable oneself ontology.
Six, after the completion of artificial mark as described above, " entity relationship " statement library after being marked.At this point, using machine
The method of device study reads " entity relationship " statement library after mark, carries out machine learning, as a result, " entity relationship " recognizes
The foundation of model.Here it is possible to select using bayesian algorithm, SVM can also be selected to carry out specific machine learning.
Now, there is above specific implementation, so that it may which carrying out Relation extraction to a unknown text, (unknown text takes
From in urtext collection, but not in the column of training text).This process has a detailed description in Fig. 3, specific with one here
Example explanatory diagram 3 in algorithm specific implementation.By taking one describes the natural language text of people as an example, other texts can be by this
Reasoning.
It is assumed that this article being not comprised in training text library is about Chen Jingrun, content is as follows:
Chen Jingrun, on May 22nd, 1933 are born in Fujian Foochow, Modern mathematics man.Nineteen fifty-three September is assigned to Beijing No.4 Middle School and appoints
Religion.2 months nineteen fifty-fives were recommended by principal Mr. Wang Yanan of Xiamen University at that time, were gone back to department of mathematics of Xiamen University of Alma Mater and were appointed assiatant.
In October nineteen fifty-seven, due to the appreciation of Hua Luogeng professor, Chen Jingrun is transferred to Chinese Academy of Sciences's Institute of Mathematics.It delivers within 1973
The detailed proof of (1+2) is acknowledged as the major contribution studied Goldbach's Conjecture.In March, 1981 is elected as Chinese section
The institute member of Academia Sinica (academician).Zeng Ren State Scientific and Technological Commission Mathematics Discipline group membership.Appoint within 1992 " mathematics journal " chief editor.1996 3
In 10 minutes at 1 point in afternoons of the moon 19, Chen Jingrun is dead in Beijing Hospital, is only 63 years old.
Present purpose seeks to understand this article by machine: first, it is a pass that machine will tell this first
In the article of people, second, machine will tell the relationship between the entity for including in article and these entities.These are all
The content of extraction will all be expressed by the RDF statement for meeting international semantic criteria, these sentences are also further construction knowledge mapping
Basic element.
The first step, using this text as input, call and run " text type " identification model, to judge the nature language
Say the basic semantic type of the theme of text
Here, if " text type " recognizes model foundation enough to accurate, it can be assigned to correct type to the text:
This is an article about schema:Person, that is, an article about people, and generates following triple simultaneously
Sentence:
Ex: Chen Jing profit rdf:type schema:Person.
Second, machine start to extract all Chinese entity sets in " Chen Jingrun ", machine can be used existing
Chinese natural language processing software extracts all Chinese entity sets from this article about Chen Jingrun.For example, can be with
The entity extracted includes the following: Chen Jingrun (personage), Fujian Foochow (place), Chinese Academy of Sciences's Institute of Mathematics (tissue),
Beijing (place) etc..
Three, in the entity sets extracted, machine can pick out all related entities pair.To every a pair of related real
Body pair wins its relational statement, and constructs " entity relationship " sentence
Here, the related entities that machine can be discernable are to there is these: Chen Jing moistens Fujian Foochow, and Chen Jing moistens Chinese science
Institute's Institute of Mathematics, Chen Jing moisten Beijing.In the following, machine will be to each related entities to winning its relational statement, and construct " real
Body relationship " sentence.
By related entities to " Chen Jing moisten Fujian Foochow " for, the entity to appearing in this following sentence,
On May 22nd, 1933 is born in Fujian Foochow
Remove entity, retain every other content, is exactly desired " entity relationship " sentence:
On May 22nd, 1933 is born in
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
1933 | May | 22 days | raw | in
Again by related entities to " Chen Jing moisten Chinese Academy of Sciences's Institute of Mathematics " for, the entity to appear in it is following this
In a sentence, Chen Jingrun is transferred to Chinese Academy of Sciences's Institute of Mathematics
Remove entity, retains every other content, exactly desired " entity relationship " sentence: be transferred to
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
Quilt | it is transferred to
Finally, by related entities to " Chen Jing moisten Beijing " for, the entity is to appearing in this following sentence, Chen Jing
Moisten dead in Beijing Hospital
Remove entity, retain every other content, is exactly desired " entity relationship " sentence:
It is dead in hospital
Sentence above eliminates entity part.It is above-mentioned " real by the processing of existing Chinese natural language processing software
Body relationship " sentence is expressed in this way:
| hospital | it is dead
Four, machine has generated following " entity relationship " sentence,
Chen Jing profit Fujian Foochow: 1933 | May | 22 days | raw | in
Chen Jing moistens Chinese Academy of Sciences's Institute of Mathematics: quilt | it is transferred to
Chen Jing moistens Beijing: | hospital | it is dead
Wherein first " entity relationship " sentence calls as input and runs " entity relationship " identification model.If mould
The accuracy of type is good enough, it should can identify " Chen Jing profit Fujian Foochow " relationship should be following birthplace pass
System, which will be recorded as follows in the form of RDF triple sentence:
Ex: Chen Jing profit schema:birthPlace ex: Fujian Foochow
Likewise, second " entity relationship " sentence as input, calls and runs " entity relationship " identification model.Model
The relationship that should can identify " Chen Jing profit Chinese Academy of Sciences's Institute of Mathematics " should be the relationship in following place of working, the pass
System will be recorded as follows in the form of RDF triple sentence:
Ex: Chen Jing profit schema:workplace ex: Chinese Academy of Sciences Institute of Mathematics
Finally, when third sentence " entity relationship " sentence as input, calls and runs " entity relationship " identification model, model
The relationship that should can identify " Chen Jing moistens Beijing " should be his dead place, which will be with RDF triple sentence
Form is recorded as follows:
Ex: Chen Jing profit schema:deathplace ex: Beijing
In this way, each entity pair that machine is recognized, the semantic relation between them is just accurately extracted
?.
Five, for this given unknown text, machine obtains following RDF triple sentence,
Ex: Chen Jing profit rdf:type schema:Person.
Ex: Chen Jing profit schema:birthPlace ex: Fujian Foochow
Ex: Chen Jing profit schema:workplace ex: Chinese Academy of Sciences Institute of Mathematics
Ex: Chen Jing profit schema:deathplace ex: Beijing
So far, the machine for just completing entity relationship extracts automatically.As previously described, because lacking Chinese natural language at present
The algorithm of entity relation extraction, for industry when establishing Chinese knowledge mapping, basic solution is to avoid carrying out certainly
The extraction of right entity language relationship.For example " Baidu's knowledge mapping " is built upon in the data that searching structure data are harvested,
And the search without unstructured data (natural language)." search dog knowledge mapping " and same searching structure data,
And avoid the search of unstructured data.
Described also as before, the knowledge mapping based on English uses the information extraction frame towards open field in recent years
(OpenInformation Extraction, OIE) extracts the semantic relation between entity, but up to the present, this
Method is not suitable for the extraction of Chinese natural language text entities relationship.The present invention has filled up Chinese natural language entity relation extraction
Blank, be also greatly promoted the foundation of Chinese knowledge mapping at the same time, be based especially on knowing for Chinese natural language text
Know the foundation of map.
It should be noted that the description of the above specific embodiment is not intended to limit the invention, it is all in essence of the invention
Any modification, equivalent replacement, improvement and so within mind and principle, should all be included in the protection scope of the present invention.
Claims (8)
1. the automatic identification algorithm of Chinese natural language Entity Semantics relationship, which comprises the following steps:
S1: input primitive nature language text;
S2: extracting " entity relationship " trained text from primitive nature language text, is stored in " entity relationship " trained text
Library;
S3: text is read from " entity relationship " training text library;
S4: entity sets are extracted from text;
S5: picking out related entities pair, wins its relational statement, constructs " entity relationship " sentence;
S6: " entity relationship " sentence constructed is stored in training " entity relationship " statement library;
S7: if each text has been read, each sentence in " entity relationship " statement library is manually marked;Otherwise
Return step S3;
S8: machine learning is carried out to " entity relationship " statement library after mark and is modeled;
S9: " entity relationship " identification model is established.
2. the automatic identification algorithm of Chinese natural language Entity Semantics relationship according to claim 1, it is characterised in that step
Extraction entity sets can be used existing Chinese natural language processing software and extract all Chinese entities from text in rapid 4
Set.
3. the automatic identification algorithm of Chinese natural language Entity Semantics relationship according to claim 1, it is characterised in that step
Recognized in rapid 5 two entities of related entities clock synchronization can become related entities pair condition both be have to appear in it is same
In sentence.
4. the automatic identification algorithm of Chinese natural language Entity Semantics relationship according to claim 1, it is characterised in that step
Construction " entity relationship " sentence described in rapid 5 refers specifically to remove entity and retains every other content.
5. the automatic identification algorithm of Chinese natural language Entity Semantics relationship according to claim 1, it is characterised in that step
It is manually labeled as that the semantic relation manually marked is added at the end of each sentence described in rapid 7.
6. the automatic identification algorithm of Chinese natural language Entity Semantics relationship according to claim 1, it is characterised in that step
Machine learning described in rapid 8 can choose using bayesian algorithm or selection SVM and carry out.
7. a kind of automatic identification algorithm using Chinese natural language Entity Semantics relationship described in claim 1 is in given
Literary natural language text generates the algorithm of " entity relationship " triple, which comprises the following steps:
S71: input primitive nature language text;
S72: calling " text type " to recognize Model Distinguish text type, generates text type triple;
S73: entity sets are extracted from text;
S74: picking out related entities, wins its relational statement, constructs " entity relationship " sentence;
S75: calling " entity relationship " to recognize Model Distinguish entity relationship, generates entity relationship triple;
S76: the triple sentence of all generations is collected.
8. the algorithm according to claim 7 to given Chinese natural language text generation " entity relationship " triple,
It is characterized in that, step 72 specifically includes the following steps:
S81: input primitive nature language text collection;
S82: extracting " text type " trained text, is stored in " text type " trained text library;
S83: its type is manually marked to each text;
S84: the training text set library after forming mark;
S85: it carries out machine learning and models;
S86: it completes " text type " and recognizes model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810796558.3A CN109062894A (en) | 2018-07-19 | 2018-07-19 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810796558.3A CN109062894A (en) | 2018-07-19 | 2018-07-19 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109062894A true CN109062894A (en) | 2018-12-21 |
Family
ID=64817342
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810796558.3A Pending CN109062894A (en) | 2018-07-19 | 2018-07-19 | The automatic identification algorithm of Chinese natural language Entity Semantics relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109062894A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032649A (en) * | 2019-04-12 | 2019-07-19 | 北京科技大学 | Relation extraction method and device between a kind of entity of TCM Document |
CN110032650A (en) * | 2019-04-18 | 2019-07-19 | 腾讯科技(深圳)有限公司 | A kind of generation method, device and the electronic equipment of training sample data |
CN110688857A (en) * | 2019-10-08 | 2020-01-14 | 北京金山数字娱乐科技有限公司 | Article generation method and device |
CN111538843A (en) * | 2020-03-18 | 2020-08-14 | 广州多益网络股份有限公司 | Knowledge graph relation matching method, model construction method and device in game field |
CN111597812A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial field multiple relation extraction method based on mask language model |
CN113486189A (en) * | 2021-06-08 | 2021-10-08 | 广州数说故事信息科技有限公司 | Open knowledge graph mining method and system |
WO2021253238A1 (en) * | 2020-06-16 | 2021-12-23 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Learning interpretable relationships between entities, relations, and concepts via bayesian structure learning on open domain facts |
CN115827884A (en) * | 2022-07-27 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Text processing method, text processing device, electronic equipment, medium and program product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133848A (en) * | 2014-07-01 | 2014-11-05 | 中央民族大学 | Tibetan language entity knowledge information extraction method |
CN104657750A (en) * | 2015-03-23 | 2015-05-27 | 苏州大学张家港工业技术研究院 | Method and device for extracting character relation |
CN108052625A (en) * | 2017-12-18 | 2018-05-18 | 清华大学 | A kind of entity sophisticated category method |
-
2018
- 2018-07-19 CN CN201810796558.3A patent/CN109062894A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133848A (en) * | 2014-07-01 | 2014-11-05 | 中央民族大学 | Tibetan language entity knowledge information extraction method |
CN104657750A (en) * | 2015-03-23 | 2015-05-27 | 苏州大学张家港工业技术研究院 | Method and device for extracting character relation |
CN108052625A (en) * | 2017-12-18 | 2018-05-18 | 清华大学 | A kind of entity sophisticated category method |
Non-Patent Citations (1)
Title |
---|
姜丽: "面向药品说明书的医疗实体关系抽取方法研究", 《万方数据库》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032649A (en) * | 2019-04-12 | 2019-07-19 | 北京科技大学 | Relation extraction method and device between a kind of entity of TCM Document |
CN110032650A (en) * | 2019-04-18 | 2019-07-19 | 腾讯科技(深圳)有限公司 | A kind of generation method, device and the electronic equipment of training sample data |
CN110688857A (en) * | 2019-10-08 | 2020-01-14 | 北京金山数字娱乐科技有限公司 | Article generation method and device |
CN111538843A (en) * | 2020-03-18 | 2020-08-14 | 广州多益网络股份有限公司 | Knowledge graph relation matching method, model construction method and device in game field |
CN111538843B (en) * | 2020-03-18 | 2023-06-16 | 广州多益网络股份有限公司 | Knowledge-graph relationship matching method and model building method and device in game field |
CN111597812A (en) * | 2020-05-09 | 2020-08-28 | 北京合众鼎成科技有限公司 | Financial field multiple relation extraction method based on mask language model |
WO2021253238A1 (en) * | 2020-06-16 | 2021-12-23 | Baidu.Com Times Technology (Beijing) Co., Ltd. | Learning interpretable relationships between entities, relations, and concepts via bayesian structure learning on open domain facts |
CN113486189A (en) * | 2021-06-08 | 2021-10-08 | 广州数说故事信息科技有限公司 | Open knowledge graph mining method and system |
CN115827884A (en) * | 2022-07-27 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Text processing method, text processing device, electronic equipment, medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109062894A (en) | The automatic identification algorithm of Chinese natural language Entity Semantics relationship | |
CN107239446B (en) | A kind of intelligence relationship extracting method based on neural network Yu attention mechanism | |
CN105589844B (en) | It is a kind of to be used to take turns the method for lacking semantic supplement in question answering system more | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN106815293A (en) | System and method for constructing knowledge graph for information analysis | |
CN109165385A (en) | Multi-triple extraction method based on entity relationship joint extraction model | |
CN107168945A (en) | A kind of bidirectional circulating neutral net fine granularity opinion mining method for merging multiple features | |
CN107766371A (en) | A kind of text message sorting technique and its device | |
CN108446286A (en) | A kind of generation method, device and the server of the answer of natural language question sentence | |
CN101799849A (en) | Method for realizing non-barrier automatic psychological consult by adopting computer | |
CN103176963B (en) | Chinese sentence meaning structure model automatic labeling method based on CRF ++ | |
CN108846104A (en) | A kind of question and answer analysis and processing method and system based on padagogical knowledge map | |
CN106202054A (en) | A kind of name entity recognition method learnt based on the degree of depth towards medical field | |
CN105631018B (en) | Article Feature Extraction Method based on topic model | |
CN107832295B (en) | Title selection method and system of reading robot | |
CN110210016A (en) | Bilinearity neural network Deceptive news detection method and system based on style guidance | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN105740227A (en) | Genetic simulated annealing method for solving new words in Chinese segmentation | |
CN105760514A (en) | Method for automatically obtaining short text of knowledge domain from community question-and-answer website | |
CN106547733A (en) | A kind of name entity recognition method towards particular text | |
CN109918647A (en) | A kind of security fields name entity recognition method and neural network model | |
CN108520038B (en) | Biomedical literature retrieval method based on sequencing learning algorithm | |
CN106649266A (en) | Logical inference method for ontology knowledge | |
CN108090223A (en) | A kind of opening scholar portrait method based on internet information | |
CN109359701A (en) | A kind of three-dimensional modeling data analytic method of extracted with high accuracy and Fast Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181221 |
|
RJ01 | Rejection of invention patent application after publication |