CN106951530A - A kind of event type abstracting method and device - Google Patents
A kind of event type abstracting method and device Download PDFInfo
- Publication number
- CN106951530A CN106951530A CN201710169761.3A CN201710169761A CN106951530A CN 106951530 A CN106951530 A CN 106951530A CN 201710169761 A CN201710169761 A CN 201710169761A CN 106951530 A CN106951530 A CN 106951530A
- Authority
- CN
- China
- Prior art keywords
- word
- language material
- candidate
- trigger word
- benchmark
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application provides a kind of event type abstracting method and device, this method includes:Candidate's language material word is extracted from preset corpus;Based on the corpus, the benchmark trigger word in preset triggering set of words and the relevance of candidate's language material word are determined, wherein, the benchmark trigger word is determined by automated content extraction technique;For any one benchmark trigger word, candidate's language material word that sexual satisfaction preset requirement is associated with the benchmark trigger word is defined as target trigger word, at least one described target trigger word is obtained;Determine the feature of the target trigger word in the triggering set of words;Based on the feature of target trigger word, target trigger word is clustered, the cluster set for belonging to different event classification clustered out is obtained.This method and device are that the accuracy for improving event extraction and the application for increasing event extraction provide possibility.
Description
Technical field
The application is related to technical field of information processing, more particularly to a kind of event type abstracting method and dress
Put.
Background technology
Event extraction is with a wide range of applications and huge reality meaning as the important component of information extraction
Justice.The purpose of event extraction is that time letter interested is accurately and effectively extracted from a large amount of unordered mixed and disorderly, structureless information
Breath, according to the task definition of event extraction, event refers to one that specific people, thing interact in special time and locality
Objective fact is planted, event is made up of the element of trigger word and description event structure.Event extraction requirement is from containing event information
Destructuring source text in, automatic identification and extract out the structuring containing event type, Event element and event Role Information
Information.
At present, existing Event Distillation directly extracts (Automatic Content using automated content
Extraction, ACE) annotation results so that the research of event extraction is also limited only to the event type defined in ACE,
That is, it is limited only to defined domain event extraction.It is more rich and varied yet with event type in open field, the difference of event type
It is different relatively small, so as to cause difference to differentiate that difficulty is big, if still directly using ACE, it can not accurately and effectively carry out event
Extract.
The content of the invention
In view of this, this application provides a kind of event type abstracting method and device, the standard for improving event extraction is thought
True property and the application of increase event extraction are provided may.
To achieve the above object, the application provides following technical scheme:
A kind of event type abstracting method, including:
Multiple candidate's language material words are extracted from preset corpus;
Based on the corpus, the benchmark trigger word and candidate's language material word in preset triggering set of words are determined
Relevance, wherein, the benchmark trigger word is determined by automated content extraction technique;
For any one benchmark trigger word, candidate's language material of sexual satisfaction preset requirement will be associated with the benchmark trigger word
Word is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word;
The feature of each target trigger word is determined respectively;
Based on the feature of the target trigger word, all target trigger words are clustered, obtain what is clustered out
Multiple cluster set for belonging to different event classification, wherein, each cluster set correspond to a kind of event category, and each cluster
Set includes at least one target trigger word.
It is preferred that, it is described to extract candidate's language material word from preset corpus, including:
Determine the language material word undetermined included in multiple language material texts in the preset corpus;
The default useless word included in the language material word undetermined is filtered, candidate's language material word is obtained, wherein,
The default useless word includes stop words and function word.
It is preferred that, it is described to be based on the corpus, determine that the benchmark trigger word in preset triggering set of words is waited with described
The relevance of language material word is selected, including:
For each candidate's language material word, candidate's language material word and each base in triggering set of words are calculated successively
Initial association of the quasi- trigger word in the corpus in every language material text;
For benchmark trigger word described in any pair and candidate's language material word, by the benchmark trigger word and candidate's language material word
Initial association of the language in every language material text is summed up, and obtains the benchmark trigger word with candidate's language material word described
Relevance in corpus.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material
Initial association in storehouse in every language material text, including:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text it is same
First number occurred in one sentence, benchmark triggering word and candidate's language material word are defined as with the ratio of minimum occurrence number
Initial association of the language in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in institute's predicate
Minimum value in the number of times that occurs in material text, and the number of times that occurs in the language material text of candidate's language material word.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material
Initial association in storehouse in every language material text, including:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language
Expect word, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst
Quantity in target sentences, is defined as the benchmark trigger word with the ratio of minimum occurrence number and candidate's language material word exists
On the conjunction j in the language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn
Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences
The total quantity for the preset conjunction having.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material
Initial association in storehouse in every language material text, including:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate
There is the third time number in the second target sentences simultaneously in language material word, and being defined as the benchmark with the ratio of minimum occurrence number touches
Word and candidate's language material word are sent out in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein,
Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language
Expect the sentence that word is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material
Text diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn
Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type
Maximum quantity.
It is preferred that, the feature for determining each target trigger word, including it is following any one or a few:
Obtain the attributive character of the target trigger word;
The conjunctive word of the target trigger word is obtained, the conjunctive word includes the synonym of the target trigger word, antisense
Word and related term;
By being scanned in the language material text that the corpus is included, the mesh for including the target trigger word is searched out
Poster material text, and the feature that predeterminated position relation is met with the target trigger word is oriented in the target language material text
Word, using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, identify that the target is touched
Send out word and the framework type of the target trigger word.
It is preferred that, it is described obtain clustering out it is multiple belong to the cluster set of different event classification after, also include:
For any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, the cluster set is determined
In be suitable as it is described cluster set label at least one target trigger word;
Using at least one described target trigger word as the label of the cluster set, the cluster set is marked
Note.
On the other hand, present invention also provides a kind of event type draw-out device, including:
Word screening unit, for extracting multiple candidate's language material words from preset corpus;
Associate determining unit, for based on the corpus, determine benchmark trigger word in preset triggering set of words and
The relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element, for for any one benchmark trigger word, sexual satisfaction will to be associated with the benchmark trigger word pre-
If it is required that candidate's language material word be defined as target trigger word, obtain each benchmark trigger word corresponding described at least one
Target trigger word;
Characteristics determining unit, the feature for determining each target trigger word respectively;
Type determining units, for the feature based on the target trigger word, are clustered to all trigger words,
The multiple cluster set for belonging to different event classification clustered out are obtained, wherein, each cluster set correspond to a kind of event class
Not, and each clustering set includes at least one target trigger word.
It is preferred that, the association determining unit, including:
First association computing unit, for for each candidate's language material word, candidate's language material word to be calculated successively
Language and each initial association of the benchmark trigger word in the corpus in every language material text in triggering set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by described in
Initial association of the benchmark trigger word with candidate's language material word in every language material text is summed up, and obtains the benchmark triggering
Word and relevance of candidate's language material word in the corpus.
Understood via above-mentioned technical scheme, the target trigger word in set of words is triggered in this application with existing automatic
On the basis of the trigger word that content extraction technology is obtained, the trigger word that automatic extraction technique is obtained is extended and obtained, so that
The scope that the trigger word that must be obtained is covered is more extensive, is conducive to determining the core word of firing event in Event Distillation, therefore,
Trigger word after based on extension is clustered, and may finally obtain a greater variety of event types, is conducive to raising event to carry
The degree of accuracy taken, increases the application of Event Distillation.
Brief description of the drawings
, below will be to be used needed for embodiment description in order to illustrate more clearly of the technical scheme of the embodiment of the present application
Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are only embodiments herein, for this area
For those of ordinary skill, on the premise of not paying creative work, it can also be obtained according to the accompanying drawing of offer other attached
Figure.
Fig. 1 shows a kind of schematic flow sheet of event type abstracting method one embodiment of the application;
Fig. 2 shows a kind of schematic flow sheet of another embodiment of event type abstracting method of the application;
Fig. 3 shows a kind of composition structural representation of event type draw-out device one embodiment of the application.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on
Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of the application protection.
For the ease of understanding event extraction process, involved noun is simply situated between in being extracted below to some events
Continue:
Entity (Entity):Belong to the object or object set of some semantic classes.
Entity description (Entity mention):Phrase (being noun phrase under normal circumstances) comprising entity.
Event trigger word (Event trigger):(trigger word is mainly verb to the core word that firing event occurs in ACE
Or noun).
Event element (Event arguments):The participant of event, is the core of composition event.
Event role/element role (Argument roles):Event participant and the relation of event.
Event description (Event mention):Phrase or sentence comprising event trigger word and event participant.
A kind of event type abstracting method that the application is described below is introduced.
Referring to Fig. 1, it illustrates a kind of schematic flow sheet of event type abstracting method one embodiment of the application, this reality
Applying the method for example can include:
101, candidate's language material word is extracted from preset corpus.
Wherein, the corpus is exactly pending language resource, e.g., the corpus can be based on TDT (topic detection with
Tracking, Topic Detection and Tracking) technical limit spacing arrives language material, and the corpus includes many language material texts, this
A little language material texts can be the news report towards multi-language text and speech form, and TDT mainly reports that border is automatic to event
Recognize, lock and collect sudden news topic, the development of tracking topic and the inter-related task such as across language detection and tracking.It is based on
A large amount of events are described in the newsletter archive of TDT technologies.
The candidate's language material word extracted from preset corpus may be considered candidate's trigger word, so as to subsequently from this
Chosen in a little candidate's trigger words can as the trigger word expanded word.Specifically, can be by preset corpus
In language material text carry out word extraction, to obtain candidate's trigger word.
102, based on the corpus, determine the benchmark trigger word and candidate's language material word in preset triggering set of words
Relevance.
Wherein, the benchmark trigger word is to extract what ACE technologies were determined by automated content.The benchmark trigger word is appreciated that
For the seed trigger word for extending trigger word, on the basis of the benchmark trigger word, to be carried out with reference to candidate's language material word
The extension of trigger word.
103, for any one benchmark trigger word, candidate's language of sexual satisfaction preset requirement will be associated with the benchmark trigger word
Material word is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word.
With existing difference, the trigger word that event extraction is used in the application is not directly to be extracted using the automated content
The trigger word that technology is obtained, but on the basis of the trigger word obtained by automated content extraction technique, trigger word is extended.
104, the feature of each target trigger word is determined respectively.
Wherein, the feature of target trigger word is used for the self attributes for characterizing the target trigger word, and the target trigger word is in language
Expect in text with the relevance of context etc., the feature of the target trigger word is to determine the foundation of the event category.
105, based on the feature of target trigger word, all target trigger words are clustered, obtain clustering out is multiple
Belong to the cluster set of different event classification.
Wherein, each cluster set correspond to a kind of event category, and each cluster set includes at least one target and touched
Send out word.
The triggering that the target trigger word in triggering set of words is obtained with existing automated content extraction technique in this application
On the basis of word, the trigger word that automatic extraction technique is obtained is extended and obtained, so that what obtained trigger word was covered
Scope is more extensive, is conducive to determining the core word of firing event in Event Distillation, therefore, the trigger word after based on extension is entered
Row cluster, may finally obtain a greater variety of event types, be conducive to improving the degree of accuracy of Event Distillation, increase Event Distillation
Application.
Referring to Fig. 2, it illustrates a kind of schematic flow sheet of another embodiment of event type abstracting method of the application, sheet
The method of embodiment can include:
201, from multiple language material texts in preset corpus, determine language material word undetermined.
The step 201 is wrapped equivalent to word extraction is carried out in middle language material text with determining in the plurality of language material text
The language material word contained, in order to be made a distinction with the candidate's language material word for being subsequently used for extending trigger word, will be carried in the language material text
The initial language material word taken out is referred to as language material word undetermined.
202, the default useless word included in the language material word undetermined is filtered, obtains including multiple candidate's language material words
Candidate's language material set of words.
Wherein, this is preset useless word and can set as needed, e.g., and this, which presets useless word, can include some deactivations
The word of word and function word.Wherein, function word can not serve as the word of sentence element, be the word outside notional word.And notional word can be served as individually
Sentence element, that is, the word for having lexical meaning and grammatical meaning.
Certainly, in addition to filtering useless word set in advance from language material word undetermined, attribute material can also be treated
Remaining notional word carries out the pretreatment such as lemmatization in dictionary, and regard remaining language material word undetermined after pretreatment as candidate
Language material word, so as to obtain candidate's language material set of words.
203, obtain preset triggering set of words.
Wherein, the triggering set of words includes multiple benchmark trigger words determined by automated content extraction technique.
Benchmark trigger word can be understood as the trigger word determined according to prior art, and the application needs to touch existing
Send out the extension that trigger word is carried out in base standard.
204, for each candidate's language material word, candidate's language material word and each benchmark in triggering set of words are calculated successively
Initial association of the trigger word in every language material text.
Correlation and degree of correlation that relevance can reflect between two words, relevance can exist including two words
Correlation in same piece language material text, in that case, correlation can only reflect the two words in this text
Interior degree of correlation, for the ease of distinguishing, correlation of the word candidate's language material word in a language material text is triggered by benchmark
Referred to as initial association.It is understood that because when language material text has many, benchmark trigger word and candidate's language material word can have
There is the initial association for multiple different language material texts.
Relevance is additionally may included in the comprehensive correlation of all documents in corpus, and the comprehensive correlation can reflect
Degree of relevancy of two words in all text documents, in the embodiment of the present application, by benchmark trigger word and candidate's language material word
The comprehensive correlation of all documents is referred to as the relevance in corpus in corpus.
Wherein, calculating candidate's language material word can with the mode of initial association of the benchmark trigger word in a language material text
It is a variety of to have.Such as:
In a kind of implementation for calculating initial association:
First number that benchmark trigger word and candidate's language material word can be occurred in language material text in same sentence, with
It is initial in the language material text that the ratio of minimum occurrence number is defined as the same candidate's language material word of benchmark triggering word
Relevance.Wherein, the number of times that trigger word occurs in the language material text on the basis of minimum occurrence number, and candidate's language material word
Minimum value in the number of times occurred in the language material text.That is, initial association Rdi(seed can c) be expressed as:
Wherein, the frequency of trigger word seed and candidate's language material word c co-occurrences in a sentence on the basis of molecule, denominator is
Benchmark trigger word seed and candidate's language material word c are respectively in language material text diMinimum value in the frequency of middle appearance.
In this kind of implementation, the word occurred in same sentence is considered correlation word, two words occur
The ratio of the total degree occurred in the frequency of same sentence with the two words is also high, illustrates that the correlation of the two words is higher.
In the implementation of another calculating initial association:
It can first determine to determine from the language material text while there is benchmark trigger word and candidate's language material word, and benchmark
Trigger word expects all first object sentences that word is connected by preset conjunction with candidate;Touched for connecting the benchmark
Send out each conjunction j of word and candidate's language material wordi, the benchmark trigger word and candidate's language material word are calculated respectively in language material text
On conjunction j in thisiCorrelation.In the language material text, benchmark trigger word and candidate's language material word are on the connection
Word jiCorrelation be:Benchmark trigger word and candidate's language material word are appeared in the first object sentence in the language material text
Second number, the ratio with minimum occurrence number.Wherein, trigger word occurs in the language material text on the basis of minimum occurrence number
Number of times, and the minimum value in the number of times that occurs in the language material text of candidate's language material word.
That is, in language material text di, the benchmark trigger word is to candidate's language material word on the related of preset conjunction
Property con (conji) can be expressed as follows:
Wherein, in formula two, the molecule of the fraction is in language material text diIn, with the benchmark trigger word seed and time
Select language material word c, and the first object sentence that the benchmark trigger word seed is connected with candidate language material word c by conjunction i
Quantity, wherein, the quantity is it is also assumed that be that benchmark trigger word is connected by conjunction with candidate's language material word and gone out jointly
Number of times in a present first object sentence.Trigger word seed is in language material text d on the basis of the denominator of the fractioniMiddle appearance
Number of times and candidate language material word c are in language material text diMinimum value in the number of times of middle appearance.
Accordingly, the benchmark trigger word seed and candidate language material word c is in language material text diIn initial association be
Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences
The total quantity for the preset conjunction having.
Wherein, preset conjunction can be set as needed, optionally, can use PDTB (Binzhou chapter relational tree,
Penn Discourse Treebank) define 182 conjunctions.
In this kind of implementation, connecting the species of the conjunction of two words has certain shadow to the degree of correlation of two words
Ring:If the conjunction species of two words of connection is more, then it is assumed that the correlation of two words is more chaotic, so as to reduce by two words
Correlation;If the conjunction species of two words of connection is less, then it is assumed that the correlation of two words is relatively stable, so that two words
Correlation is larger.
In the implementation of another calculating initial association:
Need to be directed to preset every kind of relationship type ji, benchmark trigger word and candidate's language material word are determined while occurring second
Third time number in target sentences, and the benchmark trigger word and candidate's language material word are calculated in the language material text on this kind pass
Set type jiCorrelation.Wherein, second target sentences are that with the corresponding specified conjunction of the relationship type, and benchmark is touched
Hair word specifies the sentence that conjunction is connected with candidate's language material word by this.The benchmark trigger word and candidate's language material word are in the language
Expect in text on this kind of relationship type jiCorrelation be the corresponding third time number of this kind of relationship type, with minimum occurrence number
Ratio, wherein, the number of times that trigger word occurs in the language material text on the basis of minimum occurrence number, and candidate's language material word
Minimum value in the number of times occurred in the language material text.
That is, in language material text diIn, the benchmark trigger word seed and candidate language material word c are on preset relationship type
jiCorrelation Rel (relji) can be expressed as follows:
Wherein, molecule is language material text diIn, with benchmark trigger word seed and candidate language material word c and pass through the relation
The corresponding specified conjunction of type connects the third time number of the benchmark trigger word and the second target sentences of candidate's language material word, when
So, it is also assumed that being that benchmark trigger word is connected by the corresponding conjunction of the relationship type with candidate's language material word and gone out jointly
Number of times in a present target sentences.Trigger word seed is in language material text d on the basis of the denominator of the fractioniThe number of times of middle appearance
Minimum value in the number of times occurred with candidate language material word c in language material text di.
The benchmark trigger word seed and candidate language material word c are being obtained on preset relationship type jiCorrelation Rel
(relji) after, the benchmark trigger word and candidate language material word c can be counted in language material text diIn initial association Rdi
(seed c) is:
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type
Maximum quantity.Such as, preset relationship type has four kinds, then the value of the k is 4.
Due to 182 kinds of conjunctions defined in PDTB, the example rare numbers of each conjunction are easily caused, in this kind of reality
In existing mode relevance is calculated using the relationship type based on chapter.Optionally, in the embodiment of the present application, the relation object of chapter
Type can include four preset major class relationship types:Relativity (Comparison), causality (Contingency), expansion
Exhibition relation (Expansion) and sequential relationship (Temporal).
Part conjunction points to specific relationship type in PDTB, for example, with conjunction " because (because of) " connection
" preposition argument " and " rearmounted argument " in sentence can point to " Causal (cause and effect) " relation;Part conjunction can be pointed to many
Relationship type is planted, for example, conjunction " and (and) ".Therefore, the present invention only chooses specific conjunction in PDTB.Specific connection
Word refer to the conjunction pointed in chapter a certain relationship type probability it is higher.The present invention is directed to the distribution of conjunction in PDTB,
And counted the probability that each conjunction points to a certain relationship type.For example conjunction " alternatively (selectively) " is pointed to
The probability of " Expansion (extension) " relationship type is 100%.It is big that a certain relationship type probability of sensing is only chosen in this application
In the 80% specified conjunction that is included as the relationship type of conjunction.
Accordingly, in this kind of implementation, there is " confining spectrum " of correlation as the two setting two words
Word is in same sentence, while requiring that the two words are connected by specifying conjunction.Meanwhile, four kinds can be directed to respectively
Relationship type calculates word seed and c correlation respectively.
Certainly, other modes can also carry out calculating benchmark trigger word with candidate's language material word in the language in actual applications
Expect the initial association in text, be not any limitation as herein.
205, for any pair of benchmark trigger word and candidate's language material word, according to the benchmark triggering in the triggering set of words
Word and initial association of candidate's language material word in every language material text, count the benchmark trigger word and exist with candidate's language material word
Relevance in the corpus.
For any pair of benchmark trigger word and candidate's language material word, by the benchmark trigger word and candidate's language material word in language
The initial association of every language material text in material storehouse is summed up, and just can obtain the benchmark trigger word and candidate's language material word
Relevance in the corpus, i.e. the benchmark trigger word and the final relevance of candidate's language material word.
That is, (seed c) is the relevance R of benchmark trigger word seed and candidate's language material word c in corpus:
Wherein, n is represented in corpus, with the sentence that there is benchmark trigger word seed and candidate's language material word c simultaneously
The total quantity of language material text, i is the natural number from 1 to n, and di represents that there is benchmark trigger word seed and candidate's language material word c is total to
With the language material text appeared in a sentence.
206, for any one benchmark trigger word, candidate's language of sexual satisfaction preset requirement will be associated with the benchmark trigger word
Material word is defined as target trigger word, obtains at least one the target trigger word expanded by the benchmark trigger word.
One or more target trigger words can be expanded for each benchmark trigger word.
The preset requirement that target trigger word is met with the benchmark trigger word can be set as needed.Such as.This is preset will
Predetermined threshold value can be more than for the value of relevance by asking.Optionally, can be directed to each benchmark trigger word, according to the benchmark
The order of the relevance of trigger word from high in the end, is ranked up to each candidate's language material word, by the forward specified quantity that sorts
Individual candidate's language material word determines target trigger word.
207, the feature of each target trigger word is obtained respectively.
Wherein, the feature of the target trigger word is used for the fundamental characteristics for describing the target trigger word.
Such as, the feature of target trigger word can include it is following any one or a few:
The attributive character of target trigger word;
The conjunctive word of target trigger word, e.g., synonym, antonym and the related term of target trigger word;
The contextual feature of target trigger word;
Framework type belonging to target trigger word.
Wherein, the feature that attributive character has in itself for the target trigger word, specifically can be by recognizing that the target is touched
The part of speech of word is sent out, names entity to obtain.
Wherein, the conjunctive word of the target trigger word can be specified by preset interface interchange dictionary is obtained.
Wherein, the contextual feature of the target trigger word can be searched by being scanned in the language material text of corpus
Rope goes out in the target language material text comprising the target trigger word, and in the target language material text, orients and triggered with the target
Word meets the Feature Words of predeterminated position relation, and using obtained Feature Words as the target trigger word contextual feature.Such as, on
Following traits can include following several:
First three word of target trigger word and rear three words (not including stop words);
According to N-Gram models, search and sequence of the distance no more than three words of target trigger word in language material text
In, extract two or three words;
Extracted from language material text, a word close to the target trigger word and before the target trigger word
And a word after the target trigger word.
Wherein, the framework type belonging to target trigger word is to identify language material text based on frame network FrameNet instruments
In each sentence target trigger word and the framework (Frame) of the target trigger word, to have the feelings of framework in target trigger word
Under condition, the framework type of the target trigger word is obtained.The frame of the previous word of the target trigger word can also further be extracted
The framework type of latter word of frame type and target trigger word.The frame network is one based on corpus, with frame
The theory of frame semantics, based on framework and the semantic network that makes its lexical meaning be connected to each other wherein.
208, based on the feature of target trigger word, obtained all target trigger words are clustered, obtain what is clustered out
Belong to the cluster set of multiple different event classifications.
Each cluster set includes multiple target trigger words
Wherein, the different event category of different cluster set correspondences, the cluster set of an event category includes category
In multiple target trigger words of the event category.
In the embodiment of the present application, carrying out cluster to target trigger word can be carried out according to default clustering algorithm, e.g., can
So that according to close to propagation clustering algorithm, i.e. Affinity Propagation Cluster algorithms are clustered, wherein, the neighbour
Propagation clustering algorithm can also be referred to as AP clustering algorithms.The clustering algorithm is using all data points all as in potentially clustering
The heart, and without specifying the number of cluster.During cluster, by the feature for the target trigger word being previously obtained constituted to
Amount is as input data, and the event trigger word characteristic vector built, just can be by the triggering of same type as input data
Word be classified as it is mutually similar in a class, cluster result in the type of target trigger word feature is identical in other words.Wherein, a class can
To be considered a triggering set of words.
Because the feature of target trigger word determined in the application and the feature determined in the prior art are significantly different,
Therefore, all target trigger words are clustered by clustering algorithm, obtained event category does not limit to and different from ACE languages
Event type defined in material.
209, gather for each cluster, at least one target trigger word is selected from the cluster set as the cluster
The label of set, to utilize the obtained label for labelling cluster set.
Optionally, according to TF-IDF algorithms, to determine that cluster is suitable as the target of the label of the cluster set in gathering
Trigger word, specifically, the every kind of event type that can be generated for clustering algorithm, chooses from the cluster set of the event category
The maximum preceding specified quantity target trigger word of TF-IDF values as the event type classification label.
Wherein, TF-IDF is a kind of statistical method, to assess a words for a file set or a corpus
In a copy of it file significance level.TF-IDF main thought is:If some word or phrase go out in an article
Existing frequency TF is high, and seldom occurs in other articles, then it is assumed that this word or phrase have good class discrimination energy
Power, is adapted to classification.Wherein, TF is word frequency (Term Frequency), represents what some word or phrase occurred in a document
Frequency;IDF is reverse document-frequency (Inverse Document Frequency), by total number of documents divided by comprising the word or
The number of documents of phrase, then the business of gained is taken the logarithm, for measuring the general importance of a word or phrase.So such as
The frequency TF that really some word or phrase occur in a document is high, and seldom occurs in other documents, then it is assumed that the word
Or phrase has preferable class discrimination ability, is adapted to the label as a certain classification.
Assuming that obtaining the cluster set of K event category by clustering algorithm, the category is counted for each event category
Under can most represent the event category several targets triggering set of words, then under each event category TF-IDF calculating only at this
Event category internal statistical.In the present invention, some language material texts are included under each event category, then for an event
For each target trigger word in classification, (TF is touched the TF-IDF of the target trigger word for some target in every language material text
The frequency that hair word occurs in this language material text di, IDF falls for the quantity of the language material text text comprising the target trigger word
Number) definition be:
Wherein, i represents target trigger word, nijRepresent that target trigger word i appears in language material text text j in the event category
In number of times;Represent the number of times sum that all target trigger words in language material text j under the event category occur;m
Represent the number for all target trigger words that the event category has;N is represented with the corresponding language material text of the event category
Total quantity (sum for including all language material texts of any one target trigger word under the event category);njExpression has
The quantity of the language material text of the target trigger word in the event category, plus 1 and represents smooth.
It can be seen that, the present invention clusters AP clustering algorithms K event category of generation, is respectively labeled as:C1,C2,…Ck;Pin
To each classification Ci (i=1,2 ... k) in all document d, calculate the TF-IDF values of each target trigger word in each document;Pin
The individual target trigger word of the maximum preceding specified quantity (e.g., 100) of TF-IDF values in the event category is taken out to each event category to make
For the mark of the event type classification.
The present invention represents the corresponding label of some event category (mark using the higher target trigger word of several TF-IDF values
Label distinguish the type of each event category), this method and is examined departing from the limitation of 33 kinds of event types defined in ACE language materials
Consider all language phenomenons, form the event type system of open field.
On the other hand, the embodiment of the present application additionally provides a kind of event type draw-out device.Such as Fig. 3, it illustrates this Shen
Please a kind of composition structural representation of event type draw-out device one embodiment, the device of the present embodiment can include:
Word screening unit 301, for extracting multiple candidate's language material words from preset corpus;
Determining unit 302 is associated, for based on the corpus, determining the benchmark trigger word in preset triggering set of words
With the relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element 303, for for any one benchmark trigger word, sexual satisfaction will to be associated with the benchmark trigger word
Candidate's language material word of preset requirement is defined as target trigger word, obtains at least one corresponding institute of each benchmark trigger word
State target trigger word;
Characteristics determining unit 304, the feature for determining each target trigger word respectively;
Type determining units 305, for the feature based on the target trigger word, are gathered to all trigger words
Class, obtains the multiple cluster set for belonging to different event classification clustered out, wherein, each cluster set correspond to a kind of event
Classification, and each cluster set includes at least one target trigger word.
Optionally, institute's predicate screening unit includes:
Word determining unit undetermined, it is undetermined for determine to be included in multiple language material texts in the preset corpus
Language material word;
Word filters unit, for filtering the default useless word included in the language material word undetermined, obtains the time
Language material word is selected, wherein, the default useless word includes stop words and function word.
Optionally, the association determining unit, including:
First association computing unit, for for each candidate's language material word, candidate's language material word to be calculated successively
Language and each initial association of the benchmark trigger word in the corpus in every language material text in triggering set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by described in
Initial association of the benchmark trigger word with candidate's language material word in every language material text is summed up, and obtains the benchmark triggering
Word and relevance of candidate's language material word in the corpus.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words
Quasi- trigger word during initial association in every language material text, is specially in the corpus:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text it is same
First number occurred in one sentence, benchmark triggering word and candidate's language material word are defined as with the ratio of minimum occurrence number
Initial association of the language in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in institute's predicate
Minimum value in the number of times that occurs in material text, and the number of times that occurs in the language material text of candidate's language material word.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words
Quasi- trigger word during initial association in every language material text, is specially in the corpus:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language
Expect word, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst
Quantity in target sentences, is defined as the benchmark trigger word with the ratio of minimum occurrence number and candidate's language material word exists
On the conjunction j in the language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn
Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences
The total quantity for the preset conjunction having.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words
Quasi- trigger word during initial association in every language material text, is specially in the corpus:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate
There is the third time number in the second target sentences simultaneously in language material word, and being defined as the benchmark with the ratio of minimum occurrence number touches
Word and candidate's language material word are sent out in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein,
Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language
Expect the sentence that word is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material
Text diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn
Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type
Maximum quantity.
Optionally, the mode of the feature of each target trigger word of characteristics determining unit determination can include following
Any one or a few:
Obtain the attributive character of the target trigger word;
The conjunctive word of the target trigger word is obtained, the conjunctive word includes the synonym of the target trigger word, antisense
Word and related term;
By being scanned in the language material text that the corpus is included, the mesh for including the target trigger word is searched out
Poster material text, and the feature that predeterminated position relation is met with the target trigger word is oriented in the target language material text
Word, using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, identify that the target is touched
Send out word and the framework type of the target trigger word.
Optionally, described device also includes:
Word determining unit is marked, multiple belongs to different event classification for obtain clustering out in the type determining units
Cluster set after, for any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, it is determined that described
At least one target trigger word of the label of the cluster set is suitable as in cluster set;
Event marks unit, for the label for gathering at least one described target trigger word as the cluster, to institute
Cluster set is stated to be labeled.
The embodiment of each in this specification is described by the way of progressive, and what each embodiment was stressed is and other
Between the difference of embodiment, each embodiment identical similar portion mutually referring to.For device disclosed in embodiment
For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part
It is bright.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the application.
A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein
General Principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, the application
The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one
The most wide scope caused.
Claims (10)
1. a kind of event type abstracting method, it is characterised in that including:
Multiple candidate's language material words are extracted from preset corpus;
Based on the corpus, associating for benchmark trigger word in preset triggering set of words and candidate's language material word is determined
Property, wherein, the benchmark trigger word is determined by automated content extraction technique;
For any one benchmark trigger word, candidate's language material word of sexual satisfaction preset requirement will be associated with the benchmark trigger word
It is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word;
The feature of each target trigger word is determined respectively;
Based on the feature of the target trigger word, all target trigger words are clustered, obtain clustering out is multiple
Belong to the cluster set of different event classification, wherein, each cluster set correspond to a kind of event category, and each cluster set
Including at least one target trigger word.
2. according to the method described in claim 1, it is characterised in that described that candidate's language material word is extracted from preset corpus
Language, including:
Determine the language material word undetermined included in multiple language material texts in the preset corpus;
The default useless word included in the language material word undetermined is filtered, candidate's language material word is obtained, wherein, it is described
Presetting useless word includes stop words and function word.
3. according to the method described in claim 1, it is characterised in that described to be based on the corpus, determine preset trigger word
Benchmark trigger word and the relevance of candidate's language material word in set, including:
For each candidate's language material word, candidate's language material word is calculated successively and is touched with each benchmark in triggering set of words
Send out initial association of the word in the corpus in every language material text;
For benchmark trigger word described in any pair and candidate's language material word, the benchmark trigger word and candidate's language material word are existed
Initial association in every language material text is summed up, and obtains the benchmark trigger word with candidate's language material word in the language material
Relevance in storehouse.
4. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words
In each initial association of the benchmark trigger word in the corpus in every language material text, including:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text same sentence
First number occurred in sub, is defined as benchmark triggering word with the ratio of minimum occurrence number and exists with candidate's language material word
Initial association in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in language material text
Minimum value in the number of times occurred in this, and the number of times that occurs in the language material text of candidate's language material word.
5. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words
In each initial association of the benchmark trigger word in the corpus in every language material text, including:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language material word
Language, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst object
Quantity in sentence, the benchmark trigger word and candidate's language material word are defined as described with the ratio of minimum occurrence number
On the conjunction j in language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn it is initial
Relevance is Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences
The total quantity of the preset conjunction.
6. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words
In each initial association of the benchmark trigger word in the corpus in every language material text, including:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate's language material
There is the third time number in the second target sentences simultaneously in word, and the benchmark trigger word is defined as with the ratio of minimum occurrence number
With candidate's language material word in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein, it is described
Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language material word
The sentence that language is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material text
diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn it is initial
Relevance is Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, the maximum with the preset relationship type
Quantity.
7. the method according to any one of claims 1 to 3, it is characterised in that each target trigger word of the determination
Feature, including it is following any one or a few:
Obtain the attributive character of the target trigger word;
Obtain the conjunctive word of the target trigger word, the conjunctive word include the synonym of the target trigger word, antonym and
Related term;
By being scanned in the language material text that the corpus is included, the target language for including the target trigger word is searched out
Expect text, and orient the Feature Words that predeterminated position relation is met with the target trigger word in the target language material text,
Using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, the target trigger word is identified
And the framework type of the target trigger word.
8. the method according to any one of claims 1 to 3, it is characterised in that it is described obtain clustering out multiple belong to
After the cluster set of different event classification, also include:
For any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, determine to fit in the cluster set
Cooperate at least one target trigger word of the label for the cluster set;
Using at least one described target trigger word as the label of the cluster set, the cluster set is labeled.
9. a kind of event type draw-out device, it is characterised in that including:
Word screening unit, for extracting multiple candidate's language material words from preset corpus;
Associate determining unit, for based on the corpus, determine the benchmark trigger word in preset triggering set of words with it is described
The relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element, for that for any one benchmark trigger word, will be associated with the benchmark trigger word, sexual satisfaction is default to be wanted
The candidate's language material word asked is defined as target trigger word, obtains at least one corresponding described target of each benchmark trigger word
Trigger word;
Characteristics determining unit, the feature for determining each target trigger word respectively;
Type determining units, for the feature based on the target trigger word, cluster to all trigger words, obtain
The multiple cluster set for belonging to different event classification clustered out, wherein, each cluster set correspond to a kind of event category, and
Each cluster set includes at least one target trigger word.
10. device according to claim 9, it is characterised in that the association determining unit, including:
First association computing unit, for for each candidate's language material word, calculate successively candidate's language material word with
Trigger each initial association of the benchmark trigger word in the corpus in every language material text in set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by the benchmark
Initial association of the trigger word with candidate's language material word in every language material text is summed up, obtain the benchmark trigger word with
Relevance of candidate's language material word in the corpus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710169761.3A CN106951530B (en) | 2017-03-21 | 2017-03-21 | Event type extraction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710169761.3A CN106951530B (en) | 2017-03-21 | 2017-03-21 | Event type extraction method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106951530A true CN106951530A (en) | 2017-07-14 |
CN106951530B CN106951530B (en) | 2020-01-17 |
Family
ID=59472782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710169761.3A Active CN106951530B (en) | 2017-03-21 | 2017-03-21 | Event type extraction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106951530B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319692A (en) * | 2018-02-01 | 2018-07-24 | 北京云知声信息技术有限公司 | Abnormal punctuate cleaning method, storage medium and server |
CN110032641A (en) * | 2019-02-14 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Method and device that computer executes, that event extraction is carried out using neural network |
CN110209807A (en) * | 2018-07-03 | 2019-09-06 | 腾讯科技(深圳)有限公司 | A kind of method of event recognition, the method for model training, equipment and storage medium |
CN111310461A (en) * | 2020-01-15 | 2020-06-19 | 腾讯云计算(北京)有限责任公司 | Event element extraction method, device, equipment and storage medium |
CN111382575A (en) * | 2020-03-19 | 2020-07-07 | 电子科技大学 | Event extraction method based on joint labeling and entity semantic information |
CN111522915A (en) * | 2020-04-20 | 2020-08-11 | 北大方正集团有限公司 | Extraction method, device and equipment of Chinese event and storage medium |
CN111985152A (en) * | 2020-07-28 | 2020-11-24 | 浙江大学 | Event classification method based on bipartite hypersphere prototype network |
CN112487171A (en) * | 2020-12-15 | 2021-03-12 | 中国人民解放军国防科技大学 | Event extraction system and method under open domain |
CN116611514A (en) * | 2023-07-19 | 2023-08-18 | 中国科学技术大学 | Value orientation evaluation system construction method based on data driving |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010029560A1 (en) * | 2000-04-05 | 2001-10-11 | Hugo Delchini | Computer farm with a system for the hot insertion/extraction of processor cards |
CN104462229A (en) * | 2014-11-13 | 2015-03-25 | 苏州大学 | Event classification method and device |
-
2017
- 2017-03-21 CN CN201710169761.3A patent/CN106951530B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010029560A1 (en) * | 2000-04-05 | 2001-10-11 | Hugo Delchini | Computer farm with a system for the hot insertion/extraction of processor cards |
CN104462229A (en) * | 2014-11-13 | 2015-03-25 | 苏州大学 | Event classification method and device |
Non-Patent Citations (1)
Title |
---|
赵妍妍: "中文事件抽取中事件类别的自动识别", 《第三届学生计算语言学研讨会论文集》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108319692A (en) * | 2018-02-01 | 2018-07-24 | 北京云知声信息技术有限公司 | Abnormal punctuate cleaning method, storage medium and server |
CN108319692B (en) * | 2018-02-01 | 2021-03-19 | 云知声智能科技股份有限公司 | Abnormal punctuation cleaning method, storage medium and server |
CN110209807A (en) * | 2018-07-03 | 2019-09-06 | 腾讯科技(深圳)有限公司 | A kind of method of event recognition, the method for model training, equipment and storage medium |
US11972213B2 (en) | 2018-07-03 | 2024-04-30 | Tencent Technology (Shenzhen) Company Limited | Event recognition method and apparatus, model training method and apparatus, and storage medium |
CN110032641A (en) * | 2019-02-14 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Method and device that computer executes, that event extraction is carried out using neural network |
CN110032641B (en) * | 2019-02-14 | 2024-02-13 | 创新先进技术有限公司 | Method and device for extracting event by using neural network and executed by computer |
CN111310461B (en) * | 2020-01-15 | 2023-03-21 | 腾讯云计算(北京)有限责任公司 | Event element extraction method, device, equipment and storage medium |
CN111310461A (en) * | 2020-01-15 | 2020-06-19 | 腾讯云计算(北京)有限责任公司 | Event element extraction method, device, equipment and storage medium |
CN111382575A (en) * | 2020-03-19 | 2020-07-07 | 电子科技大学 | Event extraction method based on joint labeling and entity semantic information |
CN111522915A (en) * | 2020-04-20 | 2020-08-11 | 北大方正集团有限公司 | Extraction method, device and equipment of Chinese event and storage medium |
CN111985152A (en) * | 2020-07-28 | 2020-11-24 | 浙江大学 | Event classification method based on bipartite hypersphere prototype network |
CN111985152B (en) * | 2020-07-28 | 2022-09-13 | 浙江大学 | Event classification method based on dichotomy hypersphere prototype network |
CN112487171A (en) * | 2020-12-15 | 2021-03-12 | 中国人民解放军国防科技大学 | Event extraction system and method under open domain |
CN116611514A (en) * | 2023-07-19 | 2023-08-18 | 中国科学技术大学 | Value orientation evaluation system construction method based on data driving |
CN116611514B (en) * | 2023-07-19 | 2023-10-10 | 中国科学技术大学 | Value orientation evaluation system construction method based on data driving |
Also Published As
Publication number | Publication date |
---|---|
CN106951530B (en) | 2020-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106951530A (en) | A kind of event type abstracting method and device | |
Bharti et al. | Sarcastic sentiment detection in tweets streamed in real time: a big data approach | |
US20180341871A1 (en) | Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains | |
Gholamrezazadeh et al. | A comprehensive survey on text summarization systems | |
Liu et al. | Using wordnet to disambiguate word senses for text classification | |
Rahman et al. | Improvement of query-based text summarization using word sense disambiguation | |
Rafeeque et al. | A survey on short text analysis in web | |
Awajan | Keyword extraction from Arabic documents using term equivalence classes | |
Bagalkotkar et al. | A novel technique for efficient text document summarization as a service | |
Sharoff | Classifying Web corpora into domain and genre using automatic feature identification | |
WO2015004006A1 (en) | Method and computer server system for receiving and presenting information to a user in a computer network | |
Bohne et al. | Efficient keyword extraction for meaningful document perception | |
Bella et al. | Domain-based sense disambiguation in multilingual structured data | |
Moradi | Frequent itemsets as meaningful events in graphs for summarizing biomedical texts | |
Tembhurnikar et al. | Topic detection using BNgram method and sentiment analysis on twitter dataset | |
Jafari et al. | Unsupervised keyword extraction for hashtag recommendation in social media | |
Chin et al. | Automatic discovery of concepts from text | |
Amin et al. | Algorithm for bengali keyword extraction | |
Ullah et al. | Pattern and semantic analysis to improve unsupervised techniques for opinion target identification | |
Perrie et al. | Using *** n-grams to expand word-emotion association lexicon | |
Ma et al. | Combining n-gram and dependency word pair for multi-document summarization | |
Han et al. | Mining Technical Topic Networks from Chinese Patents. | |
Heu et al. | Multi-document summarization exploiting semantic analysis based on tag cluster | |
Gupta et al. | Document summarisation based on sentence ranking using vector space model | |
Gayen et al. | Automatic identification of Bengali noun-noun compounds using random forest |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Hong Yu Inventor after: Yang Xuerong Inventor after: Yao Jianmin Inventor after: Zhu Qiaoming Inventor before: Yang Xuerong Inventor before: Hong Yu Inventor before: Yao Jianmin Inventor before: Zhu Qiaoming |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |