CN106951530A - A kind of event type abstracting method and device - Google Patents

A kind of event type abstracting method and device Download PDF

Info

Publication number
CN106951530A
CN106951530A CN201710169761.3A CN201710169761A CN106951530A CN 106951530 A CN106951530 A CN 106951530A CN 201710169761 A CN201710169761 A CN 201710169761A CN 106951530 A CN106951530 A CN 106951530A
Authority
CN
China
Prior art keywords
word
language material
candidate
trigger word
benchmark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710169761.3A
Other languages
Chinese (zh)
Other versions
CN106951530B (en
Inventor
杨雪蓉
洪宇
姚建民
朱巧明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201710169761.3A priority Critical patent/CN106951530B/en
Publication of CN106951530A publication Critical patent/CN106951530A/en
Application granted granted Critical
Publication of CN106951530B publication Critical patent/CN106951530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application provides a kind of event type abstracting method and device, this method includes:Candidate's language material word is extracted from preset corpus;Based on the corpus, the benchmark trigger word in preset triggering set of words and the relevance of candidate's language material word are determined, wherein, the benchmark trigger word is determined by automated content extraction technique;For any one benchmark trigger word, candidate's language material word that sexual satisfaction preset requirement is associated with the benchmark trigger word is defined as target trigger word, at least one described target trigger word is obtained;Determine the feature of the target trigger word in the triggering set of words;Based on the feature of target trigger word, target trigger word is clustered, the cluster set for belonging to different event classification clustered out is obtained.This method and device are that the accuracy for improving event extraction and the application for increasing event extraction provide possibility.

Description

A kind of event type abstracting method and device
Technical field
The application is related to technical field of information processing, more particularly to a kind of event type abstracting method and dress Put.
Background technology
Event extraction is with a wide range of applications and huge reality meaning as the important component of information extraction Justice.The purpose of event extraction is that time letter interested is accurately and effectively extracted from a large amount of unordered mixed and disorderly, structureless information Breath, according to the task definition of event extraction, event refers to one that specific people, thing interact in special time and locality Objective fact is planted, event is made up of the element of trigger word and description event structure.Event extraction requirement is from containing event information Destructuring source text in, automatic identification and extract out the structuring containing event type, Event element and event Role Information Information.
At present, existing Event Distillation directly extracts (Automatic Content using automated content Extraction, ACE) annotation results so that the research of event extraction is also limited only to the event type defined in ACE, That is, it is limited only to defined domain event extraction.It is more rich and varied yet with event type in open field, the difference of event type It is different relatively small, so as to cause difference to differentiate that difficulty is big, if still directly using ACE, it can not accurately and effectively carry out event Extract.
The content of the invention
In view of this, this application provides a kind of event type abstracting method and device, the standard for improving event extraction is thought True property and the application of increase event extraction are provided may.
To achieve the above object, the application provides following technical scheme:
A kind of event type abstracting method, including:
Multiple candidate's language material words are extracted from preset corpus;
Based on the corpus, the benchmark trigger word and candidate's language material word in preset triggering set of words are determined Relevance, wherein, the benchmark trigger word is determined by automated content extraction technique;
For any one benchmark trigger word, candidate's language material of sexual satisfaction preset requirement will be associated with the benchmark trigger word Word is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word;
The feature of each target trigger word is determined respectively;
Based on the feature of the target trigger word, all target trigger words are clustered, obtain what is clustered out Multiple cluster set for belonging to different event classification, wherein, each cluster set correspond to a kind of event category, and each cluster Set includes at least one target trigger word.
It is preferred that, it is described to extract candidate's language material word from preset corpus, including:
Determine the language material word undetermined included in multiple language material texts in the preset corpus;
The default useless word included in the language material word undetermined is filtered, candidate's language material word is obtained, wherein, The default useless word includes stop words and function word.
It is preferred that, it is described to be based on the corpus, determine that the benchmark trigger word in preset triggering set of words is waited with described The relevance of language material word is selected, including:
For each candidate's language material word, candidate's language material word and each base in triggering set of words are calculated successively Initial association of the quasi- trigger word in the corpus in every language material text;
For benchmark trigger word described in any pair and candidate's language material word, by the benchmark trigger word and candidate's language material word Initial association of the language in every language material text is summed up, and obtains the benchmark trigger word with candidate's language material word described Relevance in corpus.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material Initial association in storehouse in every language material text, including:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text it is same First number occurred in one sentence, benchmark triggering word and candidate's language material word are defined as with the ratio of minimum occurrence number Initial association of the language in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in institute's predicate Minimum value in the number of times that occurs in material text, and the number of times that occurs in the language material text of candidate's language material word.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material Initial association in storehouse in every language material text, including:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language Expect word, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst Quantity in target sentences, is defined as the benchmark trigger word with the ratio of minimum occurrence number and candidate's language material word exists On the conjunction j in the language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences The total quantity for the preset conjunction having.
It is preferred that, it is described to calculate candidate's language material word with each benchmark trigger word in triggering set of words in the language material Initial association in storehouse in every language material text, including:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate There is the third time number in the second target sentences simultaneously in language material word, and being defined as the benchmark with the ratio of minimum occurrence number touches Word and candidate's language material word are sent out in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein, Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language Expect the sentence that word is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material Text diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type Maximum quantity.
It is preferred that, the feature for determining each target trigger word, including it is following any one or a few:
Obtain the attributive character of the target trigger word;
The conjunctive word of the target trigger word is obtained, the conjunctive word includes the synonym of the target trigger word, antisense Word and related term;
By being scanned in the language material text that the corpus is included, the mesh for including the target trigger word is searched out Poster material text, and the feature that predeterminated position relation is met with the target trigger word is oriented in the target language material text Word, using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, identify that the target is touched Send out word and the framework type of the target trigger word.
It is preferred that, it is described obtain clustering out it is multiple belong to the cluster set of different event classification after, also include:
For any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, the cluster set is determined In be suitable as it is described cluster set label at least one target trigger word;
Using at least one described target trigger word as the label of the cluster set, the cluster set is marked Note.
On the other hand, present invention also provides a kind of event type draw-out device, including:
Word screening unit, for extracting multiple candidate's language material words from preset corpus;
Associate determining unit, for based on the corpus, determine benchmark trigger word in preset triggering set of words and The relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element, for for any one benchmark trigger word, sexual satisfaction will to be associated with the benchmark trigger word pre- If it is required that candidate's language material word be defined as target trigger word, obtain each benchmark trigger word corresponding described at least one Target trigger word;
Characteristics determining unit, the feature for determining each target trigger word respectively;
Type determining units, for the feature based on the target trigger word, are clustered to all trigger words, The multiple cluster set for belonging to different event classification clustered out are obtained, wherein, each cluster set correspond to a kind of event class Not, and each clustering set includes at least one target trigger word.
It is preferred that, the association determining unit, including:
First association computing unit, for for each candidate's language material word, candidate's language material word to be calculated successively Language and each initial association of the benchmark trigger word in the corpus in every language material text in triggering set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by described in Initial association of the benchmark trigger word with candidate's language material word in every language material text is summed up, and obtains the benchmark triggering Word and relevance of candidate's language material word in the corpus.
Understood via above-mentioned technical scheme, the target trigger word in set of words is triggered in this application with existing automatic On the basis of the trigger word that content extraction technology is obtained, the trigger word that automatic extraction technique is obtained is extended and obtained, so that The scope that the trigger word that must be obtained is covered is more extensive, is conducive to determining the core word of firing event in Event Distillation, therefore, Trigger word after based on extension is clustered, and may finally obtain a greater variety of event types, is conducive to raising event to carry The degree of accuracy taken, increases the application of Event Distillation.
Brief description of the drawings
, below will be to be used needed for embodiment description in order to illustrate more clearly of the technical scheme of the embodiment of the present application Accompanying drawing be briefly described, it should be apparent that, drawings in the following description are only embodiments herein, for this area For those of ordinary skill, on the premise of not paying creative work, it can also be obtained according to the accompanying drawing of offer other attached Figure.
Fig. 1 shows a kind of schematic flow sheet of event type abstracting method one embodiment of the application;
Fig. 2 shows a kind of schematic flow sheet of another embodiment of event type abstracting method of the application;
Fig. 3 shows a kind of composition structural representation of event type draw-out device one embodiment of the application.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation is described, it is clear that described embodiment is only some embodiments of the present application, rather than whole embodiments.It is based on Embodiment in the application, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of the application protection.
For the ease of understanding event extraction process, involved noun is simply situated between in being extracted below to some events Continue:
Entity (Entity):Belong to the object or object set of some semantic classes.
Entity description (Entity mention):Phrase (being noun phrase under normal circumstances) comprising entity.
Event trigger word (Event trigger):(trigger word is mainly verb to the core word that firing event occurs in ACE Or noun).
Event element (Event arguments):The participant of event, is the core of composition event.
Event role/element role (Argument roles):Event participant and the relation of event.
Event description (Event mention):Phrase or sentence comprising event trigger word and event participant.
A kind of event type abstracting method that the application is described below is introduced.
Referring to Fig. 1, it illustrates a kind of schematic flow sheet of event type abstracting method one embodiment of the application, this reality Applying the method for example can include:
101, candidate's language material word is extracted from preset corpus.
Wherein, the corpus is exactly pending language resource, e.g., the corpus can be based on TDT (topic detection with Tracking, Topic Detection and Tracking) technical limit spacing arrives language material, and the corpus includes many language material texts, this A little language material texts can be the news report towards multi-language text and speech form, and TDT mainly reports that border is automatic to event Recognize, lock and collect sudden news topic, the development of tracking topic and the inter-related task such as across language detection and tracking.It is based on A large amount of events are described in the newsletter archive of TDT technologies.
The candidate's language material word extracted from preset corpus may be considered candidate's trigger word, so as to subsequently from this Chosen in a little candidate's trigger words can as the trigger word expanded word.Specifically, can be by preset corpus In language material text carry out word extraction, to obtain candidate's trigger word.
102, based on the corpus, determine the benchmark trigger word and candidate's language material word in preset triggering set of words Relevance.
Wherein, the benchmark trigger word is to extract what ACE technologies were determined by automated content.The benchmark trigger word is appreciated that For the seed trigger word for extending trigger word, on the basis of the benchmark trigger word, to be carried out with reference to candidate's language material word The extension of trigger word.
103, for any one benchmark trigger word, candidate's language of sexual satisfaction preset requirement will be associated with the benchmark trigger word Material word is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word.
With existing difference, the trigger word that event extraction is used in the application is not directly to be extracted using the automated content The trigger word that technology is obtained, but on the basis of the trigger word obtained by automated content extraction technique, trigger word is extended.
104, the feature of each target trigger word is determined respectively.
Wherein, the feature of target trigger word is used for the self attributes for characterizing the target trigger word, and the target trigger word is in language Expect in text with the relevance of context etc., the feature of the target trigger word is to determine the foundation of the event category.
105, based on the feature of target trigger word, all target trigger words are clustered, obtain clustering out is multiple Belong to the cluster set of different event classification.
Wherein, each cluster set correspond to a kind of event category, and each cluster set includes at least one target and touched Send out word.
The triggering that the target trigger word in triggering set of words is obtained with existing automated content extraction technique in this application On the basis of word, the trigger word that automatic extraction technique is obtained is extended and obtained, so that what obtained trigger word was covered Scope is more extensive, is conducive to determining the core word of firing event in Event Distillation, therefore, the trigger word after based on extension is entered Row cluster, may finally obtain a greater variety of event types, be conducive to improving the degree of accuracy of Event Distillation, increase Event Distillation Application.
Referring to Fig. 2, it illustrates a kind of schematic flow sheet of another embodiment of event type abstracting method of the application, sheet The method of embodiment can include:
201, from multiple language material texts in preset corpus, determine language material word undetermined.
The step 201 is wrapped equivalent to word extraction is carried out in middle language material text with determining in the plurality of language material text The language material word contained, in order to be made a distinction with the candidate's language material word for being subsequently used for extending trigger word, will be carried in the language material text The initial language material word taken out is referred to as language material word undetermined.
202, the default useless word included in the language material word undetermined is filtered, obtains including multiple candidate's language material words Candidate's language material set of words.
Wherein, this is preset useless word and can set as needed, e.g., and this, which presets useless word, can include some deactivations The word of word and function word.Wherein, function word can not serve as the word of sentence element, be the word outside notional word.And notional word can be served as individually Sentence element, that is, the word for having lexical meaning and grammatical meaning.
Certainly, in addition to filtering useless word set in advance from language material word undetermined, attribute material can also be treated Remaining notional word carries out the pretreatment such as lemmatization in dictionary, and regard remaining language material word undetermined after pretreatment as candidate Language material word, so as to obtain candidate's language material set of words.
203, obtain preset triggering set of words.
Wherein, the triggering set of words includes multiple benchmark trigger words determined by automated content extraction technique.
Benchmark trigger word can be understood as the trigger word determined according to prior art, and the application needs to touch existing Send out the extension that trigger word is carried out in base standard.
204, for each candidate's language material word, candidate's language material word and each benchmark in triggering set of words are calculated successively Initial association of the trigger word in every language material text.
Correlation and degree of correlation that relevance can reflect between two words, relevance can exist including two words Correlation in same piece language material text, in that case, correlation can only reflect the two words in this text Interior degree of correlation, for the ease of distinguishing, correlation of the word candidate's language material word in a language material text is triggered by benchmark Referred to as initial association.It is understood that because when language material text has many, benchmark trigger word and candidate's language material word can have There is the initial association for multiple different language material texts.
Relevance is additionally may included in the comprehensive correlation of all documents in corpus, and the comprehensive correlation can reflect Degree of relevancy of two words in all text documents, in the embodiment of the present application, by benchmark trigger word and candidate's language material word The comprehensive correlation of all documents is referred to as the relevance in corpus in corpus.
Wherein, calculating candidate's language material word can with the mode of initial association of the benchmark trigger word in a language material text It is a variety of to have.Such as:
In a kind of implementation for calculating initial association:
First number that benchmark trigger word and candidate's language material word can be occurred in language material text in same sentence, with It is initial in the language material text that the ratio of minimum occurrence number is defined as the same candidate's language material word of benchmark triggering word Relevance.Wherein, the number of times that trigger word occurs in the language material text on the basis of minimum occurrence number, and candidate's language material word Minimum value in the number of times occurred in the language material text.That is, initial association Rdi(seed can c) be expressed as:
Wherein, the frequency of trigger word seed and candidate's language material word c co-occurrences in a sentence on the basis of molecule, denominator is Benchmark trigger word seed and candidate's language material word c are respectively in language material text diMinimum value in the frequency of middle appearance.
In this kind of implementation, the word occurred in same sentence is considered correlation word, two words occur The ratio of the total degree occurred in the frequency of same sentence with the two words is also high, illustrates that the correlation of the two words is higher.
In the implementation of another calculating initial association:
It can first determine to determine from the language material text while there is benchmark trigger word and candidate's language material word, and benchmark Trigger word expects all first object sentences that word is connected by preset conjunction with candidate;Touched for connecting the benchmark Send out each conjunction j of word and candidate's language material wordi, the benchmark trigger word and candidate's language material word are calculated respectively in language material text On conjunction j in thisiCorrelation.In the language material text, benchmark trigger word and candidate's language material word are on the connection Word jiCorrelation be:Benchmark trigger word and candidate's language material word are appeared in the first object sentence in the language material text Second number, the ratio with minimum occurrence number.Wherein, trigger word occurs in the language material text on the basis of minimum occurrence number Number of times, and the minimum value in the number of times that occurs in the language material text of candidate's language material word.
That is, in language material text di, the benchmark trigger word is to candidate's language material word on the related of preset conjunction Property con (conji) can be expressed as follows:
Wherein, in formula two, the molecule of the fraction is in language material text diIn, with the benchmark trigger word seed and time Select language material word c, and the first object sentence that the benchmark trigger word seed is connected with candidate language material word c by conjunction i Quantity, wherein, the quantity is it is also assumed that be that benchmark trigger word is connected by conjunction with candidate's language material word and gone out jointly Number of times in a present first object sentence.Trigger word seed is in language material text d on the basis of the denominator of the fractioniMiddle appearance Number of times and candidate language material word c are in language material text diMinimum value in the number of times of middle appearance.
Accordingly, the benchmark trigger word seed and candidate language material word c is in language material text diIn initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences The total quantity for the preset conjunction having.
Wherein, preset conjunction can be set as needed, optionally, can use PDTB (Binzhou chapter relational tree, Penn Discourse Treebank) define 182 conjunctions.
In this kind of implementation, connecting the species of the conjunction of two words has certain shadow to the degree of correlation of two words Ring:If the conjunction species of two words of connection is more, then it is assumed that the correlation of two words is more chaotic, so as to reduce by two words Correlation;If the conjunction species of two words of connection is less, then it is assumed that the correlation of two words is relatively stable, so that two words Correlation is larger.
In the implementation of another calculating initial association:
Need to be directed to preset every kind of relationship type ji, benchmark trigger word and candidate's language material word are determined while occurring second Third time number in target sentences, and the benchmark trigger word and candidate's language material word are calculated in the language material text on this kind pass Set type jiCorrelation.Wherein, second target sentences are that with the corresponding specified conjunction of the relationship type, and benchmark is touched Hair word specifies the sentence that conjunction is connected with candidate's language material word by this.The benchmark trigger word and candidate's language material word are in the language Expect in text on this kind of relationship type jiCorrelation be the corresponding third time number of this kind of relationship type, with minimum occurrence number Ratio, wherein, the number of times that trigger word occurs in the language material text on the basis of minimum occurrence number, and candidate's language material word Minimum value in the number of times occurred in the language material text.
That is, in language material text diIn, the benchmark trigger word seed and candidate language material word c are on preset relationship type jiCorrelation Rel (relji) can be expressed as follows:
Wherein, molecule is language material text diIn, with benchmark trigger word seed and candidate language material word c and pass through the relation The corresponding specified conjunction of type connects the third time number of the benchmark trigger word and the second target sentences of candidate's language material word, when So, it is also assumed that being that benchmark trigger word is connected by the corresponding conjunction of the relationship type with candidate's language material word and gone out jointly Number of times in a present target sentences.Trigger word seed is in language material text d on the basis of the denominator of the fractioniThe number of times of middle appearance Minimum value in the number of times occurred with candidate language material word c in language material text di.
The benchmark trigger word seed and candidate language material word c are being obtained on preset relationship type jiCorrelation Rel (relji) after, the benchmark trigger word and candidate language material word c can be counted in language material text diIn initial association Rdi (seed c) is:
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type Maximum quantity.Such as, preset relationship type has four kinds, then the value of the k is 4.
Due to 182 kinds of conjunctions defined in PDTB, the example rare numbers of each conjunction are easily caused, in this kind of reality In existing mode relevance is calculated using the relationship type based on chapter.Optionally, in the embodiment of the present application, the relation object of chapter Type can include four preset major class relationship types:Relativity (Comparison), causality (Contingency), expansion Exhibition relation (Expansion) and sequential relationship (Temporal).
Part conjunction points to specific relationship type in PDTB, for example, with conjunction " because (because of) " connection " preposition argument " and " rearmounted argument " in sentence can point to " Causal (cause and effect) " relation;Part conjunction can be pointed to many Relationship type is planted, for example, conjunction " and (and) ".Therefore, the present invention only chooses specific conjunction in PDTB.Specific connection Word refer to the conjunction pointed in chapter a certain relationship type probability it is higher.The present invention is directed to the distribution of conjunction in PDTB, And counted the probability that each conjunction points to a certain relationship type.For example conjunction " alternatively (selectively) " is pointed to The probability of " Expansion (extension) " relationship type is 100%.It is big that a certain relationship type probability of sensing is only chosen in this application In the 80% specified conjunction that is included as the relationship type of conjunction.
Accordingly, in this kind of implementation, there is " confining spectrum " of correlation as the two setting two words Word is in same sentence, while requiring that the two words are connected by specifying conjunction.Meanwhile, four kinds can be directed to respectively Relationship type calculates word seed and c correlation respectively.
Certainly, other modes can also carry out calculating benchmark trigger word with candidate's language material word in the language in actual applications Expect the initial association in text, be not any limitation as herein.
205, for any pair of benchmark trigger word and candidate's language material word, according to the benchmark triggering in the triggering set of words Word and initial association of candidate's language material word in every language material text, count the benchmark trigger word and exist with candidate's language material word Relevance in the corpus.
For any pair of benchmark trigger word and candidate's language material word, by the benchmark trigger word and candidate's language material word in language The initial association of every language material text in material storehouse is summed up, and just can obtain the benchmark trigger word and candidate's language material word Relevance in the corpus, i.e. the benchmark trigger word and the final relevance of candidate's language material word.
That is, (seed c) is the relevance R of benchmark trigger word seed and candidate's language material word c in corpus:
Wherein, n is represented in corpus, with the sentence that there is benchmark trigger word seed and candidate's language material word c simultaneously The total quantity of language material text, i is the natural number from 1 to n, and di represents that there is benchmark trigger word seed and candidate's language material word c is total to With the language material text appeared in a sentence.
206, for any one benchmark trigger word, candidate's language of sexual satisfaction preset requirement will be associated with the benchmark trigger word Material word is defined as target trigger word, obtains at least one the target trigger word expanded by the benchmark trigger word.
One or more target trigger words can be expanded for each benchmark trigger word.
The preset requirement that target trigger word is met with the benchmark trigger word can be set as needed.Such as.This is preset will Predetermined threshold value can be more than for the value of relevance by asking.Optionally, can be directed to each benchmark trigger word, according to the benchmark The order of the relevance of trigger word from high in the end, is ranked up to each candidate's language material word, by the forward specified quantity that sorts Individual candidate's language material word determines target trigger word.
207, the feature of each target trigger word is obtained respectively.
Wherein, the feature of the target trigger word is used for the fundamental characteristics for describing the target trigger word.
Such as, the feature of target trigger word can include it is following any one or a few:
The attributive character of target trigger word;
The conjunctive word of target trigger word, e.g., synonym, antonym and the related term of target trigger word;
The contextual feature of target trigger word;
Framework type belonging to target trigger word.
Wherein, the feature that attributive character has in itself for the target trigger word, specifically can be by recognizing that the target is touched The part of speech of word is sent out, names entity to obtain.
Wherein, the conjunctive word of the target trigger word can be specified by preset interface interchange dictionary is obtained.
Wherein, the contextual feature of the target trigger word can be searched by being scanned in the language material text of corpus Rope goes out in the target language material text comprising the target trigger word, and in the target language material text, orients and triggered with the target Word meets the Feature Words of predeterminated position relation, and using obtained Feature Words as the target trigger word contextual feature.Such as, on Following traits can include following several:
First three word of target trigger word and rear three words (not including stop words);
According to N-Gram models, search and sequence of the distance no more than three words of target trigger word in language material text In, extract two or three words;
Extracted from language material text, a word close to the target trigger word and before the target trigger word And a word after the target trigger word.
Wherein, the framework type belonging to target trigger word is to identify language material text based on frame network FrameNet instruments In each sentence target trigger word and the framework (Frame) of the target trigger word, to have the feelings of framework in target trigger word Under condition, the framework type of the target trigger word is obtained.The frame of the previous word of the target trigger word can also further be extracted The framework type of latter word of frame type and target trigger word.The frame network is one based on corpus, with frame The theory of frame semantics, based on framework and the semantic network that makes its lexical meaning be connected to each other wherein.
208, based on the feature of target trigger word, obtained all target trigger words are clustered, obtain what is clustered out Belong to the cluster set of multiple different event classifications.
Each cluster set includes multiple target trigger words
Wherein, the different event category of different cluster set correspondences, the cluster set of an event category includes category In multiple target trigger words of the event category.
In the embodiment of the present application, carrying out cluster to target trigger word can be carried out according to default clustering algorithm, e.g., can So that according to close to propagation clustering algorithm, i.e. Affinity Propagation Cluster algorithms are clustered, wherein, the neighbour Propagation clustering algorithm can also be referred to as AP clustering algorithms.The clustering algorithm is using all data points all as in potentially clustering The heart, and without specifying the number of cluster.During cluster, by the feature for the target trigger word being previously obtained constituted to Amount is as input data, and the event trigger word characteristic vector built, just can be by the triggering of same type as input data Word be classified as it is mutually similar in a class, cluster result in the type of target trigger word feature is identical in other words.Wherein, a class can To be considered a triggering set of words.
Because the feature of target trigger word determined in the application and the feature determined in the prior art are significantly different, Therefore, all target trigger words are clustered by clustering algorithm, obtained event category does not limit to and different from ACE languages Event type defined in material.
209, gather for each cluster, at least one target trigger word is selected from the cluster set as the cluster The label of set, to utilize the obtained label for labelling cluster set.
Optionally, according to TF-IDF algorithms, to determine that cluster is suitable as the target of the label of the cluster set in gathering Trigger word, specifically, the every kind of event type that can be generated for clustering algorithm, chooses from the cluster set of the event category The maximum preceding specified quantity target trigger word of TF-IDF values as the event type classification label.
Wherein, TF-IDF is a kind of statistical method, to assess a words for a file set or a corpus In a copy of it file significance level.TF-IDF main thought is:If some word or phrase go out in an article Existing frequency TF is high, and seldom occurs in other articles, then it is assumed that this word or phrase have good class discrimination energy Power, is adapted to classification.Wherein, TF is word frequency (Term Frequency), represents what some word or phrase occurred in a document Frequency;IDF is reverse document-frequency (Inverse Document Frequency), by total number of documents divided by comprising the word or The number of documents of phrase, then the business of gained is taken the logarithm, for measuring the general importance of a word or phrase.So such as The frequency TF that really some word or phrase occur in a document is high, and seldom occurs in other documents, then it is assumed that the word Or phrase has preferable class discrimination ability, is adapted to the label as a certain classification.
Assuming that obtaining the cluster set of K event category by clustering algorithm, the category is counted for each event category Under can most represent the event category several targets triggering set of words, then under each event category TF-IDF calculating only at this Event category internal statistical.In the present invention, some language material texts are included under each event category, then for an event For each target trigger word in classification, (TF is touched the TF-IDF of the target trigger word for some target in every language material text The frequency that hair word occurs in this language material text di, IDF falls for the quantity of the language material text text comprising the target trigger word Number) definition be:
Wherein, i represents target trigger word, nijRepresent that target trigger word i appears in language material text text j in the event category In number of times;Represent the number of times sum that all target trigger words in language material text j under the event category occur;m Represent the number for all target trigger words that the event category has;N is represented with the corresponding language material text of the event category Total quantity (sum for including all language material texts of any one target trigger word under the event category);njExpression has The quantity of the language material text of the target trigger word in the event category, plus 1 and represents smooth.
It can be seen that, the present invention clusters AP clustering algorithms K event category of generation, is respectively labeled as:C1,C2,…Ck;Pin To each classification Ci (i=1,2 ... k) in all document d, calculate the TF-IDF values of each target trigger word in each document;Pin The individual target trigger word of the maximum preceding specified quantity (e.g., 100) of TF-IDF values in the event category is taken out to each event category to make For the mark of the event type classification.
The present invention represents the corresponding label of some event category (mark using the higher target trigger word of several TF-IDF values Label distinguish the type of each event category), this method and is examined departing from the limitation of 33 kinds of event types defined in ACE language materials Consider all language phenomenons, form the event type system of open field.
On the other hand, the embodiment of the present application additionally provides a kind of event type draw-out device.Such as Fig. 3, it illustrates this Shen Please a kind of composition structural representation of event type draw-out device one embodiment, the device of the present embodiment can include:
Word screening unit 301, for extracting multiple candidate's language material words from preset corpus;
Determining unit 302 is associated, for based on the corpus, determining the benchmark trigger word in preset triggering set of words With the relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element 303, for for any one benchmark trigger word, sexual satisfaction will to be associated with the benchmark trigger word Candidate's language material word of preset requirement is defined as target trigger word, obtains at least one corresponding institute of each benchmark trigger word State target trigger word;
Characteristics determining unit 304, the feature for determining each target trigger word respectively;
Type determining units 305, for the feature based on the target trigger word, are gathered to all trigger words Class, obtains the multiple cluster set for belonging to different event classification clustered out, wherein, each cluster set correspond to a kind of event Classification, and each cluster set includes at least one target trigger word.
Optionally, institute's predicate screening unit includes:
Word determining unit undetermined, it is undetermined for determine to be included in multiple language material texts in the preset corpus Language material word;
Word filters unit, for filtering the default useless word included in the language material word undetermined, obtains the time Language material word is selected, wherein, the default useless word includes stop words and function word.
Optionally, the association determining unit, including:
First association computing unit, for for each candidate's language material word, candidate's language material word to be calculated successively Language and each initial association of the benchmark trigger word in the corpus in every language material text in triggering set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by described in Initial association of the benchmark trigger word with candidate's language material word in every language material text is summed up, and obtains the benchmark triggering Word and relevance of candidate's language material word in the corpus.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words Quasi- trigger word during initial association in every language material text, is specially in the corpus:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text it is same First number occurred in one sentence, benchmark triggering word and candidate's language material word are defined as with the ratio of minimum occurrence number Initial association of the language in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in institute's predicate Minimum value in the number of times that occurs in material text, and the number of times that occurs in the language material text of candidate's language material word.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words Quasi- trigger word during initial association in every language material text, is specially in the corpus:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language Expect word, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst Quantity in target sentences, is defined as the benchmark trigger word with the ratio of minimum occurrence number and candidate's language material word exists On the conjunction j in the language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences The total quantity for the preset conjunction having.
Optionally, the first association computing unit is calculating candidate's language material word and each base in triggering set of words Quasi- trigger word during initial association in every language material text, is specially in the corpus:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate There is the third time number in the second target sentences simultaneously in language material word, and being defined as the benchmark with the ratio of minimum occurrence number touches Word and candidate's language material word are sent out in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein, Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language Expect the sentence that word is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material Text diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn Initial association be Rdi(seed,c):
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, with the preset relationship type Maximum quantity.
Optionally, the mode of the feature of each target trigger word of characteristics determining unit determination can include following Any one or a few:
Obtain the attributive character of the target trigger word;
The conjunctive word of the target trigger word is obtained, the conjunctive word includes the synonym of the target trigger word, antisense Word and related term;
By being scanned in the language material text that the corpus is included, the mesh for including the target trigger word is searched out Poster material text, and the feature that predeterminated position relation is met with the target trigger word is oriented in the target language material text Word, using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, identify that the target is touched Send out word and the framework type of the target trigger word.
Optionally, described device also includes:
Word determining unit is marked, multiple belongs to different event classification for obtain clustering out in the type determining units Cluster set after, for any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, it is determined that described At least one target trigger word of the label of the cluster set is suitable as in cluster set;
Event marks unit, for the label for gathering at least one described target trigger word as the cluster, to institute Cluster set is stated to be labeled.
The embodiment of each in this specification is described by the way of progressive, and what each embodiment was stressed is and other Between the difference of embodiment, each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
The foregoing description of the disclosed embodiments, enables professional and technical personnel in the field to realize or use the application. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can in other embodiments be realized in the case where not departing from spirit herein or scope.Therefore, the application The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims (10)

1. a kind of event type abstracting method, it is characterised in that including:
Multiple candidate's language material words are extracted from preset corpus;
Based on the corpus, associating for benchmark trigger word in preset triggering set of words and candidate's language material word is determined Property, wherein, the benchmark trigger word is determined by automated content extraction technique;
For any one benchmark trigger word, candidate's language material word of sexual satisfaction preset requirement will be associated with the benchmark trigger word It is defined as target trigger word, obtains at least one corresponding described target trigger word of each benchmark trigger word;
The feature of each target trigger word is determined respectively;
Based on the feature of the target trigger word, all target trigger words are clustered, obtain clustering out is multiple Belong to the cluster set of different event classification, wherein, each cluster set correspond to a kind of event category, and each cluster set Including at least one target trigger word.
2. according to the method described in claim 1, it is characterised in that described that candidate's language material word is extracted from preset corpus Language, including:
Determine the language material word undetermined included in multiple language material texts in the preset corpus;
The default useless word included in the language material word undetermined is filtered, candidate's language material word is obtained, wherein, it is described Presetting useless word includes stop words and function word.
3. according to the method described in claim 1, it is characterised in that described to be based on the corpus, determine preset trigger word Benchmark trigger word and the relevance of candidate's language material word in set, including:
For each candidate's language material word, candidate's language material word is calculated successively and is touched with each benchmark in triggering set of words Send out initial association of the word in the corpus in every language material text;
For benchmark trigger word described in any pair and candidate's language material word, the benchmark trigger word and candidate's language material word are existed Initial association in every language material text is summed up, and obtains the benchmark trigger word with candidate's language material word in the language material Relevance in storehouse.
4. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words In each initial association of the benchmark trigger word in the corpus in every language material text, including:
For a language material text, by the benchmark trigger word and candidate's language material word in the language material text same sentence First number occurred in sub, is defined as benchmark triggering word with the ratio of minimum occurrence number and exists with candidate's language material word Initial association in the language material text, wherein, the minimum occurrence number is the benchmark trigger word in language material text Minimum value in the number of times occurred in this, and the number of times that occurs in the language material text of candidate's language material word.
5. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words In each initial association of the benchmark trigger word in the corpus in every language material text, including:
Determine multiple preset conjunctions;
For a language material text, determined from the language material text while having the benchmark trigger word and candidate's language material word Language, and pass through the first object sentence of preset the conjunction connection benchmark trigger word and candidate's language material word;
For each preset conjunction ji, by the language material text, with the preset conjunction jiFirst object Quantity in sentence, the benchmark trigger word and candidate's language material word are defined as described with the ratio of minimum occurrence number On the conjunction j in language material textiCorrelation Con (conji);
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn it is initial Relevance is Rdi(seed,c):
R d i ( s e e d , c ) = - Σ i = 1 k C o n ( conj i ) log ( C o n ( conj i ) ) ;
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, have in all first object sentences The total quantity of the preset conjunction.
6. method according to claim 3, it is characterised in that calculating candidate's language material word and triggering set of words In each initial association of the benchmark trigger word in the corpus in every language material text, including:
Determine preset a variety of relationship types;
In any language material text diIn, for relationship type j any one describedi, by the benchmark trigger word and candidate's language material There is the third time number in the second target sentences simultaneously in word, and the benchmark trigger word is defined as with the ratio of minimum occurrence number With candidate's language material word in the language material text diIn on the relationship type jiCorrelation Rel (relji), wherein, it is described Second target sentences are with the relationship type jiCorresponding specified conjunction, and the benchmark trigger word and candidate's language material word The sentence that language is connected by the specified conjunction, the minimum occurrence number is the benchmark trigger word in the language material text diThe number of times of middle appearance, and candidate's language material word is in the language material text diMinimum value in the number of times of middle appearance;
Using equation below, the benchmark trigger word seed and candidate language material word c is calculated in the language material text diIn it is initial Relevance is Rdi(seed,c):
R d i ( s e e d , c ) = - Σ i = 1 k Re l ( relj i ) l o g ( Re l ( relj i ) ) ;
Wherein, i is the natural number from 1 to k, and k represents the language material text diIn, the maximum with the preset relationship type Quantity.
7. the method according to any one of claims 1 to 3, it is characterised in that each target trigger word of the determination Feature, including it is following any one or a few:
Obtain the attributive character of the target trigger word;
Obtain the conjunctive word of the target trigger word, the conjunctive word include the synonym of the target trigger word, antonym and Related term;
By being scanned in the language material text that the corpus is included, the target language for including the target trigger word is searched out Expect text, and orient the Feature Words that predeterminated position relation is met with the target trigger word in the target language material text, Using obtained Feature Words as the target trigger word contextual feature;
Based on frame network FrameNet instruments out of, sentence in the language material text of corpus, the target trigger word is identified And the framework type of the target trigger word.
8. the method according to any one of claims 1 to 3, it is characterised in that it is described obtain clustering out multiple belong to After the cluster set of different event classification, also include:
For any one cluster set according to word frequency and reverse document-frequency TF-IDF algorithms, determine to fit in the cluster set Cooperate at least one target trigger word of the label for the cluster set;
Using at least one described target trigger word as the label of the cluster set, the cluster set is labeled.
9. a kind of event type draw-out device, it is characterised in that including:
Word screening unit, for extracting multiple candidate's language material words from preset corpus;
Associate determining unit, for based on the corpus, determine the benchmark trigger word in preset triggering set of words with it is described The relevance of candidate's language material word, wherein, the benchmark trigger word is determined by automated content extraction technique;
Word expanding element, for that for any one benchmark trigger word, will be associated with the benchmark trigger word, sexual satisfaction is default to be wanted The candidate's language material word asked is defined as target trigger word, obtains at least one corresponding described target of each benchmark trigger word Trigger word;
Characteristics determining unit, the feature for determining each target trigger word respectively;
Type determining units, for the feature based on the target trigger word, cluster to all trigger words, obtain The multiple cluster set for belonging to different event classification clustered out, wherein, each cluster set correspond to a kind of event category, and Each cluster set includes at least one target trigger word.
10. device according to claim 9, it is characterised in that the association determining unit, including:
First association computing unit, for for each candidate's language material word, calculate successively candidate's language material word with Trigger each initial association of the benchmark trigger word in the corpus in every language material text in set of words;
Second association computing unit, for for benchmark trigger word described in any pair and candidate's language material word, by the benchmark Initial association of the trigger word with candidate's language material word in every language material text is summed up, obtain the benchmark trigger word with Relevance of candidate's language material word in the corpus.
CN201710169761.3A 2017-03-21 2017-03-21 Event type extraction method and device Active CN106951530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710169761.3A CN106951530B (en) 2017-03-21 2017-03-21 Event type extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710169761.3A CN106951530B (en) 2017-03-21 2017-03-21 Event type extraction method and device

Publications (2)

Publication Number Publication Date
CN106951530A true CN106951530A (en) 2017-07-14
CN106951530B CN106951530B (en) 2020-01-17

Family

ID=59472782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710169761.3A Active CN106951530B (en) 2017-03-21 2017-03-21 Event type extraction method and device

Country Status (1)

Country Link
CN (1) CN106951530B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319692A (en) * 2018-02-01 2018-07-24 北京云知声信息技术有限公司 Abnormal punctuate cleaning method, storage medium and server
CN110032641A (en) * 2019-02-14 2019-07-19 阿里巴巴集团控股有限公司 Method and device that computer executes, that event extraction is carried out using neural network
CN110209807A (en) * 2018-07-03 2019-09-06 腾讯科技(深圳)有限公司 A kind of method of event recognition, the method for model training, equipment and storage medium
CN111310461A (en) * 2020-01-15 2020-06-19 腾讯云计算(北京)有限责任公司 Event element extraction method, device, equipment and storage medium
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
CN111522915A (en) * 2020-04-20 2020-08-11 北大方正集团有限公司 Extraction method, device and equipment of Chinese event and storage medium
CN111985152A (en) * 2020-07-28 2020-11-24 浙江大学 Event classification method based on bipartite hypersphere prototype network
CN112487171A (en) * 2020-12-15 2021-03-12 中国人民解放军国防科技大学 Event extraction system and method under open domain
CN116611514A (en) * 2023-07-19 2023-08-18 中国科学技术大学 Value orientation evaluation system construction method based on data driving

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029560A1 (en) * 2000-04-05 2001-10-11 Hugo Delchini Computer farm with a system for the hot insertion/extraction of processor cards
CN104462229A (en) * 2014-11-13 2015-03-25 苏州大学 Event classification method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010029560A1 (en) * 2000-04-05 2001-10-11 Hugo Delchini Computer farm with a system for the hot insertion/extraction of processor cards
CN104462229A (en) * 2014-11-13 2015-03-25 苏州大学 Event classification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵妍妍: "中文事件抽取中事件类别的自动识别", 《第三届学生计算语言学研讨会论文集》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319692A (en) * 2018-02-01 2018-07-24 北京云知声信息技术有限公司 Abnormal punctuate cleaning method, storage medium and server
CN108319692B (en) * 2018-02-01 2021-03-19 云知声智能科技股份有限公司 Abnormal punctuation cleaning method, storage medium and server
CN110209807A (en) * 2018-07-03 2019-09-06 腾讯科技(深圳)有限公司 A kind of method of event recognition, the method for model training, equipment and storage medium
US11972213B2 (en) 2018-07-03 2024-04-30 Tencent Technology (Shenzhen) Company Limited Event recognition method and apparatus, model training method and apparatus, and storage medium
CN110032641A (en) * 2019-02-14 2019-07-19 阿里巴巴集团控股有限公司 Method and device that computer executes, that event extraction is carried out using neural network
CN110032641B (en) * 2019-02-14 2024-02-13 创新先进技术有限公司 Method and device for extracting event by using neural network and executed by computer
CN111310461B (en) * 2020-01-15 2023-03-21 腾讯云计算(北京)有限责任公司 Event element extraction method, device, equipment and storage medium
CN111310461A (en) * 2020-01-15 2020-06-19 腾讯云计算(北京)有限责任公司 Event element extraction method, device, equipment and storage medium
CN111382575A (en) * 2020-03-19 2020-07-07 电子科技大学 Event extraction method based on joint labeling and entity semantic information
CN111522915A (en) * 2020-04-20 2020-08-11 北大方正集团有限公司 Extraction method, device and equipment of Chinese event and storage medium
CN111985152A (en) * 2020-07-28 2020-11-24 浙江大学 Event classification method based on bipartite hypersphere prototype network
CN111985152B (en) * 2020-07-28 2022-09-13 浙江大学 Event classification method based on dichotomy hypersphere prototype network
CN112487171A (en) * 2020-12-15 2021-03-12 中国人民解放军国防科技大学 Event extraction system and method under open domain
CN116611514A (en) * 2023-07-19 2023-08-18 中国科学技术大学 Value orientation evaluation system construction method based on data driving
CN116611514B (en) * 2023-07-19 2023-10-10 中国科学技术大学 Value orientation evaluation system construction method based on data driving

Also Published As

Publication number Publication date
CN106951530B (en) 2020-01-17

Similar Documents

Publication Publication Date Title
CN106951530A (en) A kind of event type abstracting method and device
Bharti et al. Sarcastic sentiment detection in tweets streamed in real time: a big data approach
US20180341871A1 (en) Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains
Gholamrezazadeh et al. A comprehensive survey on text summarization systems
Liu et al. Using wordnet to disambiguate word senses for text classification
Rahman et al. Improvement of query-based text summarization using word sense disambiguation
Rafeeque et al. A survey on short text analysis in web
Awajan Keyword extraction from Arabic documents using term equivalence classes
Bagalkotkar et al. A novel technique for efficient text document summarization as a service
Sharoff Classifying Web corpora into domain and genre using automatic feature identification
WO2015004006A1 (en) Method and computer server system for receiving and presenting information to a user in a computer network
Bohne et al. Efficient keyword extraction for meaningful document perception
Bella et al. Domain-based sense disambiguation in multilingual structured data
Moradi Frequent itemsets as meaningful events in graphs for summarizing biomedical texts
Tembhurnikar et al. Topic detection using BNgram method and sentiment analysis on twitter dataset
Jafari et al. Unsupervised keyword extraction for hashtag recommendation in social media
Chin et al. Automatic discovery of concepts from text
Amin et al. Algorithm for bengali keyword extraction
Ullah et al. Pattern and semantic analysis to improve unsupervised techniques for opinion target identification
Perrie et al. Using *** n-grams to expand word-emotion association lexicon
Ma et al. Combining n-gram and dependency word pair for multi-document summarization
Han et al. Mining Technical Topic Networks from Chinese Patents.
Heu et al. Multi-document summarization exploiting semantic analysis based on tag cluster
Gupta et al. Document summarisation based on sentence ranking using vector space model
Gayen et al. Automatic identification of Bengali noun-noun compounds using random forest

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Hong Yu

Inventor after: Yang Xuerong

Inventor after: Yao Jianmin

Inventor after: Zhu Qiaoming

Inventor before: Yang Xuerong

Inventor before: Hong Yu

Inventor before: Yao Jianmin

Inventor before: Zhu Qiaoming

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant