CN110298039A - Recognition methods, system, equipment and the computer readable storage medium of event - Google Patents
Recognition methods, system, equipment and the computer readable storage medium of event Download PDFInfo
- Publication number
- CN110298039A CN110298039A CN201910539293.3A CN201910539293A CN110298039A CN 110298039 A CN110298039 A CN 110298039A CN 201910539293 A CN201910539293 A CN 201910539293A CN 110298039 A CN110298039 A CN 110298039A
- Authority
- CN
- China
- Prior art keywords
- word
- place
- candidate locations
- title
- administrative division
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/258—Heading extraction; Automatic titling; Numbering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present invention provides recognition methods, system, equipment and the computer readable storage medium of a kind of event.This method comprises: extracting the candidate locations word in event information, the event information includes title and text;By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that the identification model identifies whether the candidate locations word is venue location in the sentence of the place, the place sentence is the sentence where the place word.The embodiment of the present invention can accurately identify venue location.
Description
Technical field
The present embodiments relate to field of communication technology more particularly to a kind of recognition methods of event, system, equipment and
Computer readable storage medium.
Background technique
Event map is the network that relationship using event as node, between event is side, and event node is by event
Each attributive character constitute, wherein place is one of important attribute of event, and therefore, the scene of identification events is for thing
The building of part map is most important.
Currently, there are some prior arts to can recognize that the spot in some events, but for existing simultaneously event
The scene of spot and event relatively, can not but distinguish the two, cause the accuracy of identification to venue location lower.
Summary of the invention
The embodiment of the present invention provides recognition methods, system, equipment and the computer readable storage medium of a kind of event, with
Improve the accuracy of identification to venue location.
In a first aspect, the embodiment of the present invention provides a kind of recognition methods of event, comprising: extract the time in event information
Selection of land point word, the event information includes title and text;By the candidate locations word and the corresponding title and place sentence
Input identification model trained in advance, so that the identification model trained in advance identifies whether the candidate locations word is described
Venue location in the sentence of place, the place sentence are the sentence where the place word.
Second aspect, the embodiment of the present invention provide a kind of identifying system of event, comprising: abstraction module, for extracting
Candidate locations word in event information, the event information include title and text;Input and identification module are used for the time
The selection of land point word identification model trained in advance with the corresponding title and place sentence input, so that the identification trained in advance
Model identifies whether the candidate locations word is venue location in the sentence of the place, and the place sentence is the place word institute
Sentence.
The third aspect, the embodiment of the present invention provide a kind of identification equipment of event, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality
Method described in existing first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program,
The computer program is executed by processor to realize method described in first aspect.
Recognition methods, system, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through
The candidate locations word in event information is extracted, the event information includes title and text;By the candidate locations word with it is corresponding
The title and the trained in advance identification model of place sentence input so that the identification model trained in advance identifies the time
Whether selection of land point word is venue location in the sentence of the place, and the place sentence is the sentence where the place word.Due to
Identification model is at identification events spot, it is contemplated that title, so as to distinguish venue location and event relatively, thus
Accurately identify venue location.
Detailed description of the invention
Fig. 1 is the recognition methods flow chart of event provided in an embodiment of the present invention;
Fig. 2 be another embodiment of the present invention provides event recognition methods flow chart;
Fig. 3 is the structural schematic diagram of the identifying system of event provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of the identification equipment of event provided in an embodiment of the present invention.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings
It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments
Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended
The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The recognition methods of event provided in an embodiment of the present invention can be applied to terminal device, smartwatch, plate electricity
The equipment such as brain.
The recognition methods of event provided in an embodiment of the present invention, it is intended to solve the technical problem as above of the prior art.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned
Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept
Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Fig. 1 is the recognition methods flow chart of event provided in an embodiment of the present invention.The embodiment of the present invention is directed to existing skill
The technical problem as above of art provides the recognition methods of event, and specific step is as follows for this method:
Candidate locations word in step 101, extraction event information, which includes title and text.
In the present embodiment, event information can be information, for example, Domestic News, Domestic News include title and
Text.Candidate locations word in extraction event information refers to extracts all place words from the title and text of event information.
For example, the title that a certain Domestic News are are as follows: it is obvious that the earthquake senses such as 4.1 grades of earthquake Leshan sichuan Yaan occur for Zhaotong County, Yunnan.
Text are as follows: 26 divide on Zhaotong County, Yunnan city, 4.1 grades of ground of (28.11 degree of north latitude, 103.63 degree of east longitude) generations, Yongshan County when 5 days 15 June
Shake, 8 km of the depth of focus.After the earthquake, there is Sichuan online friend feedback, the earthquake senses such as Leshan, Yibin, Yaan are obvious.Then from mark
Extracting candidate locations word in topic and text includes: " Yunnan " " Zhaotong City " " Yongshan County " " Sichuan " " Leshan " " Yaan ".
Step 102, the identification model for training candidate locations word in advance with corresponding title and place sentence input, so that in advance
Whether first trained identification model identification candidate locations word is venue location in the place sentence, wherein place sentence is ground
Sentence where point word.
Optionally, the identification mould that the candidate locations word is trained in advance with the corresponding title and place sentence input
Type so that identification model trained in advance identify the candidate locations word whether be venue location in the place sentence it
Before, the method for the embodiment of the present invention further include: construct identification model to be trained;Obtain training sample, the training sample packet
Include title and text;Extract the candidate locations word in the title and text of the training sample, and to the candidate locations word into
Rower note, obtains annotation results, the annotation results include whether the candidate locations word is place word, whether is event
Ground and whether be event relatively;By the annotation results of the candidate locations word, the corresponding place sentence of the candidate locations word, institute
It states in the corresponding title input identification model to be trained of candidate locations word;The identification model to be trained is instructed
Practice, until reaching preset training quota.
Specifically, the identification model trained in advance can be and be trained to obtain to two disaggregated models based on deep learning
, such as the fine-tune model based on bert is trained two disaggregated models based on deep learning and specifically includes that and obtain
It takes training sample and two processes is labeled to sample.Wherein, obtaining training sample can be from event map resources bank
The Domestic News of every field whithin a period of time are recalled, for example, the Domestic News in nearly 1 year, then from the training got
Preset quantity event, such as 2000 events are randomly selected in sample, and this 2000 events are arranged for event ID number, newly
Hear the format of link, title and text.And then all place words for including in title and text are extracted, to be used for subsequent people
Work mark.
Optionally, artificial mark, which comprises determining that whether title and text are object event, to be carried out to training sample, if mesh
Mark event, it is determined that whether include place word in title and text;If carrying out contingency table to the place word comprising place word
Note.Optionally, the place word not extracted for whether having omission can also be manually checked, if there is not extracting for omission
Place word, then extracted by way of manually extracting.
Further, the data marked are based on deep learning according to the format input of " title+place sentence, place word "
Two disaggregated models in be trained, whether be that event in the sentence of place is sent out to the place word of the mark with two disaggregated models of training
Radix Rehmanniae is learnt.In learning process, which can export a score, indicate the thing whether the place word belongs in the sentence of place
The score of the probability of part spot, output is bigger, indicates that the probability for the venue location that the place word belongs in the sentence of place is bigger.
When trained result reaches training quota, then terminate to train.Optionally, the probability value that training quota can be model output reaches
To probability threshold value.
Wherein, title can be used for distinguishing venue location and event relatively, and place sentence refers to the place extracted
Sentence where word.
After obtaining identification model by the training of above method step, so that it may which by the identification model, event occurs for identification
Ground.In use, then similar place word, place sentence and title are inputted, then model can identify whether the place word is ground
Venue location in point sentence.
The embodiment of the present invention includes title and just by extracting the candidate locations word in event information, the event information
Text;By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that the knowledge
Other model identifies whether the candidate locations word is venue location in the sentence of the place, and the place sentence is the place word
The sentence at place.Since identification model is at identification events spot, it is contemplated that title, so as to distinguish venue location and
Event relatively, to accurately identify venue location.
Optionally, the candidate locations word or the title and text for extracting the training sample in event information are extracted
In candidate locations word, including at least one of the following processing method:
The first: the geographical term in extracting header and text, as candidate locations word.Specifically, be judge title and
Whether include geographical term in text, if including geographical term, is extracted as place word.
Second: word cutting being carried out to title and text, and part of speech analysis is carried out to word cutting result, obtains candidate locations word.
Optionally, title and text can be segmented using existing participle tool, obtains several words, and then to this several
Word carries out part-of-speech tagging.Specifically, part-of-speech tagging include: to several words be labeled as termini generales, modifier, noun phrase,
Verb phrase etc..Finally remove function word, numeral-classifier compound, onomatopoeia, stop words etc. from this several word.
The third: is according to administrative division lexicon file, administrative division class place word in extracting header and text, as time
Selection of land point word.Specifically, being parsed to administrative division lexicon file, the places such as country, province, state, city, county, area are obtained
Word, and then judge the places words such as the country, province, state, city, county, the area that whether obtain comprising parsing in title and text, if packet
Contain, is then extracted as place word.
4th kind: canonical matching being carried out to the title and the text by canonical matching template, obtains candidate locations
Word.Specifically, being to be matched by canonical matching template to the sentence in title and text, to excavate potential candidate ground
Point word.Canonical matching template is for example held at (.*), attends to be located at (.*) [meeting | movable | forum].
It should be noted that the title and text in above four kinds of processing modes refer in event information or training sample
Title and text.
Optionally, the embodiment of the present invention can extract place word using the one of which in above-mentioned four kinds of modes, can also
Extract place word to select two kinds therein or three kinds, it is, of course, also possible to according to above-mentioned four kinds of modes successively to title and
Text is handled, and candidate locations word is extracted.Preferable mode is according to above-mentioned four kinds of modes successively to title and text
It is handled, extracts candidate locations word.It is potentially waited in title and text in such manner, it is possible to guarantee utmostly to excavate
Selection of land point word.
Optionally, the identification mould that the candidate locations word is trained in advance with the corresponding title and place sentence input
Type, so that the identification model identifies whether the candidate locations word is the present invention after venue location in the sentence of the place
The method of embodiment further include: the corresponding candidate locations word processing of the venue location that will identify that is the ground of preset format
Location.For example, the corresponding candidate locations word of the venue location identified to above-described embodiment carries out province, city, county/Qu Denghang
The mapping of political affairs unit is processed into the place of the Pyatyi format of " country-province/state-city-county/area-address ".If city
Upper level administrative division is province, state, then final process obtain the result is that country-province/state-city address.If city
Upper level be country, then final process obtain the result is that country-city address.
Fig. 2 be another embodiment of the present invention provides event recognition methods flow chart.On the basis of above-described embodiment
On, the corresponding candidate locations word processing of the venue location that will identify that is the address of preset format, comprising:
Step 201 segments candidate locations word;
Step 202 carries out part of speech analysis to word segmentation result, obtains fine-grained place word;
Since the candidate locations word that previous embodiment extracts may be more coarse, for example, " Changsha in Hunan Province " this
The format of sample.It is therefore possible to use existing participle tool further segments the candidate locations word extracted, obtain
More fine-grained place word, the place word in available more fine-grained " Hunan Province " and " Changsha " after further segmenting.
Optionally, part of speech analysis is carried out to word segmentation result, comprising: termini generales, modifier, name are labeled as to several words
Word phrase, verb phrase etc. finally remove the words such as function word, numeral-classifier compound, onomatopoeia, stop words from this several word.
Step 203, in the case where fine-grained place word belongs to administrative division class place word, using administrative division dictionary
Fine-grained place word is handled to the address for preset format.Optionally, administrative division class place word includes xx province, the city xx, xx
The administrative divisions class place such as county, the area xx, the town xx word.For example, if a certain fine-grained place word is Changsha, by administration
The upper level administrative division class place word that Changsha is determined in zoning dictionary is Hunan Province, and it is default in this way to obtain Changsha, Hunan
The address of format.
Optionally, in the case where fine-grained place word belongs to administrative division class place word, using administrative division dictionary
Fine-grained place word is handled to the address for preset format, comprising: belong to administrative division class in the fine-grained place word
In the case where the word of place, according to administrative division dictionary, the corresponding upper level administrative division of administrative division class place word is obtained
Place word, until getting highest administrative division place word;It is to include according to row by the processing of administrative division class place word
Rank is drawn step by step upwards until the address of highest administrative division place word in administrative division.
Optionally, as shown in Fig. 2, carrying out part of speech analysis to word segmentation result, after obtaining fine-grained place word, the present invention is real
The method for applying example further include:
Step 204, in the case where fine-grained place word belongs to organization's class place word, using preset entity and
The address that the mapping relations in place are handled fine-grained place word as preset format.
Optionally, in the case where fine-grained place word belongs to institutional framework class place word, using preset entity and
The address that the mapping relations in place are handled fine-grained place word as preset format, comprising: in the fine-grained place word
In the case where belonging to organization's class place word, according to the mapping relations of preset entity and place, the tissue is successively obtained
The corresponding upper level place word of mechanism class place word, until highest place word;It is by the processing of organization's class place word
From organization's class place word step by step upwards up to the address of highest place word.
Optionally, organization's class place word includes the entities such as the street xx, the community xx, xx cell, xx school, the mansion xx.
For example, if a certain fine-grained place Ci Weixx university, it can be according at the mapping relations in preset entity and place
Reason is country-province/state-city-county/- xx university, area corresponding better address preset format.Wherein, entity can be a room
Son is also possible to a retail shop, can also be a mailbox or a bus station.Since there may be the realities of same names
Body, therefore, an entity may finally obtain multiple addresses, and finally obtained is an address list.
In addition, in order to increase the robustness of identification model, meeting beating to the candidate locations word extracted according to identification model
Point, come determine score be more than score threshold candidate locations word, and by these scores be more than score threshold candidate locations word all
Processing is the address of preset format.
For example, identification model is 0.9 to the marking of the place word " Xinhua District " identified, the marking of " Shijiazhuang Xinhua District "
It is 0.8, the marking in " Beijing " is 0.4, and score threshold is set as 0.5, then " Beijing " this place word is filtered out, by " Xinhua District "
" Shijiazhuang Xinhua District " processing is the address of preset format.
Further, it by taking Xinhua District as an example, first carries out segmenting and what part of speech was analyzed is " Xinhua District ", in administrative division
Xinhua District is searched in dictionary, it is assumed that obtain 2 as a result, being Xinhua District -> [Shijiazhuang, Cangzhou], then address after being handled respectively
Format are as follows: " China-Hebei province-Shijiazhuang City-Xinhua District -0.45, China-Hebei province-Cangzhou City-Xinhua District -0.45 ".
Similarly, Shijiazhuang Xinhua District, address format after being handled are as follows: " China-river are searched in administrative division dictionary
Bei Sheng-Shijiazhuang-Xinhua District -- 0.8 ".
Above-mentioned two result is merged again to obtain " China-Hebei province-Shijiazhuang-Xinhua District -- 1.25, China-Hebei province-
Cangzhou City-Xinhua District -- 0.45 ", it is carried out according to the level order of administrative division from highest level-one to the end in merging process
Merge, such as is merged according to the sequence in country, province, city, county.
Fig. 3 is the structural schematic diagram of the identifying system of event provided in an embodiment of the present invention.The embodiment of the present invention provides
Event identifying system can execute event recognition methods embodiment provide process flow, as shown in figure 3, event
The identifying system 30 on ground includes: abstraction module 31 and input and identification module 32;Wherein, abstraction module 31, for extracting event
Candidate locations word in information, the event information include title and text;Input and identification module 32 are used for the candidate
The place word identification model trained in advance with the corresponding title and place sentence input, so that described in identification model identification
Whether candidate locations word is venue location in the sentence of the place, and the place sentence is the sentence where the place word.
Optionally, the system 30 of the embodiment of the present invention further include: building module 33 obtains module 34,35 and of input module
Training module 36;Wherein, module 33 is constructed, for constructing identification model to be trained;Module 34 is obtained, for obtaining trained sample
This, the training sample includes title and text;Abstraction module 31, in the title and text for being also used to extract the training sample
Candidate locations word, and the candidate locations word is labeled, obtains annotation results, the annotation results include the candidate
Place word whether be place word, whether be venue location and whether be event relatively;Input module 35 is used for the time
The corresponding place sentence of the annotation results of selection of land point word, the candidate locations word, the corresponding title of the candidate locations word input institute
It states in identification model to be trained;Training module 36, for being trained to the identification model to be trained, until reaching pre-
If training quota.
Optionally, the abstraction module 31 extracts candidate locations word or the extraction trained sample in event information
When candidate locations word in this title and text, including at least one of the following processing: extract the title and it is described just
Geographical term in text, as the candidate locations word;Word cutting carried out to the title and the text, and to word cutting result into
The analysis of row part of speech, obtains the candidate locations word;According to administrative division lexicon file, extract in the title and the text
Administrative division class place word, as candidate locations word;Canonical is carried out to the title and the text by canonical matching template
Matching, obtains candidate locations word.
Optionally, the system 30 of the embodiment of the present invention further includes processing module 37;Wherein, processing module 37, for that will know
Not Chu the corresponding candidate locations word processing of the venue location be preset format address.
Optionally, the corresponding candidate locations word processing of the venue location that the processing module 37 will identify that
For preset format address when, be specifically used for: the candidate locations word segmented;Part of speech analysis is carried out to word segmentation result,
Obtain fine-grained place word;In the case where the fine-grained place word belongs to administrative division class place word, using administration
The fine-grained place word is handled the address for preset format by zoning dictionary.
Optionally, the processing module 37 is also used to the case where fine-grained place word belongs to organization's class place word
Under, use the mapping relations in preset entity and place that fine-grained place word is handled to the address for preset format.
Optionally, processing module 37 is adopted in the case where the fine-grained place word belongs to administrative division class place word
When the fine-grained place word being handled the address for preset format with administrative division dictionary, it is specifically used for: in the particulate
In the case that the place word of degree belongs to administrative division class place word, according to administrative division dictionary, with obtaining the administrative division class
Point word corresponding upper level administrative division place word, until getting highest administrative division place word;By the administrative division
Word processing in class place is to include according to administrative division rank step by step upwards until the address of highest administrative division place word.
Optionally, the processing module 37 is the case where the fine-grained place word belongs to institutional framework class place word
Under, when using the mapping relations in preset entity and place that the fine-grained place word is handled the address for preset format,
It is specifically used for: in the case where the fine-grained place word belongs to organization's class place word, according to preset entity and ground
The mapping relations of point, successively obtain the corresponding upper level place word of organization's class place word, until highest place word;
It is from organization's class place word step by step upwards up to highest place word by the processing of organization's class place word
Address.The identifying system of the event of embodiment illustrated in fig. 3 can be used for executing the technical solution of above method embodiment, realize
Principle is similar with technical effect, and details are not described herein again.
Recognition methods, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through extraction
Candidate locations word in event information, the event information includes title and text, by the candidate locations word and corresponding institute
State title and the trained in advance identification model of place sentence input so that the identification model identify the candidate locations word whether be
Venue location in the place sentence, the place sentence are the sentence where the place word.Since identification model is identifying
When venue location, it is contemplated that title, so as to distinguish venue location and event relatively, to accurately identify event hair
Radix Rehmanniae.
Fig. 4 is the structural schematic diagram of the identification equipment of event provided in an embodiment of the present invention.The embodiment of the present invention mentions
The identification equipment of the event of confession can execute the process flow that the recognition methods embodiment of event provides, as shown in figure 4, thing
The identification equipment 40 on part ground includes: memory 41, processor 42, computer program and communication interface 43;Wherein, computer program
It is stored in memory 41, and is configured as the step of above method embodiment is executed by processor 42.
The identification equipment of the event of embodiment illustrated in fig. 4 can be used for executing the technical solution of above method embodiment,
The realization principle and technical effect are similar, and details are not described herein again.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored thereon with computer program, institute
It states computer program and is executed by processor recognition methods to realize event described in above-described embodiment.
Recognition methods, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through extraction
At least one place word in event information, the event information includes title and text;Will at least one described place word with
The corresponding title and the trained in advance identification model of place sentence input so that identification model identification it is described at least one
Venue location in the word of place, the place sentence are the sentence where the place word.Since identification model is in identification events
When spot, it is contemplated that title, so as to distinguish venue location and event relatively, to accurately identify event
Ground.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it
Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only
Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied
Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed
Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit
Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various
It can store the medium of program code.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each functional module
Division progress for example, in practical application, can according to need and above-mentioned function distribution is complete by different functional modules
At the internal structure of device being divided into different functional modules, to complete all or part of the functions described above.On
The specific work process for stating the device of description, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (18)
1. a kind of recognition methods of event characterized by comprising
The candidate locations word in event information is extracted, the event information includes title and text;
By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that described pre-
First trained identification model identifies whether the candidate locations word is venue location in the place sentence, and the place sentence is
Sentence where the place word.
2. the method according to claim 1, wherein described by the candidate locations word and the corresponding title
The identification model trained in advance with place sentence input, so that the identification model trained in advance identifies that the candidate locations word is
Before the no venue location in the place sentence, the method also includes:
Construct identification model to be trained;
Training sample is obtained, the training sample includes title and text;
The candidate locations word in the title and text of the training sample is extracted, and the candidate locations word is labeled, is obtained
To annotation results, the annotation results include the candidate locations word whether be place word, whether be venue location and whether
Relatively for event;
The annotation results of the candidate locations word, the corresponding place sentence of the candidate locations word and the candidate locations word is corresponding
The title input identification model to be trained in;
The identification model to be trained is trained, until reaching preset training quota.
3. according to the method described in claim 2, it is characterized in that, the candidate locations word extracted in event information or institute
State the candidate locations word in the title and text that extract the training sample, including at least one of the following processing:
The geographical term in the title and the text is extracted, as the candidate locations word;
Word cutting is carried out to the title and the text, and part of speech analysis is carried out to word cutting result, obtains the candidate locations word;
According to administrative division lexicon file, the administrative division class place word in the title and the text is extracted, as described
Candidate locations word;
Canonical matching is carried out to the title and the text by canonical matching template, obtains the candidate locations word.
4. method according to claim 1-3, which is characterized in that it is described by the candidate locations word with it is corresponding
The identification model that the title and place sentence input are trained in advance, so that the identification model trained in advance identifies the candidate
After whether place word is venue location in the sentence of the place, the method also includes:
The corresponding candidate locations word processing of the venue location that will identify that is the address of preset format.
5. according to the method described in claim 4, it is characterized in that, the corresponding institute of the venue location that will identify that
State the address that the processing of candidate locations word is preset format, comprising:
The candidate locations word is segmented;
Part of speech analysis is carried out to word segmentation result, obtains fine-grained place word;
It, will be described thin using administrative division dictionary in the case where the fine-grained place word belongs to administrative division class place word
The place word processing of granularity is the address of preset format.
6. according to the method described in claim 5, it is characterized in that, it is described to word segmentation result carry out part of speech analysis, obtain particulate
After the place word of degree, the method also includes:
In the case where the fine-grained place word belongs to organization's class place word, using reflecting for preset entity and place
It penetrates relationship and the fine-grained place word is handled to address for preset format.
7. method according to claim 5 or 6, which is characterized in that described to belong to administration in the fine-grained place word
In the case where the word of zoning class place, use administrative division dictionary that the fine-grained place word is handled to the ground for preset format
Location, comprising:
In the case where the fine-grained place word belongs to administrative division class place word, according to administrative division dictionary, institute is obtained
Administrative division class place word corresponding upper level administrative division place word is stated, until getting highest administrative division place word;
It is to include according to administrative division rank step by step upwards until highest administrative area by the processing of administrative division class place word
Draw the address of place word.
8. according to the method described in claim 6, it is characterized in that, described belong to institutional framework in the fine-grained place word
In the case where the word of class place, it is default for using the mapping relations in preset entity and place to handle the fine-grained place word
The address of format, comprising:
In the case where the fine-grained place word belongs to organization's class place word, according to reflecting for preset entity and place
Relationship is penetrated, the corresponding upper level place word of organization's class place word is successively obtained, until highest place word;
It is upward up to highest place step by step from organization's class place word by the processing of organization's class place word
The address of word.
9. a kind of identifying system of event characterized by comprising
Abstraction module, for extracting the candidate locations word in event information, the event information includes title and text;
Input and identification module, for train the candidate locations word in advance with the corresponding title and place sentence input
Identification model, so that the identification model trained in advance identifies whether the candidate locations word is event in the place sentence
Spot, the place sentence are the sentence where the place word.
10. system according to claim 9, which is characterized in that the system also includes:
Module is constructed, for constructing identification model to be trained;
Module is obtained, for obtaining training sample, the training sample includes title and text;
The abstraction module is also used to extract the candidate locations word in the title and text of the training sample, and to the time
Selection of land point word is labeled, and obtains annotation results, the annotation results include the candidate locations word whether be place word, whether
For venue location and whether be event relatively;
Input module, for by the corresponding place sentence of the annotation results of the candidate locations word, the candidate locations word and described
In the corresponding title input identification model to be trained of candidate locations word;
Training module, for being trained to the identification model to be trained, until reaching preset training quota.
11. system according to claim 10, which is characterized in that the abstraction module extracts the candidate ground in event information
When putting the candidate locations word in word or the title and text for extracting the training sample, including at least one of the following
Processing:
The geographical term in the title and the text is extracted, as the candidate locations word;
Word cutting is carried out to the title and the text, and part of speech analysis is carried out to word cutting result, obtains the candidate locations word;
According to administrative division lexicon file, the administrative division class place word in the title and the text is extracted, as candidate
Place word;
Canonical matching is carried out to the title and the text by canonical matching template, obtains candidate locations word.
12. according to the described in any item systems of claim 9-11, which is characterized in that the system also includes:
Processing module, the corresponding candidate locations word processing of the venue location for will identify that is preset format
Address.
13. system according to claim 12, which is characterized in that the event that the processing module will identify that occurs
When the corresponding candidate locations word processing in ground is the address of preset format, it is specifically used for:
The candidate locations word is segmented;
Part of speech analysis is carried out to word segmentation result, obtains fine-grained place word;
It, will be described thin using administrative division dictionary in the case where the fine-grained place word belongs to administrative division class place word
The place word processing of granularity is the address of preset format.
14. system according to claim 13, which is characterized in that the processing module be also used to it is described fine-grainedly
It, will be described fine-grained using the mapping relations in preset entity and place in the case that point word belongs to organization's class place word
Word processing in place is the address of preset format.
15. system described in 3 or 14 according to claim 1, which is characterized in that the processing module is in the fine-grained place
In the case that word belongs to administrative division class place word, it is default for using administrative division dictionary to handle the fine-grained place word
When the address of format, it is specifically used for:
In the case where the fine-grained place word belongs to administrative division class place word, according to administrative division dictionary, institute is obtained
Administrative division class place word corresponding upper level administrative division place word is stated, until getting highest administrative division place word;
It is to include according to administrative division rank step by step upwards until highest administrative area by the processing of administrative division class place word
Draw the address of place word.
16. system according to claim 14, which is characterized in that the processing module is in the fine-grained place word category
In the case where the word of institutional framework class place, using the mapping relations in preset entity and place by the fine-grained place word
When processing is the address of preset format, it is specifically used for:
In the case where the fine-grained place word belongs to organization's class place word, according to reflecting for preset entity and place
Relationship is penetrated, the corresponding upper level place word of organization's class place word is successively obtained, until highest place word;
It is upward up to highest place step by step from organization's class place word by the processing of organization's class place word
The address of word.
17. a kind of identification equipment of event characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as
Method described in any one of claims 1-8.
18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The method according to claim 1 is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910539293.3A CN110298039B (en) | 2019-06-20 | 2019-06-20 | Event place identification method, system, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910539293.3A CN110298039B (en) | 2019-06-20 | 2019-06-20 | Event place identification method, system, equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110298039A true CN110298039A (en) | 2019-10-01 |
CN110298039B CN110298039B (en) | 2023-05-30 |
Family
ID=68028381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910539293.3A Active CN110298039B (en) | 2019-06-20 | 2019-06-20 | Event place identification method, system, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110298039B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111090994A (en) * | 2019-11-12 | 2020-05-01 | 北京信息科技大学 | Chinese-internet-forum-text-oriented event place attribution province identification method |
CN111309861A (en) * | 2020-02-07 | 2020-06-19 | 中科鼎富(北京)科技发展有限公司 | Location extraction method, device, electronic equipment and computer readable storage medium |
CN112329469A (en) * | 2020-11-05 | 2021-02-05 | 新华智云科技有限公司 | Administrative region entity identification method and system |
CN113837472A (en) * | 2021-09-26 | 2021-12-24 | 杭州海康威视***技术有限公司 | Method and equipment for predicting event executive personnel |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5778402A (en) * | 1995-06-07 | 1998-07-07 | Microsoft Corporation | Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types |
CN102298635A (en) * | 2011-09-13 | 2011-12-28 | 苏州大学 | Method and system for fusing event information |
CN103020286A (en) * | 2012-12-27 | 2013-04-03 | 上海交通大学 | Internet ranking list grasping system based on ranking website |
CN104572958A (en) * | 2014-12-29 | 2015-04-29 | 中国科学院计算机网络信息中心 | Event extraction based sensitive information monitoring method |
CN104731768A (en) * | 2015-03-05 | 2015-06-24 | 西安交通大学城市学院 | Incident location extraction method oriented to Chinese news texts |
US20160019465A1 (en) * | 2014-07-18 | 2016-01-21 | PlaceIQ, Inc. | Analyzing Mobile-Device Location Histories To Characterize Consumer Behavior |
CN105630884A (en) * | 2015-12-18 | 2016-06-01 | 中国科学院信息工程研究所 | Geographic position discovery method for microblog hot event |
CN106464706A (en) * | 2014-04-18 | 2017-02-22 | 意大利电信股份公司 | Method and system for identifying significant locations through data obtainable from telecommunication network |
CN108153860A (en) * | 2017-12-25 | 2018-06-12 | 中译语通科技(青岛)有限公司 | A kind of geolocation analysis method based on multilingual news |
CN108415902A (en) * | 2018-02-10 | 2018-08-17 | 合肥工业大学 | A kind of name entity link method based on search engine |
CN108563655A (en) * | 2017-12-28 | 2018-09-21 | 北京百度网讯科技有限公司 | Text based event recognition method and device |
CN109740150A (en) * | 2018-12-20 | 2019-05-10 | 出门问问信息科技有限公司 | Address resolution method, device, computer equipment and computer readable storage medium |
-
2019
- 2019-06-20 CN CN201910539293.3A patent/CN110298039B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5778402A (en) * | 1995-06-07 | 1998-07-07 | Microsoft Corporation | Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types |
CN102298635A (en) * | 2011-09-13 | 2011-12-28 | 苏州大学 | Method and system for fusing event information |
CN103020286A (en) * | 2012-12-27 | 2013-04-03 | 上海交通大学 | Internet ranking list grasping system based on ranking website |
CN106464706A (en) * | 2014-04-18 | 2017-02-22 | 意大利电信股份公司 | Method and system for identifying significant locations through data obtainable from telecommunication network |
US20160019465A1 (en) * | 2014-07-18 | 2016-01-21 | PlaceIQ, Inc. | Analyzing Mobile-Device Location Histories To Characterize Consumer Behavior |
CN104572958A (en) * | 2014-12-29 | 2015-04-29 | 中国科学院计算机网络信息中心 | Event extraction based sensitive information monitoring method |
CN104731768A (en) * | 2015-03-05 | 2015-06-24 | 西安交通大学城市学院 | Incident location extraction method oriented to Chinese news texts |
CN105630884A (en) * | 2015-12-18 | 2016-06-01 | 中国科学院信息工程研究所 | Geographic position discovery method for microblog hot event |
CN108153860A (en) * | 2017-12-25 | 2018-06-12 | 中译语通科技(青岛)有限公司 | A kind of geolocation analysis method based on multilingual news |
CN108563655A (en) * | 2017-12-28 | 2018-09-21 | 北京百度网讯科技有限公司 | Text based event recognition method and device |
CN108415902A (en) * | 2018-02-10 | 2018-08-17 | 合肥工业大学 | A kind of name entity link method based on search engine |
CN109740150A (en) * | 2018-12-20 | 2019-05-10 | 出门问问信息科技有限公司 | Address resolution method, device, computer equipment and computer readable storage medium |
Non-Patent Citations (4)
Title |
---|
SHUN ABE 等: ""Predicting the Occurrence of Life Events from User"s Tweet History"", 《 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC)》 * |
张松: ""同一新闻事件识别研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
李贞昊: ""基于地理位置的新闻事件收集与分析技术的研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
杨继文等: "利用地名语义实现Web地震事件空间信息提取", 《测绘地理信息》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111090994A (en) * | 2019-11-12 | 2020-05-01 | 北京信息科技大学 | Chinese-internet-forum-text-oriented event place attribution province identification method |
CN111309861A (en) * | 2020-02-07 | 2020-06-19 | 中科鼎富(北京)科技发展有限公司 | Location extraction method, device, electronic equipment and computer readable storage medium |
CN112329469A (en) * | 2020-11-05 | 2021-02-05 | 新华智云科技有限公司 | Administrative region entity identification method and system |
CN112329469B (en) * | 2020-11-05 | 2023-12-19 | 新华智云科技有限公司 | Administrative region entity identification method and system |
CN113837472A (en) * | 2021-09-26 | 2021-12-24 | 杭州海康威视***技术有限公司 | Method and equipment for predicting event executive personnel |
CN113837472B (en) * | 2021-09-26 | 2024-03-12 | 杭州海康威视***技术有限公司 | Method and equipment for predicting event executives |
Also Published As
Publication number | Publication date |
---|---|
CN110298039B (en) | 2023-05-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298039A (en) | Recognition methods, system, equipment and the computer readable storage medium of event | |
CN107766371B (en) | Text information classification method and device | |
CN104408093B (en) | A kind of media event key element abstracting method and device | |
CN109299271B (en) | Training sample generation method, text data method, public opinion event classification method and related equipment | |
CN102262634B (en) | Automatic questioning and answering method and system | |
CN106156365A (en) | A kind of generation method and device of knowledge mapping | |
CN107343223A (en) | The recognition methods of video segment and device | |
CN108763212A (en) | A kind of address information extraction method and device | |
CN106886567A (en) | Microblogging incident detection method and device based on semantic extension | |
CN104731768A (en) | Incident location extraction method oriented to Chinese news texts | |
CN108304424B (en) | Text keyword extraction method and text keyword extraction device | |
CN109740159B (en) | Processing method and device for named entity recognition | |
WO2019227581A1 (en) | Interest point recognition method, apparatus, terminal device, and storage medium | |
CN104899335A (en) | Method for performing sentiment classification on network public sentiment of information | |
CN112527933A (en) | Chinese address association method based on space position and text training | |
CN109101518A (en) | Phonetic transcription text quality appraisal procedure, device, terminal and readable storage medium storing program for executing | |
CN111488468A (en) | Geographic information knowledge point extraction method and device, storage medium and computer equipment | |
CN109299469A (en) | A method of identifying complicated address in long text | |
CN112287082A (en) | Data processing method, device, equipment and storage medium combining RPA and AI | |
CN112613321A (en) | Method and system for extracting entity attribute information in text | |
CN116414823A (en) | Address positioning method and device based on word segmentation model | |
CN112685550A (en) | Intelligent question answering method, device, server and computer readable storage medium | |
CN109086306A (en) | The extracting method of atomic event label based on mixed hidden Markov model | |
CN112363996B (en) | Method, system and medium for establishing physical model of power grid knowledge graph | |
CN108241609B (en) | Ranking sentence identification method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |