CN110298039A - Recognition methods, system, equipment and the computer readable storage medium of event - Google Patents

Recognition methods, system, equipment and the computer readable storage medium of event Download PDF

Info

Publication number
CN110298039A
CN110298039A CN201910539293.3A CN201910539293A CN110298039A CN 110298039 A CN110298039 A CN 110298039A CN 201910539293 A CN201910539293 A CN 201910539293A CN 110298039 A CN110298039 A CN 110298039A
Authority
CN
China
Prior art keywords
word
place
candidate locations
title
administrative division
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910539293.3A
Other languages
Chinese (zh)
Other versions
CN110298039B (en
Inventor
韩翠云
陈玉光
刘远圳
潘禄
施茜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910539293.3A priority Critical patent/CN110298039B/en
Publication of CN110298039A publication Critical patent/CN110298039A/en
Application granted granted Critical
Publication of CN110298039B publication Critical patent/CN110298039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides recognition methods, system, equipment and the computer readable storage medium of a kind of event.This method comprises: extracting the candidate locations word in event information, the event information includes title and text;By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that the identification model identifies whether the candidate locations word is venue location in the sentence of the place, the place sentence is the sentence where the place word.The embodiment of the present invention can accurately identify venue location.

Description

Recognition methods, system, equipment and the computer readable storage medium of event
Technical field
The present embodiments relate to field of communication technology more particularly to a kind of recognition methods of event, system, equipment and Computer readable storage medium.
Background technique
Event map is the network that relationship using event as node, between event is side, and event node is by event Each attributive character constitute, wherein place is one of important attribute of event, and therefore, the scene of identification events is for thing The building of part map is most important.
Currently, there are some prior arts to can recognize that the spot in some events, but for existing simultaneously event The scene of spot and event relatively, can not but distinguish the two, cause the accuracy of identification to venue location lower.
Summary of the invention
The embodiment of the present invention provides recognition methods, system, equipment and the computer readable storage medium of a kind of event, with Improve the accuracy of identification to venue location.
In a first aspect, the embodiment of the present invention provides a kind of recognition methods of event, comprising: extract the time in event information Selection of land point word, the event information includes title and text;By the candidate locations word and the corresponding title and place sentence Input identification model trained in advance, so that the identification model trained in advance identifies whether the candidate locations word is described Venue location in the sentence of place, the place sentence are the sentence where the place word.
Second aspect, the embodiment of the present invention provide a kind of identifying system of event, comprising: abstraction module, for extracting Candidate locations word in event information, the event information include title and text;Input and identification module are used for the time The selection of land point word identification model trained in advance with the corresponding title and place sentence input, so that the identification trained in advance Model identifies whether the candidate locations word is venue location in the sentence of the place, and the place sentence is the place word institute Sentence.
The third aspect, the embodiment of the present invention provide a kind of identification equipment of event, comprising:
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor with reality Method described in existing first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, The computer program is executed by processor to realize method described in first aspect.
Recognition methods, system, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through The candidate locations word in event information is extracted, the event information includes title and text;By the candidate locations word with it is corresponding The title and the trained in advance identification model of place sentence input so that the identification model trained in advance identifies the time Whether selection of land point word is venue location in the sentence of the place, and the place sentence is the sentence where the place word.Due to Identification model is at identification events spot, it is contemplated that title, so as to distinguish venue location and event relatively, thus Accurately identify venue location.
Detailed description of the invention
Fig. 1 is the recognition methods flow chart of event provided in an embodiment of the present invention;
Fig. 2 be another embodiment of the present invention provides event recognition methods flow chart;
Fig. 3 is the structural schematic diagram of the identifying system of event provided in an embodiment of the present invention;
Fig. 4 is the structural schematic diagram of the identification equipment of event provided in an embodiment of the present invention.
Through the above attached drawings, it has been shown that the specific embodiment of the disclosure will be hereinafter described in more detail.These attached drawings It is not intended to limit the scope of this disclosure concept by any means with verbal description, but is by referring to specific embodiments Those skilled in the art illustrate the concept of the disclosure.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all implementations consistent with this disclosure.On the contrary, they be only with it is such as appended The example of the consistent device and method of some aspects be described in detail in claims, the disclosure.
The recognition methods of event provided in an embodiment of the present invention can be applied to terminal device, smartwatch, plate electricity The equipment such as brain.
The recognition methods of event provided in an embodiment of the present invention, it is intended to solve the technical problem as above of the prior art.
How to be solved with technical solution of the specifically embodiment to technical solution of the present invention and the application below above-mentioned Technical problem is described in detail.These specific embodiments can be combined with each other below, for the same or similar concept Or process may repeat no more in certain embodiments.Below in conjunction with attached drawing, the embodiment of the present invention is described.
Fig. 1 is the recognition methods flow chart of event provided in an embodiment of the present invention.The embodiment of the present invention is directed to existing skill The technical problem as above of art provides the recognition methods of event, and specific step is as follows for this method:
Candidate locations word in step 101, extraction event information, which includes title and text.
In the present embodiment, event information can be information, for example, Domestic News, Domestic News include title and Text.Candidate locations word in extraction event information refers to extracts all place words from the title and text of event information. For example, the title that a certain Domestic News are are as follows: it is obvious that the earthquake senses such as 4.1 grades of earthquake Leshan sichuan Yaan occur for Zhaotong County, Yunnan. Text are as follows: 26 divide on Zhaotong County, Yunnan city, 4.1 grades of ground of (28.11 degree of north latitude, 103.63 degree of east longitude) generations, Yongshan County when 5 days 15 June Shake, 8 km of the depth of focus.After the earthquake, there is Sichuan online friend feedback, the earthquake senses such as Leshan, Yibin, Yaan are obvious.Then from mark Extracting candidate locations word in topic and text includes: " Yunnan " " Zhaotong City " " Yongshan County " " Sichuan " " Leshan " " Yaan ".
Step 102, the identification model for training candidate locations word in advance with corresponding title and place sentence input, so that in advance Whether first trained identification model identification candidate locations word is venue location in the place sentence, wherein place sentence is ground Sentence where point word.
Optionally, the identification mould that the candidate locations word is trained in advance with the corresponding title and place sentence input Type so that identification model trained in advance identify the candidate locations word whether be venue location in the place sentence it Before, the method for the embodiment of the present invention further include: construct identification model to be trained;Obtain training sample, the training sample packet Include title and text;Extract the candidate locations word in the title and text of the training sample, and to the candidate locations word into Rower note, obtains annotation results, the annotation results include whether the candidate locations word is place word, whether is event Ground and whether be event relatively;By the annotation results of the candidate locations word, the corresponding place sentence of the candidate locations word, institute It states in the corresponding title input identification model to be trained of candidate locations word;The identification model to be trained is instructed Practice, until reaching preset training quota.
Specifically, the identification model trained in advance can be and be trained to obtain to two disaggregated models based on deep learning , such as the fine-tune model based on bert is trained two disaggregated models based on deep learning and specifically includes that and obtain It takes training sample and two processes is labeled to sample.Wherein, obtaining training sample can be from event map resources bank The Domestic News of every field whithin a period of time are recalled, for example, the Domestic News in nearly 1 year, then from the training got Preset quantity event, such as 2000 events are randomly selected in sample, and this 2000 events are arranged for event ID number, newly Hear the format of link, title and text.And then all place words for including in title and text are extracted, to be used for subsequent people Work mark.
Optionally, artificial mark, which comprises determining that whether title and text are object event, to be carried out to training sample, if mesh Mark event, it is determined that whether include place word in title and text;If carrying out contingency table to the place word comprising place word Note.Optionally, the place word not extracted for whether having omission can also be manually checked, if there is not extracting for omission Place word, then extracted by way of manually extracting.
Further, the data marked are based on deep learning according to the format input of " title+place sentence, place word " Two disaggregated models in be trained, whether be that event in the sentence of place is sent out to the place word of the mark with two disaggregated models of training Radix Rehmanniae is learnt.In learning process, which can export a score, indicate the thing whether the place word belongs in the sentence of place The score of the probability of part spot, output is bigger, indicates that the probability for the venue location that the place word belongs in the sentence of place is bigger. When trained result reaches training quota, then terminate to train.Optionally, the probability value that training quota can be model output reaches To probability threshold value.
Wherein, title can be used for distinguishing venue location and event relatively, and place sentence refers to the place extracted Sentence where word.
After obtaining identification model by the training of above method step, so that it may which by the identification model, event occurs for identification Ground.In use, then similar place word, place sentence and title are inputted, then model can identify whether the place word is ground Venue location in point sentence.
The embodiment of the present invention includes title and just by extracting the candidate locations word in event information, the event information Text;By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that the knowledge Other model identifies whether the candidate locations word is venue location in the sentence of the place, and the place sentence is the place word The sentence at place.Since identification model is at identification events spot, it is contemplated that title, so as to distinguish venue location and Event relatively, to accurately identify venue location.
Optionally, the candidate locations word or the title and text for extracting the training sample in event information are extracted In candidate locations word, including at least one of the following processing method:
The first: the geographical term in extracting header and text, as candidate locations word.Specifically, be judge title and Whether include geographical term in text, if including geographical term, is extracted as place word.
Second: word cutting being carried out to title and text, and part of speech analysis is carried out to word cutting result, obtains candidate locations word. Optionally, title and text can be segmented using existing participle tool, obtains several words, and then to this several Word carries out part-of-speech tagging.Specifically, part-of-speech tagging include: to several words be labeled as termini generales, modifier, noun phrase, Verb phrase etc..Finally remove function word, numeral-classifier compound, onomatopoeia, stop words etc. from this several word.
The third: is according to administrative division lexicon file, administrative division class place word in extracting header and text, as time Selection of land point word.Specifically, being parsed to administrative division lexicon file, the places such as country, province, state, city, county, area are obtained Word, and then judge the places words such as the country, province, state, city, county, the area that whether obtain comprising parsing in title and text, if packet Contain, is then extracted as place word.
4th kind: canonical matching being carried out to the title and the text by canonical matching template, obtains candidate locations Word.Specifically, being to be matched by canonical matching template to the sentence in title and text, to excavate potential candidate ground Point word.Canonical matching template is for example held at (.*), attends to be located at (.*) [meeting | movable | forum].
It should be noted that the title and text in above four kinds of processing modes refer in event information or training sample Title and text.
Optionally, the embodiment of the present invention can extract place word using the one of which in above-mentioned four kinds of modes, can also Extract place word to select two kinds therein or three kinds, it is, of course, also possible to according to above-mentioned four kinds of modes successively to title and Text is handled, and candidate locations word is extracted.Preferable mode is according to above-mentioned four kinds of modes successively to title and text It is handled, extracts candidate locations word.It is potentially waited in title and text in such manner, it is possible to guarantee utmostly to excavate Selection of land point word.
Optionally, the identification mould that the candidate locations word is trained in advance with the corresponding title and place sentence input Type, so that the identification model identifies whether the candidate locations word is the present invention after venue location in the sentence of the place The method of embodiment further include: the corresponding candidate locations word processing of the venue location that will identify that is the ground of preset format Location.For example, the corresponding candidate locations word of the venue location identified to above-described embodiment carries out province, city, county/Qu Denghang The mapping of political affairs unit is processed into the place of the Pyatyi format of " country-province/state-city-county/area-address ".If city Upper level administrative division is province, state, then final process obtain the result is that country-province/state-city address.If city Upper level be country, then final process obtain the result is that country-city address.
Fig. 2 be another embodiment of the present invention provides event recognition methods flow chart.On the basis of above-described embodiment On, the corresponding candidate locations word processing of the venue location that will identify that is the address of preset format, comprising:
Step 201 segments candidate locations word;
Step 202 carries out part of speech analysis to word segmentation result, obtains fine-grained place word;
Since the candidate locations word that previous embodiment extracts may be more coarse, for example, " Changsha in Hunan Province " this The format of sample.It is therefore possible to use existing participle tool further segments the candidate locations word extracted, obtain More fine-grained place word, the place word in available more fine-grained " Hunan Province " and " Changsha " after further segmenting.
Optionally, part of speech analysis is carried out to word segmentation result, comprising: termini generales, modifier, name are labeled as to several words Word phrase, verb phrase etc. finally remove the words such as function word, numeral-classifier compound, onomatopoeia, stop words from this several word.
Step 203, in the case where fine-grained place word belongs to administrative division class place word, using administrative division dictionary Fine-grained place word is handled to the address for preset format.Optionally, administrative division class place word includes xx province, the city xx, xx The administrative divisions class place such as county, the area xx, the town xx word.For example, if a certain fine-grained place word is Changsha, by administration The upper level administrative division class place word that Changsha is determined in zoning dictionary is Hunan Province, and it is default in this way to obtain Changsha, Hunan The address of format.
Optionally, in the case where fine-grained place word belongs to administrative division class place word, using administrative division dictionary Fine-grained place word is handled to the address for preset format, comprising: belong to administrative division class in the fine-grained place word In the case where the word of place, according to administrative division dictionary, the corresponding upper level administrative division of administrative division class place word is obtained Place word, until getting highest administrative division place word;It is to include according to row by the processing of administrative division class place word Rank is drawn step by step upwards until the address of highest administrative division place word in administrative division.
Optionally, as shown in Fig. 2, carrying out part of speech analysis to word segmentation result, after obtaining fine-grained place word, the present invention is real The method for applying example further include:
Step 204, in the case where fine-grained place word belongs to organization's class place word, using preset entity and The address that the mapping relations in place are handled fine-grained place word as preset format.
Optionally, in the case where fine-grained place word belongs to institutional framework class place word, using preset entity and The address that the mapping relations in place are handled fine-grained place word as preset format, comprising: in the fine-grained place word In the case where belonging to organization's class place word, according to the mapping relations of preset entity and place, the tissue is successively obtained The corresponding upper level place word of mechanism class place word, until highest place word;It is by the processing of organization's class place word From organization's class place word step by step upwards up to the address of highest place word.
Optionally, organization's class place word includes the entities such as the street xx, the community xx, xx cell, xx school, the mansion xx. For example, if a certain fine-grained place Ci Weixx university, it can be according at the mapping relations in preset entity and place Reason is country-province/state-city-county/- xx university, area corresponding better address preset format.Wherein, entity can be a room Son is also possible to a retail shop, can also be a mailbox or a bus station.Since there may be the realities of same names Body, therefore, an entity may finally obtain multiple addresses, and finally obtained is an address list.
In addition, in order to increase the robustness of identification model, meeting beating to the candidate locations word extracted according to identification model Point, come determine score be more than score threshold candidate locations word, and by these scores be more than score threshold candidate locations word all Processing is the address of preset format.
For example, identification model is 0.9 to the marking of the place word " Xinhua District " identified, the marking of " Shijiazhuang Xinhua District " It is 0.8, the marking in " Beijing " is 0.4, and score threshold is set as 0.5, then " Beijing " this place word is filtered out, by " Xinhua District " " Shijiazhuang Xinhua District " processing is the address of preset format.
Further, it by taking Xinhua District as an example, first carries out segmenting and what part of speech was analyzed is " Xinhua District ", in administrative division Xinhua District is searched in dictionary, it is assumed that obtain 2 as a result, being Xinhua District -> [Shijiazhuang, Cangzhou], then address after being handled respectively Format are as follows: " China-Hebei province-Shijiazhuang City-Xinhua District -0.45, China-Hebei province-Cangzhou City-Xinhua District -0.45 ".
Similarly, Shijiazhuang Xinhua District, address format after being handled are as follows: " China-river are searched in administrative division dictionary Bei Sheng-Shijiazhuang-Xinhua District -- 0.8 ".
Above-mentioned two result is merged again to obtain " China-Hebei province-Shijiazhuang-Xinhua District -- 1.25, China-Hebei province- Cangzhou City-Xinhua District -- 0.45 ", it is carried out according to the level order of administrative division from highest level-one to the end in merging process Merge, such as is merged according to the sequence in country, province, city, county.
Fig. 3 is the structural schematic diagram of the identifying system of event provided in an embodiment of the present invention.The embodiment of the present invention provides Event identifying system can execute event recognition methods embodiment provide process flow, as shown in figure 3, event The identifying system 30 on ground includes: abstraction module 31 and input and identification module 32;Wherein, abstraction module 31, for extracting event Candidate locations word in information, the event information include title and text;Input and identification module 32 are used for the candidate The place word identification model trained in advance with the corresponding title and place sentence input, so that described in identification model identification Whether candidate locations word is venue location in the sentence of the place, and the place sentence is the sentence where the place word.
Optionally, the system 30 of the embodiment of the present invention further include: building module 33 obtains module 34,35 and of input module Training module 36;Wherein, module 33 is constructed, for constructing identification model to be trained;Module 34 is obtained, for obtaining trained sample This, the training sample includes title and text;Abstraction module 31, in the title and text for being also used to extract the training sample Candidate locations word, and the candidate locations word is labeled, obtains annotation results, the annotation results include the candidate Place word whether be place word, whether be venue location and whether be event relatively;Input module 35 is used for the time The corresponding place sentence of the annotation results of selection of land point word, the candidate locations word, the corresponding title of the candidate locations word input institute It states in identification model to be trained;Training module 36, for being trained to the identification model to be trained, until reaching pre- If training quota.
Optionally, the abstraction module 31 extracts candidate locations word or the extraction trained sample in event information When candidate locations word in this title and text, including at least one of the following processing: extract the title and it is described just Geographical term in text, as the candidate locations word;Word cutting carried out to the title and the text, and to word cutting result into The analysis of row part of speech, obtains the candidate locations word;According to administrative division lexicon file, extract in the title and the text Administrative division class place word, as candidate locations word;Canonical is carried out to the title and the text by canonical matching template Matching, obtains candidate locations word.
Optionally, the system 30 of the embodiment of the present invention further includes processing module 37;Wherein, processing module 37, for that will know Not Chu the corresponding candidate locations word processing of the venue location be preset format address.
Optionally, the corresponding candidate locations word processing of the venue location that the processing module 37 will identify that For preset format address when, be specifically used for: the candidate locations word segmented;Part of speech analysis is carried out to word segmentation result, Obtain fine-grained place word;In the case where the fine-grained place word belongs to administrative division class place word, using administration The fine-grained place word is handled the address for preset format by zoning dictionary.
Optionally, the processing module 37 is also used to the case where fine-grained place word belongs to organization's class place word Under, use the mapping relations in preset entity and place that fine-grained place word is handled to the address for preset format.
Optionally, processing module 37 is adopted in the case where the fine-grained place word belongs to administrative division class place word When the fine-grained place word being handled the address for preset format with administrative division dictionary, it is specifically used for: in the particulate In the case that the place word of degree belongs to administrative division class place word, according to administrative division dictionary, with obtaining the administrative division class Point word corresponding upper level administrative division place word, until getting highest administrative division place word;By the administrative division Word processing in class place is to include according to administrative division rank step by step upwards until the address of highest administrative division place word.
Optionally, the processing module 37 is the case where the fine-grained place word belongs to institutional framework class place word Under, when using the mapping relations in preset entity and place that the fine-grained place word is handled the address for preset format, It is specifically used for: in the case where the fine-grained place word belongs to organization's class place word, according to preset entity and ground The mapping relations of point, successively obtain the corresponding upper level place word of organization's class place word, until highest place word; It is from organization's class place word step by step upwards up to highest place word by the processing of organization's class place word Address.The identifying system of the event of embodiment illustrated in fig. 3 can be used for executing the technical solution of above method embodiment, realize Principle is similar with technical effect, and details are not described herein again.
Recognition methods, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through extraction Candidate locations word in event information, the event information includes title and text, by the candidate locations word and corresponding institute State title and the trained in advance identification model of place sentence input so that the identification model identify the candidate locations word whether be Venue location in the place sentence, the place sentence are the sentence where the place word.Since identification model is identifying When venue location, it is contemplated that title, so as to distinguish venue location and event relatively, to accurately identify event hair Radix Rehmanniae.
Fig. 4 is the structural schematic diagram of the identification equipment of event provided in an embodiment of the present invention.The embodiment of the present invention mentions The identification equipment of the event of confession can execute the process flow that the recognition methods embodiment of event provides, as shown in figure 4, thing The identification equipment 40 on part ground includes: memory 41, processor 42, computer program and communication interface 43;Wherein, computer program It is stored in memory 41, and is configured as the step of above method embodiment is executed by processor 42.
The identification equipment of the event of embodiment illustrated in fig. 4 can be used for executing the technical solution of above method embodiment, The realization principle and technical effect are similar, and details are not described herein again.
In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is stored thereon with computer program, institute It states computer program and is executed by processor recognition methods to realize event described in above-described embodiment.
Recognition methods, equipment and the computer readable storage medium of event provided in an embodiment of the present invention, pass through extraction At least one place word in event information, the event information includes title and text;Will at least one described place word with The corresponding title and the trained in advance identification model of place sentence input so that identification model identification it is described at least one Venue location in the word of place, the place sentence are the sentence where the place word.Since identification model is in identification events When spot, it is contemplated that title, so as to distinguish venue location and event relatively, to accurately identify event Ground.
In several embodiments provided by the present invention, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each functional module Division progress for example, in practical application, can according to need and above-mentioned function distribution is complete by different functional modules At the internal structure of device being divided into different functional modules, to complete all or part of the functions described above.On The specific work process for stating the device of description, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (18)

1. a kind of recognition methods of event characterized by comprising
The candidate locations word in event information is extracted, the event information includes title and text;
By the candidate locations word identification model trained in advance with the corresponding title and place sentence input, so that described pre- First trained identification model identifies whether the candidate locations word is venue location in the place sentence, and the place sentence is Sentence where the place word.
2. the method according to claim 1, wherein described by the candidate locations word and the corresponding title The identification model trained in advance with place sentence input, so that the identification model trained in advance identifies that the candidate locations word is Before the no venue location in the place sentence, the method also includes:
Construct identification model to be trained;
Training sample is obtained, the training sample includes title and text;
The candidate locations word in the title and text of the training sample is extracted, and the candidate locations word is labeled, is obtained To annotation results, the annotation results include the candidate locations word whether be place word, whether be venue location and whether Relatively for event;
The annotation results of the candidate locations word, the corresponding place sentence of the candidate locations word and the candidate locations word is corresponding The title input identification model to be trained in;
The identification model to be trained is trained, until reaching preset training quota.
3. according to the method described in claim 2, it is characterized in that, the candidate locations word extracted in event information or institute State the candidate locations word in the title and text that extract the training sample, including at least one of the following processing:
The geographical term in the title and the text is extracted, as the candidate locations word;
Word cutting is carried out to the title and the text, and part of speech analysis is carried out to word cutting result, obtains the candidate locations word;
According to administrative division lexicon file, the administrative division class place word in the title and the text is extracted, as described Candidate locations word;
Canonical matching is carried out to the title and the text by canonical matching template, obtains the candidate locations word.
4. method according to claim 1-3, which is characterized in that it is described by the candidate locations word with it is corresponding The identification model that the title and place sentence input are trained in advance, so that the identification model trained in advance identifies the candidate After whether place word is venue location in the sentence of the place, the method also includes:
The corresponding candidate locations word processing of the venue location that will identify that is the address of preset format.
5. according to the method described in claim 4, it is characterized in that, the corresponding institute of the venue location that will identify that State the address that the processing of candidate locations word is preset format, comprising:
The candidate locations word is segmented;
Part of speech analysis is carried out to word segmentation result, obtains fine-grained place word;
It, will be described thin using administrative division dictionary in the case where the fine-grained place word belongs to administrative division class place word The place word processing of granularity is the address of preset format.
6. according to the method described in claim 5, it is characterized in that, it is described to word segmentation result carry out part of speech analysis, obtain particulate After the place word of degree, the method also includes:
In the case where the fine-grained place word belongs to organization's class place word, using reflecting for preset entity and place It penetrates relationship and the fine-grained place word is handled to address for preset format.
7. method according to claim 5 or 6, which is characterized in that described to belong to administration in the fine-grained place word In the case where the word of zoning class place, use administrative division dictionary that the fine-grained place word is handled to the ground for preset format Location, comprising:
In the case where the fine-grained place word belongs to administrative division class place word, according to administrative division dictionary, institute is obtained Administrative division class place word corresponding upper level administrative division place word is stated, until getting highest administrative division place word;
It is to include according to administrative division rank step by step upwards until highest administrative area by the processing of administrative division class place word Draw the address of place word.
8. according to the method described in claim 6, it is characterized in that, described belong to institutional framework in the fine-grained place word In the case where the word of class place, it is default for using the mapping relations in preset entity and place to handle the fine-grained place word The address of format, comprising:
In the case where the fine-grained place word belongs to organization's class place word, according to reflecting for preset entity and place Relationship is penetrated, the corresponding upper level place word of organization's class place word is successively obtained, until highest place word;
It is upward up to highest place step by step from organization's class place word by the processing of organization's class place word The address of word.
9. a kind of identifying system of event characterized by comprising
Abstraction module, for extracting the candidate locations word in event information, the event information includes title and text;
Input and identification module, for train the candidate locations word in advance with the corresponding title and place sentence input Identification model, so that the identification model trained in advance identifies whether the candidate locations word is event in the place sentence Spot, the place sentence are the sentence where the place word.
10. system according to claim 9, which is characterized in that the system also includes:
Module is constructed, for constructing identification model to be trained;
Module is obtained, for obtaining training sample, the training sample includes title and text;
The abstraction module is also used to extract the candidate locations word in the title and text of the training sample, and to the time Selection of land point word is labeled, and obtains annotation results, the annotation results include the candidate locations word whether be place word, whether For venue location and whether be event relatively;
Input module, for by the corresponding place sentence of the annotation results of the candidate locations word, the candidate locations word and described In the corresponding title input identification model to be trained of candidate locations word;
Training module, for being trained to the identification model to be trained, until reaching preset training quota.
11. system according to claim 10, which is characterized in that the abstraction module extracts the candidate ground in event information When putting the candidate locations word in word or the title and text for extracting the training sample, including at least one of the following Processing:
The geographical term in the title and the text is extracted, as the candidate locations word;
Word cutting is carried out to the title and the text, and part of speech analysis is carried out to word cutting result, obtains the candidate locations word;
According to administrative division lexicon file, the administrative division class place word in the title and the text is extracted, as candidate Place word;
Canonical matching is carried out to the title and the text by canonical matching template, obtains candidate locations word.
12. according to the described in any item systems of claim 9-11, which is characterized in that the system also includes:
Processing module, the corresponding candidate locations word processing of the venue location for will identify that is preset format Address.
13. system according to claim 12, which is characterized in that the event that the processing module will identify that occurs When the corresponding candidate locations word processing in ground is the address of preset format, it is specifically used for:
The candidate locations word is segmented;
Part of speech analysis is carried out to word segmentation result, obtains fine-grained place word;
It, will be described thin using administrative division dictionary in the case where the fine-grained place word belongs to administrative division class place word The place word processing of granularity is the address of preset format.
14. system according to claim 13, which is characterized in that the processing module be also used to it is described fine-grainedly It, will be described fine-grained using the mapping relations in preset entity and place in the case that point word belongs to organization's class place word Word processing in place is the address of preset format.
15. system described in 3 or 14 according to claim 1, which is characterized in that the processing module is in the fine-grained place In the case that word belongs to administrative division class place word, it is default for using administrative division dictionary to handle the fine-grained place word When the address of format, it is specifically used for:
In the case where the fine-grained place word belongs to administrative division class place word, according to administrative division dictionary, institute is obtained Administrative division class place word corresponding upper level administrative division place word is stated, until getting highest administrative division place word;
It is to include according to administrative division rank step by step upwards until highest administrative area by the processing of administrative division class place word Draw the address of place word.
16. system according to claim 14, which is characterized in that the processing module is in the fine-grained place word category In the case where the word of institutional framework class place, using the mapping relations in preset entity and place by the fine-grained place word When processing is the address of preset format, it is specifically used for:
In the case where the fine-grained place word belongs to organization's class place word, according to reflecting for preset entity and place Relationship is penetrated, the corresponding upper level place word of organization's class place word is successively obtained, until highest place word;
It is upward up to highest place step by step from organization's class place word by the processing of organization's class place word The address of word.
17. a kind of identification equipment of event characterized by comprising
Memory;
Processor;And
Computer program;
Wherein, the computer program stores in the memory, and is configured as being executed by the processor to realize such as Method described in any one of claims 1-8.
18. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The method according to claim 1 is realized when being executed by processor.
CN201910539293.3A 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium Active CN110298039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910539293.3A CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910539293.3A CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110298039A true CN110298039A (en) 2019-10-01
CN110298039B CN110298039B (en) 2023-05-30

Family

ID=68028381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910539293.3A Active CN110298039B (en) 2019-06-20 2019-06-20 Event place identification method, system, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110298039B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090994A (en) * 2019-11-12 2020-05-01 北京信息科技大学 Chinese-internet-forum-text-oriented event place attribution province identification method
CN111309861A (en) * 2020-02-07 2020-06-19 中科鼎富(北京)科技发展有限公司 Location extraction method, device, electronic equipment and computer readable storage medium
CN112329469A (en) * 2020-11-05 2021-02-05 新华智云科技有限公司 Administrative region entity identification method and system
CN113837472A (en) * 2021-09-26 2021-12-24 杭州海康威视***技术有限公司 Method and equipment for predicting event executive personnel

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778402A (en) * 1995-06-07 1998-07-07 Microsoft Corporation Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN103020286A (en) * 2012-12-27 2013-04-03 上海交通大学 Internet ranking list grasping system based on ranking website
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
US20160019465A1 (en) * 2014-07-18 2016-01-21 PlaceIQ, Inc. Analyzing Mobile-Device Location Histories To Characterize Consumer Behavior
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
CN108153860A (en) * 2017-12-25 2018-06-12 中译语通科技(青岛)有限公司 A kind of geolocation analysis method based on multilingual news
CN108415902A (en) * 2018-02-10 2018-08-17 合肥工业大学 A kind of name entity link method based on search engine
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5778402A (en) * 1995-06-07 1998-07-07 Microsoft Corporation Method and system for auto-formatting a document using an event-based rule engine to format a document as the user types
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN103020286A (en) * 2012-12-27 2013-04-03 上海交通大学 Internet ranking list grasping system based on ranking website
CN106464706A (en) * 2014-04-18 2017-02-22 意大利电信股份公司 Method and system for identifying significant locations through data obtainable from telecommunication network
US20160019465A1 (en) * 2014-07-18 2016-01-21 PlaceIQ, Inc. Analyzing Mobile-Device Location Histories To Characterize Consumer Behavior
CN104572958A (en) * 2014-12-29 2015-04-29 中国科学院计算机网络信息中心 Event extraction based sensitive information monitoring method
CN104731768A (en) * 2015-03-05 2015-06-24 西安交通大学城市学院 Incident location extraction method oriented to Chinese news texts
CN105630884A (en) * 2015-12-18 2016-06-01 中国科学院信息工程研究所 Geographic position discovery method for microblog hot event
CN108153860A (en) * 2017-12-25 2018-06-12 中译语通科技(青岛)有限公司 A kind of geolocation analysis method based on multilingual news
CN108563655A (en) * 2017-12-28 2018-09-21 北京百度网讯科技有限公司 Text based event recognition method and device
CN108415902A (en) * 2018-02-10 2018-08-17 合肥工业大学 A kind of name entity link method based on search engine
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHUN ABE 等: ""Predicting the Occurrence of Life Events from User"s Tweet History"", 《 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC)》 *
张松: ""同一新闻事件识别研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
李贞昊: ""基于地理位置的新闻事件收集与分析技术的研究"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *
杨继文等: "利用地名语义实现Web地震事件空间信息提取", 《测绘地理信息》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090994A (en) * 2019-11-12 2020-05-01 北京信息科技大学 Chinese-internet-forum-text-oriented event place attribution province identification method
CN111309861A (en) * 2020-02-07 2020-06-19 中科鼎富(北京)科技发展有限公司 Location extraction method, device, electronic equipment and computer readable storage medium
CN112329469A (en) * 2020-11-05 2021-02-05 新华智云科技有限公司 Administrative region entity identification method and system
CN112329469B (en) * 2020-11-05 2023-12-19 新华智云科技有限公司 Administrative region entity identification method and system
CN113837472A (en) * 2021-09-26 2021-12-24 杭州海康威视***技术有限公司 Method and equipment for predicting event executive personnel
CN113837472B (en) * 2021-09-26 2024-03-12 杭州海康威视***技术有限公司 Method and equipment for predicting event executives

Also Published As

Publication number Publication date
CN110298039B (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN110298039A (en) Recognition methods, system, equipment and the computer readable storage medium of event
CN107766371B (en) Text information classification method and device
CN104408093B (en) A kind of media event key element abstracting method and device
CN109299271B (en) Training sample generation method, text data method, public opinion event classification method and related equipment
CN102262634B (en) Automatic questioning and answering method and system
CN106156365A (en) A kind of generation method and device of knowledge mapping
CN107343223A (en) The recognition methods of video segment and device
CN108763212A (en) A kind of address information extraction method and device
CN106886567A (en) Microblogging incident detection method and device based on semantic extension
CN104731768A (en) Incident location extraction method oriented to Chinese news texts
CN108304424B (en) Text keyword extraction method and text keyword extraction device
CN109740159B (en) Processing method and device for named entity recognition
WO2019227581A1 (en) Interest point recognition method, apparatus, terminal device, and storage medium
CN104899335A (en) Method for performing sentiment classification on network public sentiment of information
CN112527933A (en) Chinese address association method based on space position and text training
CN109101518A (en) Phonetic transcription text quality appraisal procedure, device, terminal and readable storage medium storing program for executing
CN111488468A (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN109299469A (en) A method of identifying complicated address in long text
CN112287082A (en) Data processing method, device, equipment and storage medium combining RPA and AI
CN112613321A (en) Method and system for extracting entity attribute information in text
CN116414823A (en) Address positioning method and device based on word segmentation model
CN112685550A (en) Intelligent question answering method, device, server and computer readable storage medium
CN109086306A (en) The extracting method of atomic event label based on mixed hidden Markov model
CN112363996B (en) Method, system and medium for establishing physical model of power grid knowledge graph
CN108241609B (en) Ranking sentence identification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant