CN109408819A - A kind of core place name extracting method and device based on natural language processing technique - Google Patents

A kind of core place name extracting method and device based on natural language processing technique Download PDF

Info

Publication number
CN109408819A
CN109408819A CN201811202492.7A CN201811202492A CN109408819A CN 109408819 A CN109408819 A CN 109408819A CN 201811202492 A CN201811202492 A CN 201811202492A CN 109408819 A CN109408819 A CN 109408819A
Authority
CN
China
Prior art keywords
place name
score
city
core
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811202492.7A
Other languages
Chinese (zh)
Other versions
CN109408819B (en
Inventor
段春先
尹展鹏
胡锐
程方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WUDA GEOINFORMATICS CO Ltd
Original Assignee
WUDA GEOINFORMATICS CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WUDA GEOINFORMATICS CO Ltd filed Critical WUDA GEOINFORMATICS CO Ltd
Priority to CN201811202492.7A priority Critical patent/CN109408819B/en
Publication of CN109408819A publication Critical patent/CN109408819A/en
Application granted granted Critical
Publication of CN109408819B publication Critical patent/CN109408819B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)
  • Character Discrimination (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention is suitable for technical field of geographic information, provides a kind of core place name extracting method and device based on natural language processing technique, described device includes: Chinese word segmentation dictionary production unit;Place name set acquiring unit;Frequency score computing unit;Importance score calculation unit;Relationship score calculation unit;Component score computing unit;Total score sequencing unit;Core place name judging unit.Core place name extraction algorithm step in the present invention is simple, should be readily appreciated that and realizes, can quickly be applied in production project.

Description

A kind of core place name extracting method and device based on natural language processing technique
Technical field
The invention belongs to technical field of geographic information more particularly to a kind of core place names based on natural language processing technique Extracting method and device.
Background technique
Place name identification is one of natural language processing name Entity recognition, and multiple place names can be extracted from text. It is divided according to the degree of correlation of content of text, place name can be divided into core place name, strong correlation place name and weak related place name, core Place name is place name most related to text subject, and strong correlation place name is that have certain associated place name with text subject, weak correlation place name It is the place name not high with the text subject degree of association.Application scenarios are monitored in internet public feelings, computer is to internet public feelings information When carrying out regional analysis, weak correlation place name generates interference to analysis, so that the precision of regional analysis is lower, it is difficult to extract text The core place name closely related with theme in notebook data.Existing natural language processing algorithm can only generally extract in text Place name, but the membership between place name cannot be expressed, these place names can not be pressed and be arranged with the text subject degree of correlation Sequence can not obtain the place name closely related with text subject.
The Chinese patent of application number CN201410381574.8, the entitled intelligent place name identification technology based on statistical model Intelligent place name identification technology based on statistical model takes the superior and the subordinate's place name identification, place name statistical model context identification, name Middle place name disappears the methods of qi, provides the high-accuracy of practical level to place name identification, but the program do not identify place name with The degree of correlation of text subject.
In view of the above deficiency, the core place name extracting method that the present invention provides a kind of based on natural language processing technique and Device can be extracted and the high place name of topic correlativity using this method and device from place names multiple in text.
Summary of the invention
In view of the above problems, the purpose of the present invention is to provide a kind of, and the core place name based on natural language processing technique mentions Take method and device, it is intended to solve weak related place name in prior art processing and interference is generated to analysis, so that the essence of regional analysis Accuracy is lower, it is difficult to extract the technical problems such as core place name closely related with theme in text data.
The present invention adopts the following technical scheme:
The core place name extracting method based on natural language processing technique includes the following steps:
Step S1: being made into one-to-one tables of data for national province, city, the title in county and its administrative division code, and It saves and is subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation dictionary;
Step S2: according to the Chinese word segmentation dictionary, using natural language processing tool to one section of specified text data into Row Chinese word segmentation, obtain include sentence element place name set;
Step S3: counting the number that all identical place names occur in place name set, calculates the frequency that each place name occurs Score;
Step S4: judging whether place name appears in the title of text data, and the place name importance score is calculated;
Step S5: each place name in place name set is subordinate to according to the national province, city, county's membership that are saved in step S1 Belong to the scoring of grade relationship, calculates membership score;
Step S6: according to ingredient of the place name each in place name set in sentence, place name component score is obtained;
Step S7: four scores of step S3-S6 are added, and obtain the topic correlativity score of place name, and are single with city Position carries out place name polymerization, calculates city and city and has all district place name topic correlativity score summations under its command, and by score from height to It is low to be ranked up;
Step S8: judging whether topic correlativity score summation peak reaches the minimum score value of core place name, if not Reach, then coreless place name in this article notebook data, if reached, chooses the highest urban place name of topic correlativity score summation For core place name.
Further, frequency score calculation formula described in step S3 is as follows:
Wherein SifFor frequency score, fiThe number occurred for i-th in place name set identical place name.
Further, place name importance score calculation formula described in step S4 is as follows:
Sit=St, wherein if place name appears in the title of text data, i.e. St=1, on the contrary St=0.
Further, membership score calculation formula is as follows in step S5:
Sir=Sr, wherein judging to obtain membership if higher level's place name of a place name appears in place name set Score Sr=0.5, on the contrary Sr=0.
Further, in step S6, sentence element of the place name in text data sentence is subject, the adverbial modifier, attribute or guest Language, place name component score mode are as follows:
Sic=Scz+Sch+Scd+Scb, wherein Scz、Sch、Scd、ScbPlace name is respectively represented as subject, the adverbial modifier, attribute and guest The score of language, the every appearance of place name is primary to calculate place name component score.
On the other hand, the core place name extraction element based on natural language processing technique includes such as lower unit:
Chinese word segmentation dictionary production unit: for national province, city, the title in county and its administrative division code to be made into one One corresponding tables of data, and save and be subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation Dictionary;
Place name set acquiring unit: it is used for according to the Chinese word segmentation dictionary, using natural language processing tool to one section Specified text data carries out Chinese word segmentation, obtain include sentence element place name set;
Frequency score computing unit: it for counting the number that all identical place names occur in place name set, calculates each The frequency score that place name occurs;
Importance score calculation unit: for judging whether place name appears in the title of text data, this is calculated Place name importance score;
Relationship score calculation unit: for according to the national province, city, county's membership saved in step S1 to place name set In each place name be subordinate to a grade relationship scoring, calculate membership score;
Component score computing unit: for the ingredient according to place name each in place name set in sentence, place name ingredient is obtained Score;
Total score sequencing unit: for frequency score, importance score, relationship score, component score four to be obtained split-phase Add, obtain the topic correlativity score of place name, and carry out place name polymerization as unit of city, calculates city and city has all districts under its command Place name topic correlativity score summation, and be ranked up from high to low by score;
Core place name judging unit: for judging it is minimum whether topic correlativity score summation peak reaches core place name Score value, if not up to, coreless place name in this article notebook data chooses topic correlativity score summation if reached Highest urban place name is core place name.
The beneficial effects of the present invention are: the present invention constructs a Rating Model according to the characteristics of place name occurs in text, A kind of frequency of occurrences based on place name, position, dependence, the description method of membership are defined, based at natural language Reason technology, establishes core place name identification algorithm, and the place name extracted in text information is established membership, and give each place name with The text subject degree of correlation, to achieve the purpose that core place name is extracted in identification from text information, proposes a kind of text-oriented Core place name extracting method, the present invention in core place name extraction algorithm step it is simple, should be readily appreciated that and realize, can quickly answer It uses in production project.
Detailed description of the invention
Fig. 1 is the core place name extracting method flow chart provided in an embodiment of the present invention based on natural language processing technique;
Fig. 2 is the core place name extraction element structure chart provided in an embodiment of the present invention based on natural language processing technique.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Embodiment one:
Fig. 1 shows the core place name extracting method process provided in an embodiment of the present invention based on natural language processing technique Figure, only parts related to embodiments of the present invention are shown for ease of description.
This method can extract core place name at county level and above county level from one section of specified text data.At one section In text information, wherein multiple place names may be mentioned to, some place names and file content theme are closely related, some place names with The content of text theme degree of association is little, and the little weak rigidity place name of the degree of association can generate interference to computer understanding text information. Present invention aim to extract core place name closely related with theme in text information.
Core place name extracting method provided in an embodiment of the present invention based on natural language processing technique includes the following steps:
Step S1: being made into one-to-one tables of data for national province, city, the title in county and its administrative division code, and It saves and is subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation dictionary.
In step of the present invention, national province, city, the title in county and administrative division code data are current with China's Statistical office The administrative division of publication and code are foundation.Grade relationship that is subordinate between place name is preserved in tables of data, for example province is the upper of city Grade, a province includes multiple cities, and city is the higher level in county (or area), and a city has several counties (or area).Make Chinese word segmentation dictionary The present embodiment uses HanNLP tool making.
Step S2: according to the Chinese word segmentation dictionary, using natural language processing tool to one section of specified text data into Row Chinese word segmentation, obtain include sentence element place name set.
Text is segmented using natural language processing tool (using HanNLP tool in this method) in this step.From Right Language Processing is a kind of various theory and methods realized and carry out efficient communication between people and computer with natural language, mainly Scope includes Chinese Automatic Word Segmentation, part-of-speech tagging, text classification, name Entity recognition, interdependent syntactic analysis, speech recognition, letter Breath retrieval, machine translation and autoabstract etc..In the method for the present invention, using the part-of-speech tagging in natural language processing tool Function, the name functions such as Entity recognition function and interdependent syntactic analysis.Assuming that the place name collection in this step is combined into M.Text data In title handled by natural language tool, the convenient a set of code of points next designed.
Step S3: counting the number that all identical place names occur in place name set, calculates the frequency that each place name occurs Score.
Frequency score calculation formula described in step S3 is as follows:
Wherein SifFor frequency score, fiThe number occurred for i-th in place name set identical place name.
In one section of text, after participle, there is multiple and different place names in text, and some place names go out in the text Existing number more than once, is segmented by natural language processing tool and is counted, different each place name in available place name set M The number occurred with same place name.According to above-mentioned formula, it is known that the number that same place name occurs is more than that there is no obviously for 3 times Difference, it may be said that bright this has biggish relationship with the place name herein, so needing exist for limit frequency score.
Step S4: judging whether place name appears in the title of text data, and the place name importance score is calculated.
Place name importance score calculation formula described in step S4 is as follows:
Sit=St, wherein if place name appears in the title of text data, i.e. St=1.0, on the contrary St=0.
Step S5: each place name in place name set is subordinate to according to the national province, city, county's membership that are saved in step S1 Belong to the scoring of grade relationship, calculates membership score.
Each place name has unique corresponding administrative division code.Including province, city, county.And administrative division code is named Rule is related with relationship between superior and subordinate is subordinate to, therefore this method can determine whether two place names belong to person in servitude by administrative division code Category relationship.
Membership score calculation formula is as follows in step S5:
Sir=Sr, wherein judging to obtain membership if higher level's place name of a place name appears in place name set Score Sr=0.5, on the contrary Sr=0.
In this step embodiment, as there is Wuhan City and Wuchang District simultaneously in place name set M, since Wuhan City is Wuchang The upper level place name in area, then Wuchang District obtains membership score.If there was only Wuhan City in place name set, place name is subordinate to Relationship is scored at 0.If occurring Hubei Province and Wuhan City in place name set simultaneously, Wuhan City obtains membership and obtains Point.
Step S6: according to ingredient of the place name each in place name set in sentence, place name component score is obtained.
In abovementioned steps S2, Chinese word segmentation is carried out to text data using right language processing tools, and obtained ground It include sentence element of each place name in text sentence in name set, place name is in sentence as different ingredients, importance Also there is different.In step S6, sentence element of the place name in text data sentence is subject, the adverbial modifier, attribute or object, ground Name component score mode are as follows:
Sic=Scz+Sch+Scd+Scb, wherein Scz、Sch、Scd、ScbPlace name is respectively represented as subject, the adverbial modifier, attribute and guest The score of language, the every appearance of place name is primary to calculate place name component score.
In embodiments of the present invention, place name appears in text data as subject, the adverbial modifier, attribute and object component score Constant is respectively 0.5,0.3,0.3,0.1.For example assume that a place name occurs twice, making respectively in sentence in the text For subject and object, therefore the place name component score of this place name is 0.6 point.
Step S7: four scores of step S3-S6 are added, and obtain the topic correlativity score of place name, and are single with city Position carries out place name polymerization, calculates city and city and has all district place name topic correlativity score summations under its command, and by score from height to It is low to be ranked up.
In this step, the topic correlativity score formula for obtaining place name is si=sif+sit+sir+sic, siIndicate place name With the degree of correlation of text data theme, the value is higher, indicates that the degree of correlation is higher.
Meanwhile it carrying out polymerizeing place name as unit of city, calculate city and having all district place name score summations under its command, and pressing Score is ranked up from high to low.Such as the Wuhan City place name set Zhong You, Hongshan District and Wuchang District, therefore city is that unit carries out pair Place name polymerization refers to that the Hongshan District and Wuchang District by Wuhan City and its junior count, calculates total place name topic correlativity Score summation.
Step S8: judging whether topic correlativity score summation peak reaches the minimum score value of core place name, if not Reach, then coreless place name in this article notebook data, if reached, chooses the highest urban place name of topic correlativity score summation For core place name.
After place name polymerization, the present invention counts the place name topic correlativity score summation in each city as unit of city, and value is maximum Urban place name, then judge whether maximum value is greater than minimum score value T, in this step, setting core place name is minimum Score value is that the value of T takes 1.6, i.e. T=1.6, if si> T, then core place name when the place name is in text data, otherwise be not.
An example is set forth below.
In internet public feelings monitoring system, needs to carry out territorial classification to internet text automatically, text is such as pressed into area Sort out in domain are as follows: Beijing, Shanghai, Wuhan, Guangzhou, Shenzhen etc., to facilitate monitoring personnel to find carriage relevant to affiliated area in time Feelings information.
Assuming that there is following public feelings information:
According to traditional place name extracting mode, " Harbin " " Beijing " " Shanghai " " Guangzhou " " Wuhan " " Xi'an " can be extracted The information can be classified as simultaneously area above by the place names vocabulary such as " Chengdu ", public sentiment monitoring system.
After the method for the present invention, have in place name set first Harbin,Songbei District, Beijing, Shanghai, Guangzhou, Wuhan, west Peace, Chengdu, these place names, wherein Harbin occurs 2 times, frequency score 0.86, other place names occur once, Frequency score is 0.63;Harbin appears in title, therefore HarbinPlace name importance is scored at 1, the weight of other place names The property wanted is scored at 0;In addition occur Harbin andSongbei District belongs to membership, therefore Songbei District membership is scored at 0.5, The membership of other place names is scored at 0;Harbin occurs twice, is the adverbial modifier, and it is also the adverbial modifier that other place names, which occur once, Therefore the place name component score in Harbin is 0.6, other ground entitled 0.3.
Finally counting, the place name topic correlativity in Harbin is scored at 2.46,Songbei District is scored at 1.43. other place names and obtains It is divided into 0.93.Then each urban place name is counted after place name polymerizationTopic correlativity score summation, Harbin score summation are 3.89, Beijing, Shanghai, Guangzhou, Wuhan, Xi'an, Chengdu score summation are 0.93.3.89 are greater than 1.6, therefore public sentiment monitors system for carriage The region of feelings information is classified as in " Harbin ", and Harbin is core place name, other place names occurred will be ignored in text, mentions The high precision of public sentiment territorial classification.
To sum up, scheme only extracts place name purely from text data compared with the existing technology, and the present invention will be in text information The place name of extraction establishes membership, and gives each place name and the text subject degree of correlation, and the place name result of extraction is conducive to calculate Owner's reason and good sense solution text information.Core place name extraction algorithm step in the present invention is simple, should be readily appreciated that and realizes, can quickly answer It uses in production project.
Embodiment two:
The core place name extraction element structure based on natural language processing technique that Fig. 2 shows provided in an embodiment of the present invention Figure, for completing the core place name extracting method based on natural language processing technique, illustrates only and this hair for ease of description The relevant part of bright embodiment.
The core place name extraction element based on natural language processing technique includes such as lower unit:
Chinese word segmentation dictionary production unit: for national province, city, the title in county and its administrative division code to be made into one One corresponding tables of data, and save and be subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation Dictionary;
Place name set acquiring unit: it is used for according to the Chinese word segmentation dictionary, using natural language processing tool to one section Specified text data carries out Chinese word segmentation, obtain include sentence element place name set;
Frequency score computing unit: it for counting the number that all identical place names occur in place name set, calculates each The frequency score that place name occurs;
Importance score calculation unit: for judging whether place name appears in the title of text data, this is calculated Place name importance score;
Relationship score calculation unit: for according to the national province, city, county's membership saved in step S1 to place name set In each place name be subordinate to a grade relationship scoring, calculate membership score;
Component score computing unit: for the ingredient according to place name each in place name set in sentence, place name ingredient is obtained Score;
Total score sequencing unit: for frequency score, importance score, relationship score, component score four to be obtained split-phase Add, obtain the topic correlativity score of place name, and carry out place name polymerization as unit of city, calculates city and city has all districts under its command Place name topic correlativity score summation, and be ranked up from high to low by score;
Core place name judging unit: for judging it is minimum whether topic correlativity score summation peak reaches core place name Score value, if not up to, coreless place name in this article notebook data chooses topic correlativity score summation if reached Highest urban place name is core place name.
The corresponding step S1-S8 realized in embodiment one of each functional unit provided in this embodiment, implemented Which is not described herein again for journey.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (6)

1. a kind of core place name extracting method based on natural language processing technique, which is characterized in that the method includes as follows Step:
Step S1: national province, city, the title in county and its administrative division code are made into one-to-one tables of data, and saved It is subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation dictionary;
Step S2: according to the Chinese word segmentation dictionary, using natural language processing tool in one section of specified text data progress Text participle, obtain include sentence element place name set;
Step S3: counting the number that all identical place names occur in place name set, calculates the frequency score that each place name occurs;
Step S4: judging whether place name appears in the title of text data, and the place name importance score is calculated;
Step S5: each place name in place name set is carried out being subordinate to grade according to the national province, city, county's membership that save in step S1 Relationship scoring, calculates membership score;
Step S6: according to ingredient of the place name each in place name set in sentence, place name component score is obtained;
Step S7: four scores of step S3-S6 being added, the topic correlativity score of place name is obtained, and as unit of city into The polymerization of row place name calculates city and city and has all district place name topic correlativity score summations under its command, and by score from high to low into Row sequence;
Step S8: judging whether topic correlativity score summation peak reaches the minimum score value of core place name, if not up to, Then coreless place name in this article notebook data, if reached, choosing the highest urban place name of topic correlativity score summation is core Heart name.
2. the core place name extracting method based on natural language processing technique as described in claim 1, which is characterized in that step S3 Described in frequency score calculation formula it is as follows:
Wherein SifFor frequency score, fiThe number occurred for i-th of place name in place name set.
3. the core place name extracting method based on natural language processing technique as described in claim 1, which is characterized in that step S4 Described in place name importance score calculation formula it is as follows:
Sit=St, wherein if place name appears in the title of text data, i.e. St=1, on the contrary St=0.
4. the core place name extracting method based on natural language processing technique as described in claim 1, which is characterized in that step S5 Middle membership score calculation formula is as follows:
Sir=Sr, wherein judging to obtain membership score S if higher level's place name of a place name appears in place name setr =0.5, on the contrary Sr=0.
5. the core place name extracting method based on natural language processing technique as described in claim 1, which is characterized in that step S6 In, sentence element of the place name in text data sentence is subject, the adverbial modifier, attribute or object, place name component score mode are as follows:
Sic=Scz+Sch+Scd+Scb, wherein Scz、Sch、Scd、ScbPlace name is respectively represented as subject, the adverbial modifier, attribute and object Score, the every appearance of place name is primary to calculate place name component score.
6. a kind of core place name extraction element based on natural language processing technique, which is characterized in that described device includes as follows Unit:
Chinese word segmentation dictionary production unit: for national province, city, the title in county and its administrative division code to be made into an a pair The tables of data answered, and save and be subordinate to a grade relationship between place name, the title in national province, city, county is made as Chinese word segmentation dictionary;
Place name set acquiring unit: it is used for according to the Chinese word segmentation dictionary, using natural language processing tool to a Duan Zhiding Text data carry out Chinese word segmentation, obtain include sentence element place name set;
Frequency score computing unit: for counting the number that all identical place names occur in place name set, each place name is calculated The frequency score of appearance;
Importance score calculation unit: for judging whether place name appears in the title of text data, the place name is calculated Importance score;
Relationship score calculation unit: for according to the national province, city, county's membership saved in step S1 to each in place name set Place name carries out being subordinate to a grade relationship scoring, calculates membership score;
Component score computing unit: for the ingredient according to place name each in place name set in sentence, place name component score is obtained;
Total score sequencing unit: it for being added four frequency score, importance score, relationship score, component score scores, obtains The topic correlativity score of place name is taken, and carries out place name polymerization as unit of city, city is calculated and city has all district place names under its command Topic correlativity score summation, and be ranked up from high to low by score;
Core place name judging unit: for judging whether topic correlativity score summation peak reaches the minimum score of core place name Value, if not up to, coreless place name in this article notebook data chooses topic correlativity score summation highest if reached Urban place name be core place name.
CN201811202492.7A 2018-10-16 2018-10-16 Core place name extraction method and device based on natural language processing technology Active CN109408819B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811202492.7A CN109408819B (en) 2018-10-16 2018-10-16 Core place name extraction method and device based on natural language processing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811202492.7A CN109408819B (en) 2018-10-16 2018-10-16 Core place name extraction method and device based on natural language processing technology

Publications (2)

Publication Number Publication Date
CN109408819A true CN109408819A (en) 2019-03-01
CN109408819B CN109408819B (en) 2023-05-16

Family

ID=65467264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811202492.7A Active CN109408819B (en) 2018-10-16 2018-10-16 Core place name extraction method and device based on natural language processing technology

Country Status (1)

Country Link
CN (1) CN109408819B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144121A (en) * 2019-12-27 2020-05-12 北大方正集团有限公司 Geographical name recognition method and device, electronic equipment and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN101661461A (en) * 2008-08-29 2010-03-03 阿里巴巴集团控股有限公司 Method and system for determining core geographic information in document
CN101876975A (en) * 2009-11-04 2010-11-03 中国科学院声学研究所 Identification method of Chinese place name
US20110093257A1 (en) * 2009-10-19 2011-04-21 Avraham Shpigel Information retrieval through indentification of prominent notions
JP2013257634A (en) * 2012-06-11 2013-12-26 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for extracting a pair of place name and word from document, and program
CN103577442A (en) * 2012-07-30 2014-02-12 腾讯科技(深圳)有限公司 Method and device for calculating map data importance
CN103853738A (en) * 2012-11-29 2014-06-11 中国科学院计算机网络信息中心 Identification method for webpage information related region
CN105824959A (en) * 2016-03-31 2016-08-03 首都信息发展股份有限公司 Public opinion monitoring method and system
CN107562717A (en) * 2017-07-24 2018-01-09 南京邮电大学 A kind of text key word abstracting method being combined based on Word2Vec with Term co-occurrence
CN108268443A (en) * 2017-12-21 2018-07-10 北京百度网讯科技有限公司 It determines the transfer of topic point and obtains the method, apparatus for replying text

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
CN101661461A (en) * 2008-08-29 2010-03-03 阿里巴巴集团控股有限公司 Method and system for determining core geographic information in document
US20110093257A1 (en) * 2009-10-19 2011-04-21 Avraham Shpigel Information retrieval through indentification of prominent notions
CN101876975A (en) * 2009-11-04 2010-11-03 中国科学院声学研究所 Identification method of Chinese place name
JP2013257634A (en) * 2012-06-11 2013-12-26 Nippon Telegr & Teleph Corp <Ntt> Apparatus and method for extracting a pair of place name and word from document, and program
CN103577442A (en) * 2012-07-30 2014-02-12 腾讯科技(深圳)有限公司 Method and device for calculating map data importance
CN103853738A (en) * 2012-11-29 2014-06-11 中国科学院计算机网络信息中心 Identification method for webpage information related region
CN105824959A (en) * 2016-03-31 2016-08-03 首都信息发展股份有限公司 Public opinion monitoring method and system
CN107562717A (en) * 2017-07-24 2018-01-09 南京邮电大学 A kind of text key word abstracting method being combined based on Word2Vec with Term co-occurrence
CN108268443A (en) * 2017-12-21 2018-07-10 北京百度网讯科技有限公司 It determines the transfer of topic point and obtains the method, apparatus for replying text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
潘正高: "基于规则和统计相结合的中文命名实体识别研究", 《情报科学》 *
魏勇等: "一种基于复合特征的中文地名识别方法", 《武汉大学学报(信息科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144121A (en) * 2019-12-27 2020-05-12 北大方正集团有限公司 Geographical name recognition method and device, electronic equipment and readable storage medium
CN111144121B (en) * 2019-12-27 2021-12-03 北大方正集团有限公司 Geographical name recognition method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN109408819B (en) 2023-05-16

Similar Documents

Publication Publication Date Title
CN103810218B (en) A kind of automatic question-answering method and device based on problem cluster
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
CN102262634B (en) Automatic questioning and answering method and system
CN106709754A (en) Power user grouping method based on text mining
CN107608999A (en) A kind of Question Classification method suitable for automatically request-answering system
CN102682120B (en) Method and device for acquiring essential article commented on network
CN105893524B (en) A kind of intelligent answer method and device
CN106205609B (en) A kind of audio scene recognition method and its device based on audio event and topic model
CN106294744A (en) Interest recognition methods and system
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN108776940A (en) A kind of intelligent food and drink proposed algorithm excavated based on text comments
CN103207901B (en) A kind of method and apparatus that IP address ownership place is obtained based on search engine
CN109800308A (en) A kind of short text classification method combined based on part of speech and Fuzzy Pattern Recognition
CN105808665A (en) Novel hand-drawn sketch based image retrieval method
CN104598648B (en) A kind of microblog users interactive mode gender identification method and device
CN106471502A (en) Intension recognizing method based on water conservancy diversion and system
CN106021424B (en) A kind of literature author&#39;s duplication of name detection method
CN103744838B (en) A kind of Chinese emotion digest system and method for measuring main flow emotion information
CN109446399A (en) A kind of video display entity search method
CN103810170B (en) Intercommunion platform file classification method and device
CN109408819A (en) A kind of core place name extracting method and device based on natural language processing technique
CN109947914A (en) A kind of software defect automatic question-answering method based on template
CN109697676A (en) Customer analysis and application method and device based on social group
CN109543001A (en) A kind of scientific and technological entry abstracting method characterizing Scientific Articles research contents
CN108536674A (en) A kind of semantic-based typical opinion polymerization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 430223 Wuda science and Technology Park, Jiangxia Avenue, Miaoshan community, Donghu Development Zone, Wuhan City, Hubei Province

Applicant after: Geospace Information Technology Co., Ltd.

Address before: 430223 Wuda science and Technology Park, Jiangxia Avenue, Miaoshan community, Donghu Development Zone, Wuhan City, Hubei Province

Applicant before: WUDA GEOINFORMATICS Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant