CN104102667A - POI (Point of Interest) information differentiation method and device - Google Patents

POI (Point of Interest) information differentiation method and device Download PDF

Info

Publication number
CN104102667A
CN104102667A CN201310125396.8A CN201310125396A CN104102667A CN 104102667 A CN104102667 A CN 104102667A CN 201310125396 A CN201310125396 A CN 201310125396A CN 104102667 A CN104102667 A CN 104102667A
Authority
CN
China
Prior art keywords
poi
similarity
poi information
address
existing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310125396.8A
Other languages
Chinese (zh)
Inventor
罗丽俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Navinfo Co Ltd
Original Assignee
Navinfo Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Navinfo Co Ltd filed Critical Navinfo Co Ltd
Priority to CN201310125396.8A priority Critical patent/CN104102667A/en
Publication of CN104102667A publication Critical patent/CN104102667A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a POI (Point of Interest) information differentiation method and device. The method comprises the following steps: disassembling POI information into a plurality of first feature words; combining the first feature words, and searching by a search engine to acquire a POI set; calculating the first similarity between each POI information and the POI information to be differentiated in the POI set; selecting one or more POI information as a differentiation result of the POI information to be differentiated according to the first similarity. According to the POI information differentiation method and device, the POI information to be differentiated into the feature words which are effectively combined, and the related POI set is searched, the similarity between the POI in the set and the POI information to be differentiated is calculated, and the differentiation result is output according to the similarity, more search conditions can be combined by using the disassembled feature words, and thus more possible results are searched and the system matching ratio is increased.

Description

A kind of POI information gap separating method and device
Technical field
The present invention relates to POI difference field, particularly relate to a kind of POI information gap separating method and device.
Background technology
Present stage operator to third party POI(Point of Interest, point of interest) difference is carried out in storehouse, it is mainly the main body word that extracts POI title and POI address, phone is formatd, utilize afterwards the main body word of title and address, and phone, type, these information of coordinate, go to search relevant information in original storehouse; In the result of inquiry, find POI that similarity is high as the result of coupling; Wherein similarity mainly relies on the main body of title and the calculating of the main body similarity of address; The method of calculating mainly adopts the methods such as editing distance, Jaccard likeness coefficient.Existing this difference method, every day, everyone can only difference go out 100-200 bar POI, along with third party POI information sharply increases, the serious production that affects geographic information data of traditional difference method.
In the method for existing POI difference, be all generally by artificial setting up third party POI storehouse and original storehouse classification contrast relationship or artificially classification mark carried out in third party POI storehouse, solve the error that third party POI storehouse and original storehouse classification disunity bring, but this mode is very general, there is certain error, be unfavorable for dwindling matching range; POI coordinate is mainly from third party POI storehouse simultaneously, and still the coordinate of third party library has certain deviation conventionally, and does not comprise coordinate in most of third party POI storehouse, is unfavorable for equally dwindling matching range; In the computing method of similarity, the main main body similarity relying on after address and title fractionation, this method is inaccurate for the similarity of calculated address, because address is a minute geographical rank, in same district not, there will be main body duplication of name phenomenon, and the address rank weight after the inborn ability of different addresses should change; The scope of simultaneously only going to dwindle coupling by title main body, address main body, classification and coordinate can be shone into the omission of part matched data.
In a word, existing POI differential system matching rate is low, and the length that expends time in has increased the difficulty of subsequent operation.
Summary of the invention
The object of this invention is to provide a kind of POI information gap separating method and device, improved POI difference matching rate, reduced and expended time in.
In order to solve the problems of the technologies described above, the invention provides a kind of POI information gap separating method, comprise the steps:
To treat that difference POI information disassembles into a plurality of First Characteristic words;
A plurality of First Characteristic words are combined, and by search engine inquiry, obtain POI and gather;
Calculate in POI set each POI information and treat the first similarity between difference POI information;
According to described the first similarity, select one or more POI information as the difference result for the treatment of difference POI information.
Preferably, each POI information and treat the first similarity between difference POI information in described calculating POI set, further comprises:
For each Second Characteristic word in POI information distributes respectively a weight;
Calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse;
The product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation, obtain operation result;
Using this operation result as described POI information and treat the first similarity between difference POI information.
Preferably, the Second Characteristic word of described POI information is one or more in title, address, phone, classification;
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
Preferably, adopt following formula to calculate the second similarity score of this address addr:
score addr = Σ k = 1 n α k · level k ;
Wherein n is the rank sum that address is divided; Level ksub-similarity for the subaddressing coupling of different stage; α kfor the weight of subaddressing corresponding level, and
Preferably, when Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.
Preferably, described third phase adopts following formula to calculate like degree:
Score addr_2=dist/dist_kind, wherein, dist is the distance that described address and existing POI inquiry storehouse are inquired about, dist_kind is to the predetermined maximum length of inhomogeneity.
Preferably, when the Second Characteristic word in described POI information is the combination of title, address, phone and classification, this POI information and treat that the first similarity score between difference POI information is:
score=α·score name+β·socre address+χ·socre phone+δ·socre kind
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score namefor title second-phase is like degree, score addraddress the second similarity, score phonefor phone the second similarity, score kindfor classification the second similarity.
The invention provides a kind of POI information gap separating device, comprising:
Feature Words is disassembled module, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module, obtains the POI set of character pair word combination for a plurality of First Characteristic words being organized to merga pass search engine inquiry;
Similarity determination module, for calculating POI each POI information of set and treating the first similarity between difference POI information;
Output module, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
Preferably, described similarity determination module further comprises,
Weight allocation submodule, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module, for exporting this operation result as the first similarity.
Preferably, described similarity calculating sub module further comprises:
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
Technique scheme has following beneficial effect: the present invention disassembles into a plurality of Feature Words by the POI information of obtaining, this Feature Words combination is gathered by search engine inquiry POI, and by calculate in POI set POI information and and treat that the similarity between difference POI information exports difference result, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improved the matching rate of system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the POI information gap separating method of an embodiment of the present invention;
Fig. 2 is the process flow diagram of the title difference of an embodiment of the present invention;
Fig. 3 is definite Feature Words set of an embodiment of the present invention and the process flow diagram for the treatment of the similarity between difference POI information;
Fig. 4 is the POI information difference overall flow figure of an embodiment of the present invention;
Fig. 5 is the structured flowchart of the POI information gap separating device of an embodiment of the present invention;
Fig. 6 is the structured flowchart of the similarity determination module of an embodiment of the present invention.
Embodiment
For making the technical problem to be solved in the present invention, technical scheme and advantage clearer, be described in detail below in conjunction with the accompanying drawings and the specific embodiments.
As shown in Figure 1, the process flow diagram for the POI information gap separating method of an embodiment of the present invention, comprising:
Step S101: will treat that difference POI information disassembles into a plurality of First Characteristic words;
Step S102: a plurality of First Characteristic words are combined, and obtain POI by search engine inquiry and gather;
Step S103: calculate in POI set each POI information and treat the first similarity between difference POI information;
Step S104: select one or more POI information as the difference result for the treatment of difference POI information according to described the first similarity.
The present invention disassembles into a plurality of Feature Words by the POI information of obtaining, this Feature Words combination is obtained to POI by search engine gathers, and POI information and treat that the similarity between difference POI information exports difference result in gathering by POI, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improve the matching rate of system.After the selected Feature Words set of output, and the selected Feature Words set of output is carried out to Accuracy Verification to existing POI inquiry storehouse, generate POI data accurately, and according to the database of the described Data Update of POI accurately electronic chart.
In step S101, the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words, can be to carry out title difference and address difference.As shown in Figure 2, be the process flow diagram of the title difference of an embodiment of the present invention, comprise and obtain additional information, obtain another name, remove prefix word, remove suffix word, remove noise word, obtain trunk word.Wherein, title can be split as additional information, another name, prefix word, suffix word, trunk word; For example, after " international trade shop, the chain hotel of Beijing Mu Yang fashion (Bo Li commercial hotel, former Beijing) " splits, prefix word is Beijing; Suffix word is chain hotel; Trunk word is Mu Yang fashion; Additional information is international trade shop; Another name Wei Boli commercial hotel.Title disassembly principle: it is corresponding regular that prefix word mainly relies on corresponding dictionary, another name, trunk word with the fractionation of suffix word, additional information splits main dependence.Address can difference be the address ranks such as province, city, district, small towns, road, community, mark, doorplate.For example, after " the good wooden garden apartment 8-9 floor in No. 978, triumph South Street, Xingqing District, Yinchuan City " splits: city-level-Yinchuan City; Level-Xingqing District, district; Road level-triumph South Street; Doorplate level-No. 978; Mark level-Jia wood garden apartment.Disassembly principle: province, city, district, small towns, road, community, mark, utilize corresponding dictionary; The address ranks such as doorplate are utilized rule; For non-existent word in dictionary, utilize the corresponding rank of Rule.Matching addresses: if the POI of input does not have coordinate, by matching addresses service acquisition coordinate.
Obtain Feature Words POI name of the information is carried out to difference, also obtain the classification that Feature Words is corresponding, such other acquisition process is as follows:
Pre-service: utilize χ 2the weight of statistical nature word in classification, removes the weight of certain threshold value; χ 2the computing formula of statistics is:
χ 2 ( w , C ) = N × ( AD - BC ) 2 ( A + C ) × ( B + D ) × ( A + B ) × ( C + D )
Wherein, characteristic item is that w classification is C.~w represents the further feature item except w,~C represents other classification except C, the relation of characteristic item w and classification C has following four kinds of situations so: (w, C), (w,~C), (~w, C), (~w ,~C), the frequency that represents respectively the POI of these four kinds of situations with A, B, C, D, POI sum N=A+B+C+D.
POI title is being carried out to difference, obtaining First Characteristic word, obtaining the classification weight that First Characteristic word is corresponding; Choose classification that one or more weight is higher as the possible classification of this POI as Output rusults.
In step S102, a plurality of First Characteristic words are combined, and gather by the POI that search engine inquiry obtains character pair word combination.Described query script: used POI search engine service, inquired relevant POI by fields such as title, address, type, coordinates and gather.Specifically refer to the address after the title after fractionation, fractionation, phone, classification, coordinate etc. are effectively combined, it can be the combination of address and phone, also can be the combination of title and address, take various array modes, First Characteristic word combination after combination is obtained to POI by search engine inquiry to be gathered, this search engine can be Baidu, Google etc., can also be existing POI inquiry storehouse.The POI set of obtaining by search engine in this step includes a plurality of POI information.
In step S103, calculate in POI set each POI information and treat the first similarity between difference POI information.As shown in Figure 3, for POI information in the calculating POI set of an embodiment of the present invention and treat the calculation process of the first similarity between difference POI information, comprise the steps:
Step S1031: for each Second Characteristic word in POI information distributes respectively a weight;
Step S1032: the second similarity of calculating each Second Characteristic word and existing POI inquiry storehouse;
Step S1033: the product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation;
Step S1034: using this operation result as described POI information and treat the first similarity between difference POI information.
In step S1031, for each Second Characteristic word in POI information distributes respectively a weight, can be that title, address, phone, classification etc. assign weight according to Feature Words, the size of weight can be set as required, also can automatically set according to classification.
In step S1032, calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse, when Feature Words is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
In step S1033, the product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation; Second Characteristic word in described POI information is title, address, phone and classification, and this POI gathers and treats that the second similarity score between difference POI information is:
score=α·score name+β·socre address+χ·socre phone+δ·socre kind
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score namefor title second-phase is like degree, score addraddress the second similarity, score phonefor phone the second similarity, score kindfor classification the second similarity.
In step S1034, according to the result of step S1033 summation operation, export this operation result.
In embodiments of the invention, when Second Characteristic word is address, this address Feature Words is a plurality of subaddressings according to partition of the level, and the weight of each subaddressing is determined according to corresponding level.This weight is the rank having along with difference address and dynamically changing.
When Second Characteristic word of the present invention is title, score namefor title second-phase is like degree, main the second similarity at trunk word+suffix word and existing POI library name; Another name and second similarity of inquiring about storehouse POI title; Trunk word and second similarity of inquiring about the trunk word of storehouse POI title are chosen a maximum value as the second similarity of POI title from these similarities, if this value is less than predetermined threshold values, remove this record; Wherein, the second similarity=editing distance/major term is long.Editing distance: claim again Levenshtein distance, refer between two word strings, change into another required minimum editing operation number of times by one.The editing operation of license comprises a character replacement is become to another character, inserts a character, deletes a character.
In embodiments of the invention, when Second Characteristic word is address, score addrcomputing formula is wherein n is the rank sum that address is divided; Level ksub-similarity for different stage subaddressing coupling; α kfor the weight of subaddressing corresponding level, and this weight is the rank having along with difference address and dynamically changing.For example,, for address No. 978, South Street " triumph good wooden garden apartment " road level wherein: " triumph South Street " weight is 0.5; Doorplate level: " No. 978 " weight is 0.2; Mark level: " good wooden garden apartment " weight is 0.3; If address is " No. 978, triumph South Street ", so road level: " triumph South Street " weight is 0.7; Doorplate level: " No. 978 " weight is 0.3.
When Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.The distance of this address is by address spaces is become to geographic coordinate, utilizes Geocoding to change into geographic coordinate, thereby calculates the distance of appropriate address in this address and POI inquiry storehouse.
Preferably, described third phase adopts following formula to calculate like degree:
Score addr_2=dist/dist_kind, wherein, dist is described address and existing POI inquiry storehouse distance, dist_kind is to the predetermined maximum length of inhomogeneity.When this address and existing POI inquiry storehouse exists coordinate time simultaneously, from score addrwith score addr_2in choose a maximal value as the second similarity of this address.
In embodiments of the invention, when Second Characteristic word is phone, score phonefor phone similarity: 1 represents that phone equates, 0 represents that phone is unequal.In embodiments of the invention, when Second Characteristic word is classification, score kindfor classification similarity: 1 represents that classification equates, 0 represents that classification is unequal.
As shown in Figure 4, the POI information difference overall flow figure for an embodiment of the present invention, comprises the steps:
Obtaining POI information, obtain the POI information that third party provides, can be that service provider provides or individual providing;
Format is processed, and the POI information of obtaining is formatd to processing, and format processing procedure is prior art.
Title is split, title is split as to additional information, another name, prefix word, suffix word, trunk word etc., title splits and can adopt prior art to split.
Classification is obtained, and according to the title splitting, carries out obtaining of classification, and classification is obtained and can be adopted prior art to carry out.
Address dividing, address can difference be the address ranks such as province, city, district, small towns, road, community, mark, doorplate, address dividing can adopt prior art to split.
Coordinate obtains, address spaces is become to geographic coordinate, utilize Geocoding to transform, Geocoding is a kind of coding method based on space orientation technique, and the geographical location information that it provides a kind of handle to be described as address converts the mode of the geographic coordinate that can be used to GIS (Geographic Information System) to.
Match query, inquires about Feature Words set and existing POI storehouse, to obtain each Feature Words set and to treat the first similarity between difference POI information.
Record output, exports qualified Feature Words set according to similarity.
As shown in Figure 5, the structured flowchart for the POI information gap separating device of an embodiment of the present invention, comprising:
Feature Words is disassembled module 100, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module 200, for a plurality of First Characteristic words being organized to merga pass search engine inquiry, obtains POI set;
Similarity determination module 300, for calculating each POI set POI information and treating the first similarity between difference POI information;
Output module 400, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
As shown in Figure 6, the structured flowchart for the similarity determination module of an embodiment of the present invention, comprises,
Weight allocation submodule 301, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module 302, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule 303, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module 304, for exporting as POI information and treating this operation result of the first similarity between difference POI information.
Similarity calculating sub module of the present invention further comprises: when Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
In embodiments of the invention, when Second Characteristic word is address, score addrcomputing formula is wherein n is the rank sum that address is divided; Level ksub-similarity for different stage subaddressing coupling; α kfor the weight of subaddressing corresponding level, and this weight is the rank having along with difference address and dynamically changing.
Technique scheme: the POI information of obtaining is disassembled into a plurality of Feature Words, this Feature Words combination is obtained to POI by search engine gathers, and POI information and treat that the similarity between difference POI information exports difference result in gathering by POI, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improve the matching rate of system.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, do not departing under the prerequisite of principle of the present invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (10)

1. a POI information gap separating method, is characterized in that, comprises the steps:
To treat that difference POI information disassembles into a plurality of First Characteristic words;
A plurality of First Characteristic words are combined, and by search engine inquiry, obtain POI and gather;
Calculate in POI set each POI information and treat the first similarity between difference POI information;
According to described the first similarity, select one or more POI information as the difference result for the treatment of difference POI information.
2. POI information gap separating method according to claim 1, is characterized in that, each POI information and treat the first similarity between difference POI information in described calculating POI set, further comprises:
For each Second Characteristic word in POI information distributes respectively a weight;
Calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse;
The product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation, obtain operation result;
Using this operation result as described POI information and treat the first similarity between difference POI information.
3. POI information gap separating method according to claim 2, is characterized in that, the Second Characteristic word of described POI information is one or more in title, address, phone, classification;
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
4. POI information gap separating method according to claim 3, is characterized in that, adopts following formula to calculate the second similarity score of this address addr:
score addr = Σ k = 1 n α k · level k ;
Wherein n is the rank sum that address is divided; Level ksub-similarity for the subaddressing coupling of different stage; α kfor the weight of subaddressing corresponding level, and
5. POI information gap separating method according to claim 3, it is characterized in that, when Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.
6. POI information gap separating method according to claim 5, is characterized in that, described third phase adopts following formula to calculate like degree:
Score addr_2=dist/dist_kind, wherein, dist is the distance that described address and existing POI inquiry storehouse are inquired about, dist_kind is to the predetermined maximum length of inhomogeneity.
7. according to the POI information gap separating method described in any one in claim 3-6, it is characterized in that, when the Second Characteristic word in described POI information is the combination of title, address, phone and classification, this POI information and treat that the first similarity score between difference POI information is:
score=α·score name+β·socre address+χ·socre phone+δ·socre kind
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score namefor title second-phase is like degree, score addraddress the second similarity, score phonefor phone the second similarity, score kindfor classification the second similarity.
8. a POI information gap separating device, is characterized in that, comprising:
Feature Words is disassembled module, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module, for a plurality of First Characteristic words being organized to merga pass search engine inquiry, obtains POI set;
Similarity determination module, for calculating POI each POI information of set and treating the first similarity between difference POI information;
Output module, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
9. POI information gap separating device according to claim 8, is characterized in that, described similarity determination module further comprises,
Weight allocation submodule, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module, for exporting this operation result as the first similarity.
10. according to POI information gap separating device claimed in claim 9, it is characterized in that, described similarity calculating sub module further comprises:
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
CN201310125396.8A 2013-04-11 2013-04-11 POI (Point of Interest) information differentiation method and device Pending CN104102667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310125396.8A CN104102667A (en) 2013-04-11 2013-04-11 POI (Point of Interest) information differentiation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310125396.8A CN104102667A (en) 2013-04-11 2013-04-11 POI (Point of Interest) information differentiation method and device

Publications (1)

Publication Number Publication Date
CN104102667A true CN104102667A (en) 2014-10-15

Family

ID=51670826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310125396.8A Pending CN104102667A (en) 2013-04-11 2013-04-11 POI (Point of Interest) information differentiation method and device

Country Status (1)

Country Link
CN (1) CN104102667A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105045880A (en) * 2015-07-22 2015-11-11 福州大学 Fuzzy matching method for interest points of different data sources
CN105718470A (en) * 2014-12-03 2016-06-29 高德软件有限公司 POI (Point of Interest) data processing method and device
CN105740252A (en) * 2014-12-09 2016-07-06 北京四维图新科技股份有限公司 Processing method and processing device of point of interest POI data
CN106294458A (en) * 2015-05-29 2017-01-04 北京四维图新科技股份有限公司 A kind of map point of interest update method and device
CN107850674A (en) * 2015-05-20 2018-03-27 诺基亚技术有限公司 The method and apparatus for obtaining differential positional information
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN109284449A (en) * 2018-10-23 2019-01-29 厦门大学 The recommended method and device of point of interest
CN110674419A (en) * 2019-01-25 2020-01-10 北京嘀嘀无限科技发展有限公司 Geographic information retrieval method and device, electronic equipment and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388023A (en) * 2008-09-12 2009-03-18 北京搜狗科技发展有限公司 Electronic map interest point data redundant detecting method and system
JP2011003151A (en) * 2009-06-22 2011-01-06 Kddi Corp Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program
JP2011154004A (en) * 2010-01-28 2011-08-11 Kddi Corp Poi attribute determination device, poi recommendation server, and poi recommendation system
CN102789467A (en) * 2011-05-20 2012-11-21 腾讯科技(深圳)有限公司 Data fusion method, data fusion device and data processing system
CN102880647A (en) * 2012-08-24 2013-01-16 北京百度网讯科技有限公司 Method and device for acquiring another name of organization

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101388023A (en) * 2008-09-12 2009-03-18 北京搜狗科技发展有限公司 Electronic map interest point data redundant detecting method and system
JP2011003151A (en) * 2009-06-22 2011-01-06 Kddi Corp Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program
JP2011154004A (en) * 2010-01-28 2011-08-11 Kddi Corp Poi attribute determination device, poi recommendation server, and poi recommendation system
CN102789467A (en) * 2011-05-20 2012-11-21 腾讯科技(深圳)有限公司 Data fusion method, data fusion device and data processing system
CN102880647A (en) * 2012-08-24 2013-01-16 北京百度网讯科技有限公司 Method and device for acquiring another name of organization

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718470A (en) * 2014-12-03 2016-06-29 高德软件有限公司 POI (Point of Interest) data processing method and device
CN105718470B (en) * 2014-12-03 2019-08-20 高德软件有限公司 A kind of POI data treating method and apparatus
CN105740252A (en) * 2014-12-09 2016-07-06 北京四维图新科技股份有限公司 Processing method and processing device of point of interest POI data
CN107850674A (en) * 2015-05-20 2018-03-27 诺基亚技术有限公司 The method and apparatus for obtaining differential positional information
CN106294458A (en) * 2015-05-29 2017-01-04 北京四维图新科技股份有限公司 A kind of map point of interest update method and device
CN105045880A (en) * 2015-07-22 2015-11-11 福州大学 Fuzzy matching method for interest points of different data sources
CN105045880B (en) * 2015-07-22 2018-09-18 福州大学 A kind of Method of Fuzzy Matching of the point of interest of different data sources
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN108628811B (en) * 2018-04-10 2022-04-12 北京京东尚科信息技术有限公司 Address text matching method and device
CN109284449A (en) * 2018-10-23 2019-01-29 厦门大学 The recommended method and device of point of interest
CN109284449B (en) * 2018-10-23 2020-06-16 厦门大学 Interest point recommendation method and device
CN110674419A (en) * 2019-01-25 2020-01-10 北京嘀嘀无限科技发展有限公司 Geographic information retrieval method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN104102667A (en) POI (Point of Interest) information differentiation method and device
Wang et al. Protecting personal trajectories of social media users through differential privacy
Zheng et al. Reference-based framework for spatio-temporal trajectory compression and query processing
US8478704B2 (en) Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components
CN103605752A (en) Address matching method based on semantic recognition
CN101477542A (en) Sampling analysis method, system and equipment
CN105468677A (en) Log clustering method based on graph structure
CN107291847A (en) A kind of large-scale data Distributed Cluster processing method based on MapReduce
CN106503223B (en) online house source searching method and device combining position and keyword information
CN103744934A (en) Distributed index method based on LSH (Locality Sensitive Hashing)
CN102867066B (en) Data Transform Device and data summarization method
CN104317801A (en) Data cleaning system and method for aiming at big data
CN103116610A (en) Vector space big data storage method based on HBase
CN111307164B (en) Low-sampling-rate track map matching method
CN106227726A (en) A kind of path extraction method based on track of vehicle data
CN110069500B (en) Dynamic mixed indexing method for non-relational database
CN103761251A (en) Storing and finding method for large-data-volume client information
CN107145523A (en) Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching
CN101299218B (en) Method and device for searching three-dimensional model
CN105183795A (en) Content based remote sensing image change detection information retrieval method
Yan et al. Context-aware query recommendation by learning high-order relation in query logs
CN106126681B (en) A kind of increment type stream data clustering method and system
CN103886072A (en) Retrieved result clustering system in coal mine search engine
Wang et al. Improved KNN algorithm based on preprocessing of center in smart cities
Wang et al. User oriented trajectory similarity search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141015

RJ01 Rejection of invention patent application after publication