CN104102667A - POI (Point of Interest) information differentiation method and device - Google Patents
POI (Point of Interest) information differentiation method and device Download PDFInfo
- Publication number
- CN104102667A CN104102667A CN201310125396.8A CN201310125396A CN104102667A CN 104102667 A CN104102667 A CN 104102667A CN 201310125396 A CN201310125396 A CN 201310125396A CN 104102667 A CN104102667 A CN 104102667A
- Authority
- CN
- China
- Prior art keywords
- poi
- similarity
- poi information
- address
- existing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/29—Geographical information databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Remote Sensing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a POI (Point of Interest) information differentiation method and device. The method comprises the following steps: disassembling POI information into a plurality of first feature words; combining the first feature words, and searching by a search engine to acquire a POI set; calculating the first similarity between each POI information and the POI information to be differentiated in the POI set; selecting one or more POI information as a differentiation result of the POI information to be differentiated according to the first similarity. According to the POI information differentiation method and device, the POI information to be differentiated into the feature words which are effectively combined, and the related POI set is searched, the similarity between the POI in the set and the POI information to be differentiated is calculated, and the differentiation result is output according to the similarity, more search conditions can be combined by using the disassembled feature words, and thus more possible results are searched and the system matching ratio is increased.
Description
Technical field
The present invention relates to POI difference field, particularly relate to a kind of POI information gap separating method and device.
Background technology
Present stage operator to third party POI(Point of Interest, point of interest) difference is carried out in storehouse, it is mainly the main body word that extracts POI title and POI address, phone is formatd, utilize afterwards the main body word of title and address, and phone, type, these information of coordinate, go to search relevant information in original storehouse; In the result of inquiry, find POI that similarity is high as the result of coupling; Wherein similarity mainly relies on the main body of title and the calculating of the main body similarity of address; The method of calculating mainly adopts the methods such as editing distance, Jaccard likeness coefficient.Existing this difference method, every day, everyone can only difference go out 100-200 bar POI, along with third party POI information sharply increases, the serious production that affects geographic information data of traditional difference method.
In the method for existing POI difference, be all generally by artificial setting up third party POI storehouse and original storehouse classification contrast relationship or artificially classification mark carried out in third party POI storehouse, solve the error that third party POI storehouse and original storehouse classification disunity bring, but this mode is very general, there is certain error, be unfavorable for dwindling matching range; POI coordinate is mainly from third party POI storehouse simultaneously, and still the coordinate of third party library has certain deviation conventionally, and does not comprise coordinate in most of third party POI storehouse, is unfavorable for equally dwindling matching range; In the computing method of similarity, the main main body similarity relying on after address and title fractionation, this method is inaccurate for the similarity of calculated address, because address is a minute geographical rank, in same district not, there will be main body duplication of name phenomenon, and the address rank weight after the inborn ability of different addresses should change; The scope of simultaneously only going to dwindle coupling by title main body, address main body, classification and coordinate can be shone into the omission of part matched data.
In a word, existing POI differential system matching rate is low, and the length that expends time in has increased the difficulty of subsequent operation.
Summary of the invention
The object of this invention is to provide a kind of POI information gap separating method and device, improved POI difference matching rate, reduced and expended time in.
In order to solve the problems of the technologies described above, the invention provides a kind of POI information gap separating method, comprise the steps:
To treat that difference POI information disassembles into a plurality of First Characteristic words;
A plurality of First Characteristic words are combined, and by search engine inquiry, obtain POI and gather;
Calculate in POI set each POI information and treat the first similarity between difference POI information;
According to described the first similarity, select one or more POI information as the difference result for the treatment of difference POI information.
Preferably, each POI information and treat the first similarity between difference POI information in described calculating POI set, further comprises:
For each Second Characteristic word in POI information distributes respectively a weight;
Calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse;
The product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation, obtain operation result;
Using this operation result as described POI information and treat the first similarity between difference POI information.
Preferably, the Second Characteristic word of described POI information is one or more in title, address, phone, classification;
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
Preferably, adopt following formula to calculate the second similarity score of this address
addr:
Wherein n is the rank sum that address is divided; Level
ksub-similarity for the subaddressing coupling of different stage; α
kfor the weight of subaddressing corresponding level, and
Preferably, when Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.
Preferably, described third phase adopts following formula to calculate like degree:
Score
addr_2=dist/dist_kind, wherein, dist is the distance that described address and existing POI inquiry storehouse are inquired about, dist_kind is to the predetermined maximum length of inhomogeneity.
Preferably, when the Second Characteristic word in described POI information is the combination of title, address, phone and classification, this POI information and treat that the first similarity score between difference POI information is:
score=α·score
name+β·socre
address+χ·socre
phone+δ·socre
kind,
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score
namefor title second-phase is like degree, score
addraddress the second similarity, score
phonefor phone the second similarity, score
kindfor classification the second similarity.
The invention provides a kind of POI information gap separating device, comprising:
Feature Words is disassembled module, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module, obtains the POI set of character pair word combination for a plurality of First Characteristic words being organized to merga pass search engine inquiry;
Similarity determination module, for calculating POI each POI information of set and treating the first similarity between difference POI information;
Output module, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
Preferably, described similarity determination module further comprises,
Weight allocation submodule, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module, for exporting this operation result as the first similarity.
Preferably, described similarity calculating sub module further comprises:
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
Technique scheme has following beneficial effect: the present invention disassembles into a plurality of Feature Words by the POI information of obtaining, this Feature Words combination is gathered by search engine inquiry POI, and by calculate in POI set POI information and and treat that the similarity between difference POI information exports difference result, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improved the matching rate of system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the POI information gap separating method of an embodiment of the present invention;
Fig. 2 is the process flow diagram of the title difference of an embodiment of the present invention;
Fig. 3 is definite Feature Words set of an embodiment of the present invention and the process flow diagram for the treatment of the similarity between difference POI information;
Fig. 4 is the POI information difference overall flow figure of an embodiment of the present invention;
Fig. 5 is the structured flowchart of the POI information gap separating device of an embodiment of the present invention;
Fig. 6 is the structured flowchart of the similarity determination module of an embodiment of the present invention.
Embodiment
For making the technical problem to be solved in the present invention, technical scheme and advantage clearer, be described in detail below in conjunction with the accompanying drawings and the specific embodiments.
As shown in Figure 1, the process flow diagram for the POI information gap separating method of an embodiment of the present invention, comprising:
Step S101: will treat that difference POI information disassembles into a plurality of First Characteristic words;
Step S102: a plurality of First Characteristic words are combined, and obtain POI by search engine inquiry and gather;
Step S103: calculate in POI set each POI information and treat the first similarity between difference POI information;
Step S104: select one or more POI information as the difference result for the treatment of difference POI information according to described the first similarity.
The present invention disassembles into a plurality of Feature Words by the POI information of obtaining, this Feature Words combination is obtained to POI by search engine gathers, and POI information and treat that the similarity between difference POI information exports difference result in gathering by POI, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improve the matching rate of system.After the selected Feature Words set of output, and the selected Feature Words set of output is carried out to Accuracy Verification to existing POI inquiry storehouse, generate POI data accurately, and according to the database of the described Data Update of POI accurately electronic chart.
In step S101, the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words, can be to carry out title difference and address difference.As shown in Figure 2, be the process flow diagram of the title difference of an embodiment of the present invention, comprise and obtain additional information, obtain another name, remove prefix word, remove suffix word, remove noise word, obtain trunk word.Wherein, title can be split as additional information, another name, prefix word, suffix word, trunk word; For example, after " international trade shop, the chain hotel of Beijing Mu Yang fashion (Bo Li commercial hotel, former Beijing) " splits, prefix word is Beijing; Suffix word is chain hotel; Trunk word is Mu Yang fashion; Additional information is international trade shop; Another name Wei Boli commercial hotel.Title disassembly principle: it is corresponding regular that prefix word mainly relies on corresponding dictionary, another name, trunk word with the fractionation of suffix word, additional information splits main dependence.Address can difference be the address ranks such as province, city, district, small towns, road, community, mark, doorplate.For example, after " the good wooden garden apartment 8-9 floor in No. 978, triumph South Street, Xingqing District, Yinchuan City " splits: city-level-Yinchuan City; Level-Xingqing District, district; Road level-triumph South Street; Doorplate level-No. 978; Mark level-Jia wood garden apartment.Disassembly principle: province, city, district, small towns, road, community, mark, utilize corresponding dictionary; The address ranks such as doorplate are utilized rule; For non-existent word in dictionary, utilize the corresponding rank of Rule.Matching addresses: if the POI of input does not have coordinate, by matching addresses service acquisition coordinate.
Obtain Feature Words POI name of the information is carried out to difference, also obtain the classification that Feature Words is corresponding, such other acquisition process is as follows:
Pre-service: utilize χ
2the weight of statistical nature word in classification, removes the weight of certain threshold value; χ
2the computing formula of statistics is:
Wherein, characteristic item is that w classification is C.~w represents the further feature item except w,~C represents other classification except C, the relation of characteristic item w and classification C has following four kinds of situations so: (w, C), (w,~C), (~w, C), (~w ,~C), the frequency that represents respectively the POI of these four kinds of situations with A, B, C, D, POI sum N=A+B+C+D.
POI title is being carried out to difference, obtaining First Characteristic word, obtaining the classification weight that First Characteristic word is corresponding; Choose classification that one or more weight is higher as the possible classification of this POI as Output rusults.
In step S102, a plurality of First Characteristic words are combined, and gather by the POI that search engine inquiry obtains character pair word combination.Described query script: used POI search engine service, inquired relevant POI by fields such as title, address, type, coordinates and gather.Specifically refer to the address after the title after fractionation, fractionation, phone, classification, coordinate etc. are effectively combined, it can be the combination of address and phone, also can be the combination of title and address, take various array modes, First Characteristic word combination after combination is obtained to POI by search engine inquiry to be gathered, this search engine can be Baidu, Google etc., can also be existing POI inquiry storehouse.The POI set of obtaining by search engine in this step includes a plurality of POI information.
In step S103, calculate in POI set each POI information and treat the first similarity between difference POI information.As shown in Figure 3, for POI information in the calculating POI set of an embodiment of the present invention and treat the calculation process of the first similarity between difference POI information, comprise the steps:
Step S1031: for each Second Characteristic word in POI information distributes respectively a weight;
Step S1032: the second similarity of calculating each Second Characteristic word and existing POI inquiry storehouse;
Step S1033: the product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation;
Step S1034: using this operation result as described POI information and treat the first similarity between difference POI information.
In step S1031, for each Second Characteristic word in POI information distributes respectively a weight, can be that title, address, phone, classification etc. assign weight according to Feature Words, the size of weight can be set as required, also can automatically set according to classification.
In step S1032, calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse, when Feature Words is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
In step S1033, the product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation; Second Characteristic word in described POI information is title, address, phone and classification, and this POI gathers and treats that the second similarity score between difference POI information is:
score=α·score
name+β·socre
address+χ·socre
phone+δ·socre
kind,
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score
namefor title second-phase is like degree, score
addraddress the second similarity, score
phonefor phone the second similarity, score
kindfor classification the second similarity.
In step S1034, according to the result of step S1033 summation operation, export this operation result.
In embodiments of the invention, when Second Characteristic word is address, this address Feature Words is a plurality of subaddressings according to partition of the level, and the weight of each subaddressing is determined according to corresponding level.This weight is the rank having along with difference address and dynamically changing.
When Second Characteristic word of the present invention is title, score
namefor title second-phase is like degree, main the second similarity at trunk word+suffix word and existing POI library name; Another name and second similarity of inquiring about storehouse POI title; Trunk word and second similarity of inquiring about the trunk word of storehouse POI title are chosen a maximum value as the second similarity of POI title from these similarities, if this value is less than predetermined threshold values, remove this record; Wherein, the second similarity=editing distance/major term is long.Editing distance: claim again Levenshtein distance, refer between two word strings, change into another required minimum editing operation number of times by one.The editing operation of license comprises a character replacement is become to another character, inserts a character, deletes a character.
In embodiments of the invention, when Second Characteristic word is address, score
addrcomputing formula is
wherein n is the rank sum that address is divided; Level
ksub-similarity for different stage subaddressing coupling; α
kfor the weight of subaddressing corresponding level, and
this weight is the rank having along with difference address and dynamically changing.For example,, for address No. 978, South Street " triumph good wooden garden apartment " road level wherein: " triumph South Street " weight is 0.5; Doorplate level: " No. 978 " weight is 0.2; Mark level: " good wooden garden apartment " weight is 0.3; If address is " No. 978, triumph South Street ", so road level: " triumph South Street " weight is 0.7; Doorplate level: " No. 978 " weight is 0.3.
When Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.The distance of this address is by address spaces is become to geographic coordinate, utilizes Geocoding to change into geographic coordinate, thereby calculates the distance of appropriate address in this address and POI inquiry storehouse.
Preferably, described third phase adopts following formula to calculate like degree:
Score
addr_2=dist/dist_kind, wherein, dist is described address and existing POI inquiry storehouse distance, dist_kind is to the predetermined maximum length of inhomogeneity.When this address and existing POI inquiry storehouse exists coordinate time simultaneously, from score
addrwith score
addr_2in choose a maximal value as the second similarity of this address.
In embodiments of the invention, when Second Characteristic word is phone, score
phonefor phone similarity: 1 represents that phone equates, 0 represents that phone is unequal.In embodiments of the invention, when Second Characteristic word is classification, score
kindfor classification similarity: 1 represents that classification equates, 0 represents that classification is unequal.
As shown in Figure 4, the POI information difference overall flow figure for an embodiment of the present invention, comprises the steps:
Obtaining POI information, obtain the POI information that third party provides, can be that service provider provides or individual providing;
Format is processed, and the POI information of obtaining is formatd to processing, and format processing procedure is prior art.
Title is split, title is split as to additional information, another name, prefix word, suffix word, trunk word etc., title splits and can adopt prior art to split.
Classification is obtained, and according to the title splitting, carries out obtaining of classification, and classification is obtained and can be adopted prior art to carry out.
Address dividing, address can difference be the address ranks such as province, city, district, small towns, road, community, mark, doorplate, address dividing can adopt prior art to split.
Coordinate obtains, address spaces is become to geographic coordinate, utilize Geocoding to transform, Geocoding is a kind of coding method based on space orientation technique, and the geographical location information that it provides a kind of handle to be described as address converts the mode of the geographic coordinate that can be used to GIS (Geographic Information System) to.
Match query, inquires about Feature Words set and existing POI storehouse, to obtain each Feature Words set and to treat the first similarity between difference POI information.
Record output, exports qualified Feature Words set according to similarity.
As shown in Figure 5, the structured flowchart for the POI information gap separating device of an embodiment of the present invention, comprising:
Feature Words is disassembled module 100, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module 200, for a plurality of First Characteristic words being organized to merga pass search engine inquiry, obtains POI set;
Similarity determination module 300, for calculating each POI set POI information and treating the first similarity between difference POI information;
Output module 400, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
As shown in Figure 6, the structured flowchart for the similarity determination module of an embodiment of the present invention, comprises,
Weight allocation submodule 301, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module 302, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule 303, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module 304, for exporting as POI information and treating this operation result of the first similarity between difference POI information.
Similarity calculating sub module of the present invention further comprises: when Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
In embodiments of the invention, when Second Characteristic word is address, score
addrcomputing formula is
wherein n is the rank sum that address is divided; Level
ksub-similarity for different stage subaddressing coupling; α
kfor the weight of subaddressing corresponding level, and
this weight is the rank having along with difference address and dynamically changing.
Technique scheme: the POI information of obtaining is disassembled into a plurality of Feature Words, this Feature Words combination is obtained to POI by search engine gathers, and POI information and treat that the similarity between difference POI information exports difference result in gathering by POI, the difference method of this POI information can calculate the similarity of POI more accurately, Feature Words after simultaneously utilizing these to split, can be combined into more querying condition, thereby inquire how possible result, improve the matching rate of system.
The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, do not departing under the prerequisite of principle of the present invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.
Claims (10)
1. a POI information gap separating method, is characterized in that, comprises the steps:
To treat that difference POI information disassembles into a plurality of First Characteristic words;
A plurality of First Characteristic words are combined, and by search engine inquiry, obtain POI and gather;
Calculate in POI set each POI information and treat the first similarity between difference POI information;
According to described the first similarity, select one or more POI information as the difference result for the treatment of difference POI information.
2. POI information gap separating method according to claim 1, is characterized in that, each POI information and treat the first similarity between difference POI information in described calculating POI set, further comprises:
For each Second Characteristic word in POI information distributes respectively a weight;
Calculate second similarity in each Second Characteristic word and existing POI inquiry storehouse;
The product of the second similarity of the weight that in POI information, each Second Characteristic word distributes and its correspondence is carried out to summation operation, obtain operation result;
Using this operation result as described POI information and treat the first similarity between difference POI information.
3. POI information gap separating method according to claim 2, is characterized in that, the Second Characteristic word of described POI information is one or more in title, address, phone, classification;
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
4. POI information gap separating method according to claim 3, is characterized in that, adopts following formula to calculate the second similarity score of this address
addr:
Wherein n is the rank sum that address is divided; Level
ksub-similarity for the subaddressing coupling of different stage; α
kfor the weight of subaddressing corresponding level, and
5. POI information gap separating method according to claim 3, it is characterized in that, when Second Characteristic word is address, and there is coordinate time in this address and existing POI inquiry storehouse simultaneously, also calculate this address and existing POI inquiry storehouse distance, according to the distance of calculating, obtain third phase like degree, the similarity that this third phase is calculated according to the subaddressing of dividing like degree and this address compares, and selects one of them as second similarity in this address and existing POI inquiry storehouse.
6. POI information gap separating method according to claim 5, is characterized in that, described third phase adopts following formula to calculate like degree:
Score
addr_2=dist/dist_kind, wherein, dist is the distance that described address and existing POI inquiry storehouse are inquired about, dist_kind is to the predetermined maximum length of inhomogeneity.
7. according to the POI information gap separating method described in any one in claim 3-6, it is characterized in that, when the Second Characteristic word in described POI information is the combination of title, address, phone and classification, this POI information and treat that the first similarity score between difference POI information is:
score=α·score
name+β·socre
address+χ·socre
phone+δ·socre
kind,
Wherein, α, β, χ, the δ weight for distributing, and alpha+beta+χ+δ=1; Score
namefor title second-phase is like degree, score
addraddress the second similarity, score
phonefor phone the second similarity, score
kindfor classification the second similarity.
8. a POI information gap separating device, is characterized in that, comprising:
Feature Words is disassembled module, for the difference POI information for the treatment of of obtaining is disassembled into a plurality of First Characteristic words;
POI gathers acquisition module, for a plurality of First Characteristic words being organized to merga pass search engine inquiry, obtains POI set;
Similarity determination module, for calculating POI each POI information of set and treating the first similarity between difference POI information;
Output module, for selecting one or more POI information as the difference result for the treatment of difference POI information according to the first similarity.
9. POI information gap separating device according to claim 8, is characterized in that, described similarity determination module further comprises,
Weight allocation submodule, is used to each Second Characteristic word in POI information to distribute respectively a weight;
Similarity calculating sub module, for calculating second similarity in each Second Characteristic word and existing POI inquiry storehouse;
Summation operation submodule, for carrying out summation operation by the product of the second similarity of the weight of each Second Characteristic word distribution of POI information and its correspondence;
Operation result output sub-module, for exporting this operation result as the first similarity.
10. according to POI information gap separating device claimed in claim 9, it is characterized in that, described similarity calculating sub module further comprises:
When Second Characteristic word is title, second similarity in this title and existing POI inquiry storehouse is: the matching result in this title and existing POI inquiry storehouse;
When Second Characteristic word is address, second similarity in this address and existing POI inquiry storehouse is: by this address, according to partition of the level, be a plurality of subaddressings, for distributing a weight in each subaddressing, sub-similarity is mated to obtain with existing POI inquiry storehouse in each subaddressing, and the product of the sub-similarity of the weight of each subaddressing and Corresponding matching is carried out to summation operation, the result obtaining;
When Second Characteristic word is phone, second similarity in this phone and existing POI inquiry storehouse is: the matching result in this phone and existing POI inquiry storehouse;
When Second Characteristic word is classification, second similarity in this classification and existing POI inquiry storehouse is: the matching result in this classification and existing POI inquiry storehouse.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310125396.8A CN104102667A (en) | 2013-04-11 | 2013-04-11 | POI (Point of Interest) information differentiation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310125396.8A CN104102667A (en) | 2013-04-11 | 2013-04-11 | POI (Point of Interest) information differentiation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104102667A true CN104102667A (en) | 2014-10-15 |
Family
ID=51670826
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310125396.8A Pending CN104102667A (en) | 2013-04-11 | 2013-04-11 | POI (Point of Interest) information differentiation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104102667A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105045880A (en) * | 2015-07-22 | 2015-11-11 | 福州大学 | Fuzzy matching method for interest points of different data sources |
CN105718470A (en) * | 2014-12-03 | 2016-06-29 | 高德软件有限公司 | POI (Point of Interest) data processing method and device |
CN105740252A (en) * | 2014-12-09 | 2016-07-06 | 北京四维图新科技股份有限公司 | Processing method and processing device of point of interest POI data |
CN106294458A (en) * | 2015-05-29 | 2017-01-04 | 北京四维图新科技股份有限公司 | A kind of map point of interest update method and device |
CN107850674A (en) * | 2015-05-20 | 2018-03-27 | 诺基亚技术有限公司 | The method and apparatus for obtaining differential positional information |
CN108628811A (en) * | 2018-04-10 | 2018-10-09 | 北京京东尚科信息技术有限公司 | The matching process and device of address text |
CN109284449A (en) * | 2018-10-23 | 2019-01-29 | 厦门大学 | The recommended method and device of point of interest |
CN110674419A (en) * | 2019-01-25 | 2020-01-10 | 北京嘀嘀无限科技发展有限公司 | Geographic information retrieval method and device, electronic equipment and readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101388023A (en) * | 2008-09-12 | 2009-03-18 | 北京搜狗科技发展有限公司 | Electronic map interest point data redundant detecting method and system |
JP2011003151A (en) * | 2009-06-22 | 2011-01-06 | Kddi Corp | Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program |
JP2011154004A (en) * | 2010-01-28 | 2011-08-11 | Kddi Corp | Poi attribute determination device, poi recommendation server, and poi recommendation system |
CN102789467A (en) * | 2011-05-20 | 2012-11-21 | 腾讯科技(深圳)有限公司 | Data fusion method, data fusion device and data processing system |
CN102880647A (en) * | 2012-08-24 | 2013-01-16 | 北京百度网讯科技有限公司 | Method and device for acquiring another name of organization |
-
2013
- 2013-04-11 CN CN201310125396.8A patent/CN104102667A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101388023A (en) * | 2008-09-12 | 2009-03-18 | 北京搜狗科技发展有限公司 | Electronic map interest point data redundant detecting method and system |
JP2011003151A (en) * | 2009-06-22 | 2011-01-06 | Kddi Corp | Similarity calculation device, recommended poi determination device, poi recommendation system, similarity calculation method and program |
JP2011154004A (en) * | 2010-01-28 | 2011-08-11 | Kddi Corp | Poi attribute determination device, poi recommendation server, and poi recommendation system |
CN102789467A (en) * | 2011-05-20 | 2012-11-21 | 腾讯科技(深圳)有限公司 | Data fusion method, data fusion device and data processing system |
CN102880647A (en) * | 2012-08-24 | 2013-01-16 | 北京百度网讯科技有限公司 | Method and device for acquiring another name of organization |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718470A (en) * | 2014-12-03 | 2016-06-29 | 高德软件有限公司 | POI (Point of Interest) data processing method and device |
CN105718470B (en) * | 2014-12-03 | 2019-08-20 | 高德软件有限公司 | A kind of POI data treating method and apparatus |
CN105740252A (en) * | 2014-12-09 | 2016-07-06 | 北京四维图新科技股份有限公司 | Processing method and processing device of point of interest POI data |
CN107850674A (en) * | 2015-05-20 | 2018-03-27 | 诺基亚技术有限公司 | The method and apparatus for obtaining differential positional information |
CN106294458A (en) * | 2015-05-29 | 2017-01-04 | 北京四维图新科技股份有限公司 | A kind of map point of interest update method and device |
CN105045880A (en) * | 2015-07-22 | 2015-11-11 | 福州大学 | Fuzzy matching method for interest points of different data sources |
CN105045880B (en) * | 2015-07-22 | 2018-09-18 | 福州大学 | A kind of Method of Fuzzy Matching of the point of interest of different data sources |
CN108628811A (en) * | 2018-04-10 | 2018-10-09 | 北京京东尚科信息技术有限公司 | The matching process and device of address text |
CN108628811B (en) * | 2018-04-10 | 2022-04-12 | 北京京东尚科信息技术有限公司 | Address text matching method and device |
CN109284449A (en) * | 2018-10-23 | 2019-01-29 | 厦门大学 | The recommended method and device of point of interest |
CN109284449B (en) * | 2018-10-23 | 2020-06-16 | 厦门大学 | Interest point recommendation method and device |
CN110674419A (en) * | 2019-01-25 | 2020-01-10 | 北京嘀嘀无限科技发展有限公司 | Geographic information retrieval method and device, electronic equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104102667A (en) | POI (Point of Interest) information differentiation method and device | |
Wang et al. | Protecting personal trajectories of social media users through differential privacy | |
Zheng et al. | Reference-based framework for spatio-temporal trajectory compression and query processing | |
US8478704B2 (en) | Decomposable ranking for efficient precomputing that selects preliminary ranking features comprising static ranking features and dynamic atom-isolated components | |
CN103605752A (en) | Address matching method based on semantic recognition | |
CN101477542A (en) | Sampling analysis method, system and equipment | |
CN105468677A (en) | Log clustering method based on graph structure | |
CN107291847A (en) | A kind of large-scale data Distributed Cluster processing method based on MapReduce | |
CN106503223B (en) | online house source searching method and device combining position and keyword information | |
CN103744934A (en) | Distributed index method based on LSH (Locality Sensitive Hashing) | |
CN102867066B (en) | Data Transform Device and data summarization method | |
CN104317801A (en) | Data cleaning system and method for aiming at big data | |
CN103116610A (en) | Vector space big data storage method based on HBase | |
CN111307164B (en) | Low-sampling-rate track map matching method | |
CN106227726A (en) | A kind of path extraction method based on track of vehicle data | |
CN110069500B (en) | Dynamic mixed indexing method for non-relational database | |
CN103761251A (en) | Storing and finding method for large-data-volume client information | |
CN107145523A (en) | Large-scale Heterogeneous Knowledge storehouse alignment schemes based on Iterative matching | |
CN101299218B (en) | Method and device for searching three-dimensional model | |
CN105183795A (en) | Content based remote sensing image change detection information retrieval method | |
Yan et al. | Context-aware query recommendation by learning high-order relation in query logs | |
CN106126681B (en) | A kind of increment type stream data clustering method and system | |
CN103886072A (en) | Retrieved result clustering system in coal mine search engine | |
Wang et al. | Improved KNN algorithm based on preprocessing of center in smart cities | |
Wang et al. | User oriented trajectory similarity search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141015 |
|
RJ01 | Rejection of invention patent application after publication |