WO2020052338A1 - Address identifier and longitude and latitude thereof mining - Google Patents

Address identifier and longitude and latitude thereof mining Download PDF

Info

Publication number
WO2020052338A1
WO2020052338A1 PCT/CN2019/095106 CN2019095106W WO2020052338A1 WO 2020052338 A1 WO2020052338 A1 WO 2020052338A1 CN 2019095106 W CN2019095106 W CN 2019095106W WO 2020052338 A1 WO2020052338 A1 WO 2020052338A1
Authority
WO
WIPO (PCT)
Prior art keywords
latitude
longitude
address
address identifier
longitude information
Prior art date
Application number
PCT/CN2019/095106
Other languages
French (fr)
Chinese (zh)
Inventor
朱静雅
朱青祥
李�诚
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2020052338A1 publication Critical patent/WO2020052338A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Definitions

  • This application relates to navigation and positioning technology, and in particular to address identification and mining of its latitude and longitude.
  • address identifications such as road names, house numbers, etc. are generally used to distinguish between physical locations, and by obtaining the location information of the address identification, the positioning capability can be improved to the level of address identification.
  • businesses such as online maps or electronic maps generally use manual methods to obtain address identifiers and their latitude and longitude, and regularly spend human resources to enter the streets.
  • a method for mining an address identifier and its latitude and longitude including:
  • the raw data used to mine the address identifier and its latitude and longitude; the raw data includes point of interest data and / or user original content behavior data;
  • the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm.
  • a mining device for address identification and latitude and longitude including:
  • a raw data acquisition module configured to obtain raw data used to mine the address identifier and its latitude and longitude; the raw data includes point of interest data and / or user original content behavior data;
  • a data mining module configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier
  • the final latitude and longitude information confirmation module is configured to determine the final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to multiple latitude and longitude information.
  • an electronic device including: a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor executing the computer program To realize the aforementioned address identification and its latitude and longitude mining method.
  • a readable storage medium is provided, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is capable of performing the foregoing address identification and a method of mining latitude and longitude as described above. .
  • raw data used to mine the address identifier and its latitude and longitude can be obtained; the address identifier in the original data and the address identifier corresponding to the address identifier can be obtained.
  • Longitude and latitude information; for address identifiers corresponding to multiple longitude and latitude information, a final latitude and longitude information corresponding to the address identifiers is determined by a clustering algorithm.
  • FIG. 1 shows a flowchart of a method for mining an address identifier and its latitude and longitude according to an embodiment of the present application
  • FIG. 2 shows a flowchart of a method for mining an address identifier and its latitude and longitude according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a density clustering according to an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an address identifier and a mining device for latitude and longitude according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a mining device for address identification and latitude and longitude according to an embodiment of the present application
  • FIG. 6 schematically illustrates a block diagram of an electronic device for performing a method according to the present application.
  • FIG. 7 schematically illustrates a storage unit for holding or carrying a program code implementing a method according to the present application.
  • FIG. 1 a flowchart of steps of a method for mining an address identifier and its latitude and longitude in an embodiment of the present application is shown.
  • Step 110 Obtain raw data used to mine the address identifier and its latitude and longitude.
  • POI Point of Interest
  • UGC User Generated Content
  • POI data can also be called "Point of Information", that is, "information points”.
  • the POI data may include, but is not limited to, name, address, longitude, latitude, category, and the like.
  • the bubble icon is generally used to indicate the POI. Attractions, government agencies, companies, shopping malls, restaurants, etc. on the electronic map are all POIs.
  • UGC originated in the Internet field, that is, users display their original content through the Internet platform or provide it to other users.
  • UGC is not a specific business, but a new way for users to use the Internet, that is, changing from download-oriented to download and upload-oriented.
  • the UGC behavior data in the embodiment of the present application may include, but is not limited to, any data corresponding to user-originated content, such as changed status, posted logs, published photos, published reviews, error behavior data, and added behavior data. ,and many more.
  • the error behavior data may specifically include, but is not limited to, an address error, and / or an error report for a longitude and latitude information corresponding to an address
  • the added behavior data may also include, but is not limited to, a new address, and / or a new Latitude and longitude information for an address, etc.
  • the raw data used to mine the address identifier and its latitude and longitude can be obtained by any available method, and the content specifically contained in the raw data can also be preset according to requirements. limited.
  • the original data may include, but is not limited to, point of interest data and / or user original content behavior data.
  • the Meituan review platform contains a large number of users, so it has a large number of addresses and latitudes and longitudes contained in POI data, and addresses and latitudes and latitudes caused by a large number of user additions and errors. Therefore, the addresses and latitudes and longitudes can be mined from the POI data and UGC behavior data of a large number of users in the Meituan review platform. Then at this time, the POI data and / or UGC behavior data in the platform can be used as the original data.
  • the specific source of the original data and the acquisition channel can be set according to requirements, which is not limited in the embodiment of the present application.
  • Step 120 Obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier.
  • the obtained raw data can include data such as address, longitude, and latitude
  • the address can generally be specific to the province, city, district / county, township, village, street, house number, and so on. Therefore, in the embodiment of the present application, after obtaining the original data, the address identifier in the original data and the latitude and longitude information corresponding to the address identifier may be further obtained. Specifically, the address identifier in the original data and the latitude and longitude information corresponding to the address identifier may be obtained by any available method, which is not limited in this embodiment of the present application.
  • the address identifier may include a road name and a house number, and of course, it may include one or more of a country name, a province name, a city name, a district name, a county name, a township name, and a village name.
  • the information included in the address identifier can be preset according to requirements, which is not limited in the embodiment of the present application.
  • the longitude and latitude information may include the longitude and / or latitude corresponding to the corresponding address identifier.
  • Step 130 For the address identifier corresponding to the plurality of latitude and longitude information, determine the final latitude and longitude information corresponding to the address identifier through a clustering algorithm.
  • the same address identifier may correspond to multiple latitude and longitude information, and the multiple latitude and longitude information may not be exactly the same, so it is impossible to accurately determine the accurate longitude and latitude information corresponding to the address identifier. Therefore, in the embodiment of the present application, for an address identifier corresponding to multiple pieces of latitude and longitude information, the final latitude and longitude information corresponding to the address identifier may be determined through a clustering algorithm. In the embodiment of the present application, the final longitude and latitude information corresponding to each address identifier may be determined by any kind of clustering algorithm.
  • a clustering method may be set in advance according to requirements, which is not limited in this embodiment of the present application.
  • K-Means K-means
  • mean-shift clustering mean-shift clustering
  • density-based clustering maximum expected clustering using a Gaussian mixture model
  • clustering hierarchical clustering maximum expected clustering using a Gaussian mixture model
  • graph community detection Graphics Community Detection clustering
  • the longitude and latitude information may be directly used as the final latitude and longitude information of the corresponding address identifier.
  • the original data used to mine the address identifier and its latitude and longitude are obtained; the address identifier in the original data and the latitude and longitude information corresponding to the address identifier are obtained; and for an address corresponding to multiple latitude and longitude information Identification, the final latitude and longitude information corresponding to the address identification is determined by a clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
  • FIG. 2 a flowchart of a method for mining an address identifier and its latitude and longitude in an embodiment of the present application is shown.
  • Step 210 Obtain raw data used to mine the address identifier and its latitude and longitude.
  • Step 220 Obtain address latitude and longitude data in the original data.
  • the address latitude and longitude data includes address data, latitude and longitude data, and a correspondence between the address and the latitude and longitude.
  • the raw data can generally include data such as address, latitude and longitude, and the latitude and longitude data is generally the latitude and longitude data of a certain place.At the same time, the points can be described based on the address. There is a correspondence between them. Therefore, in the embodiment of the present application, address latitude and longitude data in the original data can be obtained.
  • the address latitude and longitude data includes address data, longitude and latitude data, and the correspondence between the address and the latitude and longitude.
  • Step 230 Structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data;
  • the geographic knowledge base includes geographic A library of information entities, and relationships between each of said geographic information entities.
  • the geographic information entity database may include different country names, province names, city names, county names, township names, road names, and so on. For example, cities in Shanghai, Beijing, Chengdu, etc., districts and counties in Changning District, Huayin County, Anhua Road, Chang'an Street, Zhuque Street, etc.
  • the relationship between geographic information entities may include the inclusion relationship between different levels of geographic information entities, or a subordinate relationship, and so on. For example, "Shanghai” includes “Changning District”, “Changning District” includes “Anhua Road”, and "Changning District” is subordinate to "Shanghai”.
  • the acquired address data may be structured based on the geographic knowledge base to obtain structured address data matching the geographic information entity in the geographic knowledge base. For example, for the address data “Debianyi Park, No. 492 Anhua Road, Changning District”, after structured processing, "Changning District”, “Anhua Road”, “492”, and “Debiyi Park” are obtained, and The corresponding types are address data of "district”, “road”, “house number”, and "landmark”.
  • the road names and house numbers in the address data can be accurately obtained, and then the road names and house numbers in the structured address data can be taken as the address identifier corresponding to the corresponding address data.
  • the road names and house numbers in the structured address data can be taken as the address identifier corresponding to the corresponding address data.
  • Step 240 Based on the correspondence between the address and the latitude and longitude, use the latitude and longitude data corresponding to the address data as the latitude and longitude information corresponding to the address identifier.
  • the latitude and longitude data corresponding to the corresponding address data may be used as the latitude and longitude information corresponding to the corresponding address identifier.
  • the latitude and longitude data corresponding to the above address data “Debi Yiyuan, No. 492 Anhua Road, Changning District” includes (longitude 1, latitude 1), (longitude 2, latitude 2), then Then, the longitude and latitude information corresponding to the address identifier “492 Anhua Road” of the address data is (longitude 1, latitude 1), (longitude 2, latitude 2).
  • Step 250 Based on the address identifiers corresponding to the plurality of latitude and longitude information, use the density clustering algorithm to cluster the plurality of latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one clustering cluster.
  • a density-based clustering algorithm that is, a density clustering algorithm
  • This type of algorithm believes that in the entire sample space points, each cluster is composed of a group of dense sample points, and these dense sample points are segmented by low-density regions (noise), and the purpose of the algorithm is to filter low-density regions. Find dense sample points.
  • the latitude and longitude information can be characterized by the latitude and longitude coordinate points, and the accuracy of each latitude and longitude information corresponding to the address identifier can be determined by the number of latitude and longitude coordinate points included in the cluster. Therefore, in the embodiment of the present application, considering the accuracy and operability of the density clustering algorithm, it is preferable to determine the address identifier corresponding to multiple latitude and longitude information through the density clustering algorithm. Final latitude and longitude information.
  • the density clustering algorithm can be any density clustering such as DBSCAN (Densit-based Spatial Clustering of Application with Noise), OPTICS (Ordering Pointing To Identify, Cluster, Structure), DENCLUE, etc. algorithm.
  • DBSCAN Densit-based Spatial Clustering of Application with Noise
  • OPTICS Organic Pointing To Identify, Cluster, Structure
  • DENCLUE DENCLUE
  • a density clustering algorithm can be used to cluster multiple latitude and longitude information corresponding to the same address identifier to obtain at least one cluster cluster, and then the largest cluster can be selected from each cluster cluster, and the The average value of the latitude and longitude information is used as the final latitude and longitude information identified by the corresponding address, and the final latitude and longitude information is more accurate latitude and longitude information.
  • the density-based clustering algorithm that is, the density clustering algorithm generally assumes that the category can be determined by the closeness of the sample distribution. Samples of the same category are closely connected, that is, there must be samples of the same category not far from any sample of that category. By grouping closely connected samples into one category, a clustering category is obtained. By dividing all closely connected samples into different categories, we get the final results for all clustering categories.
  • the DBSCAN density clustering algorithm is based on a set of neighborhoods to describe the closeness of the sample set, and the parameter ( ⁇ , MinPts) is used to describe the closeness of the sample distribution in the neighborhood.
  • describes the neighborhood distance threshold of a certain sample, that is, the first distance threshold
  • MinPts describes the threshold of the number of samples in the neighborhood where the distance of a certain sample is ⁇ , that is, the first sample threshold.
  • D (x 1 , x 2 , ..., x m )
  • Density connection For x i and x j , if there is a core object sample x k such that both x i and x j are reachable from x k density, it is said that x i and x j density are connected. The density connection relationship satisfies symmetry.
  • the DBSCAN density clustering algorithm is used to cluster the multiple latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one cluster.
  • the specific values of the first distance threshold and the first sample threshold can be preset according to requirements, which is not limited in the embodiment of the present application.
  • the latitude and longitude information associated with each address identifier can be displayed on the map, and the distance between each latitude and longitude information can be calculated.
  • one address identifier corresponds to multiple latitude and longitude information
  • one sample point in Figure 3 corresponds to one latitude and longitude information.
  • multiple latitude and longitude points corresponding to one address identifier may be clustered into at least one cluster cluster.
  • two clusters are obtained, which are clusters composed of hyperspheres corresponding to the core objects connected by two consecutive directed line segments on the left and right sides.
  • Step 260 Select the largest cluster from the at least one cluster cluster.
  • the largest cluster may be selected from at least one cluster cluster obtained by density clustering to determine The final latitude and longitude information identified by the corresponding address.
  • any available method may be used to select the largest cluster, which is not limited in the embodiment of the present application.
  • the cluster with the largest number of core objects may be included as the largest cluster, or the cluster with the largest number of sample points, that is, the cluster with the most longitude and latitude information, is used as the largest cluster, and so on.
  • the step 260 may further include:
  • Sub-step 261 Use the one cluster cluster with the largest amount of latitude and longitude information contained in the cluster cluster as the largest cluster.
  • the cluster with the largest amount of latitude and longitude information contained in at least one cluster cluster corresponding to an address identifier may be used as the largest cluster corresponding to the corresponding address identifier.
  • Each sample point corresponds to a latitude and longitude information. It can be seen that the cluster on the left side contains the largest amount of latitude and longitude information. Then, the cluster cluster can be used as the corresponding cluster to identify the largest cluster.
  • Step 270 Average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude information corresponding to the address identifier.
  • the latitude and longitude information in the corresponding largest cluster can be further averaged to obtain the averaged latitude and longitude value as the final latitude and longitude information corresponding to the corresponding address identifier.
  • each sample point in the clustering cluster corresponds to one sample, and the sample in the embodiment of the present application is the latitude and longitude information, so one sample point in the clustering cluster corresponds to one latitude and longitude information.
  • the latitude and longitude information corresponding to each sample point in the largest cluster can be averaged, and the averaged latitude and longitude value can be obtained as the final latitude and longitude information corresponding to the address identifier.
  • the longitude and latitude information corresponding to each sample point in the largest cluster can be averaged, meanwhile, the longitude and latitude information corresponding to each sample point can be averaged, and then the averaged longitude and latitude values can be used to construct.
  • the corresponding address identifies the corresponding longitude and latitude information.
  • Step 280 Obtain a first amount of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first distance between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier.
  • the degree of matching between the final latitude and longitude information and the corresponding address identifier cannot be determined.
  • the confidence of each address identifier and the corresponding final latitude and longitude information may be further determined.
  • the corresponding address identifier Based on the first amount of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the first distance between the final latitude and longitude information corresponding to the corresponding address identifier and the road corresponding to the corresponding address identifier, the corresponding address identifier and its The confidence of the final latitude and longitude information.
  • the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier and the first distance between the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier need to be obtained.
  • the first quantity of the latitude and longitude information contained in the largest cluster corresponding to the address identifier and the first longitude and latitude information corresponding to the address identifier and the road corresponding to the corresponding address identifier may be obtained by any available method.
  • a distance For example, each sample point in the largest cluster corresponds to one latitude and longitude information, so the number of sample points in the largest cluster can be counted, thereby obtaining the first number of latitude and longitude information contained in the largest cluster.
  • Step 290 Determine the confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance.
  • the confidence of the corresponding address identifier and its final latitude and longitude information can be determined according to the first number and the first distance.
  • the confidence can be used to characterize the accuracy of the corresponding address identifier and its final latitude and longitude information.
  • the first quantity and the corresponding relationship between the first distance and the confidence can be preset according to requirements, which is not limited in the embodiment of the present application.
  • the first quantity can be set to be proportional to the confidence, and the first distance is inversely proportional to the confidence. At this time, the greater the value of the confidence, the higher the accuracy of the corresponding address identifier and its final latitude and longitude information.
  • the step 290 may further include:
  • Sub-step 291 Determine an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold.
  • the preset basic score and the preset distance threshold can be set in advance according to requirements, which is not limited in the embodiment of the present application.
  • the first distance and the mapping relationship between the preset basic score and the preset distance threshold and the initial score may also be set in advance according to requirements or experience, which is not limited in the embodiments of the present application.
  • Sub-step 292 Determine a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity.
  • the corresponding address identifier and its final value may be further confirmed based on the first amount of latitude and longitude information contained in the largest cluster corresponding to the corresponding address identifier Penalty loss score for latitude and longitude information.
  • the correspondence between the first quantity and the penalty loss score can be set in advance according to requirements or experience, which is not limited in the embodiment of the present application. For example, a linear inverse relationship between the first quantity and the penalty loss score may be set, and so on.
  • Sub-step 293 Based on the preset basic score, the initial score, and the penalty loss score, determine the confidence of the address identifier and the final latitude and longitude information.
  • the confidence of the corresponding address identifier and its final latitude and longitude information may be determined based on the preset basic score, the initial score, and the penalty loss score.
  • the correspondence between the confidence level and the preset basic score, the initial score, and the penalty loss score can be set in advance according to requirements or experience, which is not limited in this embodiment of the present application.
  • C baseScore + (1-D / threshold) ⁇ 2 ⁇ (100-baseScore) -cntLossScore, to determine the confidence between the address identifier and the final latitude and longitude information; where C represents the address identifier and the final latitude and longitude information Confidence, baseScore represents a preset basic score, D represents the first distance, threshold represents a preset distance threshold, and cntLossScore represents a penalty loss score based on the first amount; when the first distance is greater than a preset When the distance threshold is set, the value of (1-D / threshold) is 0.
  • the confidence value can be set as a percentage system, and the larger the value, the higher the accuracy of the corresponding address identifier and its final latitude and longitude information.
  • the specific values of baseScore, threshold, and cntLossScore can be preset according to requirements, which is not limited in this embodiment of the present application.
  • the first distance corresponding to the address identifier is greater than the preset distance threshold, it means that the final latitude and longitude information of the address identifier is far from the road to which it belongs and the deviation is large.
  • you can directly set (1-D in the above formula) / threshold) has a value of 0, thereby reducing the confidence of the address identifier and its final latitude and longitude information.
  • InitialScore (1-D / threshold) ⁇ 2 ⁇ (100-baseScore), where InitialScore is the initial score.
  • the squared value of (1-D / threshold) is used to calculate the initial score. It can also be used in practical applications. Take the N-th power of (1-D / threshold) to calculate the initial score, where N can be set in advance according to requirements or experience, which is not limited in this embodiment of the present application.
  • the preset basic score is 35; the preset distance threshold is 500 meters; when the first number is less than or equal to 3, the penalty loss score is 20; when the first number is When the number is greater than 3 and less than or equal to 6, the penalty loss score is 10. When the first number is greater than 6 and less than or equal to 10, the penalty loss score is 5. When the first number is greater than 10, The penalty loss score is zero.
  • the method further includes:
  • Step 2110 When a positioning service request is received, a positioning service response is performed on the positioning service request according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information.
  • each address identifier and its final latitude and longitude information can be applied based on the confidence. For example, when a positioning service request is received, a positioning service response to the positioning service request may be performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. For example, if the accuracy requirement of a positioning service request is high, only a highly reliable address identifier and its final latitude and longitude information are used to perform a positioning service response on the positioning service request; and if the accuracy requirement of a positioning service request is not high , You can be broader.
  • the correspondence between the accuracy requirement and the confidence of the positioning service request can be set in advance according to the requirements, which is not limited in this embodiment of the present application.
  • the content of the accuracy requirements can be directly set to require the confidence level to be above a preset score; or different accuracy requirement levels can be set, and the different accuracy requirement levels correspond to different confidence value ranges. ,and many more.
  • Step 2120 Adjust the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final longitude and latitude information.
  • the addresses of the merchants can also be calibrated based on the confidence level. For example, for a merchant corresponding to certain latitude and longitude information, based on the value of the confidence value of the address identifier that uses the latitude and longitude as the final latitude and longitude information, consider whether to change the merchant's address to the address identifier corresponding to the latitude and longitude information.
  • a confidence threshold can be set in advance according to requirements.
  • the address identifier part of the address of the corresponding merchant can be changed to this final The address identifier corresponding to the latitude and longitude information, or the merchant may be prompted to change the address identifier part of the corresponding merchant's address to the address identifier corresponding to the final latitude and longitude information, and the merchant decides whether to change it eventually; otherwise, it is not changed or not prompted.
  • Step 2130 Update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
  • the address may be updated according to the adjusted merchant address and the relationship between the merchant address and the road.
  • the geographic knowledge base Specifically, the corresponding relationship between the corresponding business recorded in the geographic knowledge base and the address before adjustment can be cancelled or deleted, and the corresponding relationship between the adjusted business address and the corresponding business, and the relationship between the business address and the road can be recorded to Geographic knowledge base.
  • the raw data used to mine the address identifier and its latitude and longitude are obtained; the raw data includes point of interest data and / or user-originated content behavior data; the address identifier in the raw data is obtained, and The latitude and longitude information corresponding to the address identifier; for the address identifier corresponding to the plurality of latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
  • address latitude and longitude data in the original data may also be obtained;
  • the address latitude and longitude data includes address data, latitude and longitude data, and the correspondence between addresses and latitude and longitude;
  • the geographic knowledge base includes a geographic information entity library, and each of the geographic The relationship between the information entities; based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the address data is used as the latitude and longitude information corresponding to the address identifier. Therefore, the mining efficiency of the address identifier and its latitude and longitude is further improved.
  • the multiple latitude and longitude information may be aggregated by using a density clustering algorithm according to the first distance threshold and the first sample threshold.
  • Class to obtain at least one cluster cluster; select the largest cluster from the at least one cluster cluster; average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude corresponding to the address identifier information.
  • a cluster cluster having the largest amount of longitude and latitude information included in the cluster cluster is used as the largest cluster. Thereby, the accuracy of the determined final latitude and longitude information is improved.
  • the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier may also be obtained.
  • an initial score of the address identifier and the final latitude and longitude information is determined according to the first distance, a preset basic score, and a preset distance threshold; the address identifier and the first identifier are determined according to the first number.
  • the penalty loss score of the final latitude and longitude information based on the preset basic score, the initial score, and the penalty loss score, determining the confidence between the address identifier and the final latitude and longitude information, so that it can be further determined The accuracy of each address identifier and its final latitude and longitude information.
  • the positioning service request when a positioning service request is received, the positioning service request is performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information.
  • Positioning service response adjusting the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information; updating the address based on the adjusted merchant address and the relationship between the merchant address and the road. Therefore, the mined address identifier and the final latitude and longitude information are applied based on the confidence, and the timeliness and accuracy of the geographic knowledge base are improved.
  • FIG. 4 a schematic structural diagram of a mining device for an address identifier and its latitude and longitude in an embodiment of the present application is shown. These include:
  • a raw data obtaining module 310 configured to obtain raw data used to mine an address identifier and its latitude and longitude;
  • a data mining module 320 configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier;
  • the final latitude and longitude information confirmation module 330 is configured to determine the final latitude and longitude information corresponding to the address identifier by using a density clustering algorithm for address identifiers corresponding to multiple latitude and longitude information.
  • the original data includes point of interest data and / or user original content behavior data.
  • the original data used to mine the address identifier and its latitude and longitude are obtained; the address identifier in the original data and the latitude and longitude information corresponding to the address identifier are obtained; and for an address corresponding to multiple latitude and longitude information Identification, and the final latitude and longitude information corresponding to the address identification is determined by a density clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
  • An address identifier and a latitude and longitude mining device provided by another embodiment of the present application are described in detail below with reference to FIG. 5.
  • FIG. 5 a structural schematic diagram of an address identifier and a mining device for its latitude and longitude according to an embodiment of the present application is shown. These include:
  • a raw data obtaining module 410 configured to obtain raw data used to mine an address identifier and its latitude and longitude;
  • a data mining module 420 is configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier.
  • the data mining module 420 may further include:
  • a data mining submodule 421, configured to obtain address latitude and longitude data in the original data;
  • the address latitude and longitude data includes address data, latitude and longitude data, and a correspondence relationship between addresses and latitude and longitude;
  • a structured processing sub-module 422 configured to structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data;
  • the geographic knowledge base includes a library of geographic information entities and the relationships between each of said geographic information entities;
  • the latitude and longitude information confirmation submodule 423 is configured to use the latitude and longitude data corresponding to the address data as the latitude and longitude information corresponding to the address identifier based on the correspondence between the address and the latitude and longitude.
  • the device further includes a final latitude and longitude information confirmation module 430, configured to determine a final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to the plurality of latitude and longitude information.
  • a final latitude and longitude information confirmation module 430 configured to determine a final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to the plurality of latitude and longitude information.
  • the final latitude and longitude information confirmation module 430 may further include:
  • the density clustering sub-module 431 is configured to cluster the plurality of latitude and longitude information based on the first distance threshold and the first sample threshold based on the address identifier corresponding to the plurality of latitude and longitude information to obtain at least A cluster
  • the maximum cluster determination sub-module 432 is configured to select a maximum cluster from the at least one cluster cluster.
  • the maximum cluster determination sub-module 432 is further configured to use a cluster cluster having the largest amount of longitude and latitude information contained in the cluster cluster as the largest cluster.
  • the final latitude and longitude information confirmation module 430 may further include a final longitude and latitude information acquisition submodule 433, which is configured to average the longitude and latitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final longitude and latitude corresponding to the address identifier. information;
  • the confidence data obtaining module 440 is configured to obtain a first quantity of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first number between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier. A distance; and,
  • the confidence determination module 450 is configured to determine a confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance.
  • the confidence determination module 450 may further include:
  • An initial score determination submodule configured to determine an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold;
  • a loss score determination submodule configured to determine a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity
  • the confidence determination sub-module is configured to determine the confidence of the address identifier and the final latitude and longitude information based on the preset basic score, the initial score, and the penalty loss score.
  • the device may further include:
  • a positioning service response module is configured to, when receiving a positioning service request, perform a positioning service response to the positioning service request according to the accuracy requirements of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. .
  • a merchant address adjustment module is configured to adjust an address of a merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information.
  • the geographic knowledge base updating module is configured to update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
  • An embodiment of the present application further provides an electronic device, including:
  • An embodiment of the present application further provides a readable storage medium, and when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the address identification and the latitude and longitude mining method described above.
  • the raw data used to mine the address identifier and its latitude and longitude are obtained; the raw data includes interest point data and / or user-originated content behavior data; the address identifier in the raw data is obtained, and The latitude and longitude information corresponding to the address identifier; for the address identifier corresponding to multiple latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm, thereby reducing the labor cost of obtaining the address identifier and its latitude and longitude, and simultaneously improving the Timeliness.
  • address latitude and longitude data in the original data may also be obtained;
  • the address latitude and longitude data includes address data, latitude and longitude data, and the correspondence between addresses and latitude and longitude;
  • the geographic knowledge base includes a geographic information entity library, and each of the geographic The relationship between the information entities; based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the address data is used as the latitude and longitude information corresponding to the address identifier. Therefore, the mining efficiency of the address identifier and its latitude and longitude is further improved.
  • the multiple latitude and longitude information may be clustered by using a density clustering algorithm according to the first distance threshold and the first sample threshold.
  • Class to obtain at least one cluster cluster; select the largest cluster from the at least one cluster cluster; average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude corresponding to the address identifier information.
  • a cluster cluster having the largest amount of longitude and latitude information included in the cluster cluster is used as the largest cluster. Thereby, the accuracy of the determined final latitude and longitude information is improved.
  • the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier may also be obtained.
  • an initial score of the address identifier and the final latitude and longitude information is determined according to the first distance, a preset basic score, and a preset distance threshold; the address identifier and the first identifier are determined according to the first number.
  • the penalty loss score of the final latitude and longitude information based on the preset basic score, the initial score, and the penalty loss score, determining the confidence between the address identifier and the final latitude and longitude information, so that it can be further determined The accuracy of each address identifier and its final latitude and longitude information.
  • the positioning service request when a positioning service request is received, the positioning service request is performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information.
  • Positioning service response adjusting the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information; updating the address based on the adjusted merchant address and the relationship between the merchant address and the road. Therefore, the mined address identifier and the final latitude and longitude information are applied based on the confidence, and the timeliness and accuracy of the geographic knowledge base are improved.
  • the description is relatively simple.
  • the related parts refer to the description of the method embodiment.
  • modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment.
  • the modules or units or components in the embodiment may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined.
  • the various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used to implement some or all of some or all of the components in the address identification and its latitude and longitude mining equipment according to the embodiments of the present application.
  • DSP digital signal processor
  • the application may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing a part or all of the methods described herein.
  • Such a program that implements the present application may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
  • FIG. 6 illustrates an electronic device that can implement the method according to the present application, which traditionally includes a processor 610 and a computer program product in the form of a memory 620 or a computer-readable storage medium.
  • the electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, or the like.
  • the electronic device is a computing processing device.
  • the memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM.
  • the memory 620 has a storage space 6201 for program code for performing any of the method steps in the above method.
  • the storage space 6201 for program code may include program code 6202 for implementing various steps in the above method, respectively.
  • the computer program product or computer-readable storage medium stores program code of a computer program, and when the program code is executed by the processor 610, the address identifier and the latitude and longitude described in Embodiments 1 and 2 of the present application are implemented Method of mining. These program codes may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product or computer-readable storage medium is typically a portable or fixed storage unit as described with reference to FIG. 7.
  • the storage unit is configured to hold or carry program code that implements the method according to the present application, and the storage unit may have a storage segment, a storage space, and the like arranged similar to the memory 620 in the electronic device of FIG. 6.
  • the program code may be compressed, for example, in a suitable form.
  • the storage unit includes computer-readable code 6202 ', that is, a code that can be read by, for example, a processor such as 610, and these codes, when the electronic device is running, cause the electronic device to perform the steps in the method described above. .
  • one embodiment means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Also, please note that the word examples "in one embodiment” herein do not necessarily refer to the same embodiment.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claim listing several devices, several of these devices may be embodied by the same hardware item.
  • the use of the words first, second, and third does not imply any order. These words can be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for mining an address identifier and the longitude and latitude thereof, relating to the technical field of navigation and positioning. The method comprises: obtaining original data for mining an address identifier and the longitude and latitude thereof (110); obtaining the address identifier in the original data, and longitude and latitude information corresponding to the address identifier (120); and for the address identifier corresponding to multiple pieces of longitude and latitude information, determining the final longitude and latitude information corresponding to the address identifier by means of a clustering algorithm (130).

Description

地址标识及其经纬度挖掘Address identification and latitude and longitude mining
本申请要求在2018年9月12日提交中国专利局、申请号为201811064084.X、发明名称为“一种地址标识及其经纬度的挖掘方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on September 12, 2018, with an application number of 201811064084.X and an invention name of "a method and device for mining an address mark and its latitude and longitude". Incorporated by reference in this application.
技术领域Technical field
本申请涉及导航定位技术,具体涉及地址标识及其经纬度的挖掘。This application relates to navigation and positioning technology, and in particular to address identification and mining of its latitude and longitude.
背景技术Background technique
随着移动互联网技术的发展和智能手机应用的普及,人们对于导航定位的速度、精度和场景适应性的要求也越来越高,而且大多公司业务中存在大量的定位需求。在实际应用中,一般可以采用道路名称、门牌号等等地址标识对实体场所进行标号区分,而通过获取地址标识的位置信息,则可以辅助定位能力提升到地址标识级别。在现有的技术中,在线地图或者电子地图等商家一般采用人力实采的方式获取地址标识及其经纬度,定期地花费人力去各个街道录入。With the development of mobile Internet technology and the popularization of smart phone applications, people have increasingly higher requirements for the speed, accuracy and scene adaptability of navigation and positioning, and there are a large number of positioning requirements in most company businesses. In practical applications, address identifications such as road names, house numbers, etc. are generally used to distinguish between physical locations, and by obtaining the location information of the address identification, the positioning capability can be improved to the level of address identification. In the existing technology, businesses such as online maps or electronic maps generally use manual methods to obtain address identifiers and their latitude and longitude, and regularly spend human resources to enter the streets.
发明内容Summary of the Invention
依据本申请的第一方面,提供了一种地址标识及其经纬度的挖掘方法,包括:According to a first aspect of the present application, a method for mining an address identifier and its latitude and longitude is provided, including:
获取用以挖掘地址标识及其经纬度的原始数据;所述原始数据包括兴趣点数据和/或用户原创内容行为数据;Obtaining raw data used to mine the address identifier and its latitude and longitude; the raw data includes point of interest data and / or user original content behavior data;
获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;Obtaining an address identifier in the original data, and latitude and longitude information corresponding to the address identifier;
针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。For the address identifier corresponding to multiple latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm.
根据本申请的第二方面,提供了一种地址标识及其经纬度的挖掘装置,包括:According to a second aspect of the present application, a mining device for address identification and latitude and longitude is provided, including:
原始数据获取模块,用于获取用以挖掘地址标识及其经纬度的原始数据;所述原始数据包括兴趣点数据和/或用户原创内容行为数据;A raw data acquisition module, configured to obtain raw data used to mine the address identifier and its latitude and longitude; the raw data includes point of interest data and / or user original content behavior data;
数据挖掘模块,用于获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;A data mining module, configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier;
最终经纬度信息确认模块,用于针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。The final latitude and longitude information confirmation module is configured to determine the final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to multiple latitude and longitude information.
根据本申请的第三方面,提供了一种电子设备,包括:处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现前述的地址标识及其经纬度的挖掘方法。According to a third aspect of the present application, there is provided an electronic device including: a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor executing the computer program To realize the aforementioned address identification and its latitude and longitude mining method.
根据本申请的第四方面,提供了一种可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如前述的地址标识及其经纬度的挖掘方法。According to a fourth aspect of the present application, a readable storage medium is provided, and when instructions in the storage medium are executed by a processor of an electronic device, the electronic device is capable of performing the foregoing address identification and a method of mining latitude and longitude as described above. .
根据本申请实施例公开的一种地址标识及其经纬度的挖掘方法,可以获取用以挖掘地址标识及其经纬度的原始数据;获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。由此取得了降低地址标识及其经纬度的挖掘人力成本,同时提高挖掘得到的地址标识及其最终经纬度信息的时效性以及准确性的有益效果。According to a method for mining an address identifier and its latitude and longitude disclosed in the embodiments of the present application, raw data used to mine the address identifier and its latitude and longitude can be obtained; the address identifier in the original data and the address identifier corresponding to the address identifier can be obtained. Longitude and latitude information; for address identifiers corresponding to multiple longitude and latitude information, a final latitude and longitude information corresponding to the address identifiers is determined by a clustering algorithm. As a result, the beneficial effect of reducing the labor cost of address identification and its latitude and longitude, and improving the timeliness and accuracy of the address identification and its final latitude and longitude information obtained at the same time.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solution of this application. In order to understand the technical means of this application more clearly, it can be implemented in accordance with the contents of the description, and in order to make the above and other purposes, features and advantages of this application more obvious and understandable The specific implementations of this application are listed below.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the detailed description of the preferred embodiments below. The drawings are only for the purpose of illustrating preferred embodiments and are not to be considered as limiting the present application. Moreover, the same reference numerals are used throughout the drawings to refer to the same parts. In the drawings:
图1示出了根据本申请一个实施例的一种地址标识及其经纬度的挖掘方法的步骤流程图;FIG. 1 shows a flowchart of a method for mining an address identifier and its latitude and longitude according to an embodiment of the present application;
图2示出了根据本申请一个实施例的一种地址标识及其经纬度的挖掘方法的步骤流程图;FIG. 2 shows a flowchart of a method for mining an address identifier and its latitude and longitude according to an embodiment of the present application;
图3示出了根据本申请一个实施例的一种密度聚类示意图;FIG. 3 is a schematic diagram of a density clustering according to an embodiment of the present application; FIG.
图4示出了根据本申请一个实施例的一种地址标识及其经纬度的挖掘 装置的结构示意图;以及FIG. 4 is a schematic structural diagram of an address identifier and a mining device for latitude and longitude according to an embodiment of the present application; and
图5示出了根据本申请一个实施例的一种地址标识及其经纬度的挖掘装置的结构示意图;FIG. 5 is a schematic structural diagram of a mining device for address identification and latitude and longitude according to an embodiment of the present application; FIG.
图6示意性地示出了用于执行根据本申请的方法的电子设备的框图;以及FIG. 6 schematically illustrates a block diagram of an electronic device for performing a method according to the present application; and
图7示意性地示出了用于保持或者携带实现根据本申请的方法的程序代码的存储单元。FIG. 7 schematically illustrates a storage unit for holding or carrying a program code implementing a method according to the present application.
具体实施例Specific embodiment
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present application will be described in more detail with reference to the accompanying drawings. Although exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application can be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the application, and to fully convey the scope of the application to those skilled in the art.
实施例一Example one
详细介绍本申请实施例提供的一种地址标识及其经纬度的挖掘方法。An address identifier and a method for mining latitude and longitude provided by the embodiment of the present application are described in detail.
参照图1,示出了本申请实施例中的一种地址标识及其经纬度的挖掘方法的步骤流程图。Referring to FIG. 1, a flowchart of steps of a method for mining an address identifier and its latitude and longitude in an embodiment of the present application is shown.
步骤110,获取用以挖掘地址标识及其经纬度的原始数据。Step 110: Obtain raw data used to mine the address identifier and its latitude and longitude.
在实际应用中,外卖平台、在线评价网站、在线购物平台等等用户或商家可以进行输入信息、评价等的平台可以记录用户的兴趣点(POI,Point of Interest)数据和/或用户原创内容(UGC,User Generated Content)行为数据。其中,POI数据也可以叫做"Point of Information",即"信息点"。POI数据可以包括但不限于名称、地址、经度、纬度、类别等。在电子地图上一般用气泡图标来表示POI,像电子地图上的景点、政府机构、公司、商场、饭馆等,都是POI。UGC的概念最早起源于互联网领域,即用户将自己原创的内容通过互联网平台进行展示或者提供给其他用户。UGC并不是某一种具体的业务,而是一种用户使用互联网的新方式,即由原来的以下载为主变成下载和上传并重。随着互联网运用的发展,网络用户的交互作用得以体现,用户既是网络内容的浏览者,也是网络内容的创造者。在本申请实施例中的UGC行为数据可以包括但不限于用户原创内容对应的任何数据,例如更改的状态、发表的日志、发布的照片、发布的评价、报错的行为数据、新增的行为数据,等等。而且报错的行为 数据具体可以包括但不限于针对地址,和/或针对某一地址所对应的经纬度信息的报错,而且新增的行为数据也可以包括但不限于新增地址,和/或新增针对某一地址的经纬度信息,等等。In practical applications, takeaway platforms, online evaluation sites, online shopping platforms, etc., where users or merchants can enter information, and evaluation platforms can record user points of interest (POI, Point of Interest) data and / or user-originated content ( UGC (User Generated Content) behavior data. Among them, POI data can also be called "Point of Information", that is, "information points". The POI data may include, but is not limited to, name, address, longitude, latitude, category, and the like. On the electronic map, the bubble icon is generally used to indicate the POI. Attractions, government agencies, companies, shopping malls, restaurants, etc. on the electronic map are all POIs. The concept of UGC originated in the Internet field, that is, users display their original content through the Internet platform or provide it to other users. UGC is not a specific business, but a new way for users to use the Internet, that is, changing from download-oriented to download and upload-oriented. With the development of the use of the Internet, the interaction of network users is reflected, and users are both viewers and creators of network content. The UGC behavior data in the embodiment of the present application may include, but is not limited to, any data corresponding to user-originated content, such as changed status, posted logs, published photos, published reviews, error behavior data, and added behavior data. ,and many more. Moreover, the error behavior data may specifically include, but is not limited to, an address error, and / or an error report for a longitude and latitude information corresponding to an address, and the added behavior data may also include, but is not limited to, a new address, and / or a new Latitude and longitude information for an address, etc.
因此,在本申请实施例中,可以通过任何可用方式获取用以挖掘地址标识及其经纬度的原始数据,而且原始数据具体包含的内容也可以根据需求进行预先设置,对此本申请实施例不加以限定。Therefore, in the embodiment of the present application, the raw data used to mine the address identifier and its latitude and longitude can be obtained by any available method, and the content specifically contained in the raw data can also be preset according to requirements. limited.
可选地,在本申请实施例中,原始数据可以包括但不限于兴趣点数据和/或用户原创内容行为数据。Optionally, in the embodiment of the present application, the original data may include, but is not limited to, point of interest data and / or user original content behavior data.
例如,以美团点评平台为例,美团点评平台包含有大量的用户,因此具有大量POI数据包含的地址和经纬度,及大量用户新增和报错等行为中带来的地址和经纬度。因此则可以从美团点评平台中大量用户的POI数据和UGC行为数据中挖掘出地址与经纬度。那么此时则可以该平台中的POI数据和/或UGC行为数据作为原始数据。当然,在本申请实施例中,可以根据需求设置原始数据的具体来源以及获取渠道,对此本申请实施例不加以限定。For example, taking the Meituan review platform as an example, the Meituan review platform contains a large number of users, so it has a large number of addresses and latitudes and longitudes contained in POI data, and addresses and latitudes and latitudes caused by a large number of user additions and errors. Therefore, the addresses and latitudes and longitudes can be mined from the POI data and UGC behavior data of a large number of users in the Meituan review platform. Then at this time, the POI data and / or UGC behavior data in the platform can be used as the original data. Of course, in the embodiment of the present application, the specific source of the original data and the acquisition channel can be set according to requirements, which is not limited in the embodiment of the present application.
步骤120,获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息。Step 120: Obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier.
如前述,获取的原始数据中可以包含地址、经度、纬度等数据,而在地址中一般可以具体到省、市、区/县、乡镇、村、街道以及门牌号,等等。因此,在本申请实施例中,在获取得到原始数据之后,则可以进一步获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息。具体的,可以通过任何可用方法获取原始数据中的地址标识,以及与所述地址标识对应的经纬度信息,对此本申请实施例不加以限定。As mentioned above, the obtained raw data can include data such as address, longitude, and latitude, and the address can generally be specific to the province, city, district / county, township, village, street, house number, and so on. Therefore, in the embodiment of the present application, after obtaining the original data, the address identifier in the original data and the latitude and longitude information corresponding to the address identifier may be further obtained. Specifically, the address identifier in the original data and the latitude and longitude information corresponding to the address identifier may be obtained by any available method, which is not limited in this embodiment of the present application.
其中的地址标识可以包括道路名称以及门牌号,当然还可以包括国家名称、省份名称、市级名称、区级名称、县级名称、乡镇级名称、村级名称中的一个或多个,具体的,地址标识包括的信息可以根据需求进行预先设置,对此本申请实施例不加以限定。经纬度信息可以包括相应地址标识对应的经度和/或纬度。The address identifier may include a road name and a house number, and of course, it may include one or more of a country name, a province name, a city name, a district name, a county name, a township name, and a village name. The information included in the address identifier can be preset according to requirements, which is not limited in the embodiment of the present application. The longitude and latitude information may include the longitude and / or latitude corresponding to the corresponding address identifier.
步骤130,针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。Step 130: For the address identifier corresponding to the plurality of latitude and longitude information, determine the final latitude and longitude information corresponding to the address identifier through a clustering algorithm.
在实际应用中,不同的用户或者是同一用户在不同时刻都可以对同样的地址标识设置经纬度信息。因此同一地址标识可能对应多个经纬度信息,而且多个经纬度信息可以是不完全相同,那么则无法准确确定该地 址标识对应的准确的经纬度信息。因此,在本申请实施例中,对于对应多个经纬度信息的地址标识,则可以通过聚类算法确定所述地址标识对应的最终经纬度信息。在本申请实施例中,可以通过任何一种聚类算法确定各地址标识对应的最终经纬度信息,具体的,可以根据需求预先设置聚类方法,对此本申请实施例不加以限定。例如,可以采用K-Means(K均值)聚类、均值漂移聚类、基于密度的聚类、用高斯混合模型的最大期望聚类、凝聚层次聚类以及图团体检测(Graph Community Detection)聚类,等等中的任意一种。In practical applications, different users or the same user can set the same latitude and longitude information at different times. Therefore, the same address identifier may correspond to multiple latitude and longitude information, and the multiple latitude and longitude information may not be exactly the same, so it is impossible to accurately determine the accurate longitude and latitude information corresponding to the address identifier. Therefore, in the embodiment of the present application, for an address identifier corresponding to multiple pieces of latitude and longitude information, the final latitude and longitude information corresponding to the address identifier may be determined through a clustering algorithm. In the embodiment of the present application, the final longitude and latitude information corresponding to each address identifier may be determined by any kind of clustering algorithm. Specifically, a clustering method may be set in advance according to requirements, which is not limited in this embodiment of the present application. For example, K-Means (K-means) clustering, mean-shift clustering, density-based clustering, maximum expected clustering using a Gaussian mixture model, clustering hierarchical clustering, and graph community detection (Graphics Community Detection) clustering , And so on.
当然,在本申请实施例中,对于对应一个经纬度信息的地址标识,则可以直接以该经纬度信息作为相应地址标识的最终经纬度信息。Of course, in the embodiment of the present application, for an address identifier corresponding to a latitude and longitude information, the longitude and latitude information may be directly used as the final latitude and longitude information of the corresponding address identifier.
在本申请实施例中,通过获取用以挖掘地址标识及其经纬度的原始数据;获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。从而降低了获取地址标识及其经纬度的人力成本,同时提高了时效性。In the embodiment of the present application, the original data used to mine the address identifier and its latitude and longitude are obtained; the address identifier in the original data and the latitude and longitude information corresponding to the address identifier are obtained; and for an address corresponding to multiple latitude and longitude information Identification, the final latitude and longitude information corresponding to the address identification is determined by a clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
实施例二Example two
详细介绍本申请实施例提供的一种地址标识及其经纬度的挖掘方法。An address identifier and a method for mining latitude and longitude provided by the embodiment of the present application are described in detail.
参照图2,示出了本申请实施例中的一种地址标识及其经纬度的挖掘方法的步骤流程图。Referring to FIG. 2, a flowchart of a method for mining an address identifier and its latitude and longitude in an embodiment of the present application is shown.
步骤210,获取用以挖掘地址标识及其经纬度的原始数据。Step 210: Obtain raw data used to mine the address identifier and its latitude and longitude.
步骤220,获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系。Step 220: Obtain address latitude and longitude data in the original data. The address latitude and longitude data includes address data, latitude and longitude data, and a correspondence between the address and the latitude and longitude.
如前述,在实际应用中,原始数据中一般可以包括地址,经纬度等数据,而且经纬度数据一般是某一地点的经纬度数据,同时还可以基于地址描述各地点,因此同一地点的地址数据以及经纬度数据之间存在对应关系。因此,在本申请实施例中,可以获取所述原始数据中的地址经纬度数据。其中,地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系。As mentioned above, in actual applications, the raw data can generally include data such as address, latitude and longitude, and the latitude and longitude data is generally the latitude and longitude data of a certain place.At the same time, the points can be described based on the address. There is a correspondence between them. Therefore, in the embodiment of the present application, address latitude and longitude data in the original data can be obtained. The address latitude and longitude data includes address data, longitude and latitude data, and the correspondence between the address and the latitude and longitude.
步骤230,基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系。Step 230: Structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data; the geographic knowledge base includes geographic A library of information entities, and relationships between each of said geographic information entities.
其中地理信息实体库中可以包括不同的国家名称、省份名称、城市名称、 县区名称、乡镇名称、道路名称,等等。例如上海市、北京市、成都市等的城市,长宁区、华阴县等的区县,安化路,长安街,朱雀大街等的道路名称。各地理信息实体之间的关系可以包括不同等级的地理信息实体之间的包含关系,或者是从属关系,等等。诸如“上海市”包含“长宁区”,“长宁区”包含“安化路”,“长宁区”从属于“上海市”等。The geographic information entity database may include different country names, province names, city names, county names, township names, road names, and so on. For example, cities in Shanghai, Beijing, Chengdu, etc., districts and counties in Changning District, Huayin County, Anhua Road, Chang'an Street, Zhuque Street, etc. The relationship between geographic information entities may include the inclusion relationship between different levels of geographic information entities, or a subordinate relationship, and so on. For example, "Shanghai" includes "Changning District", "Changning District" includes "Anhua Road", and "Changning District" is subordinate to "Shanghai".
在本申请实施例中,则可以基于地理知识库,对获取得到的地址数据进行结构化处理,得到与所述地理知识库中的地理信息实体匹配的结构化的地址数据。例如,对于地址数据“长宁区安化路492号德必易园”,经结构化处理后得到:“长宁区”、“安化路”、“492号”、“德必易园”,且分别对应的类型为“区县”、“道路”、“门牌号”、“地标”的地址数据。In the embodiment of the present application, the acquired address data may be structured based on the geographic knowledge base to obtain structured address data matching the geographic information entity in the geographic knowledge base. For example, for the address data "Debianyi Park, No. 492 Anhua Road, Changning District", after structured processing, "Changning District", "Anhua Road", "492", and "Debiyi Park" are obtained, and The corresponding types are address data of "district", "road", "house number", and "landmark".
对于结构化处理后的地址数据,则可以准确获取地址数据中的道路名称以及门牌号,那么则可以取结构化处理后地址数据中的道路名称以及门牌号作为相应地址数据对应的地址标识。例如,对于上述的地址数据“长宁区安化路492号德必易园”,取结构化处理后得到的地址数据中的道路名称以及门牌号作为相应地址数据对应的地址标识,则可以得到其地址标识为“安化路492号”。For the structured address data, the road names and house numbers in the address data can be accurately obtained, and then the road names and house numbers in the structured address data can be taken as the address identifier corresponding to the corresponding address data. For example, for the above-mentioned address data "Debi Yiyuan, No. 492 Anhua Road, Changning District", taking the road name and the house number in the address data obtained after the structured processing as the address identifier corresponding to the corresponding address data, it can be obtained The address is identified as "492 Anhua Road".
步骤240,基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。Step 240: Based on the correspondence between the address and the latitude and longitude, use the latitude and longitude data corresponding to the address data as the latitude and longitude information corresponding to the address identifier.
在确定了地址数据对应的地址标识之后,则可以进一步基于地址与经纬度的对应关系,将与相应的地址数据对应的经纬度数据作为与相应的地址标识对应的经纬度信息。After the address identifier corresponding to the address data is determined, based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the corresponding address data may be used as the latitude and longitude information corresponding to the corresponding address identifier.
例如,假设基于地址与经纬度的对应关系,上述的地址数据“长宁区安化路492号德必易园”对应的经纬度数据包括(经度1,纬度1)、(经度2,纬度2),那么则可以得到该地址数据的地址标识“安化路492号”所对应的经纬度信息为(经度1,纬度1)、(经度2,纬度2)。For example, suppose that based on the correspondence between the address and latitude and longitude, the latitude and longitude data corresponding to the above address data "Debi Yiyuan, No. 492 Anhua Road, Changning District" includes (longitude 1, latitude 1), (longitude 2, latitude 2), then Then, the longitude and latitude information corresponding to the address identifier “492 Anhua Road” of the address data is (longitude 1, latitude 1), (longitude 2, latitude 2).
步骤250,基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇。Step 250: Based on the address identifiers corresponding to the plurality of latitude and longitude information, use the density clustering algorithm to cluster the plurality of latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one clustering cluster.
在实际应用中,由于层次聚类算法和划分式聚类算法往往只能发现凸形的聚类簇。为了弥补这一缺陷,以发现任意形状的聚类簇,因此开发出基于密度的聚类算法(即密度聚类算法)。这类算法认为,在整个样本空间点中,各聚类簇是由一群稠密样本点组成的,而这些稠密样本点被低密度区域(噪 声)分割,而算法的目的就是要过滤低密度区域,发现稠密样本点。那么,在本申请实施例中,由于可以通过经纬度坐标点表征各经纬度信息,而且通过聚类簇中包含的经纬度坐标点的多少即可以确定地址标识所对应的各个经纬度信息的准确度。因此,在本申请实施例中,考虑到密度聚类算法的准确性以及可操作性更高,优选地可以针对对应多个经纬度信息的地址标识,通过密度聚类算法确定所述地址标识对应的最终经纬度信息。In practical applications, because of hierarchical clustering algorithms and partitioned clustering algorithms, only convex clusters can be found. In order to make up for this shortcoming, in order to find clusters of arbitrary shapes, a density-based clustering algorithm (that is, a density clustering algorithm) has been developed. This type of algorithm believes that in the entire sample space points, each cluster is composed of a group of dense sample points, and these dense sample points are segmented by low-density regions (noise), and the purpose of the algorithm is to filter low-density regions. Find dense sample points. Then, in the embodiment of the present application, since the latitude and longitude information can be characterized by the latitude and longitude coordinate points, and the accuracy of each latitude and longitude information corresponding to the address identifier can be determined by the number of latitude and longitude coordinate points included in the cluster. Therefore, in the embodiment of the present application, considering the accuracy and operability of the density clustering algorithm, it is preferable to determine the address identifier corresponding to multiple latitude and longitude information through the density clustering algorithm. Final latitude and longitude information.
其中的密度聚类算法可以为DBSCAN(Densit-based Spatial Clustering of Application with Noise,具有噪声的基于密度的聚类方法),OPTICS(Ordering Pointing To Identify the Cluster Structure),DENCLUE等任意一种密度聚类算法。The density clustering algorithm can be any density clustering such as DBSCAN (Densit-based Spatial Clustering of Application with Noise), OPTICS (Ordering Pointing To Identify, Cluster, Structure), DENCLUE, etc. algorithm.
例如,可以通过密度聚类算法对同一地址标识对应的多个经纬度信息进行聚类,得到至少一个聚类簇,然后可以从各个聚类簇中选择出最大簇,进而可以将最大簇中包含的经纬度信息的平均值作为相应地址标识的最终经纬度信息,所述最终经纬度信息为更加准确的经纬度信息。For example, a density clustering algorithm can be used to cluster multiple latitude and longitude information corresponding to the same address identifier to obtain at least one cluster cluster, and then the largest cluster can be selected from each cluster cluster, and the The average value of the latitude and longitude information is used as the final latitude and longitude information identified by the corresponding address, and the final latitude and longitude information is more accurate latitude and longitude information.
基于密度的聚类算法,也即密度聚类算法一般假定类别可以通过样本分布的紧密程度决定。同一类别的样本,他们之间的紧密相连的,也就是说,在该类别任意样本周围不远处一定有同类别的样本存在。通过将紧密相连的样本划为一类,这样就得到了一个聚类类别。通过将所有各组紧密相连的样本划为各个不同的类别,则我们就得到了最终的所有聚类类别结果。The density-based clustering algorithm, that is, the density clustering algorithm generally assumes that the category can be determined by the closeness of the sample distribution. Samples of the same category are closely connected, that is, there must be samples of the same category not far from any sample of that category. By grouping closely connected samples into one category, a clustering category is obtained. By dividing all closely connected samples into different categories, we get the final results for all clustering categories.
以DBSCAN密度聚类算法为例,DBSCAN密度聚类算法是基于一组邻域来描述样本集的紧密程度的,参数(∈,MinPts)用来描述邻域的样本分布紧密程度。其中,∈描述了某一样本的邻域距离阈值,也即第一距离阈值,MinPts描述了某一样本的距离为∈的邻域中样本个数的阈值,也即第一样本阈值。Taking the DBSCAN density clustering algorithm as an example, the DBSCAN density clustering algorithm is based on a set of neighborhoods to describe the closeness of the sample set, and the parameter (∈, MinPts) is used to describe the closeness of the sample distribution in the neighborhood. Among them, ∈ describes the neighborhood distance threshold of a certain sample, that is, the first distance threshold, and MinPts describes the threshold of the number of samples in the neighborhood where the distance of a certain sample is ∈, that is, the first sample threshold.
假设样本集是D=(x 1,x 2,...,x m),则DBSCAN具体的密度描述定义如下: Assuming the sample set is D = (x 1 , x 2 , ..., x m ), the specific density description of DBSCAN is defined as follows:
1)∈邻域:对于x j∈D,其∈邻域包含样本集D中与x j的距离不大于∈的子样本集,即N∈(x j)={x i∈D|distance(x i,x j)≤∈},这个子样本集的个数记为|N∈(x j)|。 1) ∈ neighborhood: For x j ∈ D, its ∈ neighborhood includes the sub-sample set in the sample set D whose distance from x j is not greater than ∈, that is, N∈ (x j ) = {x i ∈ D | distance ( x i , x j ) ≤∈}, the number of this sub-sample set is recorded as | N∈ (x j ) |.
2)核心对象:对于任一样本x j∈D,如果其∈邻域对应的N∈(x j)至少包含MinPts个样本,即如果|N∈(x j)|≥MinPts,则x j是核心对象。 2) Core object: For any sample x j ∈ D, if N ∈ (x j ) corresponding to its ∈ neighborhood contains at least MinPts samples, that is, if | N ∈ (x j ) | ≥ MinPts, then x j is The core object.
3)密度直达:如果x i位于x j的∈邻域中,且x j是核心对象,则称x i由x j密度直达。反之不一定成立,即此时不能说x j由x i密度直达,除非x i也是 核心对象。 3) Direct Density: if x i ∈ located in the neighborhood of x j and x j is the core object called x i x j by a direct density. The opposite is not necessarily true, that is, at this time, it cannot be said that x j is directly reached by x i density, unless x i is also the core object.
4)密度可达:对于x i和x j,如果存在样本序列p 1,p 2,...,p T,满足p 1=x i,p T=x j,且p t+1由p t密度直达,则称x j由x i密度可达。也就是说,密度可达满足传递性。此时序列中的传递样本p 1,p 2,...,p T-1均为核心对象,因为只有核心对象才能使其他样本密度直达。密度可达也不满足对称性,这个可以由密度直达的不对称性得出。 4) Density reachable: For x i and x j , if there are sample sequences p 1 , p 2 , ..., p T , satisfy p 1 = x i , p T = x j , and p t + 1 by p If t density is directly reached, then x j is said to be reachable from x i density. In other words, the density can reach transitivity. At this time, the transfer samples p 1 , p 2 , ..., p T-1 in the sequence are all core objects, because only the core objects can make other samples directly reach the density. Density reachable does not satisfy symmetry, this can be derived from the asymmetry of density direct reach.
5)密度相连:对于x i和x j,如果存在核心对象样本x k,使x i和x j均由x k密度可达,则称x i和x j密度相连。密度相连关系是满足对称性的。 5) Density connection: For x i and x j , if there is a core object sample x k such that both x i and x j are reachable from x k density, it is said that x i and x j density are connected. The density connection relationship satisfies symmetry.
如图3可以很容易理解上述定义,假设图中的MinPts=5,其中每个带有箭头的线段起点所在的样本都是核心对象,因为其∈邻域至少有5个样本。其他的样本则是非核心对象。所有核心对象密度直达的样本在以相应核心对象为中心的超球体内,如果不在超球体内,则不能密度直达。图中用带有箭头的线段连起来的核心对象组成了密度可达的样本序列。在这些密度可达的样本序列的∈邻域内所有的样本相互都是密度相连的。As shown in Figure 3, the above definition can be easily understood. Assume that MinPts = 5 in the figure, where the sample at the beginning of each line segment with an arrow is the core object, because its ∈ neighborhood has at least 5 samples. Other samples are non-core objects. All samples with direct core object density are in the hypersphere with the corresponding core object as the center. If they are not in the hypersphere, they cannot be directly dense. The core objects connected by line segments with arrows in the figure form a sequence of samples with a high density. All samples in the ∈ neighborhood of these dense sample sequences are density-connected to each other.
如果基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用DBSCAN密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇。此时上述的样本集D=(x 1,x 2,...,x m)中包含的样本则为一个地址标识对应的多个经纬度信息。其中,第一距离阈值以及第一样本阈值的具体取值可以根据需求进行预先设置,对此本申请实施例不加以限定。对于每个地址标识所关联的经纬度信息,其在地图上均可显示,并可计算各个经纬度信息之间的距离。如图3所示,一个地址标识对应多个经纬度信息,图3中的一个样本点对应一个经纬度信息,经DBSCAN密度聚类后,一个地址标识对应的多个经纬度点可能聚成至少一个聚类簇。如图3则得到两个聚类簇,分别为左右两侧两个连续的有向线段所连接的核心对象对应的超球体构成的聚类簇。 If based on the address identifiers corresponding to multiple latitude and longitude information, the DBSCAN density clustering algorithm is used to cluster the multiple latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one cluster. At this time, the samples contained in the above-mentioned sample set D = (x 1 , x 2 , ..., x m ) are multiple latitude and longitude information corresponding to one address identifier. The specific values of the first distance threshold and the first sample threshold can be preset according to requirements, which is not limited in the embodiment of the present application. The latitude and longitude information associated with each address identifier can be displayed on the map, and the distance between each latitude and longitude information can be calculated. As shown in Figure 3, one address identifier corresponds to multiple latitude and longitude information, and one sample point in Figure 3 corresponds to one latitude and longitude information. After DBSCAN density clustering, multiple latitude and longitude points corresponding to one address identifier may be clustered into at least one cluster cluster. As shown in Figure 3, two clusters are obtained, which are clusters composed of hyperspheres corresponding to the core objects connected by two consecutive directed line segments on the left and right sides.
步骤260,从所述至少一个聚类簇中选定最大簇。Step 260: Select the largest cluster from the at least one cluster cluster.
很明显,最大簇所包含的经纬度信息对于相应的地址标识更为准确,因此在本申请的一些实施例中,可以从经密度聚类得到的至少一个聚类簇中选定最大簇,以确定相应地址标识的最终经纬度信息。具体的,可以采用任何可用方法选定出最大簇,对此本申请实施例不加以限定。Obviously, the latitude and longitude information contained in the largest cluster is more accurate for the corresponding address identification. Therefore, in some embodiments of the present application, the largest cluster may be selected from at least one cluster cluster obtained by density clustering to determine The final latitude and longitude information identified by the corresponding address. Specifically, any available method may be used to select the largest cluster, which is not limited in the embodiment of the present application.
例如,可以包含的核心对象的数量最多的聚类簇作为最大簇,或者是以包含的样本点数量最多,也即包含的经纬度信息最多的聚类簇作为最大簇, 等等。For example, the cluster with the largest number of core objects may be included as the largest cluster, or the cluster with the largest number of sample points, that is, the cluster with the most longitude and latitude information, is used as the largest cluster, and so on.
可选地,在本申请实施例中,所述步骤260进一步可以包括:Optionally, in the embodiment of the present application, the step 260 may further include:
子步骤261,以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。Sub-step 261: Use the one cluster cluster with the largest amount of latitude and longitude information contained in the cluster cluster as the largest cluster.
优选地,在本申请实施例中,可以一地址标识对应的至少一个聚类簇中包含的经纬度信息数量最多的一个聚类簇作为相应地址标识对应的最大簇。Preferably, in the embodiment of the present application, the cluster with the largest amount of latitude and longitude information contained in at least one cluster cluster corresponding to an address identifier may be used as the largest cluster corresponding to the corresponding address identifier.
例如,对于图3所示的一个地址标识的经纬度信息的聚类效果图。其中的每个样本点对应一个经纬度信息,可以看出其中左侧的聚类簇中包含的经纬度信息数量最多,那么则可以该聚类簇作为相应地址标识对应的最大簇。For example, a clustering effect diagram of longitude and latitude information identified by an address shown in FIG. 3. Each sample point corresponds to a latitude and longitude information. It can be seen that the cluster on the left side contains the largest amount of latitude and longitude information. Then, the cluster cluster can be used as the corresponding cluster to identify the largest cluster.
步骤270,对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。Step 270: Average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude information corresponding to the address identifier.
在确认了地址标识对应的最大簇之后,则可以进一步对相应的最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为相应地址标识对应的最终经纬度信息。如前述,聚类簇中的每个样本点对应一个样本,而本申请实施例中的样本即为经纬度信息,因此聚类簇中的一个样本点对应一个经纬度信息。那么此时可以将最大簇中的每个样本点对应的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。具体的,可以将最大簇中的各个样本点对应的经纬度信息中经度值取平均,同时将各个样本点对应的经纬度信息中纬度值取平均,进而以平均后的经度值以及纬度值,构建得到相应地址标识对应的最终经纬度信息。After confirming the largest cluster corresponding to the address identifier, the latitude and longitude information in the corresponding largest cluster can be further averaged to obtain the averaged latitude and longitude value as the final latitude and longitude information corresponding to the corresponding address identifier. As mentioned above, each sample point in the clustering cluster corresponds to one sample, and the sample in the embodiment of the present application is the latitude and longitude information, so one sample point in the clustering cluster corresponds to one latitude and longitude information. Then at this time, the latitude and longitude information corresponding to each sample point in the largest cluster can be averaged, and the averaged latitude and longitude value can be obtained as the final latitude and longitude information corresponding to the address identifier. Specifically, the longitude and latitude information corresponding to each sample point in the largest cluster can be averaged, meanwhile, the longitude and latitude information corresponding to each sample point can be averaged, and then the averaged longitude and latitude values can be used to construct. The corresponding address identifies the corresponding longitude and latitude information.
步骤280,获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离。Step 280: Obtain a first amount of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first distance between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier.
在实际应用中,经上述步骤可以批量得到的大量地址标识的最终经纬度信息。但是各个最终经纬度信息与相应地址标识的匹配程度并不能确定。而且在定位服务等需要用到地址标识以及相应的最终经纬度信息时,如果定位服务的精度要求较高,而返回的地址标识以及相应的最终经纬度信息可能无法达到相应的精度要求。因此,在本申请实施例中,为了避免上述问题,可以进一步确定各个地址标识以及相应的最终经纬度信息的置信度。具体的,可以基于地址标识对应的最大簇中包含的经纬度信息的第一数量,以及相应地址标识对应的最终经纬度信息与相应地址标识对应的道路之间的第一距离,确定相应地址标识及其最终经纬度信息的置信度。In practical applications, a large amount of the final longitude and latitude information of the address identifiers can be obtained in batches through the above steps. However, the degree of matching between the final latitude and longitude information and the corresponding address identifier cannot be determined. In addition, when address identification and the corresponding final latitude and longitude information are required for positioning services, if the accuracy requirements of the positioning service are high, the returned address identification and corresponding final latitude and longitude information may not meet the corresponding accuracy requirements. Therefore, in the embodiment of the present application, in order to avoid the above problems, the confidence of each address identifier and the corresponding final latitude and longitude information may be further determined. Specifically, based on the first amount of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the first distance between the final latitude and longitude information corresponding to the corresponding address identifier and the road corresponding to the corresponding address identifier, the corresponding address identifier and its The confidence of the final latitude and longitude information.
那么首先需要获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离。Then first, the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier and the first distance between the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier need to be obtained.
其中,在本申请实施例中可以通过任何可用方法获取地址标识对应的最大簇中包含的经纬度信息的第一数量,以及地址标识对应的最终经纬度信息与相应的地址标识对应的道路之间的第一距离。例如,最大簇中每个样本点对应一个经纬度信息,因此可以统计最大簇中样本点的数量,进而得到最大簇中包含的经纬度信息的第一数量。在电子地图中获取地址标识对应的最终经纬度信息所对应的坐标点到相应的地址标识对应的道路之间的垂直距离作为第一距离;等等。Wherein, in the embodiments of the present application, the first quantity of the latitude and longitude information contained in the largest cluster corresponding to the address identifier and the first longitude and latitude information corresponding to the address identifier and the road corresponding to the corresponding address identifier may be obtained by any available method. A distance. For example, each sample point in the largest cluster corresponds to one latitude and longitude information, so the number of sample points in the largest cluster can be counted, thereby obtaining the first number of latitude and longitude information contained in the largest cluster. Obtain the vertical distance from the coordinate point corresponding to the final latitude and longitude information corresponding to the address identifier to the road corresponding to the corresponding address identifier in the electronic map as the first distance; and so on.
步骤290,根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。Step 290: Determine the confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance.
在获取得到第一数量以及第一距离之后,则可以根据第一数量以及第一距离,确定相应的地址标识与其最终经纬度信息的置信度。其中,置信度可以用于表征相应的地址标识与其最终经纬度信息的准确度。而且,第一数量以及第一距离与置信度之间的对应关系可以根据需求进行预先设置,对此本申请实施例不加以限定。例如,可以设置第一数量与置信度成正比关系,且第一距离与置信度成反比关系,此时置信度的取值越大,则说明相应地址标识及其最终经纬度信息的准确度越高;当然也可以设置第一数量与置信度成反比关系,且第一距离与置信度成正比关系,此时置信度的取值越小,则说明相应地址标识及其最终经纬度信息的准确度越高;等等。After obtaining the first number and the first distance, the confidence of the corresponding address identifier and its final latitude and longitude information can be determined according to the first number and the first distance. Among them, the confidence can be used to characterize the accuracy of the corresponding address identifier and its final latitude and longitude information. In addition, the first quantity and the corresponding relationship between the first distance and the confidence can be preset according to requirements, which is not limited in the embodiment of the present application. For example, the first quantity can be set to be proportional to the confidence, and the first distance is inversely proportional to the confidence. At this time, the greater the value of the confidence, the higher the accuracy of the corresponding address identifier and its final latitude and longitude information. ; Of course, it is also possible to set the inverse relationship between the first quantity and the confidence, and the first distance is proportional to the confidence. At this time, the smaller the confidence value is, the more accurate the corresponding address identifier and its final latitude and longitude information are. High; wait.
可选地,在本申请实施例中,所述步骤290进一步可以包括:Optionally, in the embodiment of the present application, the step 290 may further include:
子步骤291,根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最终经纬度信息的初始分值。Sub-step 291: Determine an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold.
其中的预设基础分值以及预设距离阈值都可以根据需求进行预先设置,对此本申请实施例不加以限定。而且,第一距离以及预设基础分值和预设距离阈值与初始分值的映射关系也可以根据需求或经验等进行预先设置,对此本申请实施例也不加以限定。The preset basic score and the preset distance threshold can be set in advance according to requirements, which is not limited in the embodiment of the present application. In addition, the first distance and the mapping relationship between the preset basic score and the preset distance threshold and the initial score may also be set in advance according to requirements or experience, which is not limited in the embodiments of the present application.
子步骤292,根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值。Sub-step 292: Determine a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity.
在实际应用中,如果地址标识对应的最大簇中包含经纬度信息越多,则说明由该最大簇中包含的经纬度信息确认得到的相应地址标识的最终经纬 度信息的准确度越高,反之则越低。因此,在本申请一些实施例中,为了提高最终得到的置信度的准确性,还可以进一步基于相应地址标识对应的最大簇中所包含的经纬度信息的第一数量,确认相应地址标识及其最终经纬度信息的惩罚损失分值。其中,第一数量与惩罚损失分值之间的对应关系可以根据需求或者是经验等进行预先设置,对此本申请实施例不加以限定。例如,可以设置第一数量与惩罚损失分值呈线性反比关系,等等。In practical applications, if the largest cluster corresponding to the address identifier contains more latitude and longitude information, it means that the accuracy of the final latitude and longitude information of the corresponding address identifier confirmed by the latitude and longitude information contained in the largest cluster is higher, and vice versa . Therefore, in some embodiments of the present application, in order to improve the accuracy of the finally obtained confidence, the corresponding address identifier and its final value may be further confirmed based on the first amount of latitude and longitude information contained in the largest cluster corresponding to the corresponding address identifier Penalty loss score for latitude and longitude information. The correspondence between the first quantity and the penalty loss score can be set in advance according to requirements or experience, which is not limited in the embodiment of the present application. For example, a linear inverse relationship between the first quantity and the penalty loss score may be set, and so on.
子步骤293,基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度。Sub-step 293: Based on the preset basic score, the initial score, and the penalty loss score, determine the confidence of the address identifier and the final latitude and longitude information.
在确定了惩罚损失分值之后,则可以基于预设基础分值、所述初始分值以及所述惩罚损失分值,确定相应地址标识及其最终经纬度信息的置信度。其中,置信度与预设基础分值、初始分值以及惩罚损失分值之间的对应关系可以根据需求或者是经验等进行预先设置,对此本申请实施例不加以限定。After the penalty loss score is determined, the confidence of the corresponding address identifier and its final latitude and longitude information may be determined based on the preset basic score, the initial score, and the penalty loss score. The correspondence between the confidence level and the preset basic score, the initial score, and the penalty loss score can be set in advance according to requirements or experience, which is not limited in this embodiment of the present application.
可选地,在本申请实施例中,可以根据公式Optionally, in the embodiment of the present application, according to the formula
C=baseScore+(1-D/threshold)^2×(100-baseScore)-cntLossScore,确定所述地址标识与所述最终经纬度信息的置信度;其中,C表示所述地址标识与所述最终经纬度信息的置信度,baseScore表示预设基础分值,D表示所述第一距离,threshold表示预设距离阈值,cntLossScore表示基于所述第一数量的惩罚损失分值;当所述第一距离大于预设距离阈值时,所述(1-D/threshold)的取值为0。C = baseScore + (1-D / threshold) ^ 2 × (100-baseScore) -cntLossScore, to determine the confidence between the address identifier and the final latitude and longitude information; where C represents the address identifier and the final latitude and longitude information Confidence, baseScore represents a preset basic score, D represents the first distance, threshold represents a preset distance threshold, and cntLossScore represents a penalty loss score based on the first amount; when the first distance is greater than a preset When the distance threshold is set, the value of (1-D / threshold) is 0.
在本申请实施例中,可以设置置信度的取值为百分制,且取值越大则说明相应地址标识及其最终经纬度信息的准确度越高。根据经验则可以根据公式C=baseScore+(1-D/threshold)^2×(100-baseScore)-cntLossScore,确定所述地址标识与所述最终经纬度信息的置信度。其中,baseScore、threshold,以及cntLossScore的具体取值均可以根据需求进行预设设置,对此本申请实施例不加以限定。而且,如果地址标识对应的第一距离大于预设距离阈值,则说明该地址标识的最终经纬度信息距离其所属道路较远,偏差较大,此时则可以直接设置上述公式中的(1-D/threshold)的取值为0,从而降低该地址标识及其最终经纬度信息的置信度。In the embodiment of the present application, the confidence value can be set as a percentage system, and the larger the value, the higher the accuracy of the corresponding address identifier and its final latitude and longitude information. According to experience, the confidence between the address identifier and the final latitude and longitude information can be determined according to the formula C = baseScore + (1-D / threshold) ^ 2 × (100-baseScore) -cntLossScore. The specific values of baseScore, threshold, and cntLossScore can be preset according to requirements, which is not limited in this embodiment of the present application. Moreover, if the first distance corresponding to the address identifier is greater than the preset distance threshold, it means that the final latitude and longitude information of the address identifier is far from the road to which it belongs and the deviation is large. At this time, you can directly set (1-D in the above formula) / threshold) has a value of 0, thereby reducing the confidence of the address identifier and its final latitude and longitude information.
根据上述公式可以看出,此时初始分值与第一距离以及预设基础分值和预设距离阈值之间的对应关系为:According to the above formula, it can be seen that the corresponding relationship between the initial score, the first distance, the preset basic score, and the preset distance threshold is:
InitialScore=(1-D/threshold)^2×(100-baseScore),其中InitialScore为初始分值,此时是取(1-D/threshold)的平方值计算初始分值,在实际应用中也可以 取(1-D/threshold)的N次方计算初始分值,其中的N可以根据需求或经验等进行预先设置,对此本申请实施例不加以限定。InitialScore = (1-D / threshold) ^ 2 × (100-baseScore), where InitialScore is the initial score. At this time, the squared value of (1-D / threshold) is used to calculate the initial score. It can also be used in practical applications. Take the N-th power of (1-D / threshold) to calculate the initial score, where N can be set in advance according to requirements or experience, which is not limited in this embodiment of the present application.
可选地,根据实际生产中的经验,为了能够比较好的将置信度映射到100分的区间内,且能较好的反应地址标识及其最终经纬度的可信度,在本申请一些实施例中,可以设置所述预设基础分值为35;所述预设距离阈值为500米;当所述第一数量小于等于3时,所述惩罚损失分值为20,当所述第一数量大于3且小于等于6时,所述惩罚损失分值为10,当所述第一数量大于6且小于等于10时,所述惩罚损失分值为5,当所述第一数量大于10时,所述惩罚损失分值为0。Optionally, according to the experience in actual production, in order to be able to better map the confidence degree to the interval of 100 points, and to better reflect the credibility of the address identifier and its final latitude and longitude, in some embodiments of the present application , The preset basic score is 35; the preset distance threshold is 500 meters; when the first number is less than or equal to 3, the penalty loss score is 20; when the first number is When the number is greater than 3 and less than or equal to 6, the penalty loss score is 10. When the first number is greater than 6 and less than or equal to 10, the penalty loss score is 5. When the first number is greater than 10, The penalty loss score is zero.
可选地,在本申请实施例中,还包括:Optionally, in the embodiment of the present application, the method further includes:
步骤2110,当接收到定位服务请求时,根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应。Step 2110: When a positioning service request is received, a positioning service response is performed on the positioning service request according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information.
在本申请实施例中,在确定了各地址标识及其最终经纬度信息的置信度之后,则可以基于置信度,将各地址标识及其最终经纬度信息进行应用。比如定位服务,当接收到定位服务请求时,则可以根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应。例如,如果定位服务请求的精度要求很高的话,就只使用置信度非常高的地址标识及其最终经纬度信息,对所述定位服务请求进行定位服务响应;而如果定位服务请求的精度要求不高,则可以宽泛一些。In the embodiment of the present application, after the confidence of each address identifier and its final latitude and longitude information is determined, each address identifier and its final latitude and longitude information can be applied based on the confidence. For example, when a positioning service request is received, a positioning service response to the positioning service request may be performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. For example, if the accuracy requirement of a positioning service request is high, only a highly reliable address identifier and its final latitude and longitude information are used to perform a positioning service response on the positioning service request; and if the accuracy requirement of a positioning service request is not high , You can be broader.
其中,定位服务请求的精度要求与置信度之间的对应关系可以根据需求进行预先设置,对此本申请实施例不加以限定。例如,可以在设置精度要求时,直接设置精度要求的内容为要求置信度在预设分数以上;或者设置不同的精度要求等级,而不同的精度要求等级又分别对应于不同的置信度取值范围,等等。The correspondence between the accuracy requirement and the confidence of the positioning service request can be set in advance according to the requirements, which is not limited in this embodiment of the present application. For example, when setting the accuracy requirements, the content of the accuracy requirements can be directly set to require the confidence level to be above a preset score; or different accuracy requirement levels can be set, and the different accuracy requirement levels correspond to different confidence value ranges. ,and many more.
步骤2120,根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整。Step 2120: Adjust the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final longitude and latitude information.
另外,在本申请实施例中,还可以基于置信度校准各商户的地址。例如,对于某一经纬度信息对应的商户,可以根据以该经纬度作为最终经纬度信息的地址标识的置信度取值大小,考虑是否将商户的地址更改为此经纬度信息对应的地址标识。具体的,可以根据需求预先设置一置信度阈值,当地址标 识及其最终经纬度信息的置信度的取值大于等于该置信度阈值,则可以将相应商户的地址中的地址标识部分更改为此最终经纬度信息对应的地址标识,或者可以提示商户将相应商户的地址中的地址标识部分更改为此最终经纬度信息对应的地址标识,并由商户决定最终是否更改;否则不更改或者不进行提示。In addition, in the embodiment of the present application, the addresses of the merchants can also be calibrated based on the confidence level. For example, for a merchant corresponding to certain latitude and longitude information, based on the value of the confidence value of the address identifier that uses the latitude and longitude as the final latitude and longitude information, consider whether to change the merchant's address to the address identifier corresponding to the latitude and longitude information. Specifically, a confidence threshold can be set in advance according to requirements. When the value of the confidence of the address identifier and its final latitude and longitude information is greater than or equal to the confidence threshold, the address identifier part of the address of the corresponding merchant can be changed to this final The address identifier corresponding to the latitude and longitude information, or the merchant may be prompted to change the address identifier part of the corresponding merchant's address to the address identifier corresponding to the final latitude and longitude information, and the merchant decides whether to change it eventually; otherwise, it is not changed or not prompted.
步骤2130,根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。Step 2130: Update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
在本申请实施例中,在基于置信度对商户的地址进行调整之后,为了避免后续在确定该商户的地址时出现错误,可以根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。具体的,可以将地理知识库中记录的相应商户与其调整前的地址之间对应关系取消或者删除,而将调整后的商户地址与相应商户的对应关系,以及商户地址与道路的关系,记录至地理知识库中。In the embodiment of the present application, after the address of the merchant is adjusted based on the confidence level, in order to avoid subsequent errors in determining the address of the merchant, the address may be updated according to the adjusted merchant address and the relationship between the merchant address and the road. The geographic knowledge base. Specifically, the corresponding relationship between the corresponding business recorded in the geographic knowledge base and the address before adjustment can be cancelled or deleted, and the corresponding relationship between the adjusted business address and the corresponding business, and the relationship between the business address and the road can be recorded to Geographic knowledge base.
例如,假设在地理知识库中记录有商户A所在地址为“长宁区安化路492号”,可以看出此时商户A属于“安化路”,而基于上述的置信度调整后的商户A所在地址为“长宁区安贞路500号”,那么此时则可以将地理知识库中的商户A与“长宁区安化路492号”的对应关系取消或删除,而将商户A地址更新为“长宁区安贞路500号”。For example, assuming that the address of merchant A is "492 Anhua Road, Changning District" recorded in the geographic knowledge base, it can be seen that merchant A belongs to "Anhua Road" at this time, and merchant A adjusted based on the above confidence The address is "500 Anzhen Road, Changning District", then you can cancel or delete the correspondence between Merchant A and "492 Anhua Road, Changning District" in the geographic knowledge base, and update the address of Merchant A to "500 Anzhen Road, Changning District".
在本申请一些实施例中,通过获取用以挖掘地址标识及其经纬度的原始数据;所述原始数据包括兴趣点数据和/或用户原创内容行为数据;获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。从而降低了获取地址标识及其经纬度的人力成本,同时提高了时效性。In some embodiments of the present application, the raw data used to mine the address identifier and its latitude and longitude are obtained; the raw data includes point of interest data and / or user-originated content behavior data; the address identifier in the raw data is obtained, and The latitude and longitude information corresponding to the address identifier; for the address identifier corresponding to the plurality of latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
其次,在本申请另一些实施例中,还可以获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系;基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系;基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。从而进一步提高了地址标识及其经纬度的挖掘效率。Secondly, in other embodiments of the present application, address latitude and longitude data in the original data may also be obtained; the address latitude and longitude data includes address data, latitude and longitude data, and the correspondence between addresses and latitude and longitude; based on the geographic knowledge base, the The address data is structured, and the road name and house number obtained after the structured processing are taken as the address identifier corresponding to the address data; the geographic knowledge base includes a geographic information entity library, and each of the geographic The relationship between the information entities; based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the address data is used as the latitude and longitude information corresponding to the address identifier. Therefore, the mining efficiency of the address identifier and its latitude and longitude is further improved.
再次,在本申请再一些实施例中,还可以基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇;从所述至少一个聚类簇中选定最大簇;对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。并且,以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。从而提高确定的最终经纬度信息的准确度。Again, in some embodiments of the present application, based on the address identifiers corresponding to multiple latitude and longitude information, the multiple latitude and longitude information may be aggregated by using a density clustering algorithm according to the first distance threshold and the first sample threshold. Class to obtain at least one cluster cluster; select the largest cluster from the at least one cluster cluster; average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude corresponding to the address identifier information. In addition, a cluster cluster having the largest amount of longitude and latitude information included in the cluster cluster is used as the largest cluster. Thereby, the accuracy of the determined final latitude and longitude information is improved.
而且,在本申请一些实施例中,还可以获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离;根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。并且,根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最终经纬度信息的初始分值;根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值;基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度,从而可以进一步确定各地址标识及其最终经纬度信息的准确性。Furthermore, in some embodiments of the present application, the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier may also be obtained. A first distance between the two; and a confidence between the address identifier and the final latitude and longitude information according to the first number and the first distance. In addition, an initial score of the address identifier and the final latitude and longitude information is determined according to the first distance, a preset basic score, and a preset distance threshold; the address identifier and the first identifier are determined according to the first number. The penalty loss score of the final latitude and longitude information; based on the preset basic score, the initial score, and the penalty loss score, determining the confidence between the address identifier and the final latitude and longitude information, so that it can be further determined The accuracy of each address identifier and its final latitude and longitude information.
另外,在本申请一些实施例中,当接收到定位服务请求时,根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应;根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整;根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。从而基于置信度将挖掘得到的地址标识与所述最终经纬度信息进行应用,同时提高地理知识库的时效性以及准确性。In addition, in some embodiments of the present application, when a positioning service request is received, the positioning service request is performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. Positioning service response; adjusting the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information; updating the address based on the adjusted merchant address and the relationship between the merchant address and the road The geographic knowledge base. Therefore, the mined address identifier and the final latitude and longitude information are applied based on the confidence, and the timeliness and accuracy of the geographic knowledge base are improved.
对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。For the method embodiments, for the sake of simple description, they are all expressed as a series of action combinations. However, those skilled in the art should know that the embodiments of the present application are not limited by the described sequence of actions, because according to the embodiments of the present application Some steps can be performed in another order or simultaneously. Secondly, a person skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions involved are not necessarily required by the embodiments of the present application.
实施例三Example three
详细介绍本申请实施例提供的一种地址标识及其经纬度的挖掘装置。An address identifier and a latitude and longitude excavation device provided by the embodiments of the present application are described in detail.
参照图4,示出了本申请实施例中一种地址标识及其经纬度的挖掘装置的结构示意图。具体包括:Referring to FIG. 4, a schematic structural diagram of a mining device for an address identifier and its latitude and longitude in an embodiment of the present application is shown. These include:
原始数据获取模块310,用于获取用以挖掘地址标识及其经纬度的原始数据;A raw data obtaining module 310, configured to obtain raw data used to mine an address identifier and its latitude and longitude;
数据挖掘模块320,用于获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;A data mining module 320, configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier;
最终经纬度信息确认模块330,用于针对对应多个经纬度信息的地址标识,通过密度聚类算法确定所述地址标识对应的最终经纬度信息。The final latitude and longitude information confirmation module 330 is configured to determine the final latitude and longitude information corresponding to the address identifier by using a density clustering algorithm for address identifiers corresponding to multiple latitude and longitude information.
可选地,在本申请实施例中,所述原始数据包括兴趣点数据和/或用户原创内容行为数据。Optionally, in the embodiment of the present application, the original data includes point of interest data and / or user original content behavior data.
在本申请实施例中,通过获取用以挖掘地址标识及其经纬度的原始数据;获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;针对对应多个经纬度信息的地址标识,通过密度聚类算法确定所述地址标识对应的最终经纬度信息。从而降低了获取地址标识及其经纬度的人力成本,同时提高了时效性。In the embodiment of the present application, the original data used to mine the address identifier and its latitude and longitude are obtained; the address identifier in the original data and the latitude and longitude information corresponding to the address identifier are obtained; and for an address corresponding to multiple latitude and longitude information Identification, and the final latitude and longitude information corresponding to the address identification is determined by a density clustering algorithm. Therefore, the labor cost of obtaining the address identifier and its latitude and longitude is reduced, and the timeliness is improved.
实施例四Embodiment 4
下面结合图5详细介绍本申请另一实施例提供的一种地址标识及其经纬度的挖掘装置。An address identifier and a latitude and longitude mining device provided by another embodiment of the present application are described in detail below with reference to FIG. 5.
参照图5,示出了本申请实施例中一种地址标识及其经纬度的挖掘装置的结构示意图。具体包括:Referring to FIG. 5, a structural schematic diagram of an address identifier and a mining device for its latitude and longitude according to an embodiment of the present application is shown. These include:
原始数据获取模块410,用于获取用以挖掘地址标识及其经纬度的原始数据;A raw data obtaining module 410, configured to obtain raw data used to mine an address identifier and its latitude and longitude;
数据挖掘模块420,用于获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息。A data mining module 420 is configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier.
其中,所述数据挖掘模块420,进一步可以包括:The data mining module 420 may further include:
数据挖掘子模块421,用于获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系;A data mining submodule 421, configured to obtain address latitude and longitude data in the original data; the address latitude and longitude data includes address data, latitude and longitude data, and a correspondence relationship between addresses and latitude and longitude;
结构化处理子模块422,用于基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系;以及,A structured processing sub-module 422, configured to structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data; The geographic knowledge base includes a library of geographic information entities and the relationships between each of said geographic information entities; and,
经纬度信息确认子模块423,用于基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。The latitude and longitude information confirmation submodule 423 is configured to use the latitude and longitude data corresponding to the address data as the latitude and longitude information corresponding to the address identifier based on the correspondence between the address and the latitude and longitude.
参照图5,所述装置还包括最终经纬度信息确认模块430,用于针对对应 多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。Referring to FIG. 5, the device further includes a final latitude and longitude information confirmation module 430, configured to determine a final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to the plurality of latitude and longitude information.
其中,所述最终经纬度信息确认模块430,进一步可以包括:The final latitude and longitude information confirmation module 430 may further include:
密度聚类子模块431,用于基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇;The density clustering sub-module 431 is configured to cluster the plurality of latitude and longitude information based on the first distance threshold and the first sample threshold based on the address identifier corresponding to the plurality of latitude and longitude information to obtain at least A cluster
最大簇确定子模块432,用于从所述至少一个聚类簇中选定最大簇。The maximum cluster determination sub-module 432 is configured to select a maximum cluster from the at least one cluster cluster.
可选地,在本申请实施例中,所述最大簇确定子模块432,还用于以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。Optionally, in the embodiment of the present application, the maximum cluster determination sub-module 432 is further configured to use a cluster cluster having the largest amount of longitude and latitude information contained in the cluster cluster as the largest cluster.
所述最终经纬度信息确认模块430,进一步还可以包括最终经纬度信息获取子模块433,用于对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息;The final latitude and longitude information confirmation module 430 may further include a final longitude and latitude information acquisition submodule 433, which is configured to average the longitude and latitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final longitude and latitude corresponding to the address identifier. information;
置信度数据获取模块440,用于获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离;以及,The confidence data obtaining module 440 is configured to obtain a first quantity of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first number between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier. A distance; and,
置信度确定模块450,用于根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。The confidence determination module 450 is configured to determine a confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance.
可选地,在本申请实施例中,所述置信度确定模块450,进一步可以包括:Optionally, in the embodiment of the present application, the confidence determination module 450 may further include:
初始分值确定子模块,用于根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最终经纬度信息的初始分值;An initial score determination submodule, configured to determine an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold;
损失分值确定子模块,用于根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值;A loss score determination submodule, configured to determine a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity;
置信度确定子模块,用于基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度。The confidence determination sub-module is configured to determine the confidence of the address identifier and the final latitude and longitude information based on the preset basic score, the initial score, and the penalty loss score.
可选地,在本申请实施例中,所述装置还可以包括:Optionally, in the embodiment of the present application, the device may further include:
定位服务响应模块,用于当接收到定位服务请求时,根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应。A positioning service response module is configured to, when receiving a positioning service request, perform a positioning service response to the positioning service request according to the accuracy requirements of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. .
商户地址调整模块,用于根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整。A merchant address adjustment module is configured to adjust an address of a merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information.
地理知识库更新模块,用于根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。The geographic knowledge base updating module is configured to update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
本申请实施例还提供了一种电子设备,包括:An embodiment of the present application further provides an electronic device, including:
处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述的地址标识及其经纬度的挖掘方法。A processor, a memory, and a computer program stored on the memory and executable on the processor; when the processor executes the computer program, the address identification and the latitude and longitude mining method described above are implemented.
本申请实施例还提供了一种可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如上述的地址标识及其经纬度的挖掘方法。An embodiment of the present application further provides a readable storage medium, and when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the address identification and the latitude and longitude mining method described above.
在本申请实施例中,通过获取用以挖掘地址标识及其经纬度的原始数据;所述原始数据包括兴趣点数据和/或用户原创内容行为数据;获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息,从而降低了获取地址标识及其经纬度的人力成本,同时提高了时效性。In the embodiment of the present application, the raw data used to mine the address identifier and its latitude and longitude are obtained; the raw data includes interest point data and / or user-originated content behavior data; the address identifier in the raw data is obtained, and The latitude and longitude information corresponding to the address identifier; for the address identifier corresponding to multiple latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm, thereby reducing the labor cost of obtaining the address identifier and its latitude and longitude, and simultaneously improving the Timeliness.
其次,在本申请另一些实施例中,还可以获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系;基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系;基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。从而进一步提高了地址标识及其经纬度的挖掘效率。Secondly, in other embodiments of the present application, address latitude and longitude data in the original data may also be obtained; the address latitude and longitude data includes address data, latitude and longitude data, and the correspondence between addresses and latitude and longitude; based on the geographic knowledge base, the The address data is structured, and the road name and house number obtained after the structured processing are taken as the address identifier corresponding to the address data; the geographic knowledge base includes a geographic information entity library, and each of the geographic The relationship between the information entities; based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the address data is used as the latitude and longitude information corresponding to the address identifier. Therefore, the mining efficiency of the address identifier and its latitude and longitude is further improved.
再次,在本申请又一些实施例中,还可以基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇;从所述至少一个聚类簇中选定最大簇;对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。并且,以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。从而提高确定的最终经纬度信息的准确度。Thirdly, in some embodiments of the present application, based on the address identifier corresponding to multiple latitude and longitude information, the multiple latitude and longitude information may be clustered by using a density clustering algorithm according to the first distance threshold and the first sample threshold. Class to obtain at least one cluster cluster; select the largest cluster from the at least one cluster cluster; average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final latitude and longitude corresponding to the address identifier information. In addition, a cluster cluster having the largest amount of longitude and latitude information included in the cluster cluster is used as the largest cluster. Thereby, the accuracy of the determined final latitude and longitude information is improved.
而且,在本申请一些实施例中,还可以获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离;根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。并且,根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最 终经纬度信息的初始分值;根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值;基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度,从而可以进一步确定各地址标识及其最终经纬度信息的准确性。Furthermore, in some embodiments of the present application, the first quantity of latitude and longitude information contained in the largest cluster corresponding to the address identifier, and the final latitude and longitude information corresponding to the address identifier and the road corresponding to the address identifier may also be obtained. A first distance between the two; and a confidence between the address identifier and the final latitude and longitude information according to the first number and the first distance. In addition, an initial score of the address identifier and the final latitude and longitude information is determined according to the first distance, a preset basic score, and a preset distance threshold; the address identifier and the first identifier are determined according to the first number. The penalty loss score of the final latitude and longitude information; based on the preset basic score, the initial score, and the penalty loss score, determining the confidence between the address identifier and the final latitude and longitude information, so that it can be further determined The accuracy of each address identifier and its final latitude and longitude information.
另外,在本申请一些实施例中,当接收到定位服务请求时,根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应;根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整;根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。从而基于置信度将挖掘得到的地址标识与所述最终经纬度信息进行应用,同时提高地理知识库的时效性以及准确性。In addition, in some embodiments of the present application, when a positioning service request is received, the positioning service request is performed according to the accuracy requirement of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. Positioning service response; adjusting the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information; updating the address based on the adjusted merchant address and the relationship between the merchant address and the road The geographic knowledge base. Therefore, the mined address identifier and the final latitude and longitude information are applied based on the confidence, and the timeliness and accuracy of the geographic knowledge base are improved.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。As for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple. For the related parts, refer to the description of the method embodiment.
在此提供的算法和显示不与任何特定计算机、虚拟***或者其它设备固有相关。各种通用***也可以与基于在此的示教一起使用。根据上面的描述,构造这类***所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。The algorithms and displays provided here are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. From the above description, the structure required to construct such a system is obvious. In addition, this application is not directed to any particular programming language. It should be understood that various programming languages can be used to implement the content of the application described herein, and the description of the specific language above is to disclose the best implementation of the application.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided here, numerous specific details are explained. However, it can be understood that the embodiments of the present application can be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of the specification.
类似地,应当理解,为了精简本申请并帮助理解各个发明方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,申请方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。Similarly, it should be understood that, in order to streamline this application and help understand one or more of the various aspects of the invention, in the above description of the exemplary embodiments of the application, various features of the application are sometimes grouped together into a single embodiment, Figure, or description of it. However, this disclosed method should not be construed as reflecting the intention that the claimed application claims more features than are expressly recited in each claim. Rather, as reflected in the following claims, the application aspect lies in less than all features of the single embodiment disclosed previously. Thus, the claims that follow a specific embodiment are hereby explicitly incorporated into this specific embodiment, where each claim itself serves as a separate embodiment of the present application.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可 以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and set in one or more devices different from the embodiment. The modules or units or components in the embodiment may be combined into one module or unit or component, and furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Except for such features and / or processes or units, which are mutually exclusive, all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any methods so disclosed may be employed in any combination or All processes or units of the equipment are combined. Each feature disclosed in this specification (including the accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。In addition, those skilled in the art can understand that although some embodiments described herein include certain features included in other embodiments but not other features, the combination of features of different embodiments is meant to be within the scope of the present application. Within and form different embodiments. For example, in the following claims, any one of the claimed embodiments can be used in any combination.
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的地址标识及其经纬度的挖掘设备中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that, in practice, a microprocessor or a digital signal processor (DSP) may be used to implement some or all of some or all of the components in the address identification and its latitude and longitude mining equipment according to the embodiments of the present application. Features. The application may also be implemented as a device or device program (e.g., a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program that implements the present application may be stored on a computer-readable medium or may have the form of one or more signals. Such signals can be downloaded from an Internet website, provided on a carrier signal, or provided in any other form.
例如,图6示出了可以实现根据本申请的方法的电子设备,该电子设备传统上包括处理器610和以存储器620形式的计算机程序产品或者计算机可读存储介质。所述电子设备可以为PC机、移动终端、个人数字助理、平板电脑等。所述电子设备为计算处理设备。所述存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有用于执行上述方法中的任何方法步骤的程序代码的存储空间6201。例如,用于程序代码的存储空间6201可以包括分别用于实现上面的方法中的各种步骤的程序代码6202。所述计算机程序产品或者计算机可读存储介质,其上存储有计算机程序的程序代码,该程序代码被处理器610执行时实现如本申请实施例一和实施例二所述的地址标识及其经纬度挖掘的方法。这些程序代码可以从一个或者多个计算机程序产品中读出或 者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品或者计算机可读存储介质通常为如参考图7所述的便携式或者固定存储单元。该存储单元用于保持或者携带实现根据本申请的方法的程序代码,该存储单元可以具有与图6的电子设备中的存储器620类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码6202’,即可以由例如诸如610之类的处理器读取的代码,这些代码当电子设备运行时,导致该电子设备执行上面所描述的方法中的各个步骤。For example, FIG. 6 illustrates an electronic device that can implement the method according to the present application, which traditionally includes a processor 610 and a computer program product in the form of a memory 620 or a computer-readable storage medium. The electronic device may be a PC, a mobile terminal, a personal digital assistant, a tablet computer, or the like. The electronic device is a computing processing device. The memory 620 may be an electronic memory such as a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 6201 for program code for performing any of the method steps in the above method. For example, the storage space 6201 for program code may include program code 6202 for implementing various steps in the above method, respectively. The computer program product or computer-readable storage medium stores program code of a computer program, and when the program code is executed by the processor 610, the address identifier and the latitude and longitude described in Embodiments 1 and 2 of the present application are implemented Method of mining. These program codes may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product or computer-readable storage medium is typically a portable or fixed storage unit as described with reference to FIG. 7. The storage unit is configured to hold or carry program code that implements the method according to the present application, and the storage unit may have a storage segment, a storage space, and the like arranged similar to the memory 620 in the electronic device of FIG. 6. The program code may be compressed, for example, in a suitable form. Generally, the storage unit includes computer-readable code 6202 ', that is, a code that can be read by, for example, a processor such as 610, and these codes, when the electronic device is running, cause the electronic device to perform the steps in the method described above. .
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本公开的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。As used herein, "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Also, please note that the word examples "in one embodiment" herein do not necessarily refer to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided here, numerous specific details are explained. However, it can be understood that the embodiments of the present application can be practiced without these specific details. In some instances, well-known methods, structures, and techniques have not been shown in detail so as not to obscure the understanding of the specification.
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claim listing several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not imply any order. These words can be interpreted as names.
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一 个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate the present application and do not limit the present application, and those skilled in the art can design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claim listing several devices, several of these devices may be embodied by the same hardware item. The use of the words first, second, and third does not imply any order. These words can be interpreted as names.

Claims (19)

  1. 一种地址标识及其经纬度的挖掘方法,包括:An address identifier and a method for mining its latitude and longitude include:
    获取用以挖掘地址标识及其经纬度的原始数据;Obtain the raw data used to mine the address identifier and its latitude and longitude;
    获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;Obtaining an address identifier in the original data, and latitude and longitude information corresponding to the address identifier;
    针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。For the address identifier corresponding to multiple latitude and longitude information, the final latitude and longitude information corresponding to the address identifier is determined by a clustering algorithm.
  2. 根据权利要求1所述的方法,所述获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息的步骤,包括:The method according to claim 1, wherein the step of obtaining an address identifier in the original data and latitude and longitude information corresponding to the address identifier comprises:
    获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系;Obtaining address latitude and longitude data in the original data; the address latitude and longitude data includes address data, latitude and longitude data, and a correspondence relationship between addresses and latitude and longitude;
    基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系;Structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data; the geographic knowledge base includes a geographic information entity library , And the relationship between each said geographic information entity;
    基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。Based on the correspondence between the address and the latitude and longitude, the latitude and longitude data corresponding to the address data is used as the latitude and longitude information corresponding to the address identifier.
  3. 根据权利要求1所述的方法,所述针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息的步骤,包括:The method according to claim 1, wherein the step of determining a final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for an address identifier corresponding to a plurality of latitude and longitude information comprises:
    基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇;Based on the address identifiers corresponding to multiple latitude and longitude information, using the density clustering algorithm to cluster the multiple latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one clustering cluster;
    从所述至少一个聚类簇中选定最大簇;Selecting a largest cluster from the at least one cluster cluster;
    对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。The longitude and latitude information in the largest cluster is averaged to obtain the averaged longitude and latitude value as the final longitude and latitude information corresponding to the address identifier.
  4. 根据权利要求3所述的方法,所述从所述至少一个聚类簇中选定最大簇的步骤,包括:The method according to claim 3, wherein the step of selecting the largest cluster from the at least one cluster cluster comprises:
    以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。A cluster cluster having the largest amount of longitude and latitude information contained in the cluster cluster is used as the largest cluster.
  5. 根据权利要求3所述的方法,在所述对于对应多个经纬度信息的地址标识,基于密度聚类算法确定所述地址标识对应的最终经纬度信息的步骤之后,还包括:The method according to claim 3, after the step of determining the final latitude and longitude information corresponding to the address identifier based on a density clustering algorithm for the address identifier corresponding to multiple latitude and longitude information, further comprising:
    获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及 所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离;Obtaining a first quantity of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first distance between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier;
    根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。According to the first number and the first distance, a confidence degree between the address identifier and the final latitude and longitude information is determined.
  6. 根据权利要求5所述的方法,所述根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度的步骤,包括:The method according to claim 5, wherein the step of determining the confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance comprises:
    根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最终经纬度信息的初始分值;Determining an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold;
    根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值;Determining a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity;
    基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度。Based on the preset basic score, the initial score, and the penalty loss score, a confidence level between the address identifier and the final latitude and longitude information is determined.
  7. 根据权利要求5所述的方法,在所述根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度的步骤之后,还包括:The method according to claim 5, after the step of determining the confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance, further comprising:
    当接收到定位服务请求时,根据所述定位服务请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应;When receiving a positioning service request, performing a positioning service response to the positioning service request according to the accuracy requirements of the positioning service request and the confidence of the address identifier and the final latitude and longitude information;
    根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整;Adjusting the address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final longitude and latitude information;
    根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。Update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
  8. 根据权利要求1-7任一项所述的方法,所述原始数据包括兴趣点数据和/或用户原创内容行为数据。The method according to any one of claims 1-7, wherein the original data includes point of interest data and / or user-originated content behavior data.
  9. 一种地址标识及其经纬度的挖掘装置,包括:An address identification and its latitude and longitude excavation device include:
    原始数据获取模块,用于获取用以挖掘地址标识及其经纬度的原始数据;Raw data acquisition module, used to obtain raw data used to mine the address identifier and its latitude and longitude;
    数据挖掘模块,用于获取所述原始数据中的地址标识,以及与所述地址标识对应的经纬度信息;A data mining module, configured to obtain an address identifier in the original data and longitude and latitude information corresponding to the address identifier;
    最终经纬度信息确认模块,用于针对对应多个经纬度信息的地址标识,通过聚类算法确定所述地址标识对应的最终经纬度信息。The final latitude and longitude information confirmation module is configured to determine the final latitude and longitude information corresponding to the address identifier by using a clustering algorithm for address identifiers corresponding to multiple latitude and longitude information.
  10. 根据权利要求9所述的装置,所述数据挖掘模块,包括:The device according to claim 9, the data mining module, comprising:
    数据挖掘子模块,用于获取所述原始数据中的地址经纬度数据;所述地址经纬度数据包括地址数据,经纬度数据,以及地址与经纬度的对应关系;A data mining submodule, configured to obtain address latitude and longitude data in the original data; the address latitude and longitude data includes address data, latitude and longitude data, and a correspondence relationship between addresses and latitude and longitude;
    结构化处理子模块,用于基于地理知识库,对所述地址数据进行结构化处理,并取结构化处理后得到的道路名称以及门牌号作为与所述地址数据对应的地址标识;所述地理知识库中包括地理信息实体库,以及各所述地理信息实体之间的关系;A structured processing sub-module, configured to structure the address data based on the geographic knowledge base, and take the road name and house number obtained after the structured processing as the address identifier corresponding to the address data; The knowledge base includes a library of geographic information entities, and relationships between each of the geographic information entities;
    经纬度信息确认子模块,用于基于所述地址与经纬度的对应关系,将与所述地址数据对应的经纬度数据作为与所述地址标识对应的经纬度信息。The latitude and longitude information confirmation submodule is configured to use the latitude and longitude data corresponding to the address data as the latitude and longitude information corresponding to the address identifier based on the correspondence between the address and the latitude and longitude.
  11. 根据权利要求9所述的装置,所述最终经纬度信息确认模块,包括:The device according to claim 9, wherein the final latitude and longitude information confirmation module comprises:
    密度聚类子模块,用于基于对应多个经纬度信息的地址标识,则根据第一距离阈值以及第一样本阈值,采用密度聚类算法对所述多个经纬度信息进行聚类,得到至少一个聚类簇;Density clustering sub-module, which is based on the address identification corresponding to multiple latitude and longitude information, and then uses the density clustering algorithm to cluster the multiple latitude and longitude information according to the first distance threshold and the first sample threshold to obtain at least one Clustering
    最大簇确定子模块,用于从所述至少一个聚类簇中选定最大簇;A maximum cluster determination submodule, configured to select a maximum cluster from the at least one cluster cluster;
    最终经纬度信息获取子模块,用于对所述最大簇中的经纬度信息取平均,得到平均后的经纬度数值作为所述地址标识对应的最终经纬度信息。The final latitude and longitude information acquisition submodule is configured to average the latitude and longitude information in the largest cluster, and obtain the averaged latitude and longitude value as the final longitude and latitude information corresponding to the address identifier.
  12. 根据权利要求11所述的装置,所述最大簇确定子模块,还用于以所述聚类簇中包含的经纬度信息数量最多的一个聚类簇作为所述最大簇。The device according to claim 11, the maximum cluster determining sub-module is further configured to use a cluster cluster having a largest amount of longitude and latitude information contained in the cluster cluster as the maximum cluster.
  13. 根据权利要求11所述的装置,还包括:The apparatus according to claim 11, further comprising:
    置信度数据获取模块,用于获取所述地址标识对应的最大簇中包含的经纬度信息的第一数量,以及所述地址标识对应的最终经纬度信息与所述地址标识对应的道路之间的第一距离;The confidence data obtaining module is configured to obtain a first quantity of latitude and longitude information contained in a largest cluster corresponding to the address identifier, and a first between the final latitude and longitude information corresponding to the address identifier and a road corresponding to the address identifier. distance;
    置信度确定模块,用于根据所述第一数量以及所述第一距离,确定所述地址标识与所述最终经纬度信息的置信度。The confidence determination module is configured to determine a confidence between the address identifier and the final latitude and longitude information according to the first quantity and the first distance.
  14. 根据权利要求13所述的装置,所述置信度确定模块,包括:The apparatus according to claim 13, the confidence determination module comprises:
    初始分值确定子模块,用于根据所述第一距离以及预设基础分值和预设距离阈值,确定所述地址标识与所述最终经纬度信息的初始分值;An initial score determination submodule, configured to determine an initial score of the address identifier and the final latitude and longitude information according to the first distance, a preset basic score, and a preset distance threshold;
    损失分值确定子模块,用于根据所述第一数量确定所述地址标识与所述最终经纬度信息的惩罚损失分值;A loss score determination submodule, configured to determine a penalty loss score of the address identifier and the final latitude and longitude information according to the first quantity;
    置信度确定子模块,用于基于所述预设基础分值、所述初始分值以及所述惩罚损失分值,确定所述地址标识与所述最终经纬度信息的置信度。The confidence determination sub-module is configured to determine the confidence of the address identifier and the final latitude and longitude information based on the preset basic score, the initial score, and the penalty loss score.
  15. 根据权利要求13所述的装置,还包括:The apparatus according to claim 13, further comprising:
    定位服务响应模块,用于当接收到定位服务请求时,根据所述定位服务 请求的精度要求,以及所述地址标识与所述最终经纬度信息的置信度,对所述定位服务请求进行定位服务响应;A positioning service response module is configured to, when receiving a positioning service request, perform a positioning service response to the positioning service request according to the accuracy requirements of the positioning service request and the confidence of the address identifier and the final latitude and longitude information. ;
    商户地址调整模块,用于根据所述地址标识与所述最终经纬度信息的置信度,对所述最终经纬度信息对应商户的地址进行调整;A merchant address adjustment module, configured to adjust an address of the merchant corresponding to the final latitude and longitude information according to the confidence of the address identifier and the final latitude and longitude information;
    地理知识库更新模块,用于根据调整后的商户地址以及所述商户地址与道路的关系,更新所述地理知识库。The geographic knowledge base updating module is configured to update the geographic knowledge base according to the adjusted business address and the relationship between the business address and the road.
  16. 根据权利要求9-15任一项所述的装置,所述原始数据包括兴趣点数据和/或用户原创内容行为数据。The device according to any one of claims 9-15, the raw data includes point of interest data and / or user-originated content behavior data.
  17. 一种电子设备,包括:An electronic device includes:
    处理器、存储器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-8中的任一项所述的地址标识及其经纬度的挖掘方法。A processor, a memory, and a computer program stored on the memory and executable on the processor, the processor realizing the address according to any one of claims 1-8 when the processor executes the computer program Identification and its latitude and longitude mining methods.
  18. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备上运行时,导致所述电子设备执行根据权利要求1-8中的任一项所述的地址标识及其经纬度的挖掘方法。A computer program comprising computer-readable code, when the computer-readable code is run on an electronic device, causing the electronic device to execute the address identification according to any one of claims 1-8 and its latitude and longitude Mining method.
  19. 一种可读存储介质,当所述存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如权利要求1-8中的任一项所述的地址标识及其经纬度的挖掘方法。A readable storage medium, when instructions in the storage medium are executed by a processor of an electronic device, enable the electronic device to perform the address identification according to any one of claims 1-8 and mining of its latitude and longitude method.
PCT/CN2019/095106 2018-09-12 2019-07-08 Address identifier and longitude and latitude thereof mining WO2020052338A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811064084.X 2018-09-12
CN201811064084.XA CN109376761B (en) 2018-09-12 2018-09-12 Address identification and longitude and latitude mining method and device thereof

Publications (1)

Publication Number Publication Date
WO2020052338A1 true WO2020052338A1 (en) 2020-03-19

Family

ID=65404481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/095106 WO2020052338A1 (en) 2018-09-12 2019-07-08 Address identifier and longitude and latitude thereof mining

Country Status (2)

Country Link
CN (1) CN109376761B (en)
WO (1) WO2020052338A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563630A (en) * 2020-05-11 2020-08-21 圆通速递有限公司 Logistics service network node layout method and system based on address longitude and latitude clustering
CN112380906A (en) * 2020-10-19 2021-02-19 上汽通用五菱汽车股份有限公司 Method for determining user address based on driving data
CN113570107A (en) * 2021-06-08 2021-10-29 众能联合数字技术有限公司 Project address positioning method for project rental scene
CN113627184A (en) * 2020-05-08 2021-11-09 北京京东振世信息技术有限公司 Data processing method and device
CN114170455A (en) * 2021-11-18 2022-03-11 北京锐安科技有限公司 Object gathering method and device, electronic equipment and storage medium
CN116095601A (en) * 2022-05-30 2023-05-09 荣耀终端有限公司 Base station cell feature library updating method and related device

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376761B (en) * 2018-09-12 2021-01-22 北京三快在线科技有限公司 Address identification and longitude and latitude mining method and device thereof
CN110348679A (en) * 2019-06-03 2019-10-18 菜鸟智能物流控股有限公司 Logistics processing method and device, electronic equipment and storage medium
CN110648043A (en) * 2019-07-26 2020-01-03 深圳壹账通智能科技有限公司 Analysis method and device based on address information, electronic equipment and storage medium
CN112308280A (en) * 2019-08-02 2021-02-02 菜鸟智能物流控股有限公司 Logistics scheduling management method and device, electronic equipment and storage medium
CN112184102A (en) * 2020-09-14 2021-01-05 深圳市睿搏科技集团有限公司 Method for automatically distributing logistics in end process of cross-border e-commerce orders
CN113568951A (en) * 2021-07-30 2021-10-29 拉扎斯网络科技(上海)有限公司 Data mining and processing method and device, storage medium and electronic equipment
CN113704640B (en) * 2021-08-09 2023-04-07 北京三快在线科技有限公司 Method and device for predicting user resident address, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441088A (en) * 2007-11-23 2009-05-27 吴玉先 Positioning method and positioning apparatus
CN104050196A (en) * 2013-03-15 2014-09-17 阿里巴巴集团控股有限公司 Point of interest (POI) data redundancy detection method and device
US20160092456A1 (en) * 2014-09-25 2016-03-31 United States Postal Service Methods and systems for creating and using a location identification grid
CN107547633A (en) * 2017-07-27 2018-01-05 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of a kind of resident point of user
CN108271120A (en) * 2017-12-22 2018-07-10 阿里巴巴集团控股有限公司 Target area and the determining method, apparatus and equipment of target user
CN109376761A (en) * 2018-09-12 2019-02-22 北京三快在线科技有限公司 The method for digging and device of a kind of address mark and its longitude and latitude

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8401771B2 (en) * 2008-07-22 2013-03-19 Microsoft Corporation Discovering points of interest from users map annotations
US8774830B2 (en) * 2011-06-24 2014-07-08 Zos Communications, Llc Training pattern recognition systems for determining user device locations
CN104077308B (en) * 2013-03-28 2018-02-13 阿里巴巴集团控股有限公司 A kind of logistics service range determining method and device
CN104793224B (en) * 2014-01-21 2017-06-20 ***通信集团设计院有限公司 A kind of GPS location method for correcting error and device
CN104572955B (en) * 2014-12-29 2016-08-24 北京奇虎科技有限公司 A kind of system and method determining POI title based on cluster
CN106534392B (en) * 2015-09-10 2019-12-06 阿里巴巴集团控股有限公司 Positioning information acquisition method, positioning method and device
CN107622061A (en) * 2016-07-13 2018-01-23 阿里巴巴集团控股有限公司 A kind of method, apparatus and system for determining address uniqueness

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441088A (en) * 2007-11-23 2009-05-27 吴玉先 Positioning method and positioning apparatus
CN104050196A (en) * 2013-03-15 2014-09-17 阿里巴巴集团控股有限公司 Point of interest (POI) data redundancy detection method and device
US20160092456A1 (en) * 2014-09-25 2016-03-31 United States Postal Service Methods and systems for creating and using a location identification grid
CN107547633A (en) * 2017-07-27 2018-01-05 腾讯科技(深圳)有限公司 Processing method, device and the storage medium of a kind of resident point of user
CN108271120A (en) * 2017-12-22 2018-07-10 阿里巴巴集团控股有限公司 Target area and the determining method, apparatus and equipment of target user
CN109376761A (en) * 2018-09-12 2019-02-22 北京三快在线科技有限公司 The method for digging and device of a kind of address mark and its longitude and latitude

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627184A (en) * 2020-05-08 2021-11-09 北京京东振世信息技术有限公司 Data processing method and device
CN113627184B (en) * 2020-05-08 2023-09-26 北京京东振世信息技术有限公司 Data processing method and device
CN111563630A (en) * 2020-05-11 2020-08-21 圆通速递有限公司 Logistics service network node layout method and system based on address longitude and latitude clustering
CN112380906A (en) * 2020-10-19 2021-02-19 上汽通用五菱汽车股份有限公司 Method for determining user address based on driving data
CN112380906B (en) * 2020-10-19 2024-05-31 上汽通用五菱汽车股份有限公司 Method for determining user address based on driving data
CN113570107A (en) * 2021-06-08 2021-10-29 众能联合数字技术有限公司 Project address positioning method for project rental scene
CN114170455A (en) * 2021-11-18 2022-03-11 北京锐安科技有限公司 Object gathering method and device, electronic equipment and storage medium
CN116095601A (en) * 2022-05-30 2023-05-09 荣耀终端有限公司 Base station cell feature library updating method and related device
CN116095601B (en) * 2022-05-30 2023-10-20 荣耀终端有限公司 Base station cell feature library updating method and related device

Also Published As

Publication number Publication date
CN109376761A (en) 2019-02-22
CN109376761B (en) 2021-01-22

Similar Documents

Publication Publication Date Title
WO2020052338A1 (en) Address identifier and longitude and latitude thereof mining
US11553302B2 (en) Labeling a significant location based on contextual data
US11698261B2 (en) Method, apparatus, computer device and storage medium for determining POI alias
US9646318B2 (en) Updating point of interest data using georeferenced transaction data
CA2974452C (en) Systems and methods for providing information for an on-demand service
US11788858B2 (en) Labeling a significant location based on contextual data
US8996523B1 (en) Forming quality street addresses from multiple providers
CN109478184B (en) Identifying, processing, and displaying clusters of data points
US11861516B2 (en) Methods and system for associating locations with annotations
CN110110244B (en) Interest point recommendation method integrating multi-source information
CN109387215B (en) Route recommendation method and device
US10970318B2 (en) Active change detection for geospatial entities using trend analysis
JP2016540957A (en) Confirm delivery location using wireless fingerprinting
CN112861972B (en) Site selection method and device for exhibition area, computer equipment and medium
CN109492066B (en) Method, device, equipment and storage medium for determining branch names of points of interest
CN110309433B (en) Data processing method and device and server
CN110633726A (en) Room source identification method and device, storage medium and electronic equipment
US9449110B2 (en) Geotiles for finding relevant results from a geographically distributed set
US20190005574A1 (en) System and method for matching a service provider to a service requestor
US8914357B1 (en) Mapping keywords to geographic features
CN111737374B (en) Position coordinate determination method, device, electronic equipment and storage medium
CN110619086B (en) Method and apparatus for processing information
CN116522953A (en) Position description method and device based on semantic understanding and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19860254

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19860254

Country of ref document: EP

Kind code of ref document: A1