CN111460054B - Address data processing method and device, equipment and storage medium - Google Patents

Address data processing method and device, equipment and storage medium Download PDF

Info

Publication number
CN111460054B
CN111460054B CN201910053685.9A CN201910053685A CN111460054B CN 111460054 B CN111460054 B CN 111460054B CN 201910053685 A CN201910053685 A CN 201910053685A CN 111460054 B CN111460054 B CN 111460054B
Authority
CN
China
Prior art keywords
address
processed
reference address
fragments
index library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910053685.9A
Other languages
Chinese (zh)
Other versions
CN111460054A (en
Inventor
郑华飞
刘楚
谢朋峻
李林琳
司罗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910053685.9A priority Critical patent/CN111460054B/en
Publication of CN111460054A publication Critical patent/CN111460054A/en
Application granted granted Critical
Publication of CN111460054B publication Critical patent/CN111460054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for processing address data, wherein the method comprises the following steps: acquiring a plurality of address fragments of an address to be processed; matching and searching in an address index library according to the address fragments to obtain a reference address; and obtaining the standard address of the address to be processed according to the reference address. The embodiment of the invention can realize the standardized processing of the multi-police address data.

Description

Address data processing method and device, equipment and storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to a method and apparatus for processing address data, a device, and a storage medium.
Background
Typically, police information systems contain multiple police address data, such as a report address, collected floating population address, traffic accident address, etc. At present, different police addresses have characteristics in data format and data storage.
First, address data of different police are respectively stored in different subsystem databases, so that the data have different data formats. For example, the "report address" is relatively compact and emphasizes the azimuth of the interest point (Point of Interest, POI) where the report address is located, such as "in three big pharmacies of North China of the family", "four-net gate of Litsea of Korea of bronze gong bay"; as another example, the "traffic accident address" emphasizes road features such as the intersection of the pool forward street and Wen Wenglu on the traffic accident address ". Secondly, information is often lost and expression is not standard when different police addresses are input.
Therefore, the current multi-alarm address is various in form and independent in storage, and most of subsystems are mutually isolated, so that the problems of inconvenient use and maintenance of multi-alarm address data exist.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a device and a storage medium for processing address data, which are used for solving the technical problem of non-standard addresses of multiple types.
In a first aspect, the present invention provides an address data processing method, the method comprising:
acquiring a plurality of address fragments of an address to be processed;
matching and searching in an address index library according to the address fragments to obtain a reference address;
and obtaining the standard address of the address to be processed according to the reference address.
In a second aspect, the present invention also provides an address data processing apparatus, the apparatus comprising:
the fragment acquisition module is used for acquiring a plurality of address fragments of the address to be processed;
the address retrieval module is used for carrying out matching retrieval in an address index library according to the plurality of address fragments to obtain a reference address;
and the standardized processing module is used for obtaining the standard address of the address to be processed according to the reference address.
In a third aspect, the present invention also provides a computing device comprising:
a memory for storing a program;
and the processor is used for running the program stored in the memory so as to execute the address data processing method.
In a fourth aspect, the present invention also provides a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the address data processing method.
The embodiment of the invention performs standardized processing on the multi-police address data so as to facilitate the use of the multi-police address data and the unified management and maintenance of the multi-police address data, thereby further promoting the construction of standard addresses.
Drawings
FIG. 1 shows a flow diagram of a method of address data processing according to one embodiment of the invention;
FIG. 2 shows a flow diagram of a method of address data processing according to another embodiment of the invention;
FIG. 3 is a block diagram showing the structure of an address data processing apparatus according to an embodiment of the present invention;
FIG. 4 illustrates a block diagram of an exemplary hardware architecture of a computing device capable of implementing methods according to embodiments of the invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and examples. It should be understood that the detailed description is intended to illustrate the invention, but not to limit the invention. Terms such as first, second, etc. herein are used solely to distinguish one entity (or action) from another entity (or action) without necessarily implying any relationship or order between such entities (or actions); in addition, terms herein such as up, down, left, right, front, back, etc. denote a direction or orientation, but merely denote a relative direction or orientation, not an absolute direction or orientation. Without additional limitations, elements defined by the term "comprising" do not exclude the presence of other elements in a process, method, article, or apparatus that comprises the element.
First, technical terms that may be related to embodiments of the present invention will be briefly explained.
Multiple alert address data: the public security system has a plurality of police address data such as a report address, a traffic address, a floating population collection address and the like.
Points of interest (Point of Interest, POIs), in a geographic information system, a POI may be a house, a shop, a post, or a bus stop, etc.
Address normalization: the address fragment is standardized, for example, the address fragment is "Yi village and village", the standardized result is "Wu Hou Ou Jiejie street and Wash face bridge street 33 Yi villa Hua Xiang" in Cheng Du City, sichuan province.
Address retrieval library: and the address index database is used for inquiring the address database aiming at the input address fragments and recalling other address element information through the index field.
By way of specific examples, the optional specific processes of embodiments of the present invention are described below. It should be noted that, the scheme of the present invention does not depend on a specific algorithm, and in practical application, any known or unknown hardware, software, algorithm, program or any combination thereof may be selected to implement the scheme of the present invention, so long as the essential idea of the scheme of the present invention is adopted, the present invention falls within the protection scope of the present invention.
FIG. 1 shows a flow diagram of a method of address data processing according to one embodiment of the invention. As shown in fig. 1, the method includes:
s101, acquiring a plurality of address fragments of an address to be processed.
It should be noted that the plurality of address fragments are a plurality of portions constituting the address to be processed.
S102, performing matching search in an address index library according to the plurality of address fragments to obtain a reference address.
Wherein the address index library comprises addresses of one or more of the following combinations: address in electronic map, address of online shopping and delivery, standard address in public security system.
The address elements of the addresses in the address index library include various combinations of: province pro, city, district distribution, county town, street name road, street number roadNo, community comm, point of interest POI, house number house no.
S103, obtaining the standard address of the address to be processed according to the reference address.
For example, the address to be processed is "Yi villa", the standard address of the address to be processed is 'Wu Hou Ou in Du City, sichuan province, yi Zhi Jie 33 Yi Zhu Hua Xiang'; the address to be processed is in three big pharmacies of North China in the department of North China, and the standard address of the address to be processed is in three big pharmacies of No. 10 of North China in the street department of Fangdu city, sichuan province, hou Ou parachuting tower; the address to be processed is the intersection of the upper pool forward street and Wen Wenglu, and the standard address of the address to be processed is the intersection of the upper pool forward street and Wen Wenglu in Qinghui sheep district of Du city, sichuan province.
In the embodiment of the invention, because the multi-police address data is various in form and independent in storage, and all subsystems are mutually isolated, the address is subjected to standardized processing, so that the use, unified management and maintenance of the multi-police address data are facilitated, the complementation and mapping of the multi-police address data are enhanced, and the construction of the standard address of the public security system is promoted.
In one embodiment of the present invention, S101 includes:
named entity recognition (Named Entity Recognition, NER) is performed on the addresses to be processed, and the recognized entities are used as address fragments.
For example, the address to be processed is "in three major pharmacies of North China Zhang", and the entity of the address to be processed is: "Kehuarou" and "Zhangsan Dazhu" and "Nei".
In one embodiment of the present invention, S102 includes:
and taking the plurality of address fragments as search parameters, and carrying out matching search in an address index library to obtain a reference address.
The reference address is indexed by adopting an inverted index mode, and one or more address fragments of the address to be processed are included in the reference address.
In one embodiment of the present invention, S101 includes:
and performing word segmentation on the address to be processed, and taking a word segmentation result as an address fragment.
As an example, after the named entity identification is performed on the to-be-processed address, the recall reference address is indexed in an inverted index mode; if the reference address is not recalled by the index, word segmentation is carried out on the entity of the address to be processed, and word matching retrieval is carried out in an address index library according to the word segmentation result to obtain the reference address.
For example, the address to be processed is "Chengdu villa", and the entity of the address to be processed is "Chengdu", "Yi villa"; if the addresses of the two entities are not included in the address index library, the words with finer granularity are required to be searched, namely, the words of the entities are divided into 'Chengdu', 'Yi villa' and 'Hua Xiang'; and indexing in an address index library according to the 'Chengdu', 'Yi villa' and 'Hua Xiang', and obtaining a reference address.
As another example, the addresses to be processed are directly segmented without named entity recognition.
In one embodiment of the present invention, S102 includes:
obtaining a plurality of word groups according to the word segmentation result, wherein each word group comprises at least one word in the word segmentation result; retrieving at least one address matched with each phrase in an address index library, and forming an address set by the at least one address matched with each phrase; and determining a reference address according to a plurality of address sets corresponding to the plurality of phrases.
As one example, in determining the reference address, intersections of a plurality of address sets corresponding to a plurality of phrases are taken as the reference address.
For example, the word segmentation results are "dues", "art villas" and "flower villas", the "dues" and "art villas" are combined into a first phrase, and the "dues" and "flower villas" are combined into a second phrase; according to the first phrase, obtaining A, B and C addresses, and according to the second phrase, obtaining B, C and D addresses; the intersection of A, B and C and B, C and D, i.e., the two addresses of B, C, is taken as the reference address.
In one embodiment of the invention, the number of reference addresses retrieved may be one or more.
If the number of the retrieved reference addresses is one, the address to be processed is processed according to the one retrieved reference address to obtain a standard address of the address to be processed.
If the number of the retrieved reference addresses is a plurality, the method further comprises:
determining at least one evaluation score for each reference address based on the content of each reference address; a target reference address is selected from the plurality of reference addresses based on at least one evaluation score for each reference address. In the embodiment of the invention, if a plurality of reference addresses are searched, the optimal reference address is obtained by evaluating and scoring each reference address, and the address standardization is carried out according to the reference address, so that the address standardization result is more accurate.
If the reference address has an evaluation score, the reference address with the largest evaluation score is selected as the target reference address.
If the reference address has multiple evaluation scores, weighting value calculation is carried out on the multiple evaluation scores to obtain a total evaluation score of the reference address, and the reference address with the maximum total evaluation score is selected as the target reference address so as to carry out address standardization processing.
Wherein the evaluation score of the reference address comprises a combination of one or more of: the content compactness of the reference address, the similarity of the reference address and the address to be processed and the confidence of the reference address.
Regarding the content fullness, determining the content fullness of each reference address according to the address element included in the reference address, and taking the content fullness as an evaluation score of the reference address; wherein, the more address elements the reference address comprises, the greater the content fullness of the reference address.
For example, if the reference address includes seven address elements of province, city, district (county), street name, street number, interest point, house number, the fullness of the reference address is full 7 points, and each address element of the reference address is reduced by 1 point based on the full 7 points.
Regarding the matching degree, determining the similarity between the reference address and the address to be processed according to address element information respectively included in the reference address and the address to be processed, and taking the similarity as an evaluation score of the reference address; the more the address element information that the reference address and the address to be processed are repeated, the higher the matching degree of the reference address and the address to be processed is.
For example, the address to be processed is "in three big pharmacies of North China Zhang" of the family, the first reference address is "three big pharmacies of North China Zhang" of the family, the second reference address is "three big pharmacies of the family Hua Luzhang", the address to be processed is the same as the street name of the first reference address, the POI address is also the same, the address to be processed is different from the street name of the second reference address, the POI is the same, and therefore, the matching degree of the first reference address and the address to be processed is higher.
Regarding the confidence level, determining the confidence level of the reference address according to the source of the reference address; wherein the higher the confidence of a reference address, the more reliable the reference address.
It should be noted that, the different addresses in the address index library are different in source, some addresses are derived from the electronic map system, some addresses are the online shopping and shipping address library, and other addresses are derived from the public security system. The address data in the public security system is acquired based on the basic police, the reliability is highest, the address is acquired based on crowdsourcing, and the reliability is lowest.
For example, the confidence of an address from an electronic map system is 1, the confidence of an address from an online shopping and delivery address library is 2, and the confidence of an address from a public security system is 3.
In one embodiment of the present invention, after selecting the target reference address, S103 includes:
according to the target reference address, the missing address field information in the address to be processed is complemented; and/or adjusting the address element information of the address to be processed according to the target reference address so as to enable the address to be processed to be consistent with the corresponding address element information in the target reference address.
Wherein, the province, city, district, street, etc. missing in the address to be processed are complemented.
The content, format and position of the address element are adjusted. For example, the format of "Yi villa" in the address to be processed is adjusted, and the adjustment is that: "Yi villa-Hua Xiang"; and adjusting the content of 'Wangtwo' in the address to be processed, wherein the adjustment is the 'Wangtwo hotel'.
Fig. 2 shows a flow chart of an address data processing method according to another embodiment of the invention. The method comprises the following steps:
s201, constructing an address index library. The address index library fuses the internet address with the existing standard address data in the public security system. The internet address includes a triage shipping address and a gorgeous POI address. Constructing a rich address retrieval library by utilizing a multi-source heterogeneous data address library indexing technology, wherein address fields in the address retrieval library comprise one or more of the following combinations: "province (adv)", "city", "region", "town", "community", "street name", "roadNo", "POI", "house number". For example, the addresses in the address index library include fields as shown in table 1.
TABLE 1
Figure BDA0001951709300000081
S202, processing multiple alert address fragments, including:
s2021, carrying out named entity identification on the address to be processed. Using named entity recognition (Named Entity Recognition) technology to perform named entity recognition on addresses to be processed of different police, for example, the recognition result of ' in three big pharmacies of North China Zhang of the department ' is ' head=North China way of the department; poi = Zhang Sanda pharmacy; assist = in ", where assist is the location identity.
S2022, the address entity is segmented. The multi-granularity word segmentation technology is utilized to segment the identified entity, for example, the word segmentation results of the Zhang Sanda drugstore are Zhang Sanand Dadrugstore.
S203, retrieving the reference address in the address index library, comprising:
s2031, the entity identified in S2021 is handed to the address retrieval library, and the recall address is indexed by the inverted index method. Taking "in Zhang Sanda pharmacy in North China" as an example, and "road=Poi=in Zhang Sanda pharmacy asest=in North China" as a query parameter of a search engine to perform entity full text matching search.
S2032, in the case of no recall result at S2031, it is necessary to use a finer granularity word segmentation result for combined recall. For example, "poi=art villa and flower village" is segmented to obtain "art villa" and "flower village", and index recall is performed respectively by using the segmentation result to obtain "poi=art villa and flower village" result.
In addition, in order to improve recall accuracy, intersections are taken for multi-granularity word segmentation results. For example, the word "art villa" is divided into two words "art villa" and "flower villa". Searching in an index library according to the 'skill villa' to obtain three addresses; searching in an index library according to the 'flower country', and obtaining two addresses; the intersection between the three addresses and the two addresses.
S204, sequencing the reference addresses, including:
based on the recalled reference address in S203, the evaluation scores of the reference addresses are calculated from three aspects, and weighted summation is performed, and the reference address with the largest summation result, that is, the optimal reference address, is selected.
Wherein the evaluation score of the reference address is calculated from the following three aspects:
1) Content fullness, reflecting the content completeness of the reference address, such as whether it contains complete provincial area, street, etc. 9-level address elements, the more complete the information, the higher the score.
2) The more the content matches, the higher the score is the more the reference address matches the address to be processed. For example, for the "within three big pharmacies of north China department" this address to be processed, "road=north China department poi=three big pharmacies of Zhang" score is greater than "road=department Hua Lu poi=three big pharmacies of Zhang".
3) The priority of the data sources is different, different data sources have different priorities, standard address data in the public security system are collected based on the police force of the base layer, and have the highest priority, and then the data of the treasures-washing and goods-receiving addresses are obtained; the high-germany POI data is acquired based on crowd sourcing with minimal confidence.
S205, generating a standard address of the address to be processed according to the reference address, including:
s2051, based on the optimal reference address, supplements the missing address element information in the address to be processed, such as province, city, district, street, etc.
S2052, normalizes the address elements existing in the address to be processed, for example, "poi=yi villa and flower villa" is updated to "poi=yi villa and flower villa" and "poi=wangbi" is replaced by "poi=wangbi hotel".
S2053, after S2051 and S2052, obtaining the address of the address to be processed after the standardized processing, and generating an address standardized expression based on the partial order relation of the address elements and the address complementation after the standardized processing. For example, an optimal reference address "prop=city of si =distribution of capital city =ram area road =upper pool positive street subroad= Wen Wenglu" of "intersection of upper pool positive street and Wen Wenglu", and the address after normalization is "road=upper pool positive street assist =intersection of upper pool positive street=and subroad= Wen Wenglu assist =intersection", and the finally generated standard address expresses: the intersection of the pool-up forward street and Wen Wenglu in Qing sheep district in Du City, sichuan province.
In the embodiment of the invention, because the multi-police address data is various in form and independent in storage, and all subsystems are mutually isolated, the address is subjected to standardized processing, so that the use, unified management and maintenance of the multi-police address data are facilitated, the complementation and mapping of the multi-police address data are enhanced, and the construction of the standard address of the public security system. One label three-true: the full name is "one standard three real" basic information collection project, namely standard address, real population, real house and real unit address information collection.
Corresponding to the address data processing method of the embodiment of the invention, the invention also provides an address data processing device, equipment and a computer storage medium.
Wherein, referring to fig. 3, the apparatus 300 comprises:
the segment acquiring module 301 is configured to acquire a plurality of address segments of an address to be processed.
The address retrieval module 302 is configured to perform matching retrieval in the address index library according to the plurality of address fragments to obtain a reference address.
The standardized processing module 303 is configured to obtain a standard address of the address to be processed according to the reference address.
In one embodiment of the present invention, the segment acquisition module 301 includes:
the entity identification module is used for carrying out named entity identification on the address to be processed;
and the fragment determining module is used for taking the identified entity as an address fragment.
In one embodiment of the invention, address retrieval module 302 includes:
and the entity retrieval module is used for carrying out matching retrieval in the address index library by taking the plurality of address fragments as retrieval parameters to obtain a reference address.
In one embodiment of the invention, the recall reference address is indexed using an inverted index.
In one embodiment of the present invention, the segment acquisition module 301 includes:
the word segmentation module is used for segmenting the address to be processed, and the word segmentation result is used as an address fragment.
In one embodiment of the invention, address retrieval module 302 includes:
and the phrase determining module is used for obtaining a plurality of phrases according to the word segmentation result, wherein each phrase comprises at least one word in the word segmentation result.
And the address set determining module is used for searching at least one address matched with each phrase in the address index library, and forming an address set with at least one address matched with each phrase.
And the standard address determining module is used for determining a reference address according to a plurality of address sets corresponding to a plurality of phrases.
In one embodiment of the invention, the standard address determination module comprises:
and the intersection taking module is used for taking intersections of a plurality of address sets as reference addresses.
In one embodiment of the present invention, the number of reference addresses is a plurality, and the apparatus further comprises:
and the evaluation scoring module is used for determining at least one evaluation score of each reference address according to the content of each reference address.
And the address selection module is used for selecting a target reference address from a plurality of reference addresses according to at least one evaluation score of each reference address.
In one embodiment of the invention, the evaluation scoring module comprises:
the filling degree determining module is used for determining the content filling degree of each reference address according to the address elements included in the reference address and taking the content filling degree as the evaluation score of the reference address; the more the number of address elements included in the reference address, the greater the content fullness of the reference address.
In one embodiment of the invention, the evaluation scoring module comprises:
and the matching degree determining module is used for determining the similarity between the reference address and the address to be processed according to the address element information included in each reference address, and taking the similarity as the evaluation score of the reference address.
In one embodiment of the invention, the evaluation scoring module comprises:
the confidence determining module is used for determining the confidence of each reference address according to the source of the reference address and taking the confidence as the evaluation score of the reference address; wherein the higher the confidence of a reference address, the more reliable the reference address.
The address in the address index library is derived from one or more of an electronic map system, an online shopping and delivery address library and a public security system.
In one embodiment of the invention, the normalization processing module 303 comprises:
and the information complementing module is used for complementing the missing address element information in the address to be processed according to the target reference address.
And the parameter adjustment module is used for adjusting the address element information of the address to be processed according to the target reference address so as to enable the address to be processed to be consistent with the corresponding address element information in the reference address.
In one embodiment of the invention, the address elements of the addresses in the address index library comprise various combinations of: province pro, city, district distribution, county town, street name road, street number roadNo, point of interest POI, house number house no.
Wherein the computing device comprises:
a memory for storing a program;
a processor for executing the program stored in the memory to perform the address data processing method of any one of the above.
Wherein the computer readable storage medium has stored thereon computer program instructions which, when executed by a processor, implement the address data processing method of any of the above.
It should be noted that in the above-described embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in whole or in part, the use is in the form of a computer program product comprising one or more computer program instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer program instructions may be stored in or transmitted from one computer readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means from one website, computer, server, or data center. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
FIG. 4 illustrates a block diagram of an exemplary hardware architecture of a computing device capable of implementing methods according to embodiments of the invention. Wherein computing device 400 includes an input device 401, an input interface 402, a processor 403, a memory 404, an output interface 405, and an output device 406.
Wherein input interface 402, processor 403, memory 404, and output interface 405 are interconnected via bus 410, and input device 401 and output device 406 are connected to bus 410 via input interface 402 and output interface 405, respectively, and further to other components of computing device 400.
Specifically, the input device 401 receives input information from the outside, and transmits the input information to the processor 403 through the input interface 402; processor 403 processes the input information based on computer executable instructions stored in memory 404 to generate output information, temporarily or permanently stores the output information in memory 404, and then communicates the output information to output device 406 via output interface 405; output device 406 outputs the output information to the outside of computing device 400 for use by a user.
The computing device 400 may perform the steps of the methods of the present invention described above.
Processor 403 may be one or more central processing units (English: central Processing Unit, CPU). In the case where the processor 403 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.
Memory 404 may be, but is not limited to, one or more of Random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), compact disc read only memory (CD-ROM), hard disk, and the like. Memory 404 is used to store program codes. It will be appreciated that the functions of any or all of the modules provided by the embodiments of the present invention may be implemented by the cpu 403.
All parts of the specification are described in a progressive manner, and all parts of the embodiments which are the same and similar to each other are referred to each other, and each embodiment is mainly described as being different from other embodiments. In particular, for apparatus and system embodiments, the description is relatively simple as it is substantially similar to method embodiments, and reference may be made to the description of the method embodiments section for relevant matters.

Claims (15)

1. A method of address data processing, the method comprising:
acquiring a plurality of address fragments of an address to be processed;
matching and searching in an address index library according to the address fragments to obtain a reference address;
obtaining a standard address of the address to be processed according to the reference address;
the matching search is performed in an address index library according to the address fragments to obtain a reference address, which comprises the following steps:
retrieving at least one address matching each of the plurality of phrases in the address index library, the at least one address matching each of the plurality of phrases forming an address set, wherein each of the plurality of phrases comprises at least one address fragment;
and acquiring intersections of a plurality of address sets corresponding to the plurality of phrases to obtain a reference address.
2. The method of claim 1, the obtaining a plurality of address fragments of the address to be processed, comprising:
and carrying out named entity identification on the address to be processed, and taking the identified entity as the address fragment.
3. The method of claim 2, wherein the performing matching search in the address index library according to the plurality of address fragments to obtain the reference address includes:
and taking the address fragments as search parameters, and carrying out matching search in the address index library to obtain the reference address.
4. A method according to claim 3, wherein the reference address is recalled using an inverted index.
5. The method of claim 1, the obtaining a plurality of address fragments of the address to be processed, comprising:
and performing word segmentation on the address to be processed, and taking a word segmentation result as the address fragment.
6. The method of claim 1, the number of reference addresses being a plurality, the method further comprising:
determining at least one evaluation score of each reference address according to the content of each reference address;
a target reference address is selected from a plurality of said reference addresses according to at least one evaluation score for each of said reference addresses.
7. The method of claim 6, wherein,
determining the content fullness of the reference address as an evaluation score of the reference address according to address elements included in the reference address;
the more the number of address elements included in the reference address, the greater the content compactness of the reference address.
8. The method of claim 6, wherein,
and determining the similarity between the reference address and the address to be processed as an evaluation score of the reference address according to address elements respectively included in the reference address and the address to be processed.
9. The method of claim 1, wherein the address in the address index library is derived from one or more of an electronic map system, an online shopping shipping address library, and a public security system.
10. The method of claim 6, wherein,
determining the confidence of the reference address according to the source of the reference address as an evaluation score of the reference address;
wherein, the higher the confidence of the reference address, the more reliable the reference address.
11. The method of claim 6, wherein the obtaining the standard address of the address to be processed according to the reference address comprises:
according to the target reference address, the missing address element information in the address to be processed is complemented;
and/or the number of the groups of groups,
and according to the target reference address, adjusting the address element information of the address to be processed so as to enable the address to be processed to be consistent with the corresponding address element information in the reference address.
12. The method of claim 1, the address elements of the addresses in the address index library comprising a plurality of combinations of: province pro, city, district distribution, county town, street name road, street number roadNo, community comm, point of interest POI, house number house no.
13. An address data processing apparatus, the apparatus comprising:
the fragment acquisition module is used for acquiring a plurality of address fragments of the address to be processed;
the address retrieval module is used for carrying out matching retrieval in an address index library according to the plurality of address fragments to obtain a reference address;
the standardized processing module is used for obtaining a standard address of the address to be processed according to the reference address;
the address retrieval module is specifically configured to retrieve at least one address matched with each phrase in the plurality of phrases in the address index library, where the at least one address matched with each phrase forms an address set, and each phrase includes at least one address fragment; and acquiring intersections of a plurality of address sets corresponding to the plurality of phrases to obtain a reference address.
14. A computing apparatus, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory to perform the method of any one of claims 1 to 12.
15. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of claims 1 to 12.
CN201910053685.9A 2019-01-21 2019-01-21 Address data processing method and device, equipment and storage medium Active CN111460054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910053685.9A CN111460054B (en) 2019-01-21 2019-01-21 Address data processing method and device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910053685.9A CN111460054B (en) 2019-01-21 2019-01-21 Address data processing method and device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111460054A CN111460054A (en) 2020-07-28
CN111460054B true CN111460054B (en) 2023-06-30

Family

ID=71684071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910053685.9A Active CN111460054B (en) 2019-01-21 2019-01-21 Address data processing method and device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111460054B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835899B (en) * 2021-01-29 2024-07-02 上海寻梦信息技术有限公司 Address library indexing method, address matching method and related equipment
CN113535880B (en) * 2021-09-16 2022-02-25 阿里巴巴达摩院(杭州)科技有限公司 Geographic information determination method and device, electronic equipment and computer storage medium
CN114003812A (en) * 2021-10-29 2022-02-01 深圳壹账通智能科技有限公司 Address matching method, system, device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314099B1 (en) * 1997-11-28 2001-11-06 Mitsubishi Electric System Lsi Design Corporation Address match determining device, communication control system, and address match determining method
US8996523B1 (en) * 2011-05-24 2015-03-31 Google Inc. Forming quality street addresses from multiple providers
CN104679801A (en) * 2013-12-03 2015-06-03 高德软件有限公司 Point of interest searching method and point of interest searching device
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN107656913A (en) * 2017-09-30 2018-02-02 百度在线网络技术(北京)有限公司 Map point of interest address extraction method, apparatus, server and storage medium
CN108228825A (en) * 2018-01-02 2018-06-29 北京市燃气集团有限责任公司 A kind of station address data cleaning method based on participle
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6314099B1 (en) * 1997-11-28 2001-11-06 Mitsubishi Electric System Lsi Design Corporation Address match determining device, communication control system, and address match determining method
US8996523B1 (en) * 2011-05-24 2015-03-31 Google Inc. Forming quality street addresses from multiple providers
CN104679801A (en) * 2013-12-03 2015-06-03 高德软件有限公司 Point of interest searching method and point of interest searching device
CN105786800A (en) * 2016-03-23 2016-07-20 苏州数字地图信息科技股份有限公司 Police standard address acquiring method and system
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN109101474A (en) * 2017-06-20 2018-12-28 菜鸟智能物流控股有限公司 Address aggregation method, package aggregation method and equipment
CN107656913A (en) * 2017-09-30 2018-02-02 百度在线网络技术(北京)有限公司 Map point of interest address extraction method, apparatus, server and storage medium
CN108228825A (en) * 2018-01-02 2018-06-29 北京市燃气集团有限责任公司 A kind of station address data cleaning method based on participle

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王雅丹.基于GIS的动态物流网络***的设计与实现.中国优秀硕士学位论文全文数据库.2018,全文. *

Also Published As

Publication number Publication date
CN111460054A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN108363698B (en) Method and device for identifying relation of interest points
CN111460054B (en) Address data processing method and device, equipment and storage medium
WO2016155386A1 (en) Method and device for determining whether webpage comprises point of interest (poi) data
CN112069276B (en) Address coding method, address coding device, computer equipment and computer readable storage medium
CN108628811B (en) Address text matching method and device
KR101945749B1 (en) Method of searching a data base, navigation device and method of generating an index structure
CN110990520B (en) Address coding method and device, electronic equipment and storage medium
RU2598165C1 (en) Non-deterministic disambiguation and comparison of data of location of commercial enterprise
Srivastava et al. A geocoding framework powered by delivery data
US11741167B2 (en) Merging point-of-interest datasets for mapping systems
CN115470307A (en) Address matching method and device
CN110688434A (en) Method, device, equipment and medium for processing interest points
CN111931077A (en) Data processing method and device, electronic equipment and storage medium
WO2016107352A1 (en) System and method for determining poi name and for determining validity of poi information
CN117494711A (en) Semantic-based electricity utilization address similarity matching method
CN111831929B (en) Method and device for acquiring POI information
CN115935086A (en) Address information identification method, information push method and information display method
CN113468881B (en) Address standardization method and device
CN114513550A (en) Method and device for processing geographical position information and electronic equipment
CN114816518A (en) Simhash-based open source component screening and identifying method and system in source code
CN111460325B (en) POI searching method, device and equipment
CN113868373A (en) Word cloud generation method and device, electronic equipment and storage medium
CN107967300B (en) Method, device and equipment for retrieving organization name and storage medium
US10204139B2 (en) Systems and methods for processing geographic data
CN117349540A (en) Address searching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant