CN113836357B - Address database data processing method and control system based on text similarity calculation - Google Patents

Address database data processing method and control system based on text similarity calculation Download PDF

Info

Publication number
CN113836357B
CN113836357B CN202111184456.4A CN202111184456A CN113836357B CN 113836357 B CN113836357 B CN 113836357B CN 202111184456 A CN202111184456 A CN 202111184456A CN 113836357 B CN113836357 B CN 113836357B
Authority
CN
China
Prior art keywords
address
mapping
level
data
addresses
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111184456.4A
Other languages
Chinese (zh)
Other versions
CN113836357A (en
Inventor
王晓林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shangyue Network Technology Co ltd
Original Assignee
Beijing Shangyue Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shangyue Network Technology Co ltd filed Critical Beijing Shangyue Network Technology Co ltd
Priority to CN202111184456.4A priority Critical patent/CN113836357B/en
Publication of CN113836357A publication Critical patent/CN113836357A/en
Application granted granted Critical
Publication of CN113836357B publication Critical patent/CN113836357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion
    • G06F16/86Mapping to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Development Economics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to an address base data processing method and a control system based on text similarity calculation, wherein a mapping data table ordered according to address hierarchy is established according to first address base data and second address base data by obtaining the first address base data and the second address base data; according to the mapping data table, obtaining all mapping addresses of the same level with the source address in the second address database data from the mapping data table, and taking the set of all mapping addresses as an address mapping set; performing similarity calculation on the source address in the second address library number and each address in the address mapping set; and sorting according to the similarity calculation results, and taking the address with the maximum similarity as the target address of the source address. Through similarity calculation, address information data of different address levels are obtained from the mapping address table, the addresses are subjected to one-layer conversion, and the two address libraries are matched in a refined and quick mode, so that the converted addresses can be directly applied to an e-commerce platform, and e-commerce communication is accelerated.

Description

Address database data processing method and control system based on text similarity calculation
Technical Field
The present disclosure relates to the field of communications technologies, and in particular, to an address database data processing method and a control system based on text similarity calculation.
Background
E-commerce transaction is a popular mode for commodity transaction in the current market, and a user can wait for delivery of goods by an E-commerce seller by storing address information, ordering, paying and the like in an E-commerce platform. Therefore, the E-commerce economy brings convenient shopping experience for the daily life of the user.
The address data of the user is an essential data in the links of e-commerce logistics and the like. The applicant finds that, in the initial stage of transaction of a user on an e-commerce platform, during the process of connecting the e-commerce, the addresses of a buyer and a seller cannot be directly used by the e-commerce, because the purchasing mall of the enterprise a needs to be connected with the purchasing mall of the enterprise B, the address of the enterprise a cannot be directly used in the purchasing mall of the enterprise B, and a layer of conversion is needed.
Therefore, it is necessary to provide a method for converting the address data of the two devices so that the converted address information can be directly matched and used for the platform of the e-commerce.
Disclosure of Invention
In view of the above, the present disclosure provides an address library data processing method and a control system based on text similarity calculation, which can perform one-layer address conversion to achieve hardcover and fast matching of two address libraries, so that the converted address can be directly applied to an e-commerce platform, and the information exchange and logistics docking speed of the e-commerce platform are increased.
According to an aspect of the present disclosure, there is provided an address database data processing method based on text similarity calculation, including the following steps:
s1, acquiring first address base data and second address base data, and establishing a mapping data table ordered according to address hierarchy according to the first address base data;
s2, according to the mapping data table, obtaining all mapping addresses of the same level as the source address in the second address database data from the mapping data table, and using the set of all mapping addresses as an address mapping set;
s3, carrying out similarity calculation on the source address in the second address library number and each address in the address mapping set;
and S4, sorting according to the similarity calculation result, and taking the address with the maximum similarity as the target address of the source address.
In one possible implementation manner, preferably, the obtaining, from the mapping data table, all mapping addresses that are at the same level as the source address in the second address library data, and using a set of all mapping addresses as an address mapping set includes:
inquiring whether the source address has parent address data in the mapping data table:
if yes, the addressing is finished;
otherwise, it is determined whether the parent code is equal to zero.
In one possible implementation manner, it is preferable that the method further includes:
if the parent code is judged to be equal to zero, all the child addresses of the source address in the mapping data table are obtained;
acquiring all sub-address names mapped in the first address database according to all sub-addresses;
and matching all the sub-address names with the user address names, and recording the matching result into an ordered list.
In one possible implementation manner, it is preferable that the method further includes:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the maximum matching score is not zero, obtaining a target address according to the position information of the sub-address name of the maximum matching score;
and saving the target address to a database.
In one possible implementation manner, it is preferable that the method further includes:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
In a possible implementation manner, preferably, the method further includes:
if the parent code is judged not to be equal to zero, acquiring next parent address data existing in the mapping data table;
judging whether the parent-level address data exists:
if yes, all sub-addresses of the source address in the mapping data table are obtained; acquiring all sub-address names mapped in the first address database data according to all sub-addresses; matching all the sub-address names with the user address names, and recording matching results into an ordered list;
if not, execution ends.
In one possible implementation manner, it is preferable that the method further includes:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the maximum matching score is not zero, obtaining a target address according to the position information of the sub-address name of the maximum matching score;
and saving the target address to a database.
In one possible implementation manner, it is preferable that the method further includes: acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
In one possible implementation, preferably, the parent-level address data includes provincial-level address data, city-level address data, and county-level address data in the mapping data table.
In one possible implementation, preferably according to another aspect of the present disclosure, a control system includes:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to implement the address library data processing method based on text similarity calculation when executing the executable instructions.
The invention has the technical effects that:
the method comprises the steps of obtaining first address base data and second address base data, and establishing a mapping data table ordered according to address hierarchy according to the first address base data; according to the mapping data table, obtaining all mapping addresses of the same level with the source address in the second address database data from the mapping data table, and taking the set of all mapping addresses as an address mapping set; performing similarity calculation on the source address in the second address library number and each address in the address mapping set; and sorting according to the similarity calculation results, and taking the address with the maximum similarity as the target address of the source address. Can obtain the accurate address information data of different address hierarchies from the mapping address table through similarity calculation, through carrying out one deck conversion with the address, can adorn by oneself and match two address libraries fast for the address after the conversion can directly be applied to the electricity merchant platform, accelerates electricity merchant platform information exchange and commodity circulation butt joint speed.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a schematic diagram illustrating an implementation flow of the address database data processing method based on text similarity calculation according to the present invention;
FIG. 2 is a general flow diagram illustrating the matching of hierarchical addresses according to address hierarchy in accordance with the present invention;
FIG. 3 is a flow chart illustrating the implementation of provincial addressing for the present invention;
fig. 4 shows a flow chart of an implementation of addressing according to parent coding for the present invention.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.
Example 1
As shown in fig. 1, a method for processing address database data based on text similarity calculation is provided, which includes the following steps:
s1, acquiring first address base data and second address base data, and establishing a mapping data table ordered according to address hierarchy according to the first address base data;
according to the method and the device, address database data of the first user and the second user need to be converted, wherein the first user can be a buyer or a merchant, and the second user can be the merchant or an operator on an e-commerce platform.
As shown in fig. 2, in this embodiment, the first address database data is used as the full data of the address of the enterprise a, the second address database data is used as the full data of the address of the enterprise B, the full data is all provinces, cities and counties including one enterprise address, for example, the full data of the address of the enterprise a is acquired, and the acquired data is the respective regional address data of the enterprise a, including the address information of the provinces, cities and counties.
Firstly, preparing data, acquiring the full data of the address of an enterprise A and the full data of the address of an enterprise B, and respectively using the full data as first address base data and second address base data;
the technology needs to convert the address of the enterprise B into the address which can be directly used for the E-commerce platform by the enterprise A through the method, so that address query and mapping matching are carried out from provincial level, city level, county level and ballast level, and finally a conversion address matched with the address of the enterprise A is obtained and is used as the E-commerce application address of the enterprise A corresponding to the enterprise B, so that address mapping is established to carry out E-commerce logistics;
therefore, in the application, an address mapping table is firstly established, according to the sorting mode of the address hierarchy 'province, city, county and town', an address mapping data table is established according to the first address database data, the address mapping data table comprises the address coding of the enterprise A and the address coding relation of the enterprise B, the address mapping data table can be established through a table or other modes, all levels of addresses 1 can be established correspondingly to 1, the mode of establishing the mapping table between two addresses through the table is a conventional means, and details are not repeated herein. The address database of the enterprise a may include address data of different addresses, and may be set specifically according to the needs.
In addition, the address mapping table needs to establish a mapping matching relationship between the a-B addresses, and the mapping matching relationship is embodied in the table as follows: each row of data is address code A-address code B, and the address codes are specifically set by a user during program setting; the mapping matching relationship is convenient for indexing the associated address data according to the mapping matching relationship when the matching address mapping table is inquired again, namely the address information corresponding to the enterprise B is found from the address library of the enterprise A according to the mapping matching relationship, and the data in the address mapping table is collected as the converted address data;
the mapping matching relationship can be set by a user according to market demands and a user-defined mode, and is not limited in this place. After the mapping matching relationship is set, the mapping matching relationship is configured and stored in a database or a memory.
S2, according to the mapping data table, obtaining all mapping addresses of the same level as the source address in the second address database data from the mapping data table, and using the set of all mapping addresses as an address mapping set;
when the mapping address data is matched and obtained, according to the matching relation in the database, the associated address information can be obtained from the address mapping table step by step according to the sorting mode of the address hierarchy 'province, city, county and town', wherein the address data information obtained each time is obtained according to the same level address, such as the province level in the enterprise address B, and the province level mapping result in the address mapping table corresponds to the mapping obtaining; and mapping one by one to obtain all the satisfied address information.
Acquiring all the mapping addresses which are at the same level as the source address in the second address database data from the mapping data table, specifically, matching all the same-level address data according to the type of the source address, for example, when the source address is a provincial address of an address in the second address database data, such as Zhejiang province, then acquiring all the provincial mapping addresses corresponding to the source address from the mapping data table according to the mapping relationship, wherein the number of the mapping addresses may include a plurality of the provincial mapping addresses, and using the provincial mapping address sets as an address mapping set; and sequentially acquiring address mapping sets with source addresses at city level, county level and ballast level. For each address set, a best matching address is required to be obtained, for example, a city-level address mapping set containing different city-level addresses of a plurality of provinces or a plurality of city-level addresses below a province, and the most similar address is required to be obtained in a similarity calculation manner to serve as a best matching ground-level address, so as to obtain a best address for mapping matching.
In this embodiment, taking province as an example, a province address of an enterprise B, such as Jiangsu province, is retrieved from an address mapping table, the province address of the enterprise B is matched with all provinces in the address mapping table that satisfy a mapping matching relationship, and the province with the largest similarity value obtained through similarity calculation is used as a replacement address of the province address of the enterprise B in the address mapping table, so that the province address information of the enterprise B is converted into a mapped replacement province address; in this way, the addresses of the city, county, and town may be acquired one by one.
When each level of ground level address is matched, after the ground level address of each layer is matched transversely, the mapping address of the next level, namely a sub-level, is matched longitudinally according to the mapping for the ground level address; as shown in fig. 2, after performing a "provincial level" mapping matching, that is, performing a mapping matching from a mapping-matched provincial level address to a city level address, a county level address and a ballast level address under the provincial level, the next province is also required to be performed, and all city level addresses, all county level addresses and all ballast level addresses under the next province are required to be mapped and matched to obtain matched city level addresses, county level addresses and ballast level addresses; and executing the next province until all province matching is completed, and further acquiring all city-level addresses below the province-level addresses meeting the conditions, all county-level addresses below the province-level addresses meeting the conditions and all ballast-level addresses below the province-level addresses meeting the conditions according to the method after province-level address mapping matching.
S3, carrying out similarity calculation on the source address in the second address library number and each address in the address mapping set;
the similarity calculation specifically includes:
and calculating the similarity value between the address corresponding to each layer of address level and each address in the address mapping set according to a text similarity calculation mode. The source address in the second address library number refers to an address that needs to be computed from an address in the address mapping table, such as a provincial address of business B.
In this embodiment, a similarity value is calculated according to the text similarity under the text2vec packet of python;
for example, similarity calculation is performed on the province level information in the enterprise address B and all the provinces in the address mapping table that satisfy the matching relationship, similarity calculation needs to be performed on each province in the enterprise address B and all the provinces in the address mapping table that satisfy the matching relationship, and the address with the maximum similarity is taken as the province matched with the province in the address base B after calculation.
In the present application, the mapping matching of the city level address, the county level address and the ballast level address is performed according to the mapping matching of all the addresses of the level "below the province level", for example, all the city level addresses below the province level, the county level addresses below the province level, and the like.
And S4, sorting according to the similarity calculation results, and taking the address with the maximum similarity as the target address of the source address.
Specifically, the enterprise addresses B are grouped according to address hierarchy for matching, and then similarity calculation and matching of the hierarchical addresses are performed in this way: as shown in fig. 3, provincial levels are matched, matching is successful according to similarity calculation, provincial level information after mapping matching is obtained, and address matching of the next city level is performed; according to the steps, after city-level matching is finished, matching county-level addresses and finally matching ballast-level addresses; and finally, obtaining an address with the highest similarity of all levels of land levels as the target address of the source address.
According to the calculation mode, the similarity value can be obtained according to the similarity calculation result, the similarity values are input into the sorted list of the database, the similarity calculation results can be obtained and sorted, and the address with the maximum similarity is used as the target address of the source address according to the address sorting.
As shown in fig. 4, in one possible implementation manner, preferably, the obtaining, from the mapping data table, all mapping addresses that are in the same level as the source address in the second address library data, and using a set of all mapping addresses as an address mapping set includes: inquiring whether the source address has parent address data in the mapping data table: if yes, the addressing is finished; otherwise, it is determined whether the parent code is equal to zero.
When addressing in the mapping table, the addresses are searched one by one according to the ground level, so that whether a parent code is provided or not is judged by addressing, namely the first-level code/the parent code, if the parent code is provided, the corresponding ground address is indicated, for example, if the parent code is judged during addressing, the first-level address is present, the provincial address is judged to be present according to the parent code, and after addressing, the city address of the next level of the parent level, namely the child code positioned under the parent code, namely the city address (the second-level code) is searched. And the addressing is cycled through in turn to obtain the other addresses.
Each address has a parent code when the address is stored, for example: the name is a Haihe area code 110108 parent _ code 110100, the obtaining mode can be stored in a memory or can be obtained by inquiring a database, and when the source address is inquired, the corresponding parent code can be checked according to the address storage position. Inquiring and judging the type of the code is a routine technical means in the program, and is not described in detail herein.
When judging that the mapping data table has the parent address data, indicating that a corresponding mapping address exists, successfully addressing, and ending addressing; as shown in fig. 4, when the enterprise address B is mapped and addressed, if matching is performed, the corresponding mapping address is directly obtained, and the mapping address is directly used as the address data after the enterprise B is converted, and is used for the enterprise a as the application address of the e-commerce;
and when judging that no parent address data exists in the mapping data table, the mapping data table indicates that the corresponding parent address is not matched, and at the moment, whether the parent code is equal to zero needs to be judged, so that the addressing is further judged.
The storage and arrangement mode of the parent level code is edited and set by a user, and the corresponding first level ground level address is provincial level, city level or county level and can be edited according to the mapping rule. The embodiment performs mapping addressing according to parent level coding, and can directly acquire converted address data when mapping is matched to a corresponding address. Similar data can be matched through stage-by-stage addressing under the condition that the address can not be directly mapped and matched, and the addressing precision can be improved.
In one possible implementation manner, it is preferable that the method further includes:
if the parent code is judged to be equal to zero, all the child addresses of the source address in the mapping data table are obtained; editing and level setting of the parent code can be set by a user; when the addressing judges that the parent code at one position is equal to zero, the parent code at the position is found to be unmatched, the addressing is required to be carried out in the mapping table one by one at the moment, and the addressing is finished until the addressing finds that the parent code is not equal to zero; therefore, when the parent code is equal to zero, the address needs to be acquired from the mapping table, all the child addresses of the source address in the mapping data table are acquired according to the mapping of the parent address, one-by-one matching calculation is performed, all the matching results are sorted according to the matching score for judgment, and when the next address still exists, the matching score is calculated and sorted again; as shown in fig. 4, a provincial mapping address can be obtained according to the provincial name of the B enterprise address and the parent, namely, the provincial code, and further addressed to obtain all the city level a enterprise sub-addresses below the provincial in the address mapping table; acquiring all sub-address names mapped in the first address database data according to all sub-addresses; acquiring a sub-address name of a corresponding enterprise address A; and matching all the sub-address names with the user address names, and recording the matching result into an ordered list.
The method matches the names of all the addressed A enterprise address sub-addresses with the names of the merchant company addresses, can calculate according to the text similarity calculation mode, and calculates and orderly outputs matching results. All relevant ground level address data can be obtained under the parent level coding addressing logic, under the condition that the parent level coding is zero, all ground level address data below the province level can be mapped and matched according to the addressing flow, the next province level addressing flow is executed, the whole-course addressing matching from the longitudinal direction to the transverse direction is realized, and the address matching precision is improved.
In one possible implementation manner, it is preferable that the method further includes: acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name; in the ordered list, the sub-address name with the maximum matching score can be queried, and the position information of the corresponding sub-address name is obtained, so that the address information can be obtained according to the position information.
During matching calculation, whether the maximum matching score is zero or not needs to be judged; if the maximum value of the matching score is not zero, sorting the matching values, calculating and sorting the similarity values of the addresses according to the sorting result, and obtaining the target address according to the position information of the sub-address name with the maximum value of the matching score; the target address is the mapping address with the maximum similarity, the most matched enterprise address A is found according to the position with the maximum score, and the address is used as the mapping address of the enterprise B; and (4) storing the mostly matched A enterprise address editing script to a database after data persistence processing.
According to the matching score sorting, a score sorting result can be quickly obtained, the maximum score, namely the best matching address position, is obtained, and the addressing efficiency is improved; when the maximum value is zero, the addressing is finished without matching.
In one possible implementation manner, it is preferable that the method further includes: acquiring a sub-address name with a maximum matching score in the ordered list and position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
If the maximum value of the matching score calculated in the sorting is zero, the mapped address does not meet the requirement on the similarity, the execution is finished at this time, and the addressing fails.
In one possible implementation, preferably, the parent-level address data includes provincial-level address data, city-level address data, and county-level address data in the mapping data table.
In this embodiment, since addressing is required sequentially from the provincial level, the level set by the parent code may be set to province, city, or prefecture.
As shown in fig. 4, after the matching score is recorded in the ordered list, it indicates that the score of one mapping address is recorded completely, at this time, it needs to be determined whether there are multiple mapping addresses, and if there are multiple mapping addresses, the matching score of the next mapping address needs to be continuously recorded in the ordered list.
It should be noted that although the addressing pattern and the text similarity calculation are described as examples in a provincial mapping manner, those skilled in the art will understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set the mapping matching relationship according to personal preference and/or practical application scene, as long as the mapping matching relationship can be addressed one by one according to the address hierarchy provided by the technology.
Therefore, through similarity calculation, accurate address information data of different address levels are obtained from the mapping address table, and two address libraries can be matched in a refined and quick mode by converting the addresses by one layer, so that the converted addresses can be directly applied to the e-commerce platform, and the information exchange and logistics docking speed of the e-commerce platform are accelerated.
Example 2
Based on the method for processing address database data based on text similarity calculation provided by the above embodiment 1,
as shown in the figure 4 of the drawings,
in one possible implementation manner, it is preferable that the method further includes:
if the parent code is judged not to be equal to zero, acquiring next parent address data existing in the mapping data table;
when addressing finds that the parent code is not equal to zero, the parent address is present, the corresponding mapping address, namely the corresponding enterprise address A, is obtained, and whether other addresses exist is judged by addressing in sequence; if the address is found to have no other matched addresses, the addressing is finished;
if yes, judging whether the parent-level address data exists:
if the address exists, all the sub-addresses of the source address in the mapping data table are obtained, namely the sub-addresses of all the enterprise addresses A are obtained; acquiring sub-address names of all enterprise addresses A mapped in the first address database data according to all sub-addresses; matching all the sub-address names with the user address names, and recording matching results into an ordered list; and matching all the addressed A enterprise address sub-address names with the names of the merchant company addresses, calculating according to the text similarity calculation mode, and calculating and orderly outputting matching results.
If not, execution ends.
In one possible implementation manner, it is preferable that the method further includes:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the maximum matching score is not zero, obtaining a target address according to the position information of the sub-address name of the maximum matching score;
and saving the target address to a database.
In one possible implementation manner, it is preferable that the method further includes:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
The above location information of the sub-address names that are most matched is obtained by performing the ordered arrangement according to the matching scores is described in embodiment 1, and this embodiment is not described again.
Example 3
Still further, according to another aspect of the present disclosure, there is also provided a control system.
The control system of the disclosed embodiments includes a processor and a memory for storing processor-executable instructions. Wherein the processor is configured to execute the executable instructions to implement a method for processing address database data based on text similarity calculation as described in any of the foregoing embodiments 1 or 2.
Here, it should be noted that the number of processors may be one or more. Meanwhile, in the control system of the embodiment of the present disclosure, an input device and an output device may be further included. The processor, the memory, the input device, and the output device may be connected by a bus, or may be connected by other means, and are not limited specifically herein.
The memory, which is a computer-readable storage medium, may be used to store software programs, computer-executable programs, and various modules, such as: the embodiment of the disclosure provides a program or a module corresponding to an address base data processing method based on text similarity calculation. The processor executes various functional applications of the control system and data processing by running software programs or modules stored in the memory.
The input device may be used to receive an input number or signal. Wherein the signal may be a key signal generated in connection with user settings and function control of the device/terminal/server. The output means may comprise a display device such as a display screen.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method for processing address database data based on text similarity calculation is characterized by comprising the following steps:
s1, acquiring first address base data and second address base data, and establishing a mapping data table ordered according to address hierarchy according to the first address base data; the address mapping data table comprises address codes of enterprises A and address matching relations of enterprises B, and the address matching relations are embodied in the table as follows: each row of data is A address code-B address code; the mapping matching relationship is convenient for indexing the associated address data according to the mapping matching relationship when a matching address mapping table is inquired, namely, the address information of the corresponding enterprise B is found from the address library of the enterprise A according to the mapping matching relationship, and the data in the address mapping table is collected as the converted address data;
s2, according to the mapping data table, obtaining all mapping addresses of the same level as the source address in the second address database data from the mapping data table, and taking a set of all mapping addresses as an address mapping set: matching all the same-level address data according to the type of the source address, and acquiring all provincial mapping addresses corresponding to the source address from a mapping data table according to the mapping relation when the source address is the provincial address of one address in the second address database data, and taking the mapping address sets which are provincial as an address mapping set; sequentially acquiring address mapping sets with source addresses at city level, county level and town level;
when each level of ground level address is matched, after the ground level address of each layer is matched transversely, the mapping address of the next level, namely a sub-level, is matched longitudinally aiming at the ground level address; after executing a "provincial level" mapping matching, that is, executing a provincial level address after mapping matching, and mapping and matching the city level address under the provincial level, the county level address under the provincial level and the town level address under the provincial level, the next province also needs to be executed, and all the city level addresses under the next province, all the county level addresses under the next province and all the town level addresses under the next province need to be mapped and matched to obtain the matched city level address, county level address and town level address; executing the next province until all province matching is completed, and further acquiring all city-level addresses below the province-level addresses meeting the conditions, all county-level addresses below the province-level addresses meeting the conditions and all ballast-level addresses below the province-level addresses meeting the conditions according to the method after province-level address mapping matching;
s3, carrying out similarity calculation on the source address in the second address database data and each address in the address mapping set; similarity calculation is carried out on the province level information in the enterprise address B and all the provinces meeting the matching relation in the address mapping table, similarity calculation needs to be carried out on the province level information in the enterprise address B and all the provinces meeting the matching relation in the address mapping table, and the address with the maximum similarity is taken as the province matched with the province in the address base B after calculation; similarly, the mapping matching of the city level address, the county level address and the ballast level address is performed according to the mapping matching of all the level addresses below the province level;
and S4, sorting according to the similarity calculation result, and taking the address with the maximum similarity as the target address of the source address.
2. The method as claimed in claim 1, wherein the obtaining all mapping addresses of peers with the source address in the second address library data from the mapping data table, and using a set of all mapping addresses as an address mapping set includes:
inquiring whether the source address has parent address data in the mapping data table:
if yes, the addressing is finished;
otherwise, it is determined whether the parent code is equal to zero.
3. The address library data processing method based on text similarity calculation according to claim 2, further comprising:
if the parent code is judged to be equal to zero, all the child addresses of the source address in the mapping data table are obtained;
acquiring all sub-address names mapped in the first address database data according to all sub-addresses;
and matching all the sub-address names with the user address names, and recording the matching result into an ordered list.
4. The address library data processing method based on text similarity calculation according to claim 3, further comprising:
acquiring a sub-address name with a maximum matching score in the ordered list and position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the maximum matching score is not zero, obtaining a target address according to the position information of the sub-address name of the maximum matching score;
and saving the target address to a database.
5. The address library data processing method based on text similarity calculation according to claim 4, further comprising:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
6. The address library data processing method based on text similarity calculation according to claim 2, further comprising:
if the parent code is judged not to be equal to zero, acquiring next parent address data existing in the mapping data table;
judging whether the parent-level address data exists:
if yes, all sub-addresses of the source address in the mapping data table are obtained; acquiring all sub-address names mapped in the first address database data according to all sub-addresses; matching all the sub-address names with the user address names, and recording matching results into an ordered list;
if not, execution ends.
7. The address library data processing method based on text similarity calculation according to claim 6, further comprising:
acquiring a sub-address name with a maximum matching score in the ordered list and position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the maximum matching score is not zero, obtaining a target address according to the position information of the sub-address name of the maximum matching score;
and saving the target address to a database.
8. The address library data processing method based on text similarity calculation according to claim 7, further comprising:
acquiring the sub-address name of the maximum matching score in the ordered list and the position information corresponding to the sub-address name;
judging whether the maximum matching score is zero or not;
if the match score maximum is zero, the addressing ends.
9. The method according to claim 2, wherein the parent address data includes provincial address data, city address data and county address data in the mapping data table.
10. A control system, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to execute the executable instructions to implement the address library data processing method based on text similarity calculation according to any one of claims 1 to 9.
CN202111184456.4A 2021-10-12 2021-10-12 Address database data processing method and control system based on text similarity calculation Active CN113836357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111184456.4A CN113836357B (en) 2021-10-12 2021-10-12 Address database data processing method and control system based on text similarity calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111184456.4A CN113836357B (en) 2021-10-12 2021-10-12 Address database data processing method and control system based on text similarity calculation

Publications (2)

Publication Number Publication Date
CN113836357A CN113836357A (en) 2021-12-24
CN113836357B true CN113836357B (en) 2022-09-16

Family

ID=78968595

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111184456.4A Active CN113836357B (en) 2021-10-12 2021-10-12 Address database data processing method and control system based on text similarity calculation

Country Status (1)

Country Link
CN (1) CN113836357B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101122629B1 (en) * 2011-11-18 2012-03-09 김춘기 Method for creation of xml document using data converting of database
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN106096024A (en) * 2016-06-24 2016-11-09 北京京东尚科信息技术有限公司 The appraisal procedure of address similarity and apparatus for evaluating
CN112347222A (en) * 2020-10-22 2021-02-09 中科曙光南京研究院有限公司 Method and system for converting non-standard address into standard address based on knowledge base reasoning
CN112988755A (en) * 2021-04-14 2021-06-18 北京商越网络科技有限公司 Automatic value selection method, device and equipment for associated data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678708B (en) * 2013-12-30 2017-01-18 小米科技有限责任公司 Method and device for recognizing preset addresses
CN107239442A (en) * 2017-05-09 2017-10-10 北京京东金融科技控股有限公司 A kind of method and apparatus of calculating address similarity
CN110147418B (en) * 2019-04-18 2022-04-29 厦门市美亚柏科信息股份有限公司 Method and system for judging whether address is standardized or not and address is standardized
CN111966766A (en) * 2020-02-18 2020-11-20 上海寻梦信息技术有限公司 Address information detection method, system, electronic device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101122629B1 (en) * 2011-11-18 2012-03-09 김춘기 Method for creation of xml document using data converting of database
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN106096024A (en) * 2016-06-24 2016-11-09 北京京东尚科信息技术有限公司 The appraisal procedure of address similarity and apparatus for evaluating
CN112347222A (en) * 2020-10-22 2021-02-09 中科曙光南京研究院有限公司 Method and system for converting non-standard address into standard address based on knowledge base reasoning
CN112988755A (en) * 2021-04-14 2021-06-18 北京商越网络科技有限公司 Automatic value selection method, device and equipment for associated data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多源地名地址数据融合更新技术方法研究;马春林;《经纬天地》;20200428(第02期);第23-26页 *

Also Published As

Publication number Publication date
CN113836357A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN106126630B (en) A kind of collection of business object, searching method and device
CN105027115B (en) Inquiry to document and index
CN108846133B (en) Block chain storage structure based on B-M tree, B-M tree establishment algorithm and search algorithm
CN102841904B (en) A kind of searching method and equipment
US10657541B2 (en) Determining and using brand information in electronic commerce
CN104281664B (en) Distributed figure computing system data segmentation method and system
CN111522989B (en) Method, computing device, and computer storage medium for image retrieval
CN111639253B (en) Data weight judging method, device, equipment and storage medium
US9342812B2 (en) Taxonomy based database partitioning
CN112347377A (en) IP address field searching method, service scheduling method, device and electronic equipment
CN111680489A (en) Target text matching method and device, storage medium and electronic equipment
CN112435087A (en) Part commodity searching method, device, equipment and storage medium
CN113836357B (en) Address database data processing method and control system based on text similarity calculation
CN110490748A (en) Item recommendation method and device based on order
US8463799B2 (en) System and method for consolidating search engine results
JPWO2007004521A1 (en) Marker specifying device and marker specifying method
CN108874873A (en) Data query method, apparatus, storage medium and processor
CN105159921A (en) Method and apparatus for de-duplicating point-of-interest (POI) data in map
JP4787597B2 (en) Similar product data search device and search method
CN110221778A (en) Processing method, system, storage medium and the electronic equipment of hotel's data
CN112232903B (en) Business object display method and device
CN110188274B (en) Search error correction method and device
CN112035432B (en) Data replacement migration method and device and computer equipment
US11734242B1 (en) Architecture for resolution of inconsistent item identifiers in a global catalog
CN111506756B (en) Method and system for searching similar pictures, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant