CN112835922A - Address division classification method, system, device and storage medium - Google Patents

Address division classification method, system, device and storage medium Download PDF

Info

Publication number
CN112835922A
CN112835922A CN202110126046.8A CN202110126046A CN112835922A CN 112835922 A CN112835922 A CN 112835922A CN 202110126046 A CN202110126046 A CN 202110126046A CN 112835922 A CN112835922 A CN 112835922A
Authority
CN
China
Prior art keywords
address
level
division
classification
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110126046.8A
Other languages
Chinese (zh)
Other versions
CN112835922B (en
Inventor
刘成亮
周筠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xunmeng Information Technology Co Ltd
Original Assignee
Shanghai Xunmeng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xunmeng Information Technology Co Ltd filed Critical Shanghai Xunmeng Information Technology Co Ltd
Priority to CN202110126046.8A priority Critical patent/CN112835922B/en
Priority claimed from CN202110126046.8A external-priority patent/CN112835922B/en
Publication of CN112835922A publication Critical patent/CN112835922A/en
Application granted granted Critical
Publication of CN112835922B publication Critical patent/CN112835922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Computational Linguistics (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Mathematical Physics (AREA)
  • Remote Sensing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an address division classification method, a system, equipment and a storage medium, wherein the method comprises the following steps: acquiring address data input by a user; according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data; extracting an address feature vector of the input address text, and inputting a partition prediction model, wherein the partition prediction model is configured to classify partitions based on the address feature vector; and acquiring a division classification result output by the region prediction model. The method and the system convert the classification of the divisions into the classification of texts, realize the accurate prediction of the administrative divisions, and fill up the missing information in the receiving address or correct part of the information.

Description

Address division classification method, system, device and storage medium
Technical Field
The present invention relates to the field of logistics data processing technologies, and in particular, to a method, a system, a device, and a storage medium for classifying address partitions.
Background
In a logistics scene, the following phenomena exist in the goods receiving address filled by a user: administrative divisions do not correspond to address details; multiple sets of administrative divisions, namely, a user pulls down a control to select one set, address details are filled in one set, and the two sets of divisions are possibly inconsistent; administrative divisions are missing, such as selecting "other districts" in the county.
The logistics one-segment code calculation is influenced by errors or loss of administrative divisions of the receiving address of the user, so that the user package cannot be normally circulated, the user cannot receive the package in time, and finally the shopping experience of the user is damaged. In addition, other logistics scenarios also depend on user address administrative division information, for example, an express company carries out logistics charge pricing according to a receiving address administrative division; e-commerce/express companies define the range of warehouse delivery areas and the like by using administrative divisions. Under these logistics scenarios, if the administrative division of the receiving address of the user is wrong or missing, the normal implementation of the corresponding service is affected.
Disclosure of Invention
The present invention aims to provide an address partition classification method, system, device and storage medium, which converts partition classification into text classification problem, and realizes accurate prediction of administrative partition, so as to fill up missing information in a receiving address or correct partial information.
The embodiment of the invention provides an address division classifying method, which comprises the following steps:
acquiring address data input by a user;
according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data;
extracting an address feature vector of the input address text, and inputting a partition prediction model, wherein the partition prediction model is configured to classify partitions based on the address feature vector;
and acquiring a division classification result output by the region prediction model.
In some embodiments, the obtaining address data input by the user includes:
providing an address input page to a user, the address input page including a compartment selection part and a detailed address input part;
acquiring the zone information selected by the user in the zone selection part; and acquiring detailed address information input by a user at the detailed address input part.
In some embodiments, the information extraction rules include extraction rules for individual address elements in the address data;
according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data, the method comprises the following steps:
splitting the address data into a plurality of address elements, and determining the category of each address element;
determining whether the address element needs to be extracted or not according to the preset incidence relation between the category of each address element and the division classification result;
and extracting the address elements which are determined to be extracted from the address data to obtain an input address text.
In some embodiments, the trained compartment prediction model includes a plurality of compartment prediction models, the compartment prediction models respectively correspond to compartment types at a first level, and the compartment classification result includes compartment types at a second level, the first level being higher than the second level.
In some embodiments, the first level of the division type is a prefecture level administrative district, and the second level of the division type is a prefecture level administrative district.
In some embodiments, the input compartment prediction model comprises the steps of:
determining a first level of compartment type in the address data;
selecting a corresponding compartment prediction model according to the determined compartment type of the first level;
and inputting the address data into the corresponding zone prediction model.
In some embodiments, training the compartment prediction model further comprises:
for each first-level region type, acquiring a corresponding sample address text;
respectively adding classification labels of a second grade to the sample address texts, and then adding corresponding training sets of a first grade;
and training the corresponding division prediction model by adopting the training set.
In some embodiments, the training of the corresponding compartment prediction model includes the following steps:
for each round of training, calculating an evaluation index according to the division classification result output by the division prediction model and the corresponding classification label;
and optimally training the compartment prediction model based on the evaluation index.
In some embodiments, the compartment prediction model is a support vector machine model, and the evaluation indicators include precision rate and hit rate.
In some embodiments, the obtaining, for each of the first level partition types, a corresponding sample address text includes:
determining the partition types of the second levels included in each partition type of the first level;
and acquiring a plurality of sample address texts corresponding to the partition types of the first level, wherein the sample address texts comprise sample address texts corresponding to the partition types of the second level.
In some embodiments, after the collecting the plurality of sample address texts corresponding to the respective first levels, the method further includes the following steps:
changing at least part of texts in the collected multiple sample address texts based on a preset address change rule to obtain changed address information;
and adding a classification label of a second grade to the changed address information, and then adding the changed address information into a corresponding training set.
In some embodiments, the obtaining, for each of the first level partition types, a corresponding sample address text includes:
for each first-level division type, collecting sample address data, extracting address elements in the sample address data, and determining the category of the address elements;
determining the incidence relation between the category of the address element and the classification label of the second level;
and determining address elements needing to be extracted in the sample address data according to the incidence relation, and extracting a sample address text from the sample address data.
In some embodiments, after obtaining the partition classification result output by the partition prediction model as the partition type of the second level corresponding to the address data, the method further includes the following steps:
judging whether the address data input by the user has the zoning information with the same level as the zoning classification result;
if not, the zone classification result is added to the address data.
In some embodiments, after determining whether the address data input by the user has the zone information of the same level as the zone classification result, the method further includes the following steps;
if yes, comparing whether the zone information of the same level in the address data is consistent with the zone classification result;
and if the address data is inconsistent with the zone classification result, replacing the zone information of the same level in the address data with the zone classification result.
The embodiment of the present invention further provides an address division classification system, which is applied to the address division classification method, and the system includes:
the data acquisition module is used for acquiring address data input by a user;
the text extraction module is used for determining address elements needing to be extracted in the address data according to a preset information extraction rule and extracting an input address text from the address data;
the model input module is used for extracting an address feature vector of the input address text and inputting a partition prediction model, and the partition prediction model is configured to classify partitions based on the address feature vector;
and the division classification module is used for acquiring the division classification result output by the region prediction model.
An embodiment of the present invention further provides an address division classifying device, including:
a processor;
a memory having stored therein executable instructions of the processor;
wherein the processor is configured to perform the steps of the address compartment classification method via execution of the executable instructions.
An embodiment of the present invention further provides a computer-readable storage medium, which is used for storing a program, and when the program is executed by a processor, the method for classifying address partitions is implemented.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
The address division classification method, system, device and storage medium of the invention have the following beneficial effects:
the method converts the division classification into the text classification problem, realizes the accurate and quick prediction classification of the administrative division by performing input address text extraction and address characteristic vector extraction of the text on the address data and inputting the address characteristic vector into the trained division prediction model so as to fill up the missing information in the receiving address or correct part of the information, can effectively improve the logistics performance service quality of logistics companies and e-commerce platforms, and is beneficial to improving the use experience of users.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings.
Fig. 1 is a flowchart of an address partition classification method according to an embodiment of the present invention;
FIG. 2 is a flow chart of extracting input address text according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a feature input compartment prediction model according to an embodiment of the invention;
FIG. 4 is a flow diagram of training a compartment prediction model according to an embodiment of the invention;
FIG. 5 is a flow chart of correcting the address data according to one embodiment of the invention;
fig. 6 is a schematic structural diagram of an address partition classification system according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an address partition classification device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
As shown in fig. 1, an embodiment of the present invention provides an address partition classification method, including the following steps:
s100: acquiring address data input by a user;
s200: according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data; namely, the input address text is obtained by combining address elements needing to be extracted;
s300: extracting an address feature vector of the input address text, and inputting a partition prediction model, wherein the partition prediction model is configured to classify partitions based on the address feature vector;
s400: and acquiring a division classification result output by the region prediction model.
The address division classification method comprises the steps of firstly obtaining address data input by a user through step S100, then carrying out input address text extraction on the address data through step S200, converting division classification into a text classification problem, extracting address feature vectors of the text through step S300, inputting the address feature vectors into a trained division prediction model, and obtaining division classification results through step S400, so that accurate and rapid prediction classification of administrative divisions is realized, and missing information in a receiving address is filled or partial information is corrected.
The address division classification method can be applied to a server of a logistics platform, and division classification is carried out on the basis of address data after the logistics platform obtains the address data. In another alternative embodiment, the address zoning classification method may also be applied to a server of an e-commerce platform, and after acquiring address data, the e-commerce platform performs zoning classification, and after completing or correcting the address data, provides the address data to a logistics platform. In yet another alternative embodiment, the address zoning classification method may also be applied to a single server, and address data may be acquired from a logistics platform or an e-commerce platform, subjected to zoning classification, and provided to the logistics platform or the e-commerce platform after being complemented or corrected. In other alternative embodiments, the address compartment classification method may also be applied to other types of devices, such as notebooks, desktops, user terminals, and the like.
In this embodiment, the step S100: acquiring address data input by a user, comprising:
providing an address input page to a user, the address input page including a compartment selection part and a detailed address input part; the section selection part can provide a drop-down box for the user to select the corresponding section information, the detailed address input part can provide an input box for the user to input specific characters, wherein the detailed address refers to address information except the section information in the address, and the detailed address comprises information such as road name, building name, house number and the like;
acquiring the zone information selected by the user in the zone selection part; and acquiring detailed address information input by a user at the detailed address input part.
Here, it is only one way of obtaining address data. In other alternative embodiments, when the address data input by the user is acquired in step S100, the provided address input page may only provide an address input box, and the user inputs all address information, or may collect a photographed image of a business card of the user or other text images, perform character recognition on the image, acquire the address information, and the like, which all fall within the protection scope of the present invention.
The address division classification method can be applied to classifying administrative divisions of various levels. Administrative divisions are generally divided into three levels: a first-level provincial administrative district, a second-level local administrative district and a third-level county administrative district. The contents included in the address data are detailed addresses, except for the three-level zone categories. The three levels include the following:
the first-level provincial divisions comprise provinces, autonomous regions, direct municipalities and special administrative regions.
The secondary region division comprises region cities, regions, autonomous states and allies.
The three-level county level divisions include prefectures, counties, autonomous counties, flags, autonomous flags, forest zones and special areas.
This embodiment will be described by taking as an example the case where a three-level county-level section is classified by prediction using a section prediction model. It is to be understood that the present invention is not limited thereto, and may be applied to prediction of a region of another level in another embodiment.
In this embodiment, the information extraction rule includes a category of the address element that needs to be extracted. The step S200: according to a preset information extraction rule, determining address elements needing to be extracted from the address data, extracting an input address text from the address data, namely extracting texts of the category of the address elements predefined by the information extraction rule from the address data, and combining the texts to obtain the input address text.
Further, the information extraction rule includes an extraction rule for each address element in the address data. As shown in fig. 2, the step S200: according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data, the method comprises the following steps:
s210: splitting the address data into a plurality of address elements, and determining the category of each address element;
for example, "Shanghai Fanyuan 523-533 Jinhongqiao International center" is split into a plurality of address elements: "Shanghai city", "Lou Shanguan road", "523 + 533 number" and "Jinhong bridge international center", wherein the category of "Shanghai city" is provincial division, the category of "Lou Shanguan road" is road, the category of "523 + 533 number" is house number, and the category of "Jinhong bridge international center" is office building;
s220: determining whether the address element needs to be extracted or not according to the preset incidence relation between the category of each address element and the division classification result;
for example, the association relationship here indicates the magnitude of the influence strength of the category of each address element on the segment classification result, that is, the stronger the association relationship, the greater the influence strength of the category of the address element on the segment classification result. The association relationship between the preset category of each address element and the division classification result can be defined in advance according to address geographic characteristic analysis;
the information extraction rule may further include selecting an address element having a strong association relationship. In application, the association relationship may include an association degree value between a category of each address element and a compartment classification result, and the information extraction rule includes taking address elements with an association program value higher than a preset degree threshold as address elements to be extracted; in another embodiment, the association relationship between the category of the address element and the three-level county-level region may be simply divided into several types: the method comprises the steps of strong association, weak association and no association, wherein when address elements are extracted, the strong association address elements are extracted first, and when the strong association address elements are insufficient in information, whether the weak association address elements are extracted or not can be considered;
for example, if the program value associated with the preset provincial level region and the third-level county level region is low, the address elements of the provincial level region are not extracted, and if the program value associated with the information such as roads, house numbers, office buildings, and the like and the third-level county level region is high, the address elements of these categories need to be extracted;
for another example, in some chain malls, considering that the chain malls may be distributed in a plurality of regions, the degree of influence on the classification of the three-level county-level divisions is small, that is, the association degree value is small, and the address elements of the category of the chain malls may be used as the unextracted address elements. For some regional identification buildings, because the influence degree of the regional identification buildings on the classification of the three-level county-level divisions is large, namely the association degree value is large, the address elements of the category of the identification buildings can also be used as the extracted address elements;
s230: and extracting the address elements which are determined to be extracted from the address data to obtain an input address text, thereby realizing the extraction of the input address text by extracting the address text which has larger influence on the three-level county-level divisions according to the text information of the address data by combining the address geographic characteristics.
In this embodiment, the trained compartment prediction model includes a plurality of compartment prediction models. In one embodiment, the section prediction model may be an overall section prediction model, that is, each of a plurality of secondary section prediction models predicts the next tertiary county-level section using one section prediction model. In another embodiment, the zone prediction model may include a plurality of zone prediction models corresponding to the secondary zone, and the zone prediction model and the secondary zone may be in a one-to-one relationship or a one-to-many relationship. In still another embodiment, the section prediction model may include a plurality of section prediction models corresponding to the first-level section, and the section prediction model and the first-level section may be in a one-to-one relationship or a one-to-many relationship. In other alternative embodiments, when the compartment prediction model is used to predict the classification result of the secondary zone, the compartment prediction model may also include a plurality of compartment prediction models corresponding to the primary zone.
The following describes an implementation of the address compartment classification method according to the present invention by taking the one-to-one correspondence between the compartment prediction model and the two-level ground-level compartments as an example, for example, the Qingdao city corresponds to a compartment prediction model for predicting three-level county-level compartments corresponding to address data in the Qingdao city, the Shenzhen city corresponds to a compartment prediction model, and the Hangzhou city corresponds to a compartment prediction model. It is to be understood that the invention is not so limited.
In this embodiment, in step S300, extracting the address feature vector of the input address text may be implemented by mapping text to word vector, for example, may be implemented by using word2vec or other similar models. As shown in fig. 3, the step S300 of inputting the compartment prediction model includes the steps of:
s310: determining a first level of compartment type in the address data, where the first level of compartment type corresponds to a secondary compartment type;
s320: selecting a corresponding zone prediction model according to the determined zone type of the first level, namely selecting the zone prediction model corresponding to the secondary zone type;
s330: and inputting the address data into the corresponding zone prediction model.
In this embodiment, the compartment prediction model is divided into a plurality of compartments, and the output types, i.e., the three-level compartment types, corresponding to each compartment prediction model are not too large in number. It is considered that the secondary partition information in the address data is generally more accurate. When the three-level region is classified, the two-level regions are classified based on the address data, and then the two-level regions are input into the corresponding region prediction model, so that the interference of data in other two-level region types is eliminated, the data analysis amount is greatly reduced, and the accuracy and the speed of model prediction can be improved. In addition, each region prediction model is more targeted during training, and only sample address data in the region prediction model needs to be collected for training.
As shown in fig. 4, in this embodiment, the address compartment classification method further includes training the compartment prediction model by using the following steps:
s510: for each first-level division type, obtaining a corresponding sample address text, namely obtaining a sample address text corresponding to each second-level division;
s520: after adding classification labels of a second level to the sample address texts respectively, adding corresponding training sets of a first level, namely adding labels of three-level county level divisions to each sample address text;
s530: and training the corresponding division prediction model by adopting the training set.
In this embodiment, the step S530: training the corresponding compartment prediction model, including iteratively training the corresponding compartment prediction model, specifically including the following steps:
for each round of training, calculating an evaluation index according to the division classification result output by the division prediction model and the corresponding classification label;
and optimally training the compartment prediction model based on the evaluation index.
In this embodiment, the compartment prediction model is a support vector machine model, and the evaluation index includes a precision rate and a hit rate. Precision here refers to precision, which is equal to the amount of accuracy of the prediction for a certain category/(the amount of accuracy of prediction for the category and the amount of misjudgment as the category). Hit rate refers to accuracy, which is equal to the number of correctly predicted results/total number of samples.
In other alternative embodiments, the compartment prediction model may also use other types of classification models, for example, a convolutional neural network based on deep learning, or a decision tree, and the evaluation index thereof may also be other types of indexes, such as a loss function, and the like, and all of them are within the scope of the present invention.
In this embodiment, the step S510: for each first-level region type, acquiring a corresponding sample address text, comprising the following steps:
for each first-level division type, determining each second-level division type included in the first-level division type, for example, for a second-level division Qingdao city, determining the name of each county included in the second-level division type, that is, the name corresponds to each third-level county division;
and acquiring a plurality of sample address texts corresponding to the partition types of the first levels, wherein the sample address texts comprise sample address texts corresponding to the partition types of the second levels, namely providing at least three sample address texts of each county-level partition for realizing the prediction classification of the three-level county-level partitions.
In this embodiment, the step S100: for each first-level region type, acquiring a corresponding sample address text, comprising the following steps:
for each first-level division type, collecting sample address data, extracting address elements in the sample address data, and determining the category of the address elements;
determining the incidence relation between the category of the address element and the classification label of the second level;
specifically, the relationship here is the degree of influence of the category corresponding to the above-described address element on the classification of the three-level county-level section. Here, the determining of the association relationship between the category of the address element and the classification tag at the second level may be determining an association degree value between the category of the address element and the third-level county-level division, or may be simply dividing the association relationship between the category of the address element and the third-level county-level division into several types: strong association, weak association, no association.
And determining address elements needing to be extracted in the sample address data according to the incidence relation, and extracting a sample address text from the sample address data.
For example, when the association relationship is an association degree value, the address elements with the association degree value higher than a preset association degree threshold value are extracted. The incidence relation between the category of the address element and the three-level county-level region is divided into several types: when the address elements are strongly associated, weakly associated, or not associated, the strongly associated address elements are first extracted, and when the strongly associated address elements are insufficient, whether the weakly associated address elements are extracted or not can be considered. Namely, when the sample address text is extracted, the selection of the address elements is consistent with the selection of the address elements when the training of the partition prediction model is completed and the partition prediction model is put into practical application for extracting the input address text.
In this embodiment, the step S510: after a plurality of sample address texts corresponding to each first class are collected, further sample enhancement can be performed to improve the training effect of the compartment prediction model, and specifically, the sample enhancement includes the following steps:
changing at least part of texts in the collected multiple sample address texts based on a preset address change rule to obtain changed address information, for example, for a sample address text, changing a house number in the sample address text to ensure that the changed sample address text still belongs to the same three-level county-level division, wherein the changed sample address text is an extended sample of the sample address text before the change;
and adding a classification label of a second grade to the changed address information, and then adding the changed address information into a corresponding training set.
As shown in fig. 5, in this embodiment, after obtaining the partition classification result output by the partition prediction model as the partition type of the second level corresponding to the address data, the method further includes correcting the address data, specifically, correcting the address data includes the following steps:
s610: judging whether the address data input by the user has the zoning information with the same level as the zoning classification result;
s620: if the address data does not exist, for example, the user misses three-level county-level region information when inputting the address data, or selects "other regions" when selecting three-level county-level regions, the region classification result is added to the address data, namely, the region classification result plays a role in completing the address data;
s630: if yes, comparing whether the zone information of the same level in the address data is consistent with the zone classification result;
s640: if the two-level address data are consistent, the three-level county-level zoning information in the original address data is accurate, and the address data is not corrected;
s650: and if the address data is inconsistent with the area classification result, replacing the area information at the same level in the address data with the area classification result, for example, if the area at the third level in the address data is divided into an area A and the area classification result is an area B, replacing the area A in the address data with the area B, namely, the area classification result has the function of error correction on the address data.
As shown in fig. 6, an embodiment of the present invention further provides an address partition classification system, which is applied to the address partition classification method, and the system includes:
a data obtaining module M100, configured to obtain address data input by a user;
the text extraction module M200 is configured to determine, according to a preset information extraction rule, an address element to be extracted in the address data, and extract an input address text from the address data;
a model input module M300, configured to extract an address feature vector of the input address text, and input a compartment prediction model, where the compartment prediction model is configured to perform compartment classification based on the address feature vector;
and the compartment classification module M400 is configured to obtain a compartment classification result output by the region prediction model.
The address division classification system firstly acquires address data input by a user through the data acquisition module M100, then performs input address text extraction on the address data through the text extraction module M200, converts division classification into a text classification problem, extracts address feature vectors of the text through the model input module M300, inputs the address feature vectors into a trained division prediction model, and acquires division classification results through the division classification module M400, so that accurate and rapid prediction classification of administrative divisions is realized, and missing information in a receiving address is filled or partial information is corrected.
The address zoning classification system can be deployed in a server of a logistics platform, and after the logistics platform obtains the address data, zoning classification is carried out based on the address data. In another alternative embodiment, the address zoning classification system may also be deployed in a server of an e-commerce platform, and after acquiring the address data, the e-commerce platform performs zoning classification, and after completing or correcting the address data, provides the address data to the logistics platform. In yet another alternative embodiment, the address zoning and classification system may also be deployed in a single server, and the address data may be acquired from the logistics platform or the e-commerce platform, subjected to zoning and classification, and provided to the logistics platform or the e-commerce platform after being complemented or corrected. In other alternative embodiments, the address compartment classification system may also be applied to other types of devices, such as notebooks, desktops, user terminals, and the like.
In this embodiment, the data obtaining module M100 obtains address data input by a user, including: providing an address input page to a user, the address input page including a compartment selection part and a detailed address input part; the section selection part can provide a drop-down box for the user to select the corresponding section information, the detailed address input part can provide an input box for the user to input specific characters, wherein the detailed address refers to address information except the section information in the address, and the detailed address comprises information such as road name, building name, house number and the like; acquiring the zone information selected by the user in the zone selection part; and acquiring detailed address information input by a user at the detailed address input part.
In this embodiment, the text extraction module M200 determines, according to a preset information extraction rule, an address element that needs to be extracted from the address data, and extracts an input address text from the address data, including: splitting the address data into a plurality of address elements, and determining the category of each address element; determining whether the address element needs to be extracted or not according to the preset incidence relation between the category of each address element and the division classification result; and extracting the address elements which are determined to be extracted from the address data to obtain an input address text, thereby realizing the extraction of the input address text by extracting the address text which has larger influence on the three-level county-level divisions according to the text information of the address data by combining the address geographic characteristics.
In this embodiment, the zone prediction models are in one-to-one correspondence with the secondary level zones, and each zone prediction model is used to predict a tertiary level zone of address data corresponding to the corresponding secondary level zone, but the present invention is not limited thereto. In this embodiment, the model input module M300 extracts the address feature vector of the input address text, which may be implemented by mapping text to word vector, for example, by using word2vec and other similar models. The model input module M300 inputs the address feature vector into the compartment prediction model, including: determining a first level of compartment type in the address data, where the first level of compartment type corresponds to a secondary compartment type; selecting a corresponding zone prediction model according to the determined zone type of the first level, namely selecting the zone prediction model corresponding to the secondary zone type; and inputting the address data into the corresponding zone prediction model.
In this embodiment, the address compartment classification system further includes a model training module, configured to train the compartment prediction model, specifically, the model training module is configured to, for each first-level compartment type, obtain a corresponding sample address text, that is, obtain a sample address text corresponding to each second-level compartment; respectively adding classification labels of a second grade to the sample address texts, and then adding corresponding training sets of a first grade; and training the corresponding division prediction model by adopting the training set.
The specific implementation method of the model training module to train the compartment prediction model may adopt the specific implementation of the steps S510 to S530, and is not described herein again.
Further, in this embodiment, the address compartment classification system may further include an address correction module configured to correct the address data. Specifically, correcting the address data includes: judging whether the address data input by the user has the zoning information with the same level as the zoning classification result; if the address data does not exist, adding the zone classification result into the address data, namely the zone classification result plays a complementary role in the address data; if yes, comparing whether the zone information of the same level in the address data is consistent with the zone classification result; if the two-level address data are consistent, the three-level county-level zoning information in the original address data is accurate, and the address data is not corrected; and if the address data is inconsistent with the zone classification result, replacing the zone information of the same level in the address data with the zone classification result.
The embodiment of the invention also provides address division classification equipment, which comprises a processor; a memory having stored therein executable instructions of the processor; wherein the processor is configured to perform the steps of the address compartment classification method via execution of the executable instructions.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" platform.
An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 600 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one storage unit 620, a bus 630 that connects the various system components (including the storage unit 620 and the processing unit 610), a display unit 640, and the like.
Wherein the storage unit stores program code that is executable by the processing unit 610 to cause the processing unit 610 to perform steps according to various exemplary embodiments of the present invention described in the address compartment classification method section above in this specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.
The memory unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 600, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 600 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
In the address compartment classification device, the program stored in the memory implements the steps of the address compartment classification method when executed by the processor, and therefore the computer storage medium can also achieve the technical effects of the address compartment classification method.
An embodiment of the present invention further provides a computer-readable storage medium, which is used for storing a program, and when the program is executed by a processor, the method for classifying address partitions is implemented. In some possible embodiments, aspects of the present invention may also be implemented in the form of a program product including program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present invention described in the address compartment classification method section above of this specification when the program product is executed on the terminal device.
Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be executed on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The program stored in the computer storage medium realizes the steps of the address compartment classification method when executed by a processor, and therefore the computer storage medium can also achieve the technical effects of the address compartment classification method.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (17)

1. A method for address partition prediction, the method comprising:
acquiring address data input by a user;
according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data;
extracting an address feature vector of the input address text, and inputting a partition prediction model, wherein the partition prediction model is configured to classify partitions based on the address feature vector;
and acquiring a division classification result output by the region prediction model.
2. The address division classifying method according to claim 1, wherein the acquiring address data input by a user includes:
providing an address input page to a user, the address input page including a compartment selection part and a detailed address input part;
acquiring the zone information selected by the user in the zone selection part; and acquiring detailed address information input by a user at the detailed address input part.
3. The address division classifying method according to claim 1, wherein the information extraction rule includes an extraction rule for each address element in address data;
according to a preset information extraction rule, determining address elements needing to be extracted in the address data, and extracting an input address text from the address data, the method comprises the following steps:
splitting the address data into a plurality of address elements, and determining the category of each address element;
determining whether the address element needs to be extracted or not according to the preset incidence relation between the category of each address element and the division classification result;
and extracting the address elements which are determined to be extracted from the address data to obtain an input address text.
4. The address partition classification method according to claim 1, wherein the trained partition prediction models include a plurality of partition prediction models, the partition prediction models respectively correspond to partition types at a first level, and the partition classification result includes partition types at a second level, and the first level is higher than the second level.
5. The address division classifying method according to claim 4, wherein the first level division type is a prefecture level administrative district, and the second level division type is a prefecture level administrative district.
6. The address partition classification method of claim 4, wherein the input partition prediction model comprises the steps of:
determining a first level of compartment type in the address data;
selecting a corresponding compartment prediction model according to the determined compartment type of the first level;
and inputting the address data into the corresponding zone prediction model.
7. The address partition classification method of claim 4, further comprising training the partition prediction model using the steps of:
for each first-level region type, acquiring a corresponding sample address text;
respectively adding classification labels of a second grade to the sample address texts, and then adding corresponding training sets of a first grade;
and training the corresponding division prediction model by adopting the training set.
8. The address partition classification method of claim 7, wherein the training of the corresponding partition prediction model comprises:
for each round of training, calculating an evaluation index according to the division classification result output by the division prediction model and the corresponding classification label;
and optimally training the compartment prediction model based on the evaluation index.
9. The address partition classification method according to claim 8, wherein the partition prediction model is a support vector machine model, and the evaluation index includes a precision rate and a hit rate.
10. The address division classifying method according to claim 7, wherein the step of obtaining the corresponding sample address text for each of the first level division types comprises:
determining the partition types of the second levels included in each partition type of the first level;
and acquiring a plurality of sample address texts corresponding to the partition types of the first level, wherein the sample address texts comprise sample address texts corresponding to the partition types of the second level.
11. The address division classifying method according to claim 10, wherein after the collecting of the plurality of sample address texts corresponding to the respective first classes, further comprising the steps of:
changing at least part of texts in the collected multiple sample address texts based on a preset address change rule to obtain changed address information;
and adding a classification label of a second grade to the changed address information, and then adding the changed address information into a corresponding training set.
12. The address division classifying method according to claim 7, wherein the step of obtaining the corresponding sample address text for each of the first level division types comprises:
for each first-level division type, collecting sample address data, extracting address elements in the sample address data, and determining the category of the address elements;
determining the incidence relation between the category of the address element and the classification label of the second level;
and determining address elements needing to be extracted in the sample address data according to the incidence relation, and extracting a sample address text from the sample address data.
13. The address division classifying method according to claim 1, further comprising, after acquiring the division classification result output by the division prediction model as the division type of the second level corresponding to the address data, the steps of:
judging whether the address data input by the user has the zoning information with the same level as the zoning classification result;
if not, the zone classification result is added to the address data.
14. The address division classifying method according to claim 13, further comprising the step of determining whether or not there is the same level of division information as the division classification result in the address data input by the user;
if yes, comparing whether the zone information of the same level in the address data is consistent with the zone classification result;
and if the address data is inconsistent with the zone classification result, replacing the zone information of the same level in the address data with the zone classification result.
15. An address compartment classification system applied to the address compartment classification method according to any one of claims 1 to 14, the system comprising:
the data acquisition module is used for acquiring address data input by a user;
the text extraction module is used for determining address elements needing to be extracted in the address data according to a preset information extraction rule and extracting an input address text from the address data;
the model input module is used for extracting an address feature vector of the input address text and inputting a partition prediction model, and the partition prediction model is configured to classify partitions based on the address feature vector;
and the division classification module is used for acquiring the division classification result output by the region prediction model.
16. An electronic device, characterized in that the electronic device comprises:
a processor;
memory having stored thereon a computer program which, when executed by the processor, executes the address compartment classification system according to any one of claims 1 to 14.
17. A computer storage medium, in which a computer program is stored, which, when executed by a processor, executes the address compartment classification system according to any one of claims 1 to 14.
CN202110126046.8A 2021-01-29 Address division classification method, system, equipment and storage medium Active CN112835922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110126046.8A CN112835922B (en) 2021-01-29 Address division classification method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110126046.8A CN112835922B (en) 2021-01-29 Address division classification method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112835922A true CN112835922A (en) 2021-05-25
CN112835922B CN112835922B (en) 2024-07-02

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272053A (en) * 2023-11-22 2023-12-22 杭州中房信息科技有限公司 Method for generating address data set with few samples, address matching method, medium and equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012168892A (en) * 2011-02-16 2012-09-06 Shigenori Tanaka Grouping device and element extraction device
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN111325022A (en) * 2018-11-28 2020-06-23 北京京东尚科信息技术有限公司 Method and device for identifying hierarchical address
CN111523433A (en) * 2020-04-17 2020-08-11 上海中通吉网络技术有限公司 Express mail terminal address standardization processing method, device and equipment
CN111625732A (en) * 2020-05-25 2020-09-04 鼎富智能科技有限公司 Address matching method and device
WO2020233332A1 (en) * 2019-05-20 2020-11-26 深圳壹账通智能科技有限公司 Text structured information extraction method, server and storage medium
CN112184350A (en) * 2019-07-04 2021-01-05 ***通信集团江西有限公司 User order processing method and device, storage medium and server

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012168892A (en) * 2011-02-16 2012-09-06 Shigenori Tanaka Grouping device and element extraction device
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN105224622A (en) * 2015-09-22 2016-01-06 中国搜索信息科技股份有限公司 The place name address extraction of Internet and standardized method
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN111325022A (en) * 2018-11-28 2020-06-23 北京京东尚科信息技术有限公司 Method and device for identifying hierarchical address
WO2020233332A1 (en) * 2019-05-20 2020-11-26 深圳壹账通智能科技有限公司 Text structured information extraction method, server and storage medium
CN112184350A (en) * 2019-07-04 2021-01-05 ***通信集团江西有限公司 User order processing method and device, storage medium and server
CN111523433A (en) * 2020-04-17 2020-08-11 上海中通吉网络技术有限公司 Express mail terminal address standardization processing method, device and equipment
CN111625732A (en) * 2020-05-25 2020-09-04 鼎富智能科技有限公司 Address matching method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272053A (en) * 2023-11-22 2023-12-22 杭州中房信息科技有限公司 Method for generating address data set with few samples, address matching method, medium and equipment
CN117272053B (en) * 2023-11-22 2024-02-23 杭州中房信息科技有限公司 Method for generating address data set with few samples, address matching method, medium and equipment

Similar Documents

Publication Publication Date Title
CN109978619B (en) Method, system, equipment and medium for screening air ticket pricing strategy
CN113064964A (en) Text classification method, model training method, device, equipment and storage medium
CN109598566A (en) Lower list prediction technique, device, computer equipment and computer readable storage medium
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN110688536A (en) Label prediction method, device, equipment and storage medium
CN111966730A (en) Risk prediction method and device based on permanent premises and electronic equipment
CN111967808B (en) Method, device, electronic equipment and storage medium for determining commodity circulation object receiving mode
CN111582589A (en) Car rental insurance prediction method, device, equipment and storage medium
CN111046669A (en) Interest point matching method and device and computer system
CN113010785B (en) User recommendation method and device
CN111754261B (en) Method and device for evaluating taxi willingness and terminal equipment
CN110598989B (en) Goods source quality evaluation method, device, equipment and storage medium
CN112835922B (en) Address division classification method, system, equipment and storage medium
CN112835922A (en) Address division classification method, system, device and storage medium
CN111179129A (en) Courseware quality evaluation method and device, server and storage medium
CN113487208B (en) Risk assessment method and risk assessment device
CN114579963A (en) User behavior analysis method, system, device and medium based on data mining
CN114897099A (en) User classification method and device based on passenger group deviation smooth optimization and electronic equipment
CN110717101B (en) User classification method and device based on application behaviors and electronic equipment
CN112785234A (en) Goods recommendation method, device, equipment and storage medium
CN113516398A (en) Risk equipment identification method and device based on hierarchical sampling and electronic equipment
CN114117200A (en) Resource display method and system for improving user conversion rate and electronic equipment
CN113570205A (en) API risk equipment identification method and device based on single classification and electronic equipment
CN113569929A (en) Internet service providing method and device based on small sample expansion and electronic equipment
CN112488199A (en) Logistics distribution mode prediction method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant