CN111753515A - Address information extraction and matching method for realizing entity positioning - Google Patents

Address information extraction and matching method for realizing entity positioning Download PDF

Info

Publication number
CN111753515A
CN111753515A CN202010590590.3A CN202010590590A CN111753515A CN 111753515 A CN111753515 A CN 111753515A CN 202010590590 A CN202010590590 A CN 202010590590A CN 111753515 A CN111753515 A CN 111753515A
Authority
CN
China
Prior art keywords
text
address
label
level
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010590590.3A
Other languages
Chinese (zh)
Other versions
CN111753515B (en
Inventor
曾伟英
霍智杰
霍凯亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Kejie Communication Information Technology Co ltd
Original Assignee
Guangdong Kejie Communication Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Kejie Communication Information Technology Co ltd filed Critical Guangdong Kejie Communication Information Technology Co ltd
Priority to CN202010590590.3A priority Critical patent/CN111753515B/en
Priority claimed from CN202010590590.3A external-priority patent/CN111753515B/en
Publication of CN111753515A publication Critical patent/CN111753515A/en
Application granted granted Critical
Publication of CN111753515B publication Critical patent/CN111753515B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for extracting and matching address information for realizing entity positioning comprises the steps of constructing a first conditional random field, and determining the state of the first conditional random field according to administrative level keywords; skipping the address text according to the state of the first conditional random field; dividing the address text according to the state jump, and dividing the address text into a plurality of sub-texts; adding text labels to the divided subfiles on the corresponding administrative levels; constructing and storing a label library according to the text labels; constructing a second random field comprising: adding a new text label to the second address text according to the text label; fuzzy matching is carried out on the address text, and the fuzzy matching comprises the following steps: acquiring a weight value of a text label of a label library; and matching the address text corresponding to the text label most similar to the input address text in the label library according to the weight value. The problem that data cannot be directly associated due to different writing habits of two different original data when address information is input is solved.

Description

Address information extraction and matching method for realizing entity positioning
Technical Field
The invention relates to the technical field of text matching, in particular to an address information extraction and matching method for realizing entity positioning.
Background
Geographic information is the most common social public information resource at present, is closely related to daily life of the masses, and is also a basic resource for government basic administration. The text address refers to a geographical location described by a word, such as "north aster road of sunward area of beijing city". However, when data mining work aiming at various items of data containing address text information is performed, the problem that most address information in original data is recorded irregularly is often faced, so that a bottleneck exists when correlation analysis is performed on massive address texts.
Disclosure of Invention
The invention aims to provide an address information extraction and matching method for realizing entity positioning aiming at the defects in the background technology, which realizes the label extraction conforming to the conventional understanding of massive address texts rapidly, can easily realize the association of data needing address association, and solves the problem that the data cannot be directly associated due to different writing habits of two different original data when the address information is input.
In order to achieve the purpose, the invention adopts the following technical scheme:
an address information extraction and matching method for realizing entity positioning comprises a first address text containing administrative level keywords and a second address text not containing the administrative level keywords, and specifically comprises the following steps:
constructing a first conditional random field applicable to a first address text, comprising:
determining the state of a first conditional random field according to the administrative level keywords;
skipping the address text according to the state of the first conditional random field;
dividing the address text according to the state jump, and dividing the address text into a plurality of sub-texts;
adding text labels to the divided subfiles at the corresponding administrative levels according to the successfully divided address texts;
constructing and storing a label library according to the text labels;
constructing a second random field applicable to the second address text, comprising:
adding a new text label to the second address text according to the text label in the label library;
fuzzy matching is carried out on the address text, and the fuzzy matching comprises the following steps:
acquiring a weight value of a text label of a label library;
and matching the address text corresponding to the text label most similar to the input address text in the label library according to the weight value.
Preferably, the first address texts are graded according to administrative level keywords of the first address texts, and one address text of each grade corresponds to one state of the first conditional random field;
the address texts at the same level are arranged side by side, and the address texts at the lower level are arranged behind the address texts at the higher level.
Preferably, in the state jump process, the high-level state corresponding to the high-level address text jumps to the low-level state corresponding to the low-level address text, and the jump is irreversible;
when the high-level state jumps to the low-level state, all the low-level states of the column in which the low-level state is located are passed;
the states of a single lowest level may jump to each other.
Preferably, in each jump, the address text is divided by using the administrative level keywords corresponding to the level state, and the divided address text enters the next low-level state to be divided again;
selecting a path with the most jumping times, and determining the path as an optimal segmentation path; and the jump times are not counted in the address text jumped across the level states.
Preferably, in the successfully divided address text, the sub-text and the administrative level vocabulary corresponding to each level state are used as text labels.
Preferably, a dictionary is established, the text labels are added to the dictionary according to preset rules, and the dictionary is stored as a two-dimensional data table.
Preferably, the second address text is split word by word, the split previous word and the split next word are combined, the combined words are matched in a label library, whether a text label of the combination exists or not is judged, and if yes, the combination is reserved; if not, the combination is not reserved;
after the combination is reserved, combining the combination with the next character to form a new combination, matching the new combination in a label library, judging whether a text label of the new combination exists, if so, reserving the new combination, continuing to combine the new combination with the next character, and if not, not reserving the new combination;
and so on until all split words can no longer be combined.
Preferably, after the input address text is segmented, weight statistics is performed on all text labels, each text label corresponds to a weight value, and the weight value is in direct proportion to the importance of the text label.
Preferably, the similarity between each text label in the input address text and the text label in the label library is calculated, the similarity and the weight value are weighted and averaged, and the text label in the label library with the maximum value is most similar to the input address text.
Has the advantages that:
the invention realizes the rapid label extraction conforming to the conventional understanding of massive address texts, can easily realize the association of the data needing address association, and solves the problem that the data cannot be directly associated due to different writing habits of two different original data when the address information is input.
Drawings
FIG. 1 is a block diagram of a model of one embodiment of the present invention;
FIG. 2 is a flow chart of one embodiment of the present invention;
FIG. 3 is a conditional random field state transition diagram of one embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.
The invention relates to an address information extraction and matching method for realizing entity positioning, which comprises a first address text containing administrative level keywords and a second address text not containing the administrative level keywords, as shown in figures 1 and 2, and comprises the following specific steps:
constructing a first conditional random field applicable to a first address text, comprising:
determining the state of a first conditional random field according to the administrative level keywords;
skipping the address text according to the state of the first conditional random field;
dividing the address text according to the state jump, and dividing the address text into a plurality of sub-texts;
adding text labels to the divided subfiles at the corresponding administrative levels according to the successfully divided address texts;
constructing and storing a label library according to the text labels;
constructing a second random field applicable to the second address text, comprising:
adding a new text label to the second address text according to the text label in the label library;
fuzzy matching is carried out on the address text, and the fuzzy matching comprises the following steps:
acquiring a weight value of a text label of a label library;
and matching the address text corresponding to the text label most similar to the input address text in the label library according to the weight value.
The method is mainly divided into two parts, wherein the first part is used for constructing a conditional random field for segmenting and extracting information; and the other part is the construction of a label library after the text extraction. The conditional random field is further divided into a first conditional random field and a second conditional random field which are respectively used for carrying out segmentation of two different types of texts, wherein one type of text is a first address text containing administrative level keywords and a second address text not containing the administrative level keywords, the first address text is for example 'three-way ring lake primary school of Buddha city district and green river of Buddha city of Guangdong province', and the second address text is for example 'three-way ring lake primary school of Guangdong Buddha mountain and green river'. The method comprises the steps of constructing a first conditional random field, segmenting a mass of texts by using the first conditional random field, constructing a tag library, and constructing a second conditional random field based on the tag library. When a new input text exists, the two conditional random fields are used for segmentation and extraction, fuzzy matching is carried out on the text based on a LenvenshiteinDistance algorithm, and an approximate text is returned, wherein the approximate text is the similar text which can be understood by people. The problem that address texts cannot be associated due to writing habits when the address texts belong to the same address in semantic understanding although the writing modes of the addresses input by different data are inconsistent in daily data mining work is solved.
Preferably, the first address texts are graded according to administrative level keywords of the first address texts, and one address text of each grade corresponds to one state of the first conditional random field;
the address texts at the same level are arranged side by side, and the address texts at the lower level are arranged behind the address texts at the higher level.
As shown in FIG. 3, a first condition is determined for the random field state: in china, all addresses may be divided according to administrative levels, such as autonomous districts, special administrative districts, provinces, cities, districts, counties, towns, street offices and the like, and classified according to administrative level keywords, such as "district" and "county" siblings, and therefore should be arranged in parallel, while "street offices" and "towns" siblings but lower in level than "district, county", and therefore are arranged in parallel after "district, county", and other levels form various states based on which the first condition is random. In the actual use process, states corresponding to the administrative levels can be increased or decreased according to actual requirements, and each first address text to be extracted passes through the first conditional random field.
Preferably, in the state jump process, the high-level state corresponding to the high-level address text jumps to the low-level state corresponding to the low-level address text, and the jump is irreversible;
when the high-level state jumps to the low-level state, all the low-level states of the column in which the low-level state is located are passed;
the states of a single lowest level may jump to each other.
The first condition is a state jump of the random field: the first conditional random field is characterized in that the state jump must be from front to back according to the characteristics, namely, the high-level address text jumps to the low-level administrative address text, and the high-to-low jump must go through every possible state, for example, a column in which the 'district' is located can only jump to a column in which the 'street office' is located, namely, the 'district' must only go through the 'street office' and the 'town', the 'street office' can only go through the 'committee for living' and the 'village committee' when jumping, and a loop exists only in the lowest-level state of the 'number'.
Preferably, in each jump, the address text is divided by using the administrative level keywords corresponding to the level state, and the divided address text enters the next low-level state to be divided again;
selecting a path with the most jumping times, and determining the path as an optimal segmentation path; and the jump times are not counted in the address text jumped across the level states.
In the method, each level of administration level is regarded as a state, and the jump process can only be from high level to low level and can not be reversed. And each jump divides the address text by using the 'administrative level vocabulary' corresponding to the state. And the divided subfolders are divided into a new state. If the division fails, such as a portion of the text is written directly from "city" to "street office" without writing "zone" information, the cross-level status is no longer included in the calculation. After the first conditional random field is segmented, the path with the largest jumping times is selected to be the optimal segmentation path.
For example, "three routes loop lake primary school in green view of Buddha city, Guangdong province", this address text will not only pass through the status jump route of "province", "city", "district" and "road", but also will pass through the status jumps of "municipality", "city" and "county" … …, but at the "municipality", it is not divided by the status of "municipality", so the "municipality" status is directly passed and enters "city" status. In the result of the path divided by the "autonomous region", the number of times of successful division is necessarily less than that of the path divided by the "province", so that the path divided by the "autonomous region" is not necessarily the best path.
Preferably, in the successfully divided address text, the sub-text and the administrative level vocabulary corresponding to each level state are used as text labels.
Preferably, a dictionary is established, the text labels are added to the dictionary according to preset rules, and the dictionary is stored as a two-dimensional data table.
For the successfully segmented address text, the segmented sub-text can be added with text labels on corresponding administrative boundaries, such as the address text "three-way ring lake primary school" in Buddha city, Guangdong province, and the level segmentation of "province" is that of "Guangdong" and "city" is that of "Buddha mountain".
Building a label library: establishing a blank dictionary which is a data structure, adding extracted label texts in a key value pairing mode, for example, the 'three-way ring lake primary school' of green view of Buddha city in Fushan city in Guangdong province, wherein the divided 'Guangdong' child text corresponds to the { 'text': "guangdong", "level": "province", "count": "+ 1" } (+1 means a gradual count accumulation on the original basis), and so on for the remaining tags.
And (3) expanding the tag library: and when the subsequent label text with the 'Guangdong' appears, the 'Guangdong' is counted by a key value and is increased by one, and if the label text does not appear, the label text is added as a new key value pair.
And (3) storage of the label library: and periodically storing the updated dictionary as a two-dimensional data table, wherein the two-dimensional data is very efficient in accessing internal data and is prepared for the subsequent establishment of a second random condition field.
Preferably, the second address text is split word by word, the split previous word and the split next word are combined, the combined words are matched in a label library, whether a text label of the combination exists or not is judged, and if yes, the combination is reserved; if not, the combination is not reserved;
after the combination is reserved, combining the combination with the next character to form a new combination, matching the new combination in a label library, judging whether a text label of the new combination exists, if so, reserving the new combination, continuing to combine the new combination with the next character, and if not, not reserving the new combination;
and so on until all split words can no longer be combined.
The second conditional random field is for text segmentation without filling any administrative keywords, such as "south sea of Buddha city" written as "south sea of Buddha". Therefore, the state of the second conditional random field depends on the label extracted from the first random field, and we use the label library, for such address text, split word by word, then add each word and its following words as a new subfile, jump into a new state, count the probability of each state jump in the label library, and take the most probable path as the optimal segmentation mode. It should be noted that, finally, through the best path, the corresponding administrative label of each state is found.
Splitting word by word: for example, "three-way ring lake primary school of green scene of Guangdong Buddha mountain", split: "Guang", "east" and "Buddha" … …
Initializing a combined address text: and for the disassembled Chinese character group, recombining, namely firstly merging the 'Guang' and the 'east', if the 'Guangdong' exists in the label library, keeping the combination, continuously merging to form the 'Guangdong Buddha', but the 'Guangdong Buddha' obviously does not exist in the label library, skipping the combination, continuously merging the 'Guangdong Buddha' until the whole address text is traversed, and forming the state of the second conditional random field by the combinations.
Selection of the best division: based on all the division combinations, the combination of the root of the occurrence frequency in the tag library and the maximum value of the product of the character string length is reserved, for example, 10000 times occur in the Guangdong Buddha, and 1000000 times occur in the Guangdong, so that the optimal division is still the Guangdong.
Word forming and division iteration: when the 'Guangdong' is the best combination, the subsequent other texts contain characters of the 'Guangdong', and the division is performed by default to take the 'Guangdong' as the best granularity. For the primary schools of the three-way ring lakes of the 'Guangdong Fushan green scenery', after the Guangdong is divided, the best division and selection are continuously carried out on the Chinese character groups behind the 'Guangdong' until the Chinese character groups can not be combined any more, the rest Chinese characters form new combinations, and the new combinations are added into a label library.
Data tagging of the new partition: the text is divided into "Guangdong" and the corresponding label is "province", then "Guangdong province" is taken as one of the labels of the text "Guangdong Buddha green scene three-way ring lake primary school", and the rest is analogized in the same way.
Preferably, after the input address text is segmented, weight statistics is performed on all text labels, each text label corresponds to a weight value, and the weight value is in direct proportion to the importance of the text label.
Preferably, the similarity between each text label in the input address text and the text label in the label library is calculated, the similarity and the weight value are weighted and averaged, and the text label in the label library with the maximum value is most similar to the input address text.
As mentioned above, the text is fuzzy-matched based on the lenvenshitin Distance algorithm, and an approximate text is returned, and the approximate text is similar text which can be understood by us. The method uses Lenvenshitein Distance as the fuzzy matching of the address, firstly, a weight statistic is carried out on the extracted text label in a label library, and the weight statistic mode used at this time is TFIDF, which is the prior art and is not described herein again. For example, after the conditional random field and TFIDF calculation, one of the address texts, namely "south sea of Buddha, Guilanlu 18", is used, the weight of "Buddha" is 0.15, the weight of south sea of.
When a new address text is received, firstly, the new address text is segmented, all administrative-level labels are obtained by utilizing the conditional random field segmentation, then, TFIDF calculates the weights of the text labels of all the administrative levels, finally, Lenvenshitin Distance of the text labels corresponding to the input address text and partial address texts in a label library is calculated, the final value is weighted and averaged, and the maximum address text is the address text most similar to the input address.
Examples are: the address of the ' 5 th 607 th room of the southern sea area of Buddha mountain's city ' cannot be recorded in the extracted tag library, so that the ' 5 th 607 th room of the Japanese sea area ' cannot appear in the tag library according to the first conditional random field and the second conditional random field (although the tags of the ' Buddha mountain city ', ' the southern sea area ', ' the Guanlan road ' and the like can be in the library and obtained by extracting other text tags in the same region), and the address of the ' 3 th 205 th room of the southern sea area of Buddha mountain's city's Lanlan ' may be extracted. After the address is divided and weighted, the weight of the 'Pigui garden 5 seat 607 room' is the highest, which means that the influence on the address is the largest, fuzzy matching of Lenvenshitin Distance is carried out on each state according to levels, the character string similarity of each state is generated, and finally, the weighted average of all the similarity and TFIDF is carried out to obtain the result. For two addresses of ' the 3 th room 205 of the Gui lan in the south sea area of Buddha city ' and ' the 5 th room 607 of the Gui lan in the south sea area of Buddha city ', after comparing the result of the product of the similarity rate of the character strings of the ' the 205 th room of the Gui garden ' and ' the 5 th room 607 of the Yi cloud ' and the weight of the ' the Gui garden 5 th room 607 ' respectively, it is determined who ' is more similar. By extracting labels segmented according to levels to perform local similarity matching, errors caused by inconsistent writing habits but similar semantics can be effectively avoided, and the situation of excessive fuzzy matching (starting from similar character string arrangement, which may result in extreme similarity on character strings but actually two different addresses, such as '4 < 201 > rooms on Longguangtian lake of lake-Changchan-district road |,' 4 < 201 > rooms on Longguangtian lake of green-scenic road of Changchan-district-China-Huafu) is avoided. Note: the addresses are all fictional and are convenient for explanation and use.
The technical principle of the present invention is described above in connection with specific embodiments. The description is made for the purpose of illustrating the principles of the invention and should not be construed in any way as limiting the scope of the invention. Based on the explanations herein, those skilled in the art will be able to conceive of other embodiments of the present invention without inventive effort, which would fall within the scope of the present invention.

Claims (9)

1. An address information extraction and matching method for realizing entity positioning is characterized in that: the method comprises a first address text containing administrative level keywords and a second address text not containing the administrative level keywords, and specifically comprises the following steps:
constructing a first conditional random field applicable to a first address text, comprising:
determining the state of a first conditional random field according to the administrative level keywords;
skipping the address text according to the state of the first conditional random field;
dividing the address text according to the state jump, and dividing the address text into a plurality of sub-texts;
adding text labels to the divided subfiles at the corresponding administrative levels according to the successfully divided address texts;
constructing and storing a label library according to the text labels;
constructing a second random field applicable to the second address text, comprising:
adding a new text label to the second address text according to the text label in the label library;
fuzzy matching is carried out on the address text, and the fuzzy matching comprises the following steps:
acquiring a weight value of a text label of a label library;
and matching the address text corresponding to the text label most similar to the input address text in the label library according to the weight value.
2. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
grading the first address text according to administrative level keywords of the first address text, wherein one address text of each level corresponds to one state of the first conditional random field;
the address texts at the same level are arranged side by side, and the address texts at the lower level are arranged behind the address texts at the higher level.
3. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
in the process of state jump, the high-level state corresponding to the high-level address text jumps to the low-level state corresponding to the low-level address text, and the jump is irreversible;
when the high-level state jumps to the low-level state, all the low-level states of the column in which the low-level state is located are passed;
the states of a single lowest level may jump to each other.
4. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
in each jump, dividing the address text by using the administrative level keywords corresponding to the level state, and performing secondary division on the divided address text in the next low-level state;
selecting a path with the most jumping times, and determining the path as an optimal segmentation path; and the jump times are not counted in the address text jumped across the level states.
5. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
in the successfully divided address texts, the sub-texts and the administrative level vocabularies corresponding to each level state are used as text labels.
6. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
and establishing a dictionary, adding the text labels to the dictionary according to a preset rule, and storing the dictionary into a two-dimensional data table.
7. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
splitting the second address text word by word, combining the split previous word with the split next word, matching the split previous word and the split next word in a tag library after combination, judging whether a text tag of the combination exists or not, and if so, keeping the combination; if not, the combination is not reserved;
after the combination is reserved, combining the combination with the next character to form a new combination, matching the new combination in a label library, judging whether a text label of the new combination exists, if so, reserving the new combination, continuing to combine the new combination with the next character, and if not, not reserving the new combination;
and so on until all split words can no longer be combined.
8. The method for extracting and matching address information for implementing entity location as claimed in claim 1, wherein:
after the input address text is segmented, weight statistics is carried out on all text labels, each text label corresponds to a weight value, and the weight value is in direct proportion to the importance of the text label.
9. The method for extracting and matching address information for implementing entity location as claimed in claim 8, wherein:
and calculating the similarity between each text label in the input address text and the text label in the label library, and calculating the weighted average of the similarity and the weight value, wherein the text label in the label library with the maximum value is most similar to the input address text.
CN202010590590.3A 2020-06-24 Address information extraction and matching method for realizing entity positioning Active CN111753515B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010590590.3A CN111753515B (en) 2020-06-24 Address information extraction and matching method for realizing entity positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010590590.3A CN111753515B (en) 2020-06-24 Address information extraction and matching method for realizing entity positioning

Publications (2)

Publication Number Publication Date
CN111753515A true CN111753515A (en) 2020-10-09
CN111753515B CN111753515B (en) 2024-07-02

Family

ID=

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581252A (en) * 2020-12-03 2021-03-30 信用生活(广州)智能科技有限公司 Address fuzzy matching method and system fusing multidimensional similarity and rule set
CN112835899A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address library indexing method, address matching method and related equipment
CN113656531A (en) * 2021-08-12 2021-11-16 南方电网数字电网研究院有限公司 Processing method and device for power grid address structuralization

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN105005577A (en) * 2015-05-08 2015-10-28 裴克铭管理咨询(上海)有限公司 Address matching method
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN109033225A (en) * 2018-06-29 2018-12-18 福州大学 Chinese address identifying system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719128A (en) * 2009-12-31 2010-06-02 浙江工业大学 Fuzzy matching-based Chinese geo-code determination method
CN104537062A (en) * 2014-12-29 2015-04-22 北京牡丹电子集团有限责任公司数字电视技术中心 Address information extracting method and system
CN105005577A (en) * 2015-05-08 2015-10-28 裴克铭管理咨询(上海)有限公司 Address matching method
CN106709065A (en) * 2017-01-19 2017-05-24 国家电网公司 Standardization processing method and standardized processing device for address information
CN108628811A (en) * 2018-04-10 2018-10-09 北京京东尚科信息技术有限公司 The matching process and device of address text
CN109033225A (en) * 2018-06-29 2018-12-18 福州大学 Chinese address identifying system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪洋: ""基于Trie树和有限状态自动机的中文地址解析模型"", 《计算机与现代化》, no. 7, pages 60 - 67 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581252A (en) * 2020-12-03 2021-03-30 信用生活(广州)智能科技有限公司 Address fuzzy matching method and system fusing multidimensional similarity and rule set
CN112835899A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address library indexing method, address matching method and related equipment
CN113656531A (en) * 2021-08-12 2021-11-16 南方电网数字电网研究院有限公司 Processing method and device for power grid address structuralization

Similar Documents

Publication Publication Date Title
US20230041991A1 (en) Density-based dynamic geohash
CN109145281B (en) Speech recognition method, apparatus and storage medium
US8700661B2 (en) Full text search using R-trees
CN103186524A (en) Address name identification method and device
CN108304493B (en) Hypernym mining method and device based on knowledge graph
US20210239486A1 (en) Method and apparatus for predicting destination, electronic device and storage medium
CN104182517A (en) Data processing method and data processing device
CN104346438A (en) Data management service system based on large data
CN109446207A (en) A kind of normal address database update method and address matching method
AU2019290018B2 (en) Computer implemented system and method for geographic subject extraction for short text
CN108831442A (en) Point of interest recognition methods, device, terminal device and storage medium
CN111625732B (en) Address matching method and device
CN105069071A (en) Geographical position information extraction method for microblog data
CN107025254A (en) A kind of course line destination searching method and device
CN112015908A (en) Knowledge graph construction method and system, and query method and system
Clarke Phonetic change in Newfoundland English
US20130132411A1 (en) Full Text Search Based on Interwoven String Tokens
Wing Text-based document geolocation and its application to the digital humanities
CN111753515A (en) Address information extraction and matching method for realizing entity positioning
CN108920705A (en) A kind of coding method of knowledge point identification and device
CN111753515B (en) Address information extraction and matching method for realizing entity positioning
CN115062150B (en) Text classification method and device, electronic equipment and storage medium
CN116644740A (en) Dictionary automatic extraction method and system based on single text term solidification degree
CN114792091A (en) Chinese address element analysis method and equipment based on vocabulary enhancement and storage medium
Wang et al. Efficient identification of local keyword patterns in microblogging platforms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant