CN115630648A - Address element analysis method and system for man-machine conversation and computer readable medium - Google Patents

Address element analysis method and system for man-machine conversation and computer readable medium Download PDF

Info

Publication number
CN115630648A
CN115630648A CN202211364279.2A CN202211364279A CN115630648A CN 115630648 A CN115630648 A CN 115630648A CN 202211364279 A CN202211364279 A CN 202211364279A CN 115630648 A CN115630648 A CN 115630648A
Authority
CN
China
Prior art keywords
address
user
information
address information
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211364279.2A
Other languages
Chinese (zh)
Inventor
李辰刚
杜振东
王清琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Yunwen Network Technology Co ltd
Original Assignee
Nanjing Yunwen Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Yunwen Network Technology Co ltd filed Critical Nanjing Yunwen Network Technology Co ltd
Priority to CN202211364279.2A priority Critical patent/CN115630648A/en
Publication of CN115630648A publication Critical patent/CN115630648A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a human-computer conversation-oriented address element analysis method, a system and a computer readable medium, wherein the method comprises the following steps: acquiring known address information from the background information and acquiring a reply text of a user; analyzing an address from a reply text of a user to obtain a user address element; combining the user address elements obtained by analysis with the known address information to obtain address elements corresponding to the address; judging whether a preset condition is met: if the preset conditions are met, ending the questioning link; if the preset condition is not met, a question which needs to be asked to the user is constructed according to the unsatisfied condition, the address is analyzed and combined from the reply text of the question by the user according to the steps, and whether the preset condition is met or not is further judged until the preset condition is met; and finally, updating the background address information by using the current address information. The method can accurately identify the role of the address element, accurately combine the address information of the context and the background with the address information in the user utterance, and finish the address element analysis facing the man-machine interaction.

Description

Address element analysis method and system for man-machine conversation and computer readable medium
Technical Field
The invention belongs to the technical field of man-machine conversation systems in the field of artificial intelligence, relates to natural language processing, and particularly relates to a man-machine conversation-oriented address element analysis method, a man-machine conversation-oriented address element analysis system and a computer readable medium.
Background
In a man-machine conversation system, a conversation system (chat robot) often needs to ask a user about an address where the user is, will arrive at, or has gone to in order to provide a corresponding service. The function has wide application in the fields of logistics distribution, take-out and meal delivery, taxi reservation, epidemiology investigation and the like. In a general conversation process, although a conversation system can acquire address information of a user by adopting technologies such as position sharing, IP address positioning and the like, the conversation is a very necessary means because the position to be transmitted by the user often does not accord with the current position of the user.
The existing flow for implementing the dialog is generally as follows: firstly, a dialogue system acquires rough background location information of a user; then, the dialogue system inquires about the address for the user, and incorporates background information to analyze the address replied by the user; and then, the dialogue system judges whether the address replied by the user meets the requirement, if so, the dialogue system is ended, and if not, the dialogue system asks the question according to the existing question.
The existing address element analysis method can segment the address text, identify each address element and obtain a more accurate result. However, the conventional method mainly aims at the definite address text field in the semi-structured data, for example, the selection box is used to obtain the administrative divisions of the first several levels, and other parts are manually input, and the wrong characters generated by the input modes such as voice recognition, keyboard input, handwriting input and the like are not considered. In addition, due to the context of the dialog, the address information provided by the user in the dialog is often incomplete or obsolete, for example, the address of the old administrative division is still input after the administrative division is changed, and the incomplete information needs to depend on the context existing in the man-machine dialog system. Therefore, some existing address element resolution methods are also generally difficult to automatically correct old or wrong information in the address.
Meanwhile, the existing address element analysis method also has the problem of poor accuracy. For example, in an address element extraction method, an address element extraction device, a computer device, and a storage medium proposed in chinese patent application CN113449528A, a method for analyzing address elements is proposed, in which an address element extraction model is used to identify address elements, and the address element extraction model is based on a knowledge-enhanced semantic representation model and a conditional random field model, so as to achieve the effect of identifying address elements. In practical applications, however, this method cannot accurately distinguish the types of address elements. For example, in "shanxi lu 67 world trade center a seat 1202 cloud trade", the model would identify both "cloud trade" and "world trade center" as "places of interest" (POI), but since "cloud trade" is a place subordinate to "world trade center", it should be labeled as "sub-places of interest". The reason is that various upper and lower relations exist between the front address element and the rear address element, and the traditional method only depends on a sequence marking model, so that the modeling distance is short, the global label relation is difficult to process, and the element role identification is inaccurate.
Disclosure of Invention
The invention aims to provide a human-computer conversation-oriented address element analysis method, which comprises the following steps:
step 1: acquiring known address information from the background information and acquiring a reply text of a user;
and 2, step: analyzing an address from a reply text of a user to obtain a user address element;
and step 3: combining the user address elements obtained by analysis with the known address information to obtain address elements corresponding to the address;
and 4, step 4: judging whether a preset condition is met:
if the preset conditions are met, ending the questioning link;
if the preset condition is not met, a question which needs to be asked to the user is constructed according to the unsatisfied condition, the address is analyzed and combined from the reply text of the question by the user according to the steps, and whether the preset condition is met or not is further judged until the preset condition is met;
and 5: updating the background address information with the current address information.
As an optional implementation manner, in step 2, resolving an address from a reply text of a user to obtain a user address element, including the following steps:
step 2.1: acquiring an address text to be processed from a reply text of a user;
step 2.2: segmenting an address text to be processed, and labeling a preliminary address role to obtain an address element list;
step 2.3: and combining each address element, removing invalid address elements, and adjusting the final address role of each address element to obtain an address element list.
As an optional implementation manner, in step 2.1, the address text to be processed is obtained from the reply text of the user, and the address text is obtained by using an entity recognition model based on sequence labeling, which includes the following steps:
firstly, performing word segmentation and part-of-speech tagging on a user reply text;
then, converting the word segmentation and part-of-speech tagging information of the text into the characteristics of the characters;
then, sending the characters and character features into a conditional random field or a conditional random field model based on a neural network to obtain a label of each character;
and finally, analyzing the label sequence through a state machine to obtain the address text to be processed.
As an optional implementation manner, the entity recognition model based on sequence labeling adopts a pre-training model, and during model training, characters of an address text are labeled according to a mode of "B, I, E, S, O" in the labeled training text data:
b represents the beginning character of the address element;
i represents the middle character of the address;
e represents the ending character of the address;
s represents an address with only one character;
o represents a character not belonging to any address
Thus, the address text is converted into a sequence of character labels according to the mode of B, I, E, S and O;
in the labeling mode of each address text, except that when an address only has one character, the label is represented by S, in other modes, the label of the address starts with B and ends with E, and only I is contained between B and E, so that the entity recognition model learns the label expression mode in the training process to obtain the dependency relationship between labels.
As an optional implementation manner, in step 2.3, combining each address element, removing invalid address elements, and further adjusting the roles of the address elements by using role transformation rules and global constraints to obtain an address element list;
the role conversion rule is a plurality of predefined rule templates, and when one part of the address elements meets the rule templates, the roles of the address elements are adjusted according to the corresponding processing modes in the rule templates;
the global constraint is a constraint rule for deleting and transforming the global role of the address elements;
and after the role conversion rule and the global constraint are finally executed and converted, sequentially combining and adjusting the roles of the address elements according to the address level from high to low.
As an optional embodiment, in step 3, combining the analyzed user address element with the known address information to obtain an address element corresponding to the address, includes:
step 3.1: linking the address elements with addresses in a specific address element entity knowledge base to obtain a user address element list;
step 3.2: combining the known address information with the current user address element in the user address element list to obtain a complete address;
step 3.3: filtering and merging the adjacent address elements in sequence;
step 3.4: and for the address elements after filtering and merging, converting the corresponding words into standard names.
As an optional implementation manner, in step 3.2, according to the difference of the known address information, the processing is performed in two cases:
(1) If the address information is known to be a single administrative division, completion is judged according to the following logic:
step 3.2.1.1: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a single address element entity, directly covering the known address information;
step 3.2.1.2: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a plurality of address element entities, performing merging and screening operation on the highest level address element entity of the current address information by using a first-level administrative division of the known address information, which is higher than the current address information, so as to obtain a merged and screened entity list; if the entity list of the combined screening is not empty, the result is used as a corresponding entity list, otherwise, the result is kept unchanged;
step 3.2.1.3: if the highest level of the current address information is smaller than the lowest level of the known address information, executing a combined screening operation by using entities corresponding to the known address information and the highest level of the current address information to obtain a combined screening entity list; and if the entity list of the combined screening is not empty, taking the result as a corresponding entity list, otherwise, keeping the result unchanged.
(2) If the known address information is the address obtained by analysis, the current address element is used to cover the address element of the existing known address information according to the level, and finally whether the reserved known address element conflicts with the user address element is checked, which comprises the following processes:
step 3.2.2.1: if the highest level of the current address element is smaller than the lowest level of the address element entity knowledge base, the address elements are considered to be compatible;
step 3.2.2.2: if the highest level of the current address element is less than or equal to the lowest level of the known address information, executing 'merging and screening' operation on the highest level of the 'known address information' and the 'current address element', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.3: if the highest level of the known address information is less than the lowest level of the current address element, executing a merging and screening operation on the highest level of the current address element and the known address information, and if the merging result is not empty, enabling the current address element and the known address information to be compatible;
step 3.2.2.4: if not, merging fails, recording a merging failure mark, and only using the current address information.
As an optional embodiment, a merge filtering operation is performed, including:
a) Sequentially selecting address element entity GPE corresponding to high-level address elements a
b) Sequentially selecting address element entity GPE corresponding to low-level address elements b
c) If GPE b Is GPE a Lower address element entity of (1), GPE b Adding an address element entity merging list;
d) And after circulation is finished, taking the address element entity merging list as an administrative division after two address elements are merged.
As an optional implementation manner, in the step 4, a question to be asked to the user is constructed according to the unsatisfied condition, which includes:
step 5.1: if the address element information is not acquired, directly feeding back the address information which is not acquired;
step 5.2: if the information of the required level is not acquired, asking questions according to the address level which is not acquired;
step 5.3: if the address is discontinuous, selecting a higher address level in the longest discontinuous segment for questioning;
step 5.4: and if the address has administrative division ambiguity, generating a question according to the number of the ambiguity.
According to a second aspect of the present invention, there is also provided a computer system, comprising:
one or more processors;
a memory storing instructions operable, when executed by the one or more processors, to implement the processes of the human-machine dialog oriented address element resolution method described above.
According to a third aspect of the present invention, a computer-readable medium storing a computer program is also provided, the computer program including instructions executable by one or more computers, the instructions implementing the process of the aforementioned human-computer conversation oriented address element resolution method when executed by the one or more computers.
Compared with the prior art, the address element analysis method facing to the man-machine conversation has the remarkable advantages that:
1. the human-computer conversation oriented address element analysis method can accurately identify the role of the address element, complete the human-computer conversation interaction oriented address element analysis, accurately combine the context and background address information with the address information in the user words, judge whether the address meets the requirements and appropriately ask back the content needing to be supplemented, and finally obtain the expected address information;
2. the address element analysis method facing the man-machine conversation better solves the interference of wrong words, old information and the like in the address input by the user by introducing the address element knowledge base, identifies and restores the correct address and improves the accuracy of address identification.
It should be understood that all combinations of the foregoing concepts and additional concepts described in greater detail below can be considered as part of the inventive subject matter of this disclosure unless such concepts are mutually inconsistent. Additionally, all combinations of claimed subject matter are considered a part of the presently disclosed subject matter.
The foregoing and other aspects, embodiments and features of the present teachings can be more fully understood from the following description taken in conjunction with the accompanying drawings. Additional aspects of the present invention, such as features and/or advantages of exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of the specific embodiments according to the teachings of the present invention.
Drawings
The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the present invention will now be described, by way of example, with reference to the accompanying drawings, in which:
fig. 1 is a flowchart illustrating a method for address element resolution for human-machine conversation according to an exemplary embodiment of the present invention.
Fig. 2 is a diagram illustrating a parsed result of the user session text "Nanjing".
Fig. 3 is a schematic diagram of reading approximate matching in an exemplary embodiment of the invention.
Detailed Description
In order to better understand the technical content of the present invention, specific embodiments are described below with reference to the accompanying drawings.
In this disclosure, aspects of the present invention are described with reference to the accompanying drawings, in which a number of illustrative embodiments are shown. Embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be appreciated that the various concepts and embodiments described above, as well as those described in greater detail below, may be implemented in any of numerous ways, as the disclosed concepts and embodiments are not limited to any one implementation. In addition, some aspects of the present disclosure may be used alone, or in any suitable combination with other aspects of the present disclosure.
The address element resolution method for man-machine conversation combined with the embodiment shown in fig. 1 is implemented by the following steps:
step 1: acquiring known address information from the background information and acquiring a reply text of a user;
step 2: analyzing an address from a reply text of a user to obtain a user address element;
and 3, step 3: combining the user address elements obtained by analysis with the known address information to obtain address elements corresponding to the address;
and 4, step 4: judging whether a preset condition is met:
if the preset conditions are met, ending the questioning link;
if the preset condition is not met, a question which needs to be asked to the user is constructed according to the unsatisfied condition, the address is analyzed and combined from the reply text of the question by the user according to the steps, and whether the preset condition is met or not is further judged until the preset condition is met;
and 5: updating the background address information with the current address information.
As an optional implementation manner, in step 1, the address information is known as the address information in the current session as the background information, and the source of the address information includes:
(1) Background address information preset in a conversation scene;
(2) Basic address information of the user;
(3) Address information that the user has mentioned in the dialog.
As an optional implementation manner, in step 2, the address is parsed from the reply text of the user to obtain the user address element, which includes the following steps:
step 2.1: acquiring an address text to be processed from a reply text of a user;
step 2.2: segmenting an address text to be processed, and labeling a preliminary address role to obtain an address element list;
step 2.3: and combining each address element, removing invalid address elements, and adjusting the final address role of each address element to obtain an address element list.
As an optional implementation manner, in step 2.1, the address text to be processed is obtained from the reply text of the user, and the address text is obtained by using an entity recognition model based on sequence tagging, which includes the following steps:
firstly, performing word segmentation and part-of-speech tagging on a user reply text;
then, converting the word segmentation and part-of-speech tagging information of the text into the characteristics of the characters;
then, sending the characters and the character features into a conditional random field or a conditional random field model based on a neural network to obtain a label of each character;
and finally, analyzing the label sequence through a state machine to obtain the address text to be processed.
As an optional implementation manner, the entity recognition model based on sequence labeling adopts a pre-training model, and during model training, characters of an address text are labeled according to a mode of "B, I, E, S, O" in the labeled training text data:
b represents the beginning character of the address element;
i represents the middle character of the address;
e represents the ending character of the address;
s represents an address with only one character;
o represents a character not belonging to any address
Thus, the address text is converted into a sequence of character labels according to the mode of B, I, E, S and O;
in the labeling mode of each address text, except for the fact that when an address only has one character, the label is represented by S, in other modes, the label of the address starts with B and ends with E, and only I is contained between B and E, so that the entity recognition model learns the label expression mode in the training process to obtain the dependency relationship between labels.
As an optional implementation manner, in step 2.3, combining each address element, removing invalid address elements, and further adjusting the roles of the address elements by using role transformation rules and global constraints to obtain an address element list;
the role conversion rule is a plurality of predefined rule templates, and when one part of the address elements meets the rule templates, the roles of the address elements are adjusted according to the corresponding processing modes in the rule templates;
the global constraint refers to a constraint rule for deleting and transforming the global role of the address element;
and after the role conversion rule and the global constraint are finally executed and completed, sequentially combining and adjusting the roles of the address elements according to the address level from high to low.
As an optional embodiment, in step 3, combining the analyzed user address element with the known address information to obtain an address element corresponding to the address, includes:
step 3.1: linking the address elements with addresses in a specific address element entity knowledge base to obtain a user address element list;
step 3.2: combining the known address information with the current user address element in the user address element list to obtain a complete address;
step 3.3: filtering and merging the adjacent address elements in sequence;
step 3.4: and for the address elements after filtering and merging, converting the corresponding words into standard names.
As an optional implementation manner, in step 3.2, according to the difference of the known address information, the processing is performed in two cases:
(1) If the address information is known to be a single administrative division, completion is determined according to the following logic:
step 3.2.1.1: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a single address element entity, directly covering the known address information;
step 3.2.1.2: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a plurality of address element entities, performing merging and screening operation on the highest level address element entity of the current address information by using a first-level administrative division of the known address information, which is higher than the current address information, so as to obtain a merged and screened entity list; if the entity list of the combined screening is not empty, the result is used as a corresponding entity list, otherwise, the result is kept unchanged;
step 3.2.1.3: if the highest level of the current address information is smaller than the lowest level of the known address information, executing a combined screening operation by using entities corresponding to the known address information and the highest level of the current address information to obtain a combined screening entity list; and if the entity list of the combined screening is not empty, taking the result as a corresponding entity list, otherwise, keeping the result unchanged.
(2) If the known address information is the address obtained by analysis, the current address element is used to cover the address element of the existing known address information according to the level, and finally whether the reserved known address element conflicts with the user address element is checked, which comprises the following processes:
step 3.2.2.1: if the highest level of the current address element is smaller than the lowest level of the address element entity knowledge base, the address elements are considered to be compatible;
step 3.2.2.2: if the highest level of the current address element is less than or equal to the lowest level of the known address information, executing 'merging and screening' operation on the highest level of the 'known address information' and the 'current address element', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.3: if the highest level of the known address information is less than the lowest level of the current address element, executing 'merging and screening' operation on the highest level of the 'current address element' and the 'known address information', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.4: if not, merging fails, recording a merging failure mark, and only using the current address information.
As an optional embodiment, the performing the merged filtering operation includes:
a) Sequentially selecting address element entity GPE corresponding to high-level address elements a
b) Sequentially selecting address element entity GPE corresponding to low-level address elements b
c) If GPE b Is GPE a Lower address element entity of (2), then GPE b Adding an address element entity merging list;
d) And after circulation is finished, taking the address element entity merging list as an administrative division after two address elements are merged.
As an optional embodiment, the aforementioned preset condition includes at least one of the following conditions:
(1) An address exists in the user dialog text;
(2) The user addresses are continuous;
(3) Requiring the user address to reach a particular address level;
(4) And if the required address level has an administrative division, the administrative division of the required address level is required to be uniquely determined.
As an optional implementation, the construction of questions to be asked to the user according to the unsatisfied condition includes:
step 5.1: if the address element information is not acquired, directly feeding back the address information which is not acquired;
step 5.2: if the information of the required level is not acquired, questioning is carried out according to the address level which is not acquired;
step 5.3: if the address is discontinuous, selecting a higher address level in the longest discontinuous segment for questioning;
step 5.4: and if the address has administrative division ambiguity, generating a question according to the number of the ambiguity.
The steps of the method according to the foregoing embodiment of the present invention will be further illustrated and described with reference to fig. 1 and 2.
In the example shown in fig. 1, the address element resolution method for human-computer conversation according to the present embodiment includes the following steps:
1. and acquiring the known address information from the background information and acquiring the reply text of the user.
2. And analyzing the address from the reply text of the user to obtain each address element.
This address resolution step includes four sub-steps, respectively: firstly, acquiring an address text to be processed from a reply text of a user; secondly, segmenting an address text to be processed, and labeling a preliminary address role to obtain an address element list; secondly, identifying unidentified address elements by using a rule, and finally combining the address elements, removing invalid address elements and adjusting the final address roles of the address elements; a list of address elements is obtained.
3. And combining the address elements obtained by analysis with the known address information to obtain the address elements corresponding to the address.
The address element combination step comprises four sub-steps, which are respectively: firstly: and linking the address elements with addresses in a specific address element entity knowledge base to obtain an address element list. Then, combining the known address information and the address elements to obtain a complete address, then filtering and merging the adjacent address elements in sequence, and finally converting words corresponding to the address elements into standard names.
4. Judging whether a preset condition is met or not; if the condition is met, ending the questioning link;
5. if not, the required problem is built according to the situation of not satisfying.
6. Updating the background address information with the current address information.
Address element resolution is the process of splitting an address mentioned in a user session into a plurality of address elements according to roles and associating the address elements with administrative divisions established in an entity knowledge base.
Different levels of addresses exist according to the coverage range from large to small.
In some embodiments, different expressions exist at some subordinate address levels, and one expression at a level is an address role of an address element.
The address elements are composed of address roles, words or phrases corresponding to the roles, and corresponding address knowledge (if any) in an address element knowledge base.
A full address is composed of a number of different levels of address elements.
The address element role definition, as in table 1 below, is a way of address role division used in the examples of the present invention. It should be understood that the larger the value of the address level, the lower the address level.
TABLE 1 Address element role definition
Figure BDA0003923251760000091
Figure BDA0003923251760000101
Taking the complete address of ' XX city XX district XX road XX industry park XX supermarket ' in XX province XX district XX road XX ' as an example, after the address element analysis, one form is to convert the address into ' provide ': the three roles of ' XX province ', ' city ', ' XX city ', ' district ', ' XX region ', ' road _ no ', ' XX number ', ' poi ': XX industrial park ', ' house _ no ', ' XX building ', ' floor ', ' X building ', ' name ': XX supermarket ', and the three roles of ' progress ', ' city ', ' district ' are associated with the specific administrative regions in the knowledge base.
And (3) analyzing the address elements facing to the man-machine conversation, and finishing the process of combining background address information and judging whether the address elements meet the conditions.
Take the following dialog as an example:
the system comprises the following steps: "ask you where your address is? The requirements are specific to the cell. "
The user: "it should be Drum Lou Ning Hai Lu No. 18"
The system comprises the following steps: "which city do you say' drum zone? "
The user: 'Nanjing bar'
The system comprises: "good, address is the drumbeat ning hailu No. 18, south Beijing, jiangsu province. "
In the dialogue, the dialogue system judges that the address given by the user is ambiguous and asks a question, then disambiguates the address according to the information, and confirms that the answer is valid. In connection with the above-described session, a procedure of performing the method twice is included.
To achieve the above object, the address element analysis method proposed in this embodiment includes the following 5 steps.
Step 1: and acquiring the known address information from the background information and acquiring the reply text of the user.
The known address information is address information as background information in the current session, and in the embodiment of the present invention, the following three forms are included:
the first is background address information preset in the dialog scenario, for example, a current address of a local service provided by the shanghai city may be assumed to be located in the shanghai city, or assumed to be located in china, and the preset background address should be a known administrative division entity;
second is the basic address information of the user. For example, based on an IP address database, the IP address information of the user is converted into a rough administrative division entity, for example, based on the city where the user is located, and also based on the geographic location information such as a wireless network, an operator base station, and satellite positioning, the user can be converted into a known administrative division entity under the condition of user authorization;
and thirdly, address information mentioned by the user in the conversation, wherein the addresses are analyzed and stored in a structured form in the conversation context.
It should be understood that the address information will only exist when the information that the user last answered fails to meet the requirements.
When there is no address information mentioned in the dialog, the background address information preset by the dialog scene or the basic address information of the user is used as the known address information.
The administrative division entity (GPE) is a node in a hierarchical administrative division, and each administrative division entity at least comprises a division name, a belonging level, belonging hierarchical relationship information, and auxiliary information such as an alias, a great name and an old membership of the administrative division. Additional information such as location, contour, etc. may also be included. All the administrative division entities form an administrative division knowledge base.
The administrative district knowledge base must specify the level of addresses used to meet the requirements of the region of use.
Taking china as an example, the administrative division can be divided into five levels of administrative divisions, namely, country, province (municipality, direct prefecture, special administrative district), city (municipality, region, municipality, union), district and county (district, unsettled city, county, flag), town street (county, town, street, national county, sappan). Administrative division entities in the administrative division knowledge base must be located at a certain level and go up to the highest level.
The reply text of the user is a text character string input by a keyboard of the user, a text character string input by handwriting or a text character string input by a voice recognition mode.
The information in the user reply should contain effective address information, and the man-machine conversation system can judge whether the information contains the effective address information or not by the method of the invention.
Step 2: and analyzing the address from the reply text of the user to obtain each user address element.
This step resolves an address form of "XX district XX road XX number XX industrial park XX building XX supermarket XX district XX", splits words or phrases belonging to each address element, and determines their address roles.
The address element contains an address role and a word or phrase representing that type of address.
As an optional implementation manner, in an embodiment of the present invention, a process of parsing a reply text to obtain a user address element includes:
step 2.1: acquiring an address text to be processed from a reply text of a user;
step 2.2: segmenting an address text to be processed, and labeling a preliminary address role to obtain an address element list;
step 2.3: and combining each address element, removing invalid address elements, and adjusting the final address role of each address element to obtain an address element list.
We further describe below with reference to specific examples.
Step 2.1: and acquiring the address text to be processed from the reply text of the user.
The user's reply text may contain the address and may also contain other non-address characters. Therefore, the address text in the text needs to be extracted. For example, the address character string "he should be" dru prefecture ning hai lu 18 "is" dru prefecture ning hai lu 18 ".
In the embodiment of the invention, the address to be processed is obtained from the reply text of the user by adopting an entity recognition model based on sequence labeling, wherein the model is a pre-trained entity recognition model.
An alternative solution for identifying addresses using a sequence tagging model includes:
firstly, performing word segmentation and part-of-speech tagging on a user reply text;
then, converting the word segmentation and part-of-speech tagging information of the text into the characteristics of the characters;
then, sending the characters and character features into a conditional random field or a conditional random field model based on a neural network to obtain a label of each character;
and finally, analyzing the label sequence through a state machine to obtain the address text to be processed.
In the sequence labeling model, the characters of the address can be labeled in the modes of "B, I, E, S, O".
Where B denotes the beginning character of the address element, I denotes the middle character of the address, E denotes the ending character of the address, S denotes an address with only one character, and O denotes a character not belonging to any address.
By this labeling mode, addresses are converted into a sequence of character tags.
For example, "it should be drum region Ninhui No. 18" may be labeled as the results shown in Table 2.
TABLE 2 address abstraction labeling example
Should be taken The Is that Drum Building Zone(s) Ning (medicine for curing rheumatism) Sea water Road surface 1 8 Number (C)
O O O B I I I I I I I E
Thus, for each address, except for the case where an address has only one character, denoted by S, in other patterns, the label of an address must begin with B and end with E, and there may only BE I between B and E, so the label pattern would only BE S, BE, BIE, BIIE, BIIIE, etc.
The sequence annotation model learns the expression patterns, i.e., the dependencies between the markers. When the sequence labeling model predicts the label corresponding to the text replied by the user by using the dependency relationship, the condition that the E label is followed by the I label is avoided.
The entity recognition model may be trained using artificially labeled text data having the same format as the labeled data of table 2.
Step 2.2: and segmenting the address text to be processed, and labeling a preliminary address role to obtain an address element list.
The input of the step is a to-be-processed address text, the output of the step is an address element list, and each address element is obtained by segmenting the to-be-processed address text and comprises a preliminary role.
In the example of the present invention, an address element parsing model is used to segment and label preliminary address elements. The address element analysis model may adopt a pre-training model.
The tags identified by the model include the following fields: country, province, city, county, development area, town street, community, village group, road name, road house number, interest site, building name, unit number, floor number, room number.
As an alternative example, the address element analysis model may use a sequence annotation model, or may predict the start position and the end position of the segment corresponding to each role.
In this example, the sequence annotation model is still taken as an example, and is called as an address element resolution sequence annotation model.
When a sequence annotation model is used, the implementation steps are similar to step 2.1. And labeling a corresponding label for each character or word segment in the address text to be processed.
Taking the example of the cloud trade center, A seat 1202, of the world trade center, A seat 1202, in the Shanxi Lu 67 of the Yuan Hua district, south, jiangsu province, each character is listed as 'B-progress, E-progress, B-city, E-city, B-discrict, I-discrict, E-discrict, B-road, I-road, E-road, B-road _ no, I-road _ no, E-road _ no, B-poi, I-poi, E-poi, B-road _ no, E-home _ no, E-road _ no, B-road _ no, I-road _ no, E-road _ no, B-name, I-name, E-name'. Or the text is segmented into segment sequences of "jiang/su/nan/jing/drum/building/district/mountain/west/way/67/number/world/trade/center/heart/a/seat/1202/cloud/flower/trade/". The corresponding labels must be listed as "B-provide, E-provide, B-city, E-city, B-district, I-district, E-district, B-road, I-road, E-road, B-road no, E-road no, B-poi, I-poi, E-poi, B-house no, E-house no, S-room no, B-name, I-name, E-name".
The meaning of the hyphen "-" preceding the hyphen "is" B, I, E, S, O "is the same as that in step 2.1, and the hyphen" - "following the label of the corresponding role is called the label back in this step.
After the label sequence is predicted from the text to be processed by the address element analysis sequence labeling model, the label of each word can be obtained through a decoding step.
At this time, the label rule is met (S alone, or B begins, E ends, and only I can exist between B and E), the character sequence with the same label rear part forms the character string of the address element, and the label rear part forms the role corresponding to the address element.
Similar to the model training of 2.1, the address element parsing sequence labeling model can be obtained by training with artificially labeled text data having the same format as the labeled data in this step.
Step 2.3: and combining each address element, removing invalid address elements, and adjusting the final address role of each address element. A list of address elements is obtained.
The step further adjusts the roles of the address elements by using role conversion rules and global constraints,
a role transformation rule is a series of predefined rule templates. Once a portion of the addresses match the "matching rules" therein, execution is performed according to the rules in the processing mode. The following table lists some examples of rule templates:
matching rules Treatment method
[ road 1/road][ and][ road 2/road][ intersection ]] ' sub _ road ' road 2'
[ location of interest 1/poi][ location of interest 2/poi] ' sub _ poi ': point of interest 2'
[ location of interest 1/poi][ vicinity of][ location of interest 2/poi] ' sub _ poi ': point of interest 2'
For example, the rule template "[ road 1/road ] [ and ] [ road 2/road ] [ intersection ]", the rule template corresponds to a rule that if the roles of consecutive words and the corresponding address elements of the words match the rule, such as "boyai road/and/pacific south road/intersection", then "road 2" is changed to "sub-road".
The global constraint is a constraint rule for deleting and transforming the global role of the address elements;
and after the role conversion rule and the global constraint are finally executed and converted, sequentially combining and adjusting the roles of the address elements according to the address level from high to low.
In an alternative example, the rules of the global constraint include the following:
1. and the address elements of the administrative division levels which are not restricted by the template only reserve the last one at each level.
For example, in the "western lake region of hangzhou city of hangzhou, zhejiang, and hangzhou, zhegzhou, the" hangzhou city "appears twice due to address matching or duplication. After treatment, only one of the 'Zhejiang province' and 'Hangzhou city' is repeatedly reserved. This step ensures that invalid information is not contained in the appropriate address information;
2. the "place of interest" element that appears only for the first time is the "place of interest" element, and the other elements are adjusted to "go to the place by oneself" if the "place of interest" element is behind "floor number", "unit number", or "room number". The role becomes "shop name".
And 3, combining the address element list obtained by analysis with the known address information to obtain the address element list.
In the step, the information of the address element list is complemented by using the known address information, and the address element is associated to the address element entity knowledge base, so that the address is matched with the knowledge base as much as possible, and a basis is provided for judging whether the address information meets the requirement.
As an alternative embodiment, the combination of address elements specifically comprises the following steps:
step 3.1: linking the address elements with addresses in a specific address element entity knowledge base to obtain a user address element list;
step 3.2: combining the known address information with the current user address element in the user address element list to obtain a complete address;
step 3.3: filtering and merging the adjacent address elements in sequence;
step 3.4: and for the address elements after filtering and merging, converting the corresponding words into standard names.
In the following, we will further explain with specific examples.
Step 3.1: and linking the address elements with addresses in a specific address element entity knowledge base to obtain an address element list.
This step further normalizes the address data using the address element entity knowledge base.
Step 3.1.1: and respectively inquiring each address element which is not less than the lowest level of the address element entity knowledge base in the address element list in the address element entity knowledge base to obtain an address element entity list corresponding to each address element.
The address level covered by the address element entity knowledge base is greater than or equal to the administrative district knowledge base, and for example, the data such as roads can be included on the basis of the administrative district knowledge base. Using the administrative division repository as the address element entity repository, a word may correspond to multiple address element entities (hereinafter "entities"). For example, the word "drumbeat area" may correspond to "china, jiangsu province, nanjing city, drumbeat area", "china, jiangsu province, xu zhou city, drumbeat area", "china, henna province, kaifeng city, drumbeat area", "china, fujian city, fuguo city" drumbeat area "which constitutes an address element entity list (hereinafter referred to as" entity list ").
Step 3.1.2: address element entities are matched by aliases.
If the entity contains the alias, the old name, the old membership and other auxiliary information of the administrative division, the index from the terms to the entity can be established according to the alias and the old name. And querying the index to obtain an entity list corresponding to the address element.
Step 3.1.3: and matching the address elements which are not matched by using the approximate matching index.
Approximate matching uses phonetic similarity or font similarity matching, for each word in the address elements, trying to match a pronunciation index or font index, finding similar words. All address element entities that can be matched are collected and constructed as an entity list.
As a pronunciation approximate matching method of an embodiment, firstly, address element texts are converted into pinyin mapping strings, and each address element in an address index is constructed into a state machine as shown in figure 3 by taking silent pinyin as a state; and then, sequentially matching the states in the state machine by using the pinyin mapping strings of the address elements, and if the matching is successful, successfully identifying.
And after each step from 3.1.1 to 3.1.3, finally obtaining a corresponding address element list, wherein each address element which is not less than the lowest level of the address element entity knowledge base corresponds to an address element entity list.
Fig. 2 shows the parsed result of the user session text "Nanjing" as shown in step 201 of fig. 2.
Step 3.2: and combining the known address information and the current user address element to obtain the complete address.
This part is divided into two cases according to the known address information.
(1) If the address information is known to be a single administrative division, completion is determined according to the following logic:
step 3.2.1.1: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a single address element entity, directly covering the known address information;
step 3.2.1.2: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a plurality of address element entities, performing merging and screening operation on the highest-level address element entity of the current address information by using a first-level administrative district of the known address information, which is higher than the current address information, to obtain a merging and screening entity list; if the entity list of the combined screening is not empty, taking the result as a corresponding entity list; otherwise, keeping the state unchanged;
step 3.2.1.3: if the highest level of the current address information is smaller than the lowest level of the known address information, trying to execute a combined screening operation by using entities corresponding to the known address information and the highest level of the current address information to obtain a combined screened entity list; and if the entity list of the combined screening is not empty, taking the result as a corresponding entity list. Otherwise, keeping the state unchanged;
(2) If the known address information is an address obtained through analysis, the current address element is used for covering the address element of the existing known address information according to the level, and finally whether the reserved known address element conflicts with the user address element is checked, wherein the method comprises the following steps:
step 3.2.2.1: if the highest level of the current address element is smaller than the lowest level of the address element entity knowledge base, the address elements are considered to be compatible;
step 3.2.2.2: if the highest level of the current address element is less than or equal to the lowest level of the known address information, executing 'merging and screening' operation on the highest level of the 'known address information' and the 'current address element', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.3: if the highest level of the known address information is less than the lowest level of the current address element, executing a merging and screening operation on the highest level of the current address element and the known address information, and if the merging result is not empty, enabling the current address element and the known address information to be compatible;
step 3.2.2.4: if not, merging fails; recording the merging failure mark and only using the current address information.
Wherein, the steps relate to 'merging and screening'. An exemplary process of merging screening methods includes:
a) Sequentially selecting address element entity GPE corresponding to high-level address elements a
b) Sequentially selecting address element entity GPE corresponding to low-level address elements b
c) If GPE b Is GPE a Lower address element entity of (1), GPE b Adding an address element entity merging list;
d) And after circulation is finished, taking the address element entity merging list as an administrative division after two address elements are merged.
The above is illustrated here by the following different examples.
When the address information is known to be a single administrative division, three cases are divided:
for example, the address information is known as "south jing city, china, jiangsu province", and the address text to be processed is "xx road xx commercial building in drum district, fuzhou city", according to step 3.2.1.1 ", and" south jing city, china, fujian province "is equal in level to" south jing city, china, jiangsu province ". The known address information is ignored.
For example, the known address information is "jiangsu province, nanjing city, jiangning district", the address text to be processed is "Qixia district xx road xx commercial building", and the highest level of the current address information is "district; the 'hauxia region' corresponds to more than one entity, and according to the step 3.2.1.2, an entity with known address information level higher than 'district' needs to be used, namely, the entity in china, jiangsu province, south kyo city. The entity list corresponding to "percha zone" performs "merged screening" operation, and the result is a merged result entity list including only "percha zone of south kyoto city, jiangsu province, china". Since the list is not empty, the combined resulting entity list serves as the entity list corresponding to the "hauxia region".
For example, if the address information is "south jing city of china, jiangsu province", and the address text to be processed is "xixia district xx street xx road xx cell", since there is more than one "xixia district" in the address element entity knowledge base, according to step 3.2.1.3, the operation of "merge screening" is performed, and the result is an entity "south jing city of south jing, china" as the address element entity corresponding to the address element "district:" xixia district ". And for example, if the address text to be processed is 'Wu-Ching district xx street xx district', only one 'Wu-Ching district' is in the address element entity knowledge base, according to the step 3.2.1.3, executing 'merge screening' operation, if the result is null, still using the original address element 'Wu-Ching district, jiangsu province, changzhou city'.
Another type of situation is that the known address information is the text of the user, for example, in an "information supplement" scenario, the known address information is "discrete: 'drum district', road: 'Ninghai road', road _ no: 'number 18' ″, the user address element is" city: 'Nanjing' ″, the corresponding address element list is "south Jingjing City of Jiangsu province", the lowest level is 2, and according to a rule 3.2.2, the lowest level is greater than the highest level 3 of the known address information, the slot corresponding to the known address information is filled with the "Nanjing" and the corresponding address element entity information. Becomes "city: 'Nanjing', district: 'Drum district', road: 'Ninghai road', road _ no: 'number 18'", as shown in step 202 of FIG. 2.
Step 3.3: and filtering and merging the adjacent address elements in sequence.
The higher level address elements are less ambiguous and administrative divisions of lower level address elements may be filtered. This step thus allows disambiguation of the corresponding address element entity while integrating the address element corresponding entity data.
In this step, the "filter merge" operation is identical to the "filter merge" operation in step 3.2.1.
In combination with step 203 of fig. 2, taking "'city', 'Nanjing', 'discrete', 'Drum zone', 'road _ no', 'No. 18' ″, after this step, only the address element entity list of the address element" drum zone "has" lower level address element entity of "China, jiangsu province, nanjing city, drum zone" is "China, jiangsu province, nanjing city", so that after this step, the address element entity list corresponding to "drum zone" only includes "one address element entity of" China, jiangsu province, nanjing city, drum zone ".
Step 3.4: and converting words corresponding to the address elements into standard names.
In the step, the matched and unique address element entity is utilized to convert the wrongly written or aliased words in the address into the standard name in the address element entity, and the completion operation is completed according to the upper layer information of the address element entity.
For example, the user address elements of 'city', 'differentiation', 'Laiyuan', and since 1 month is 2019, shandong province adjusts administrative divisions of Laiyuan city in Jinan city, cancels the land-level Laiyuan city, and divides the region under jurisdiction in Jinan city. The ' Laiycity region ' corresponds to ' Chinese, shandong, jinan and Laiwu regions ' according to synonym matching, and accordingly when in output, the ' city ' Laiwu ' and the ' District: ' Laiycity region ' in the original address are converted and filled with upper layers to be transformed into ' county ': chinese, ' provice ': shandong, city ', ' Jinan and the ' District ': laiwu regions '.
And 4, step 4: and judging whether the acquired address element list meets a preset condition or not.
As alternative examples, the preset conditions include four cases of 4.1 to 4.4.
4.1: the address is present in the dialog text of the user.
If the address element list is empty, the address resolution is considered to be failed, otherwise, the address resolution is successful.
4.2: the addresses of the users are more continuous.
After reasoning, the full address should correspond to an administrative division, under which the continuous blank address level does not exceed a threshold.
In the exemplary role hierarchy, when the role of address level 6 is "road", the consecutive blank address levels should be less than 3 levels. At other times, the consecutive blank address levels should be less than level 2.
For example: the role of address level 6 in "('city', 'Nanjing City', 'district', 'Drum zone', 'road', 'Ninghai road', 'road _ no', 'No. 18'," is "road", and levels 4 and 5 are absent from the maximum level to the minimum level, and the condition is satisfied.
4.3: the address is required to reach a specific address level.
The lowest address level in the list of address elements is not higher than the address level preset in the question and the address level involved in the question is not null.
For example, when inquiring about "the area and county", the address "provide", "Jiangsu province", "city", "Nanjing city" is empty in the address level 3, and is not satisfactory.
4.4: and if the required address level has an administrative division, the administrative division of the required address level is required to be uniquely determined.
For example, when a specific address is queried, if the address elements at the level and upper level still correspond to multiple administrative division knowledge, it is considered that there is ambiguity and that it is not satisfactory.
And 5: and constructing a question which needs to be continuously asked for the user according to the condition that the preset condition is not met.
The unsatisfied conditions mainly include four conditions of no acquisition of address element information, no acquisition of information at a required level, large discontinuity of addresses and ambiguity of administrative divisions.
Step 5.1: if the address element information is not acquired, directly feeding back the address information which is not acquired;
step 5.2: if the information of the required level is not acquired, a question is asked according to the address level which is not acquired;
in a specific operation, a problem may be prepared for each address level. For example: can it tell the correct doorplate number, cell, garden area name or sign? "
Step 5.3: if the address has great discontinuity, the higher address level in the longest discontinuous segment needs to be selected for questioning;
step 5.4: if there is an administrative ambiguity in the address, a problem arises based on the number of ambiguities.
If there are two candidate administrative regions, a problem arises according to the highest level at which there is a difference between the two administrative regions.
If there are multiple candidate administrative divisions, a single choice, multiple question-or-answer forms of questions may be generated, where a question-and-answer form of questions comprises the following process:
5.4.1: traversing the address levels of each entity in the list of ambiguous address element entities from high to low address levels,
5.4.2: acquiring roles corresponding to each level, and counting the number of non-repeated roles;
5.4.3: if the number of non-repetitions is equal to the number of elements in the entity list of ambiguous address elements, the level is selected as the question level, otherwise the next level is traversed and 5.4.1 is returned.
Still take ' district ': drum floor ' as an example, 4 ' drum floor zones ' exist in the ambiguous address element entity list, traverse the address entity from the ' country ' level according to 5.4.1, only one ' Chinese ' of the unrepeated entity at this time, and continue traversing the next level; obtaining 3 entities at the 'province' level, namely China, fujian province, china, jiangsu province and China, henan province, wherein the number of the entities in the entity list of the ambiguous address elements is smaller than that of the entities, and continuously traversing the next level; there are 4 entities at the "city of land" level, so the question level is the "city of land" level. Finally, reply to "which city do you say" the "drumbeat area" belong to? ".
And 6: the background address information is updated with the current address information.
To this end, the address element resolution process of this embodiment is completed.
With reference to fig. 1 and the address element parsing method in the foregoing embodiment, according to an embodiment of the present invention, there is further provided a computer system, including: one or more processors, and a memory. The memory is used to store instructions that can be operated, and when executed by the one or more processors, implement the processes of the human-machine dialog oriented address element resolution method of the foregoing embodiments.
With reference to fig. 1 and the address element resolution method of the foregoing embodiment, according to an embodiment of the present invention, a computer readable medium storing a computer program is further provided, where the computer program includes instructions that can be executed by one or more computers, and when the instructions are executed by the one or more computers, the instructions implement the processes of the address element resolution method for human-computer interaction of the foregoing embodiment.
Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims (13)

1. A human-computer conversation-oriented address element analysis method is characterized by comprising the following steps:
step 1: acquiring known address information from the background information and acquiring a reply text of a user;
and 2, step: analyzing an address from a reply text of a user to obtain a user address element;
and step 3: combining the user address elements obtained by analysis with the known address information to obtain address elements corresponding to the address;
and 4, step 4: judging whether a preset condition is met:
if the preset conditions are met, ending the questioning link;
if the preset conditions are not met, the problem needing to be asked for the user is constructed according to the unsatisfied condition, the address is analyzed and combined from the reply text of the user to the problem according to the steps, and whether the preset conditions are met or not is further judged until the preset conditions are met;
and 5: updating the background address information with the current address information.
2. The method for parsing address elements for human-computer interaction according to claim 1, wherein in step 1, the known address information is used as the address information of the background information in the current interaction, and the source of the known address information comprises:
(1) Background address information preset in a conversation scene;
(2) Basic address information of a user;
(3) Address information that the user has mentioned in the dialog.
3. The method for parsing address elements for human-computer interaction according to claim 1, wherein in step 2, the address is parsed from the reply text of the user to obtain the user address elements, and the method comprises the following steps:
step 2.1: acquiring an address text to be processed from a reply text of a user;
step 2.2: segmenting an address text to be processed, and labeling a preliminary address role to obtain an address element list;
step 2.3: and combining each address element, removing invalid address elements, and adjusting the final address role of each address element to obtain an address element list.
4. The method for parsing address elements for human-computer conversation according to claim 3, wherein in step 2.1, the address text to be processed is obtained from the reply text of the user, and the address text is obtained by using an entity recognition model based on sequence tagging, comprising the following steps:
firstly, performing word segmentation and part-of-speech tagging on a user reply text;
then, converting the word segmentation and part-of-speech tagging information of the text into the characteristics of the characters;
then, sending the characters and the character features into a conditional random field or a conditional random field model based on a neural network to obtain a label of each character;
and finally, analyzing the label sequence through a state machine to obtain the address text to be processed.
5. The method for parsing address elements for human-computer conversation according to claim 4, wherein the entity recognition model based on sequence labeling adopts a pre-training model, and during model training, characters of the address text are labeled in the labeled training text data according to the mode of "B, I, E, S, O":
b represents the beginning character of the address element;
i represents the middle character of the address;
e represents the ending character of the address;
s represents an address with only one character;
o represents a character not belonging to any address
Thus, the address text is converted into a sequence of character labels according to the mode of B, I, E, S and O;
in the labeling mode of each address text, except for the fact that when an address only has one character, the label is represented by S, in other modes, the label of the address starts with B and ends with E, and only I is contained between B and E, so that the entity recognition model learns the label expression mode in the training process to obtain the dependency relationship between labels.
6. The method for analyzing address elements oriented to human-computer interaction of claim 3, wherein in step 2.3, combining each address element, removing invalid address elements, and further adjusting the roles of the address elements by using role transformation rules and global constraints to obtain an address element list;
the role conversion rule is a plurality of predefined rule templates, and when one part of the address elements meets the rule templates, the roles of the address elements are adjusted according to the corresponding processing modes in the rule templates;
the global constraint refers to a constraint rule for deleting and transforming the global role of the address element;
and after the role conversion rule and the global constraint are finally executed and completed, sequentially combining and adjusting the roles of the address elements according to the address level from high to low.
7. The method for analyzing address elements for human-computer interaction according to claim 1, wherein the step 3 of combining the user address elements obtained by the analysis with the known address information to obtain the address elements corresponding to the addresses comprises:
step 3.1: linking the address elements with addresses in a specific address element entity knowledge base to obtain a user address element list;
step 3.2: combining the known address information with the current user address element in the user address element list to obtain a complete address;
step 3.3: filtering and merging the adjacent address elements in sequence;
step 3.4: and for the address elements after filtering and merging, converting the corresponding words into standard names.
8. The address element resolution method for human-computer interaction according to claim 7, wherein in step 3.2, the processing is performed in two cases according to the difference of the known address information:
(1) If the address information is known to be a single administrative division, completion is determined according to the following logic:
step 3.2.1.1: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a single address element entity, directly covering the known address information;
step 3.2.1.2: if the highest level of the current address information is higher than or equal to the lowest level of the known address information and corresponds to a plurality of address element entities, performing merging and screening operation on the highest level address element entity of the current address information by using a first-level administrative division of the known address information, which is higher than the current address information, so as to obtain a merged and screened entity list; if the entity list of the combined screening is not empty, the result is used as a corresponding entity list, otherwise, the result is kept unchanged;
step 3.2.1.3: if the highest level of the current address information is smaller than the lowest level of the known address information, executing a merging and screening operation by using entities corresponding to the known address information and the highest level of the current address information to obtain a merging and screening entity list; and if the entity list of the combined screening is not empty, taking the result as a corresponding entity list, otherwise, keeping the result unchanged.
(2) If the known address information is the address obtained by analysis, the current address element is used to cover the address element of the existing known address information according to the level, and finally whether the reserved known address element conflicts with the user address element is checked, which comprises the following processes:
step 3.2.2.1: if the highest level of the current address element is smaller than the lowest level of the address element entity knowledge base, the address elements are considered to be compatible;
step 3.2.2.2: if the highest level of the current address element is less than or equal to the lowest level of the known address information, executing 'merging and screening' operation on the highest level of the 'known address information' and the 'current address element', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.3: if the highest level of the known address information is less than the lowest level of the current address element, executing 'merging and screening' operation on the highest level of the 'current address element' and the 'known address information', and if the merging result is not null, enabling the two to be compatible;
step 3.2.2.4: if not, merging fails, recording a merging failure mark, and only using the current address information.
9. The method for address element resolution towards human-computer interaction of claim 8, wherein the performing a merge filtering operation comprises:
a) Sequentially selecting address element entity GPE corresponding to high-level address elements a
b) Sequentially selecting address element entity GPE corresponding to low-level address elements b
c) If GPE b Is GPE a Lower address element entity of (2), then GPE b Adding an address element entity merging list;
d) And after circulation is finished, taking the address element entity merging list as an administrative division after two address elements are merged.
10. A method for address element resolution towards human-computer interaction according to any of claims 1-9, wherein the predetermined condition comprises at least one of the following conditions:
(1) An address exists in the user dialog text;
(2) The user addresses are continuous;
(3) Requiring the user address to reach a specific address level;
(4) And if the required address level has an administrative division, the administrative division of the required address level is required to be uniquely determined.
11. The method for parsing address elements for human-computer interaction according to claim 1, wherein constructing a question to ask a user if the question is not satisfied comprises:
step 5.1: if the address element information is not acquired, directly feeding back the address information which is not acquired;
step 5.2: if the information of the required level is not acquired, questioning is carried out according to the address level which is not acquired;
step 5.3: if the address is discontinuous, selecting a higher address level in the longest discontinuous segment to ask a question;
step 5.4: and if the address has administrative zoning ambiguity, generating a question according to the number of the ambiguity.
12. A computer system, comprising:
one or more processors;
a memory storing instructions that are operable, when executed by one or more processors, to perform the process of the human-machine dialog-oriented address element resolution method of any of claims 1 to 11.
13. A computer-readable medium storing a computer program, wherein the computer program comprises instructions executable by one or more computers, and when the instructions are executed by the one or more computers, the instructions implement the process of the human-computer conversation oriented address element resolution method according to any one of claims 1 to 9.
CN202211364279.2A 2022-11-02 2022-11-02 Address element analysis method and system for man-machine conversation and computer readable medium Pending CN115630648A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211364279.2A CN115630648A (en) 2022-11-02 2022-11-02 Address element analysis method and system for man-machine conversation and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211364279.2A CN115630648A (en) 2022-11-02 2022-11-02 Address element analysis method and system for man-machine conversation and computer readable medium

Publications (1)

Publication Number Publication Date
CN115630648A true CN115630648A (en) 2023-01-20

Family

ID=84908828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211364279.2A Pending CN115630648A (en) 2022-11-02 2022-11-02 Address element analysis method and system for man-machine conversation and computer readable medium

Country Status (1)

Country Link
CN (1) CN115630648A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701552A (en) * 2023-04-07 2023-09-05 北京百度网讯科技有限公司 Case administration organization determination method and device and electronic equipment
CN116955855A (en) * 2023-09-14 2023-10-27 南京擎天科技有限公司 Low-cost cross-region address resolution model construction method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701552A (en) * 2023-04-07 2023-09-05 北京百度网讯科技有限公司 Case administration organization determination method and device and electronic equipment
CN116701552B (en) * 2023-04-07 2023-12-22 北京百度网讯科技有限公司 Case administration organization determination method and device and electronic equipment
CN116955855A (en) * 2023-09-14 2023-10-27 南京擎天科技有限公司 Low-cost cross-region address resolution model construction method and system
CN116955855B (en) * 2023-09-14 2023-11-24 南京擎天科技有限公司 Low-cost cross-region address resolution model construction method and system

Similar Documents

Publication Publication Date Title
CN115630648A (en) Address element analysis method and system for man-machine conversation and computer readable medium
CN111159385B (en) Template-free general intelligent question-answering method based on dynamic knowledge graph
CN110781670B (en) Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors
CN107145577A (en) Address standardization method, device, storage medium and computer
CN106909611B (en) Hotel automatic matching method based on text information extraction
CN106777275A (en) Entity attribute and property value extracting method based on many granularity semantic chunks
CN104679867B (en) Address method of knowledge processing and device based on figure
CN111488468B (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN113239210A (en) Water conservancy literature recommendation method and system based on automatic completion knowledge graph
CN112347222A (en) Method and system for converting non-standard address into standard address based on knowledge base reasoning
CN1936892A (en) Image content semanteme marking method
CN108304411B (en) Semantic recognition method and device for geographical position statement
CN111353314A (en) Story text semantic analysis method for animation generation
CN116991875B (en) SQL sentence generation and alias mapping method and device based on big model
CN114692568A (en) Sequence labeling method based on deep learning and application
CN116861269A (en) Multi-source heterogeneous data fusion and analysis method in engineering field
CN109271625B (en) Pinyin spelling standardization method for Chinese place names
CN114091454A (en) Method for extracting place name information and positioning space in internet text
CN111738008B (en) Entity identification method, device and equipment based on multilayer model and storage medium
CN117454898A (en) Method and device for realizing legal entity standardized output according to input text
CN116431746A (en) Address mapping method and device based on coding library, electronic equipment and storage medium
CN113468881B (en) Address standardization method and device
CN112287005B (en) Data processing method, device, server and medium
CN114970547A (en) Multi-level and multi-type planning content difference identification and conflict elimination method
CN114003812A (en) Address matching method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination