CN111062211A - Information extraction method and device, electronic equipment and storage medium - Google Patents

Information extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111062211A
CN111062211A CN201911379232.1A CN201911379232A CN111062211A CN 111062211 A CN111062211 A CN 111062211A CN 201911379232 A CN201911379232 A CN 201911379232A CN 111062211 A CN111062211 A CN 111062211A
Authority
CN
China
Prior art keywords
class
word segmentation
noun
dictionary
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911379232.1A
Other languages
Chinese (zh)
Inventor
宋维林
杨庆友
黄林
黎华清
叶小辉
杜敏聪
陈燕芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201911379232.1A priority Critical patent/CN111062211A/en
Publication of CN111062211A publication Critical patent/CN111062211A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium. The information extraction method provided by the embodiment of the invention comprises the following steps: the method comprises the steps of firstly carrying out word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, then determining a first probability value corresponding to point mutual information between a first action class word segmentation in the action class word segmentation set and a first noun class word segmentation in the noun class word segmentation set, and if the first probability value is larger than a preset probability threshold, generating first intention information according to the first action class word segmentation and the first noun class word segmentation. According to the information extraction method provided by the embodiment of the invention, the probability value corresponding to point mutual information between any combination of the action type participles and the noun type participles is calculated, so that the combined words and sentences of the action type participles and the noun type participles with higher relevance are selected as the intention information of the user, and the new intention of the user is found.

Description

Information extraction method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to an information extraction method and apparatus, an electronic device, and a storage medium.
Background
With the popularization of artificial intelligence, the application of the interactive robot in the customer service industry is more and more extensive, but the satisfaction degree of the answering ability and the solving ability of a customer to the interactive robot is lower in daily operation.
The customer service robot and the customer can generate a large amount of unstructured text data in the interaction process, and the data contains real feedback data and requirements of the customer. At present, in the daily maintenance process of the robot, a large amount of manpower is required to be invested for analysis so as to extract useful information, so that the new intention of a user is found, and the service coverage of a customer service robot is realized.
Therefore, the existing method for extracting useful information through manual analysis is low in efficiency, and when the method faces massive data, the manual analysis cannot meet the actual business requirements.
Disclosure of Invention
The invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium, which are used for quickly finding new intentions of customers, thereby providing great convenience for customer requirements and hotspot analysis.
In a first aspect, an embodiment of the present invention provides an information extraction method, including:
performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is determined according to a service scene type corresponding to the text to be processed;
determining a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, wherein the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;
and if the first probability value is larger than a preset probability threshold value, generating first intention information according to the first action class participle and the first noun class participle.
In a possible design, before performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method includes:
acquiring a to-be-processed conversation, wherein the to-be-processed conversation is a conversation text between a customer service robot and a customer;
and extracting the client text in the dialog to be processed to generate the text to be processed.
In one possible design, the noun class dictionary includes a business class dictionary and an activity class dictionary.
In a possible design, the performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set includes:
performing word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set;
performing synonym clustering on the action class participles in the initial action class participle set according to a preset synonym dictionary to generate an action class participle set;
and carrying out synonym clustering on the noun class participles in the initial noun class participle set according to a preset synonym dictionary to generate the noun class participle set.
In a possible design, after performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method further includes:
performing word frequency sequencing on the action class participles in the action class participle set to determine that the action class participles sequenced before a first position form a high-frequency action class participle set, wherein the first action class participle belongs to the high-frequency action class participle set;
and performing word frequency sequencing on the noun class participles in the noun class participle set to determine that the noun class participles sequenced before the second position form a high-frequency noun class participle set, wherein the first noun class participle belongs to the high-frequency noun class participle set.
In a second aspect, the present invention further provides an information extracting apparatus, including:
the text word segmentation module is used for performing word segmentation processing on a text to be processed according to a preset word segmentation tool so as to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the text to be processed;
a probability determination module, configured to determine a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;
and the information generation module is used for generating first intention information according to the first action class participle and the first noun class participle if the first probability value is greater than a preset probability threshold value.
In a possible design, the information extracting apparatus further includes:
the system comprises a conversation acquisition module, a conversation processing module and a conversation processing module, wherein the conversation acquisition module is used for acquiring a to-be-processed conversation which is a conversation text between a customer service robot and a client;
and the text extraction module is used for extracting the client text in the dialog to be processed to generate the text to be processed.
In one possible design, the noun class dictionary includes a business class dictionary and an activity class dictionary.
In one possible design, the text segmentation module is specifically configured to:
performing word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set;
performing synonym clustering on the action class participles in the initial action class participle set according to a preset synonym dictionary to generate an action class participle set;
and carrying out synonym clustering on the noun class participles in the initial noun class participle set according to a preset synonym dictionary to generate the noun class participle set.
In a possible design, the information extracting apparatus further includes: the word frequency ordering module is specifically used for:
performing word frequency sequencing on the action class participles in the action class participle set to determine that the action class participles sequenced before a first position form a high-frequency action class participle set, wherein the first action class participle belongs to the high-frequency action class participle set;
and performing word frequency sequencing on the noun class participles in the noun class participle set to determine that the noun class participles sequenced before the second position form a high-frequency noun class participle set, wherein the first noun class participle belongs to the high-frequency noun class participle set.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
a processor; and the number of the first and second groups,
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform any one of the possible information extraction methods of the first aspect via execution of the executable instructions.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the possible information extraction methods in the first aspect.
The embodiment of the invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium, by using a scene dictionary containing a specific business scene as a word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain an action class word segmentation set and a noun class word segmentation set, wherein, the application of the scene dictionary can make the word segmentation of the text to be processed generated according to the specific service scene more accurate, and then the probability value corresponding to the point mutual information between the arbitrary combination of the action class word segmentation and the noun class word segmentation is calculated, the combined words and phrases of the action class participles and the noun class participles with higher relevance are selected as the intention information of the user, therefore, new intentions of the client can be found, great convenience can be provided for client requirements and hotspot analysis, the manual analysis and daily operation cost is reduced, and the answering capability and the solving capability of the robot can be greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic diagram of an application scenario of an information extraction method according to an example embodiment of the present invention;
FIG. 2 is a flow diagram illustrating an information extraction method according to an example embodiment of the invention;
FIG. 3 is a flow diagram illustrating an information extraction method according to another example embodiment of the present invention;
FIG. 4 is a flow diagram illustrating a manner in which a scene dictionary is determined in accordance with an exemplary embodiment of the present invention;
fig. 5 is a schematic structural diagram of an information extraction apparatus according to an example embodiment of the present invention;
fig. 6 is a schematic configuration diagram of an information extraction apparatus according to another exemplary embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the popularization of artificial intelligence, the application of the interactive robot in the customer service industry is more and more extensive, but the satisfaction degree of the answering ability and the solving ability of a customer to the interactive robot is lower in daily operation. The customer service robot and the customer can generate a large amount of unstructured text data in the interaction process, and the data contains real feedback data and requirements of the customer. At present, in the daily maintenance process of the robot, a large amount of manpower is required to be invested for analysis so as to extract useful information, so that the new intention of a user is found, and the service coverage of a customer service robot is realized. Therefore, the existing method for extracting useful information through manual analysis is low in efficiency, and when the method faces massive data, the manual analysis cannot meet the actual business requirements.
In view of the above-mentioned problems, embodiments of the present invention provide an information extraction method, by using a scene dictionary containing a specific business scene as a word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain an action class word segmentation set and a noun class word segmentation set, wherein, the application of the scene dictionary can make the word segmentation of the text to be processed generated according to the specific service scene more accurate, and then the probability value corresponding to the point mutual information between the arbitrary combination of the action class word segmentation and the noun class word segmentation is calculated, the combined words and phrases of the action class participles and the noun class participles with higher relevance are selected as the intention information of the user, therefore, new intentions of the client can be found, great convenience can be provided for client requirements and hotspot analysis, the manual analysis and daily operation cost is reduced, and the answering capability and the solving capability of the robot can be greatly improved.
Fig. 1 is a schematic diagram of an application scenario of an information extraction method according to an example embodiment of the present invention. As shown in fig. 1, the information extraction method provided by this embodiment may be applied to a robot client dialog scenario, especially a robot client dialog scenario for a network operator service scenario, for mining a user intention from a dialog between a robot and a client. Specifically, text information or voice information input by the client 100 may be uploaded to the server 200, and the customer service robot 300 performs a dialogue interaction with the client 100 through the server 200 to generate a to-be-processed dialogue, so as to extract intention information from the to-be-processed dialogue to dig out the intention of the user.
Fig. 2 is a flowchart illustrating an information extraction method according to an example embodiment of the present invention. As shown in fig. 2, the information extraction method provided in this embodiment includes:
step 101, performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set.
Specifically, word segmentation processing can be performed on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to the service scene type corresponding to the text to be processed.
It should be noted that, for the preset word segmentation tool, for example, the preset word segmentation tool may be any one of a Jieba word segmentation tool, a SnowNLP word segmentation tool, a pkuserg word segmentation tool, a THULAC word segmentation tool, and a HanLP word segmentation tool. When the existing word segmentation tool is used for performing word segmentation processing on a text to be processed, a general dictionary carried by the word segmentation tool is generally adopted. For the robot client dialogue scenario applied in this embodiment, especially for the robot client dialogue scenario of the network operator service scenario, a general dictionary is used for word segmentation, and the word segmentation is usually inaccurate. Because the business class words, the activity class entries and the action class entries are not distinguished, the analysis result is difficult to judge by a machine, manual item-by-item analysis and checking are needed, a large amount of manpower and material resources are wasted, the operation cost is high, and the efficiency is low.
The scene dictionary can be an industry scene dictionary with professionalism and specificity formed by accumulating enough business nouns, active vocabularies and verb vocabularies.
Fig. 4 is a flowchart illustrating a scene dictionary determination method according to an example embodiment of the present invention. As shown in fig. 4, the scene dictionary may include an action class dictionary and a noun class dictionary, and the noun class dictionary may include a business class dictionary and an activity class dictionary. For the business class dictionary may be included: package type entries, traffic type entries, value added service type entries, etc. The activity class dictionary may include: public number activity type entries, holiday promotion type entries, maintenance activity type entries, and the like. When a user selects a scene (for example, a package opening scene, a flow complaint scene, a recharging scene, a movable preferential consultation scene, a point exchange scene and the like), corresponding entries and action class dictionaries are matched to form a scene dictionary (for example, a scene dictionary 1 or a scene dictionary 2) matched with the scene characteristics.
In addition, the word segmentation dictionary used by the preset word segmentation tool can also comprise a scene dictionary and a general dictionary, so that the coverage range of the dictionary is further expanded, and the word segmentation accuracy of the text to be processed is further improved.
Step 102, determining a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle.
In this step, a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle may be determined, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set. The first action class participle may be any action class participle in the action class participle set, and similarly, the first noun class participle may be any noun class participle in the noun class participle set. It can be seen that the above steps are used to determine a first probability value corresponding to point-to-point information after any two-two combination of the action class participle set and the noun class participle set.
It should be understood that the index of Mutual point Information (PMI) measures the correlation between two things, for example, for two words, it measures the correlation between two random variables, that is, the amount of Information contained in one random variable about the other random variable, and here, the specific calculation process is not limited in this embodiment.
And 103, if the first probability value is larger than a preset probability threshold, generating first intention information according to the first action class participle and the first noun class participle.
Since the expression intention of the client is often expressed in the form of "action + noun" (e.g., "action + business noun" or "action + activity noun" etc.) through the discovery of daily operation analysis, the intention is expressed. Therefore, when the correlation between the action class participle set acquired from the text to be processed and the combination of the action class participle and the noun class participle in the noun class participle set is high, the combination can be determined as the intention information of the client in the text to be processed.
Specifically, a first probability value corresponding to point mutual information between the first action type participle and the first noun type participle is determined, and if the first probability value is greater than a preset probability threshold value, it indicates that the first action type participle and the first noun type participle have high correlation, and the first intention information can be generated according to the first action type participle and the first noun type participle.
For example: the probability value corresponding to the point mutual information between the opening + flow packets is 0.9, the probability value corresponding to the point mutual information between the changing + flow packets is 0.2, and a preset probability threshold value can be set to be 0.8, so that the opening + flow packets can be determined as intention information determined according to the text to be processed, and can be used as a new intention of the client, so that the service corresponding to the opening + flow packets can be further performed on the client through a robot customer service or a manual client.
In the embodiment, the scene dictionary containing the specific service scene is used as the word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain the action class word segmentation set and the noun class word segmentation set, wherein the application of the scene dictionary can enable the word segmentation of the text to be processed generated according to the specific service scene to be more accurate, and then the probability value corresponding to point mutual information between any combination of the action class word segmentation and the noun class word segmentation is calculated to select the combined word and sentence of the action class word segmentation and the noun class word segmentation with higher correlation as the intention information of the user, so that the new intention of the user is found, great convenience is provided for the requirement and hotspot analysis of the user, the manual analysis and daily operation cost are reduced, and the solution capability of the robot can be greatly improved.
Fig. 3 is a flowchart illustrating an information extraction method according to another example embodiment of the present invention. As shown in fig. 3, the information extraction method provided in this embodiment includes:
step 201, obtaining a dialog to be processed.
Step 202, extracting the client text in the dialog to be processed to generate the text to be processed.
In the step, a to-be-processed dialog is obtained, wherein the to-be-processed dialog is a dialog text between the customer service robot and the customer, and then the customer text in the to-be-processed dialog is extracted to generate the to-be-processed text.
The pending conversation is a conversation text between the customer service robot and the customer, and the pending conversation may be as follows:
customer: you are good to help me look up XXX.
Customer service robot: your XXX is XXX.
Customer: good, when that XXX activity starts.
Customer service robot: XXX activity start time is XXXX.
For the above dialog text, it can be split into:
the client part:
customer: you are good to help me look up XXX.
Customer: good, when that XXX activity starts.
The customer service robot part:
customer service robot: your XXX is XXX.
Customer service robot: XXX activity start time is XXXX.
Since the information extraction method provided by the present embodiment is intended to mine intention information in a sentence expressed by a client, in order to reduce the amount of calculation, exclude noise data to improve the intention information recognition accuracy, it is possible to discard the customer service robot dialogue portion until the client text in the dialogue to be processed is extracted as the text to be processed. For the extracted text to be processed, for example:
customer: you are good to help me look up XXX.
Customer: good, when that XXX activity starts.
Step 203, performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set.
Specifically, word segmentation processing may be performed on the text to be processed according to a preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set, where a word segmentation dictionary used by the preset word segmentation tool includes a scene dictionary, the scene dictionary includes an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the text to be processed.
It should be noted that, for the preset word segmentation tool, for example, the preset word segmentation tool may be any one of a Jieba word segmentation tool, a SnowNLP word segmentation tool, a pkuserg word segmentation tool, a THULAC word segmentation tool, and a HanLP word segmentation tool. When the existing word segmentation tool is used for performing word segmentation processing on a text to be processed, a general dictionary carried by the word segmentation tool is generally adopted. For the robot client dialogue scenario applied in this embodiment, especially for the robot client dialogue scenario of the network operator service scenario, a general dictionary is used for word segmentation, and the word segmentation is usually inaccurate. Because the business class words, the activity class entries and the action class entries are not distinguished, the analysis result is difficult to judge by a machine, manual item-by-item analysis and checking are needed, a large amount of manpower and material resources are wasted, the operation cost is high, and the efficiency is low.
The scene dictionary can be an industry scene dictionary with professionalism and specificity formed by accumulating enough business nouns, active vocabularies and verb vocabularies.
Fig. 4 is a flowchart illustrating a scene dictionary determination method according to an example embodiment of the present invention. As shown in fig. 4, the scene dictionary may include an action class dictionary and a noun class dictionary, and the noun class dictionary may include a business class dictionary and an activity class dictionary. For the business class dictionary may be included: package type entries, traffic type entries, value added service type entries, etc. The activity class dictionary may include: public number activity type entries, holiday promotion type entries, maintenance activity type entries, and the like. When a user selects a scene (for example, a package opening scene, a flow complaint scene, a recharging scene, a movable preferential consultation scene, a point exchange scene and the like), corresponding entries and action class dictionaries are matched to form a scene dictionary (for example, a scene dictionary 1 or a scene dictionary 2) matched with the scene characteristics.
And 204, clustering synonyms to generate an action class participle set and a noun class participle set.
Because the customer often can not use standard words in the process of inputting characters, but adopts various spoken expressions, and often can input wrongly written characters, in order to improve the accuracy of extracting intention information, different expression methods and wrongly written characters of the customer can be clustered into one standard word, so that the actual requirements of the user can be analyzed.
Specifically, the action class participles in the initial action class participle set may be subjected to synonym clustering according to a preset synonym dictionary to generate an action class participle set, and the noun class participles in the initial noun class participle set may be subjected to synonym clustering according to the preset synonym dictionary to generate a noun class participle set.
For example, synonym clustering may be performed on the words referred to in the following table, and then the words are aggregated into "change", and the specific word list is as follows:
root-changing Is changed off Variations in Handover
Modified by Change to Is modified into Change
Become into Rotating shaft Regulating Become
Is replaced by Changes are made to Changing of Adjustment of
Replacement of Turn back to Instead, it is changed into Exchange of
Improvement of Change to Updating Changes are made to
Conversion Transformation of Is converted into Become into
Mutual rotation Variations in Replacement of Change of
Is turned into Transformation of Root-modifying Transformation of
Improvement of Changeable pipe Conversion Modifying
Step 205, performing word frequency ordering on the participles in the action class participle set and the noun class participle set.
In addition, since the corresponding words are usually expressed many times when the client expresses the intention information, after determining the action class participle set and the noun class participle set, in order to reduce the amount of calculation and improve the intention extraction accuracy, the participle and the aggregated vocabulary entry may be used to sort the word frequencies from high to low, and the word frequencies are sorted according to different business class vocabulary entries, activity class vocabulary entries, action vocabulary entries and general dictionary class words, so as to perform the association analysis, and find out the expression intention of the user with high frequency.
Specifically, the action class participles in the action class participle set may be word-frequency ordered to determine that the action class participles ordered before the first position (e.g., the first 5, 10, 100, etc.) constitute the high-frequency action class participle set. Similarly, the noun class participles in the noun class participle set are subjected to word frequency ordering to determine that the noun class participles ordered before the second position (for example, the first 5 bits, the first 10 bits, the first 100 bits, and the like) form a high-frequency noun class participle set.
Step 206, determining a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle.
In this step, a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle may be determined, where the first action class participle belongs to the high-frequency action class participle set, and the first noun class participle belongs to the high-frequency noun class participle set. The first action class participle may be any action class participle in the high frequency action class participle set, and similarly, the first noun class participle may be any noun class participle in the high frequency noun class participle set. Therefore, the step is used for determining a first probability value corresponding to point mutual information after any two combinations of the action class participles and the noun class participles in the high-frequency action class participle set and the high-frequency noun class participle set.
Step 207, if the first probability value is greater than the preset probability threshold, generating first intention information according to the first action category participle and the first noun category participle.
It is worth to be noted that, the specific implementation manner of step 207 in this embodiment may refer to the specific description of step 103 in the embodiment shown in fig. 2, and is not described again here.
Fig. 5 is a schematic structural diagram of an information extraction apparatus according to an example embodiment of the present invention. As shown in fig. 5, the information extraction apparatus 300 according to the present embodiment includes:
the text word segmentation module 301 is configured to perform word segmentation on a to-be-processed text according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, where a word segmentation dictionary used by the preset word segmentation tool includes a scene dictionary, the scene dictionary includes an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the to-be-processed text;
a probability determining module 302, configured to determine a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;
an information generating module 303, configured to generate first intention information according to the first action class participle and the first noun class participle if the first probability value is greater than a preset probability threshold.
On the basis of the embodiment shown in fig. 5, fig. 6 is a schematic structural diagram of an information extraction apparatus according to another exemplary embodiment of the present invention. As shown in fig. 6, the information extraction apparatus 300 according to the present embodiment further includes:
a dialog acquisition module 304, configured to acquire a to-be-processed dialog, where the to-be-processed dialog is a dialog text between a customer service robot and a client;
a text extracting module 305, configured to extract a client text in the to-be-processed dialog to generate the to-be-processed text.
In one possible design, the noun class dictionary includes a business class dictionary and an activity class dictionary.
In one possible design, the text segmentation module 301 is specifically configured to:
performing word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set;
performing synonym clustering on the action class participles in the initial action class participle set according to a preset synonym dictionary to generate an action class participle set;
and carrying out synonym clustering on the noun class participles in the initial noun class participle set according to a preset synonym dictionary to generate the noun class participle set.
In one possible design, the information extracting apparatus 300 further includes: the word frequency ordering module 306 is specifically configured to:
performing word frequency sequencing on the action class participles in the action class participle set to determine that the action class participles sequenced before a first position form a high-frequency action class participle set, wherein the first action class participle belongs to the high-frequency action class participle set;
and performing word frequency sequencing on the noun class participles in the noun class participle set to determine that the noun class participles sequenced before the second position form a high-frequency noun class participle set, wherein the first noun class participle belongs to the high-frequency noun class participle set.
Each functional unit in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
It should be noted that the information extraction device provided in the embodiments shown in fig. 5 to 6 can be used to execute the information extraction method provided in the embodiments shown in fig. 2 to 3, and the specific implementation manner and the technical effect are similar, and are not described herein again.
Fig. 7 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment of the present invention. As shown in fig. 7, the present embodiment provides an electronic device 400, including:
a processor 401; and the number of the first and second groups,
a memory 402 for storing executable instructions of the processor, which may also be a flash (flash memory);
wherein the processor 401 is configured to perform the steps of the above-described method via execution of the executable instructions. Reference may be made in particular to the description relating to the preceding method embodiment.
Alternatively, the memory 402 may be separate or integrated with the processor 401.
When the memory 402 is a device independent from the processor 401, the electronic device 400 may further include:
a bus 403 for connecting the processor 401 and the memory 402.
The present embodiment also provides a readable storage medium, in which a computer program is stored, and when at least one processor of the electronic device executes the computer program, the electronic device executes the methods provided by the above various embodiments.
The present embodiment also provides a program product comprising a computer program stored in a readable storage medium. The computer program can be read from a readable storage medium by at least one processor of the electronic device, and the execution of the computer program by the at least one processor causes the electronic device to implement the methods provided by the various embodiments described above.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An information extraction method, comprising:
performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is determined according to a service scene type corresponding to the text to be processed;
determining a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, wherein the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;
and if the first probability value is larger than a preset probability threshold value, generating first intention information according to the first action class participle and the first noun class participle.
2. The information extraction method according to claim 1, wherein before performing the word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method comprises:
acquiring a to-be-processed conversation, wherein the to-be-processed conversation is a conversation text between a customer service robot and a customer;
and extracting the client text in the dialog to be processed to generate the text to be processed.
3. The information extraction method according to claim 1 or 2, wherein the first-name word class dictionary includes a business class dictionary and an activity class dictionary.
4. The information extraction method according to claim 1 or 2, wherein the performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set comprises:
performing word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set;
performing synonym clustering on the action class participles in the initial action class participle set according to a preset synonym dictionary to generate an action class participle set;
and carrying out synonym clustering on the noun class participles in the initial noun class participle set according to a preset synonym dictionary to generate the noun class participle set.
5. The information extraction method according to claim 4, wherein after performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method further comprises:
performing word frequency sequencing on the action class participles in the action class participle set to determine that the action class participles sequenced before a first position form a high-frequency action class participle set, wherein the first action class participle belongs to the high-frequency action class participle set;
and performing word frequency sequencing on the noun class participles in the noun class participle set to determine that the noun class participles sequenced before the second position form a high-frequency noun class participle set, wherein the first noun class participle belongs to the high-frequency noun class participle set.
6. An information extraction apparatus characterized by comprising:
the text word segmentation module is used for performing word segmentation processing on a text to be processed according to a preset word segmentation tool so as to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the text to be processed;
a probability determination module, configured to determine a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;
and the information generation module is used for generating first intention information according to the first action class participle and the first noun class participle if the first probability value is greater than a preset probability threshold value.
7. The information extraction apparatus according to claim 6, characterized by further comprising:
the system comprises a conversation acquisition module, a conversation processing module and a conversation processing module, wherein the conversation acquisition module is used for acquiring a to-be-processed conversation which is a conversation text between a customer service robot and a client;
and the text extraction module is used for extracting the client text in the dialog to be processed to generate the text to be processed.
8. The information extraction apparatus according to claim 6 or 7, wherein the first-name word class dictionary includes a business class dictionary and an activity class dictionary.
9. An electronic device, comprising:
a processor; and the number of the first and second groups,
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the information extraction method of any one of claims 1 to 5 via execution of the executable instructions.
10. A storage medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements the information extraction method of any one of claims 1 to 5.
CN201911379232.1A 2019-12-27 2019-12-27 Information extraction method and device, electronic equipment and storage medium Pending CN111062211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911379232.1A CN111062211A (en) 2019-12-27 2019-12-27 Information extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911379232.1A CN111062211A (en) 2019-12-27 2019-12-27 Information extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111062211A true CN111062211A (en) 2020-04-24

Family

ID=70304325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911379232.1A Pending CN111062211A (en) 2019-12-27 2019-12-27 Information extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111062211A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163082A (en) * 2020-10-16 2021-01-01 泰康保险集团股份有限公司 Intention identification method and device, electronic equipment and storage medium
CN114124860A (en) * 2021-11-26 2022-03-01 中国联合网络通信集团有限公司 Session management method, device, equipment and storage medium
CN114860912A (en) * 2022-05-20 2022-08-05 马上消费金融股份有限公司 Data processing method and device, electronic equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN102929870A (en) * 2011-08-05 2013-02-13 北京百度网讯科技有限公司 Method for establishing word segmentation model, word segmentation method and devices using methods
CN105119966A (en) * 2015-07-15 2015-12-02 中国联合网络通信集团有限公司 Official account management method and device
CN106599278A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Identification method and method of application search intention
CN108121722A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 The construction method and device of knowledge base
CN108874921A (en) * 2018-05-30 2018-11-23 广州杰赛科技股份有限公司 Extract method, apparatus, terminal device and the storage medium of text feature word
CN109949830A (en) * 2019-03-12 2019-06-28 中国联合网络通信集团有限公司 User's intension recognizing method and equipment
CN110046227A (en) * 2019-04-17 2019-07-23 腾讯科技(深圳)有限公司 Configuration method, exchange method, device, equipment and the storage medium of conversational system
CN110309252A (en) * 2018-02-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of natural language processing method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929870A (en) * 2011-08-05 2013-02-13 北京百度网讯科技有限公司 Method for establishing word segmentation model, word segmentation method and devices using methods
CN102663139A (en) * 2012-05-07 2012-09-12 苏州大学 Method and system for constructing emotional dictionary
CN105119966A (en) * 2015-07-15 2015-12-02 中国联合网络通信集团有限公司 Official account management method and device
CN108121722A (en) * 2016-11-28 2018-06-05 渡鸦科技(北京)有限责任公司 The construction method and device of knowledge base
CN106599278A (en) * 2016-12-23 2017-04-26 北京奇虎科技有限公司 Identification method and method of application search intention
CN110309252A (en) * 2018-02-28 2019-10-08 阿里巴巴集团控股有限公司 A kind of natural language processing method and device
CN108874921A (en) * 2018-05-30 2018-11-23 广州杰赛科技股份有限公司 Extract method, apparatus, terminal device and the storage medium of text feature word
CN109949830A (en) * 2019-03-12 2019-06-28 中国联合网络通信集团有限公司 User's intension recognizing method and equipment
CN110046227A (en) * 2019-04-17 2019-07-23 腾讯科技(深圳)有限公司 Configuration method, exchange method, device, equipment and the storage medium of conversational system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163082A (en) * 2020-10-16 2021-01-01 泰康保险集团股份有限公司 Intention identification method and device, electronic equipment and storage medium
CN112163082B (en) * 2020-10-16 2023-09-12 泰康保险集团股份有限公司 Intention recognition method and device, electronic equipment and storage medium
CN114124860A (en) * 2021-11-26 2022-03-01 中国联合网络通信集团有限公司 Session management method, device, equipment and storage medium
CN114860912A (en) * 2022-05-20 2022-08-05 马上消费金融股份有限公司 Data processing method and device, electronic equipment and storage medium
CN114860912B (en) * 2022-05-20 2023-08-29 马上消费金融股份有限公司 Data processing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109522556B (en) Intention recognition method and device
CN110765759B (en) Intention recognition method and device
CN108595696A (en) A kind of human-computer interaction intelligent answering method and system based on cloud platform
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
US9436681B1 (en) Natural language translation techniques
CN110019742B (en) Method and device for processing information
CN108027814B (en) Stop word recognition method and device
CN111062211A (en) Information extraction method and device, electronic equipment and storage medium
CN103593412B (en) A kind of answer method and system based on tree structure problem
CN111445898B (en) Language identification method and device, electronic equipment and storage medium
US20210026890A1 (en) Faq consolidation assistance device, faq consolidation assistance method, and program
CN110287318B (en) Service operation detection method and device, storage medium and electronic device
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
CN111429157A (en) Method, device and equipment for evaluating and processing complaint work order and storage medium
CN111159334A (en) Method and system for house source follow-up information processing
CN111813923A (en) Text summarization method, electronic device and storage medium
CN112364622A (en) Dialog text analysis method, dialog text analysis device, electronic device and storage medium
CN110727764A (en) Phone operation generation method and device and phone operation generation equipment
CN113886545A (en) Knowledge question answering method, knowledge question answering device, computer readable medium and electronic equipment
CN112015895A (en) Patent text classification method and device
CN111460114A (en) Retrieval method, device, equipment and computer readable storage medium
CN110413779B (en) Word vector training method, system and medium for power industry
CN109298796B (en) Word association method and device
CN111324704B (en) Method and device for constructing speaking knowledge base and customer service robot
CN115688769A (en) Long text-based intention identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200424