CN111062211A

CN111062211A - Information extraction method and device, electronic equipment and storage medium

Info

Publication number: CN111062211A
Application number: CN201911379232.1A
Authority: CN
Inventors: 宋维林; 杨庆友; 黄林; 黎华清; 叶小辉; 杜敏聪; 陈燕芬
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2020-04-24

Abstract

The invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium. The information extraction method provided by the embodiment of the invention comprises the following steps: the method comprises the steps of firstly carrying out word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, then determining a first probability value corresponding to point mutual information between a first action class word segmentation in the action class word segmentation set and a first noun class word segmentation in the noun class word segmentation set, and if the first probability value is larger than a preset probability threshold, generating first intention information according to the first action class word segmentation and the first noun class word segmentation. According to the information extraction method provided by the embodiment of the invention, the probability value corresponding to point mutual information between any combination of the action type participles and the noun type participles is calculated, so that the combined words and sentences of the action type participles and the noun type participles with higher relevance are selected as the intention information of the user, and the new intention of the user is found.

Description

Information extraction method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an information extraction method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of artificial intelligence, the application of the interactive robot in the customer service industry is more and more extensive, but the satisfaction degree of the answering ability and the solving ability of a customer to the interactive robot is lower in daily operation.

The customer service robot and the customer can generate a large amount of unstructured text data in the interaction process, and the data contains real feedback data and requirements of the customer. At present, in the daily maintenance process of the robot, a large amount of manpower is required to be invested for analysis so as to extract useful information, so that the new intention of a user is found, and the service coverage of a customer service robot is realized.

Therefore, the existing method for extracting useful information through manual analysis is low in efficiency, and when the method faces massive data, the manual analysis cannot meet the actual business requirements.

Disclosure of Invention

The invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium, which are used for quickly finding new intentions of customers, thereby providing great convenience for customer requirements and hotspot analysis.

In a first aspect, an embodiment of the present invention provides an information extraction method, including:

performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is determined according to a service scene type corresponding to the text to be processed;

determining a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, wherein the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;

and if the first probability value is larger than a preset probability threshold value, generating first intention information according to the first action class participle and the first noun class participle.

In a possible design, before performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method includes:

acquiring a to-be-processed conversation, wherein the to-be-processed conversation is a conversation text between a customer service robot and a customer;

and extracting the client text in the dialog to be processed to generate the text to be processed.

In one possible design, the noun class dictionary includes a business class dictionary and an activity class dictionary.

In a possible design, the performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set includes:

performing word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set;

performing synonym clustering on the action class participles in the initial action class participle set according to a preset synonym dictionary to generate an action class participle set;

and carrying out synonym clustering on the noun class participles in the initial noun class participle set according to a preset synonym dictionary to generate the noun class participle set.

In a possible design, after performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method further includes:

performing word frequency sequencing on the action class participles in the action class participle set to determine that the action class participles sequenced before a first position form a high-frequency action class participle set, wherein the first action class participle belongs to the high-frequency action class participle set;

and performing word frequency sequencing on the noun class participles in the noun class participle set to determine that the noun class participles sequenced before the second position form a high-frequency noun class participle set, wherein the first noun class participle belongs to the high-frequency noun class participle set.

In a second aspect, the present invention further provides an information extracting apparatus, including:

the text word segmentation module is used for performing word segmentation processing on a text to be processed according to a preset word segmentation tool so as to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the text to be processed;

a probability determination module, configured to determine a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;

and the information generation module is used for generating first intention information according to the first action class participle and the first noun class participle if the first probability value is greater than a preset probability threshold value.

In a possible design, the information extracting apparatus further includes:

the system comprises a conversation acquisition module, a conversation processing module and a conversation processing module, wherein the conversation acquisition module is used for acquiring a to-be-processed conversation which is a conversation text between a customer service robot and a client;

and the text extraction module is used for extracting the client text in the dialog to be processed to generate the text to be processed.

In one possible design, the text segmentation module is specifically configured to:

In a possible design, the information extracting apparatus further includes: the word frequency ordering module is specifically used for:

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

a processor; and the number of the first and second groups,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any one of the possible information extraction methods of the first aspect via execution of the executable instructions.

In a fourth aspect, an embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements any one of the possible information extraction methods in the first aspect.

The embodiment of the invention provides an information extraction method, an information extraction device, electronic equipment and a storage medium, by using a scene dictionary containing a specific business scene as a word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain an action class word segmentation set and a noun class word segmentation set, wherein, the application of the scene dictionary can make the word segmentation of the text to be processed generated according to the specific service scene more accurate, and then the probability value corresponding to the point mutual information between the arbitrary combination of the action class word segmentation and the noun class word segmentation is calculated, the combined words and phrases of the action class participles and the noun class participles with higher relevance are selected as the intention information of the user, therefore, new intentions of the client can be found, great convenience can be provided for client requirements and hotspot analysis, the manual analysis and daily operation cost is reduced, and the answering capability and the solving capability of the robot can be greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of an application scenario of an information extraction method according to an example embodiment of the present invention;

FIG. 2 is a flow diagram illustrating an information extraction method according to an example embodiment of the invention;

FIG. 3 is a flow diagram illustrating an information extraction method according to another example embodiment of the present invention;

FIG. 4 is a flow diagram illustrating a manner in which a scene dictionary is determined in accordance with an exemplary embodiment of the present invention;

fig. 5 is a schematic structural diagram of an information extraction apparatus according to an example embodiment of the present invention;

fig. 6 is a schematic configuration diagram of an information extraction apparatus according to another exemplary embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

With the popularization of artificial intelligence, the application of the interactive robot in the customer service industry is more and more extensive, but the satisfaction degree of the answering ability and the solving ability of a customer to the interactive robot is lower in daily operation. The customer service robot and the customer can generate a large amount of unstructured text data in the interaction process, and the data contains real feedback data and requirements of the customer. At present, in the daily maintenance process of the robot, a large amount of manpower is required to be invested for analysis so as to extract useful information, so that the new intention of a user is found, and the service coverage of a customer service robot is realized. Therefore, the existing method for extracting useful information through manual analysis is low in efficiency, and when the method faces massive data, the manual analysis cannot meet the actual business requirements.

In view of the above-mentioned problems, embodiments of the present invention provide an information extraction method, by using a scene dictionary containing a specific business scene as a word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain an action class word segmentation set and a noun class word segmentation set, wherein, the application of the scene dictionary can make the word segmentation of the text to be processed generated according to the specific service scene more accurate, and then the probability value corresponding to the point mutual information between the arbitrary combination of the action class word segmentation and the noun class word segmentation is calculated, the combined words and phrases of the action class participles and the noun class participles with higher relevance are selected as the intention information of the user, therefore, new intentions of the client can be found, great convenience can be provided for client requirements and hotspot analysis, the manual analysis and daily operation cost is reduced, and the answering capability and the solving capability of the robot can be greatly improved.

Fig. 1 is a schematic diagram of an application scenario of an information extraction method according to an example embodiment of the present invention. As shown in fig. 1, the information extraction method provided by this embodiment may be applied to a robot client dialog scenario, especially a robot client dialog scenario for a network operator service scenario, for mining a user intention from a dialog between a robot and a client. Specifically, text information or voice information input by the client 100 may be uploaded to the server 200, and the customer service robot 300 performs a dialogue interaction with the client 100 through the server 200 to generate a to-be-processed dialogue, so as to extract intention information from the to-be-processed dialogue to dig out the intention of the user.

Fig. 2 is a flowchart illustrating an information extraction method according to an example embodiment of the present invention. As shown in fig. 2, the information extraction method provided in this embodiment includes:

step 101, performing word segmentation processing on a text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set.

Specifically, word segmentation processing can be performed on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, wherein a word segmentation dictionary used by the preset word segmentation tool comprises a scene dictionary, the scene dictionary comprises an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to the service scene type corresponding to the text to be processed.

It should be noted that, for the preset word segmentation tool, for example, the preset word segmentation tool may be any one of a Jieba word segmentation tool, a SnowNLP word segmentation tool, a pkuserg word segmentation tool, a THULAC word segmentation tool, and a HanLP word segmentation tool. When the existing word segmentation tool is used for performing word segmentation processing on a text to be processed, a general dictionary carried by the word segmentation tool is generally adopted. For the robot client dialogue scenario applied in this embodiment, especially for the robot client dialogue scenario of the network operator service scenario, a general dictionary is used for word segmentation, and the word segmentation is usually inaccurate. Because the business class words, the activity class entries and the action class entries are not distinguished, the analysis result is difficult to judge by a machine, manual item-by-item analysis and checking are needed, a large amount of manpower and material resources are wasted, the operation cost is high, and the efficiency is low.

The scene dictionary can be an industry scene dictionary with professionalism and specificity formed by accumulating enough business nouns, active vocabularies and verb vocabularies.

Fig. 4 is a flowchart illustrating a scene dictionary determination method according to an example embodiment of the present invention. As shown in fig. 4, the scene dictionary may include an action class dictionary and a noun class dictionary, and the noun class dictionary may include a business class dictionary and an activity class dictionary. For the business class dictionary may be included: package type entries, traffic type entries, value added service type entries, etc. The activity class dictionary may include: public number activity type entries, holiday promotion type entries, maintenance activity type entries, and the like. When a user selects a scene (for example, a package opening scene, a flow complaint scene, a recharging scene, a movable preferential consultation scene, a point exchange scene and the like), corresponding entries and action class dictionaries are matched to form a scene dictionary (for example, a scene dictionary 1 or a scene dictionary 2) matched with the scene characteristics.

In addition, the word segmentation dictionary used by the preset word segmentation tool can also comprise a scene dictionary and a general dictionary, so that the coverage range of the dictionary is further expanded, and the word segmentation accuracy of the text to be processed is further improved.

Step 102, determining a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle.

In this step, a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle may be determined, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set. The first action class participle may be any action class participle in the action class participle set, and similarly, the first noun class participle may be any noun class participle in the noun class participle set. It can be seen that the above steps are used to determine a first probability value corresponding to point-to-point information after any two-two combination of the action class participle set and the noun class participle set.

It should be understood that the index of Mutual point Information (PMI) measures the correlation between two things, for example, for two words, it measures the correlation between two random variables, that is, the amount of Information contained in one random variable about the other random variable, and here, the specific calculation process is not limited in this embodiment.

And 103, if the first probability value is larger than a preset probability threshold, generating first intention information according to the first action class participle and the first noun class participle.

Since the expression intention of the client is often expressed in the form of "action + noun" (e.g., "action + business noun" or "action + activity noun" etc.) through the discovery of daily operation analysis, the intention is expressed. Therefore, when the correlation between the action class participle set acquired from the text to be processed and the combination of the action class participle and the noun class participle in the noun class participle set is high, the combination can be determined as the intention information of the client in the text to be processed.

Specifically, a first probability value corresponding to point mutual information between the first action type participle and the first noun type participle is determined, and if the first probability value is greater than a preset probability threshold value, it indicates that the first action type participle and the first noun type participle have high correlation, and the first intention information can be generated according to the first action type participle and the first noun type participle.

For example: the probability value corresponding to the point mutual information between the opening + flow packets is 0.9, the probability value corresponding to the point mutual information between the changing + flow packets is 0.2, and a preset probability threshold value can be set to be 0.8, so that the opening + flow packets can be determined as intention information determined according to the text to be processed, and can be used as a new intention of the client, so that the service corresponding to the opening + flow packets can be further performed on the client through a robot customer service or a manual client.

In the embodiment, the scene dictionary containing the specific service scene is used as the word segmentation dictionary of the preset word segmentation tool, and the word segmentation tool is used for carrying out word segmentation on the text to be processed to obtain the action class word segmentation set and the noun class word segmentation set, wherein the application of the scene dictionary can enable the word segmentation of the text to be processed generated according to the specific service scene to be more accurate, and then the probability value corresponding to point mutual information between any combination of the action class word segmentation and the noun class word segmentation is calculated to select the combined word and sentence of the action class word segmentation and the noun class word segmentation with higher correlation as the intention information of the user, so that the new intention of the user is found, great convenience is provided for the requirement and hotspot analysis of the user, the manual analysis and daily operation cost are reduced, and the solution capability of the robot can be greatly improved.

Fig. 3 is a flowchart illustrating an information extraction method according to another example embodiment of the present invention. As shown in fig. 3, the information extraction method provided in this embodiment includes:

step 201, obtaining a dialog to be processed.

Step 202, extracting the client text in the dialog to be processed to generate the text to be processed.

In the step, a to-be-processed dialog is obtained, wherein the to-be-processed dialog is a dialog text between the customer service robot and the customer, and then the customer text in the to-be-processed dialog is extracted to generate the to-be-processed text.

The pending conversation is a conversation text between the customer service robot and the customer, and the pending conversation may be as follows:

customer: you are good to help me look up XXX.

Customer service robot: your XXX is XXX.

Customer: good, when that XXX activity starts.

Customer service robot: XXX activity start time is XXXX.

For the above dialog text, it can be split into:

the client part:

customer: you are good to help me look up XXX.

Customer: good, when that XXX activity starts.

The customer service robot part:

customer service robot: your XXX is XXX.

Customer service robot: XXX activity start time is XXXX.

Since the information extraction method provided by the present embodiment is intended to mine intention information in a sentence expressed by a client, in order to reduce the amount of calculation, exclude noise data to improve the intention information recognition accuracy, it is possible to discard the customer service robot dialogue portion until the client text in the dialogue to be processed is extracted as the text to be processed. For the extracted text to be processed, for example:

customer: you are good to help me look up XXX.

Customer: good, when that XXX activity starts.

Step 203, performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set.

Specifically, word segmentation processing may be performed on the text to be processed according to a preset word segmentation tool to obtain an initial action class word segmentation set and an initial noun class word segmentation set, where a word segmentation dictionary used by the preset word segmentation tool includes a scene dictionary, the scene dictionary includes an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the text to be processed.

And 204, clustering synonyms to generate an action class participle set and a noun class participle set.

Because the customer often can not use standard words in the process of inputting characters, but adopts various spoken expressions, and often can input wrongly written characters, in order to improve the accuracy of extracting intention information, different expression methods and wrongly written characters of the customer can be clustered into one standard word, so that the actual requirements of the user can be analyzed.

Specifically, the action class participles in the initial action class participle set may be subjected to synonym clustering according to a preset synonym dictionary to generate an action class participle set, and the noun class participles in the initial noun class participle set may be subjected to synonym clustering according to the preset synonym dictionary to generate a noun class participle set.

For example, synonym clustering may be performed on the words referred to in the following table, and then the words are aggregated into "change", and the specific word list is as follows:

root-changing	Is changed off	Variations in	Handover
				Modified by	Change to	Is modified into	Change
Become into	Rotating shaft	Regulating	Become
				Is replaced by	Changes are made to	Changing of	Adjustment of
Replacement of	Turn back to	Instead, it is changed into	Exchange of
				Improvement of	Change to	Updating	Changes are made to
Conversion	Transformation of	Is converted into	Become into
				Mutual rotation	Variations in	Replacement of	Change of
Is turned into	Transformation of	Root-modifying	Transformation of
				Improvement of	Changeable pipe	Conversion	Modifying

Step 205, performing word frequency ordering on the participles in the action class participle set and the noun class participle set.

In addition, since the corresponding words are usually expressed many times when the client expresses the intention information, after determining the action class participle set and the noun class participle set, in order to reduce the amount of calculation and improve the intention extraction accuracy, the participle and the aggregated vocabulary entry may be used to sort the word frequencies from high to low, and the word frequencies are sorted according to different business class vocabulary entries, activity class vocabulary entries, action vocabulary entries and general dictionary class words, so as to perform the association analysis, and find out the expression intention of the user with high frequency.

Specifically, the action class participles in the action class participle set may be word-frequency ordered to determine that the action class participles ordered before the first position (e.g., the first 5, 10, 100, etc.) constitute the high-frequency action class participle set. Similarly, the noun class participles in the noun class participle set are subjected to word frequency ordering to determine that the noun class participles ordered before the second position (for example, the first 5 bits, the first 10 bits, the first 100 bits, and the like) form a high-frequency noun class participle set.

Step 206, determining a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle.

In this step, a first probability value corresponding to point mutual information between the first action class participle and the first noun class participle may be determined, where the first action class participle belongs to the high-frequency action class participle set, and the first noun class participle belongs to the high-frequency noun class participle set. The first action class participle may be any action class participle in the high frequency action class participle set, and similarly, the first noun class participle may be any noun class participle in the high frequency noun class participle set. Therefore, the step is used for determining a first probability value corresponding to point mutual information after any two combinations of the action class participles and the noun class participles in the high-frequency action class participle set and the high-frequency noun class participle set.

Step 207, if the first probability value is greater than the preset probability threshold, generating first intention information according to the first action category participle and the first noun category participle.

It is worth to be noted that, the specific implementation manner of step 207 in this embodiment may refer to the specific description of step 103 in the embodiment shown in fig. 2, and is not described again here.

Fig. 5 is a schematic structural diagram of an information extraction apparatus according to an example embodiment of the present invention. As shown in fig. 5, the information extraction apparatus 300 according to the present embodiment includes:

the text word segmentation module 301 is configured to perform word segmentation on a to-be-processed text according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, where a word segmentation dictionary used by the preset word segmentation tool includes a scene dictionary, the scene dictionary includes an action class dictionary and a noun class dictionary, and the scene dictionary is a dictionary determined according to a service scene type corresponding to the to-be-processed text;

a probability determining module 302, configured to determine a first probability value corresponding to point mutual information between a first action class participle and a first noun class participle, where the first action class participle belongs to the action class participle set, and the first noun class participle belongs to the noun class participle set;

an information generating module 303, configured to generate first intention information according to the first action class participle and the first noun class participle if the first probability value is greater than a preset probability threshold.

On the basis of the embodiment shown in fig. 5, fig. 6 is a schematic structural diagram of an information extraction apparatus according to another exemplary embodiment of the present invention. As shown in fig. 6, the information extraction apparatus 300 according to the present embodiment further includes:

a dialog acquisition module 304, configured to acquire a to-be-processed dialog, where the to-be-processed dialog is a dialog text between a customer service robot and a client;

a text extracting module 305, configured to extract a client text in the to-be-processed dialog to generate the to-be-processed text.

In one possible design, the text segmentation module 301 is specifically configured to:

In one possible design, the information extracting apparatus 300 further includes: the word frequency ordering module 306 is specifically configured to:

Each functional unit in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

It should be noted that the information extraction device provided in the embodiments shown in fig. 5 to 6 can be used to execute the information extraction method provided in the embodiments shown in fig. 2 to 3, and the specific implementation manner and the technical effect are similar, and are not described herein again.

Fig. 7 is a schematic structural diagram of an electronic device shown in accordance with an example embodiment of the present invention. As shown in fig. 7, the present embodiment provides an electronic device 400, including:

a processor 401; and the number of the first and second groups,

a memory 402 for storing executable instructions of the processor, which may also be a flash (flash memory);

wherein the processor 401 is configured to perform the steps of the above-described method via execution of the executable instructions. Reference may be made in particular to the description relating to the preceding method embodiment.

Alternatively, the memory 402 may be separate or integrated with the processor 401.

When the memory 402 is a device independent from the processor 401, the electronic device 400 may further include:

a bus 403 for connecting the processor 401 and the memory 402.

The present embodiment also provides a readable storage medium, in which a computer program is stored, and when at least one processor of the electronic device executes the computer program, the electronic device executes the methods provided by the above various embodiments.

The present embodiment also provides a program product comprising a computer program stored in a readable storage medium. The computer program can be read from a readable storage medium by at least one processor of the electronic device, and the execution of the computer program by the at least one processor causes the electronic device to implement the methods provided by the various embodiments described above.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. An information extraction method, comprising:

2. The information extraction method according to claim 1, wherein before performing the word segmentation processing on the text to be processed according to the preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method comprises:

3. The information extraction method according to claim 1 or 2, wherein the first-name word class dictionary includes a business class dictionary and an activity class dictionary.

4. The information extraction method according to claim 1 or 2, wherein the performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set comprises:

5. The information extraction method according to claim 4, wherein after performing word segmentation processing on the text to be processed according to a preset word segmentation tool to obtain an action class word segmentation set and a noun class word segmentation set, the method further comprises:

6. An information extraction apparatus characterized by comprising:

7. The information extraction apparatus according to claim 6, characterized by further comprising:

8. The information extraction apparatus according to claim 6 or 7, wherein the first-name word class dictionary includes a business class dictionary and an activity class dictionary.

9. An electronic device, comprising:

a processor; and the number of the first and second groups,

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the information extraction method of any one of claims 1 to 5 via execution of the executable instructions.

10. A storage medium on which a computer program is stored, characterized in that the program, when executed by a processor, implements the information extraction method of any one of claims 1 to 5.