CN111985206A - Corpus understanding method and equipment - Google Patents

Corpus understanding method and equipment Download PDF

Info

Publication number
CN111985206A
CN111985206A CN202010691228.5A CN202010691228A CN111985206A CN 111985206 A CN111985206 A CN 111985206A CN 202010691228 A CN202010691228 A CN 202010691228A CN 111985206 A CN111985206 A CN 111985206A
Authority
CN
China
Prior art keywords
corpus
understanding
intention
condition
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010691228.5A
Other languages
Chinese (zh)
Inventor
孙佳
宋鸣
张东海
陈红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202010691228.5A priority Critical patent/CN111985206A/en
Publication of CN111985206A publication Critical patent/CN111985206A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a corpus understanding method and equipment, wherein the method comprises the following steps: obtaining a corpus sample for representing user intention; performing stem extraction on the corpus samples through a syntactic analyzer to obtain stem samples, wherein the stem samples are used for representing stem information in the corpus samples; training a classification model according to the trunk sample to obtain a first understanding model; training a semantic understanding model according to the corpus samples to obtain a second understanding model; the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition; the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information; the first condition and the second condition are different.

Description

Corpus understanding method and equipment
Technical Field
The invention relates to the technical field of corpus processing, in particular to a corpus understanding method and device.
Background
Semantic understanding generally refers to parsing a corpus into structured, machine-readable intent and word slot information to facilitate better machine understanding and meet user needs. The existing corpus understanding model generally judges the intention of a user according to the input corpus of the user. However, different people have different input habits, and when people are used to input short sentences omitting non-main-stem components in sentences, the existing semantic understanding model is difficult to obtain enough information content, and the situation of understanding errors is easy to occur.
Disclosure of Invention
The embodiment of the invention provides a corpus understanding method and equipment, which have the effect of improving the accuracy of corpus understanding.
An embodiment of the present invention provides a corpus understanding method, in one aspect, the method includes: obtaining a corpus sample for representing user intention; performing stem extraction on the corpus samples through a syntactic analyzer to obtain stem samples, wherein the stem samples are used for representing stem information in the corpus samples; training a classification model according to the trunk sample to obtain a first understanding model; training a semantic understanding model according to the corpus samples to obtain a second understanding model; the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition; the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information; the first condition and the second condition are different.
In one embodiment, the first condition is a specific length range; correspondingly, the method further comprises the following steps: obtaining a specified corpus; under the condition that the specified corpus is judged to meet a specific length range, intention understanding is carried out on the specified corpus through a first understanding model, and first trunk intention information is obtained; the first trunk intention information is one of the trunk intention information.
In an embodiment, the method further comprises: under the condition that the specified corpus is judged not to meet the specific length range, carrying out skeleton extraction on the specified corpus through a syntactic analyzer to obtain a skeleton text; performing intention understanding on the main body text through a first understanding model to obtain second main body intention information; the second trunk intention information is one of the trunk intention information.
In one embodiment, the second condition is a non-specific length range; correspondingly, the method further comprises the following steps: under the condition that the specified corpus is judged to meet the unspecified length range, intention understanding is carried out on the specified corpus through a second understanding model, and first corpus intention information is obtained; the first corpus intent information is one of the corpus intent information.
In one embodiment, the second condition is processed by the first understanding model; correspondingly, the method further comprises the following steps: after the result of the specified corpus is judged to be processed by the first understanding model, the specified corpus is processed by the second understanding model, and second corpus intention information is obtained; the second corpus intent information is one of the corpus intent information.
In an embodiment, the method further comprises: and integrating the corpus intention information and the main intention information to obtain first appointed intention information corresponding to the appointed corpus.
In an embodiment, the method further comprises: training an intention understanding model according to a main intention information sample corresponding to the main sample and a corpus intention information sample corresponding to the corpus sample to obtain a third understanding model; the third understanding model is used for predicting the corpus intention information and the main stem intention information to obtain first appointed intention information corresponding to the appointed corpus.
Another aspect of an embodiment of the present invention provides a corpus understanding device, where the device includes: the obtaining module is used for obtaining a corpus sample for representing the intention of a user; the extraction module is used for carrying out stem extraction on the corpus samples through a syntactic analyzer to obtain stem samples, and the stem samples are used for representing stem information in the corpus samples; the training module is used for training the classification model according to the trunk sample to obtain a first understanding model; the training module is used for training a semantic understanding model according to the corpus samples to obtain a second understanding model; the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition; the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information; the first condition and the second condition are different.
In one embodiment, the first condition is a specific length range; correspondingly, the obtaining module is further configured to obtain the specified corpus; the apparatus further comprises: the understanding module is used for carrying out intention understanding on the specified corpus through a first understanding model under the condition that the specified corpus is judged to meet a specific length range, and obtaining first trunk intention information; the first trunk intention information is one of the trunk intention information.
In an implementation manner, the extracting module is further configured to, when it is determined that the specified corpus does not satisfy a specific length range, perform skeleton extraction on the specified corpus through a syntax analyzer to obtain a skeleton text; the understanding module is used for carrying out intention understanding on the main text through a first understanding model to obtain second main intention information; the second trunk intention information is one of the trunk intention information.
In one embodiment, the second condition is a non-specific length range; correspondingly, the understanding module is further configured to, under the condition that the specified corpus is judged to satisfy the unspecified length range, perform intent understanding on the specified corpus through a second understanding model to obtain first corpus intent information; the first corpus intent information is one of the corpus intent information.
In one embodiment, the second condition is processed by the first understanding model; correspondingly, the device further comprises: the processing module is used for processing the specified corpus through the second understanding model after the first understanding model is judged to be processed according to the specified corpus result, and obtaining second corpus intention information; the second corpus intent information is one of the corpus intent information.
In an embodiment, the apparatus further comprises: and the integration module is used for integrating the corpus intention information and the main intention information to obtain first appointed intention information corresponding to the appointed corpus.
In an implementation manner, the training module is further configured to train an intention understanding model according to a main intention information sample corresponding to the main sample and a corpus intention information sample corresponding to the corpus sample, so as to obtain a third understanding model; the third understanding model is used for predicting the corpus intention information and the main stem intention information to obtain first appointed intention information corresponding to the appointed corpus.
According to the corpus understanding method and the corpus understanding device, the first understanding model and the second understanding model are trained, the targeted models are determined in the first understanding model and the second understanding model according to the condition conditions of the designated corpus, the targeted models are used for intention understanding of the designated corpus, the understanding rate of the models on the designated corpus is improved, and more accurate intention information for intention understanding of the designated corpus is obtained.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
FIG. 1 is a schematic flow chart illustrating an implementation of a corpus understanding method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an implementation flow of intent understanding of a corpus understanding method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating an implementation flow of intent understanding of a corpus understanding method according to another embodiment of the present invention;
fig. 4 is a schematic diagram of an implementation module of a corpus understanding device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart illustrating an implementation of a corpus understanding method according to an embodiment of the present invention.
Referring to fig. 1, in one aspect, an embodiment of the present invention provides a corpus understanding method, including: operation 101, obtaining a corpus sample for representing a user intention; in operation 102, performing skeleton extraction on the corpus sample through a syntax analyzer to obtain a skeleton sample, where the skeleton sample is used to represent skeleton information in the corpus sample; operation 103, training the classification model according to the trunk sample to obtain a first understanding model; operation 104, training the semantic understanding model according to the corpus samples to obtain a second understanding model; the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition; the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information; the first condition and the second condition are different.
According to the corpus understanding method provided by the embodiment of the invention, the targeted model is determined in the first understanding model and the second understanding model by training the first understanding model and the second understanding model according to the condition meeting of the specified corpus, the targeted model is used for performing intention understanding on the specified corpus, the understanding rate of the model on the specified corpus is improved, and more accurate intention information for intention understanding of the specified corpus is obtained. The corpus understanding scheme of the method is suitable for intelligent customer service, and through selecting the model with pertinence, the intention information with higher accuracy can be obtained under the condition that the user adopts various expression modes.
In operation 101, the corpus sample may be from a database, or may be written manually or by a machine, and may be a corpus sample containing a main component, or may be a corpus sample containing a non-main component, or may be a corpus sample containing both a main component and a non-main component. The method can adopt various types of corpora as the corpus samples. The corpus sample is used for representing the user intention; the corpus sample can be specifically used for representing the problems, instructions and other contents of the user. It can be understood that when the method is applied to fields with pertinence, such as the field of express delivery, the field of intelligent home, and the field of intelligent customer service, the corpus sample can be adjusted in a targeted manner according to the fields.
In operation 102, the syntactic analyzer is configured to determine a skeleton component in the corpus sample, so as to extract a skeleton from the corpus sample and obtain a skeleton sample. It can be understood that the syntactic analyzer is mainly used for performing stem extraction on the corpus sample, and in particular, can be used for performing stem extraction on the corpus sample containing non-stem components. The method can extract the stems of all the corpus samples through a syntactic analyzer to obtain stem samples. The method can also judge whether the corpus sample belongs to the corpus sample containing the non-trunk component or not, and extract the trunk of the corpus sample judged to contain the non-trunk component through a syntactic analyzer to obtain the trunk sample. The method for determining the corpus sample containing the non-main components comprises the steps of determining whether the sample belongs to the corpus sample containing the non-main components, wherein the length of the corpus sample can be used as a determination basis, namely when the length of the corpus sample exceeds a set length threshold, determining that the corpus sample is the corpus sample containing the non-main components. It is understood that the corpus sample containing non-stem components may contain stem information, auxiliary words, discourse words, adjectives, etc., while the stem sample only contains stem information, wherein the stem information at least includes predicates and objects, and may also include subjects. For example, when the corpus sample is "i need to buy some milk", after the corpus sample is analyzed by the syntactic analyzer, the stem sample is obtained as "i buy milk" or "buy milk".
In the method operation 103, the classification model is trained through the trunk samples so that the classification model can understand the intention information corresponding to the trunk samples, thereby obtaining a first understanding model. It can be understood that the first understanding model can accurately analyze the corpus only containing the stem information to obtain the corresponding stem intention information. Compared with the understanding model obtained by training the corpus sample, the first understanding model can determine the corresponding main intention information under the condition that the information amount of the input information is less. The classification model can adopt various classification models which can be suitable for the semantic understanding field.
In operation 104 of the method, the semantic understanding model is trained through all the corpus samples, so that the semantic understanding model can understand the intention information corresponding to the corpus samples, thereby obtaining a second understanding model. It is understood that the second understanding model can accurately analyze various types of corpus samples to obtain corresponding corpus intent information. In another case, before the semantic understanding model is trained through the corpus samples, the method may classify the corpus samples through a syntax analyzer, screen out the corpus samples only containing the trunk information, train the semantic understanding model through the corpus samples containing the non-trunk components so that the semantic understanding model can understand the corpus intent information of the corpus samples containing the non-trunk components, and further make the complementarity between the first understanding model and the second understanding model stronger. The semantic understanding model may adopt any kind of model, such as a DSSM model, an LSTM-DSSM model, an RNN model, an LSTM model, and the like. It is further to be added that the first understanding model and the second understanding model may adopt the same kind of model, or may adopt different kinds of models.
The specified corpus is used for referring to a corpus which needs to be understood intently in the method, and the specified corpus can be from a user, an intelligent household appliance or a terminal used by the user. When the specified corpus is obtained, the method firstly analyzes the specified corpus to judge whether the specified corpus meets a first condition and a second condition, wherein the first condition is a condition related to whether the specified corpus only contains trunk components. Therefore, in the case where the specified corpus satisfies the first condition, the specified corpus can be subjected to intent understanding by the first understanding model to obtain the skeleton intent information. The second condition may be a condition that specifies whether the corpus contains non-stem components, or may be any other condition, and the first condition is different from the second condition. Therefore, in the case that the second condition is satisfied, the second understanding model can be adopted to perform intention understanding on the specified corpus to obtain corpus intention information. In one case, the first condition and the second condition are mutually exclusive conditions, i.e., in the case where the first condition is satisfied, the second condition is not satisfied. At this time, when the first condition is satisfied, the skeleton intention information output by the first understanding model is intention information corresponding to the specified corpus. And when the second condition is met, the corpus intention information output by the second understanding model is intention information corresponding to the specified corpus. In another case, the first condition and the second condition may be unrelated conditions, that is, in the case where the first condition is satisfied, the second condition may be satisfied, or the second condition may not be satisfied. In this case, when the specified corpus satisfies both the first condition and the second condition, the skeleton intention information and the corpus intention information may be reprocessed to determine intention information corresponding to the specified corpus.
Fig. 2 is a schematic flow chart illustrating a general understanding of a corpus understanding method according to an embodiment of the present invention.
Referring to fig. 2, in the embodiment of the present invention, the first condition is a specific length range; correspondingly, the method further comprises the following steps: operation 201, obtaining a specified corpus; in operation 202, when it is determined that the specified corpus meets the specific length range, performing intent understanding on the specified corpus through the first understanding model to obtain first trunk intent information; the first trunk intention information is one of the trunk intention information.
In general, the corpus containing only the main structure is usually a short sentence, while the corpus containing no main structure is usually a long sentence, and the sentence length is usually longer than the corpus containing only the main structure, therefore, the method uses the length of the sentence as the criterion for determining whether the corpus is the main, i.e. the first condition of the method is set as a specific length range, specifically, the specific length range can be determined by counting the sentence lengths of the main samples, and the specific length range is set within less than 10 words if the sentence lengths of the main samples, which are counted as 80%, do not exceed 10 words.
The specified linguistic data in the method can be obtained in a voice or text mode, and when the equipment collects voice, the voice is subjected to text conversion to obtain the specified linguistic data. And then determining the sentence length of the specified corpus, and under the condition that the sentence length of the specified corpus meets a specific length range, determining that the specified corpus belongs to the corpus with only the trunk structure, and at the moment, performing intention understanding on the specified corpus through the first understanding model to obtain first trunk intention information with more accurate understanding intention. It can be understood that the first skeleton intention information is used to refer to intention information obtained by predicting the specified corpus by the first understanding model when the first condition is a specific length range, and the first skeleton intention information is one of the skeleton intention information.
Fig. 3 is a schematic diagram illustrating an implementation process of intent understanding of a corpus understanding method according to another embodiment of the present invention.
Referring to fig. 3, in one embodiment, the method further comprises: operation 301, in a case that it is determined that the specified corpus does not satisfy the specific length range, performing skeleton extraction on the specified corpus through a syntax analyzer to obtain a skeleton text; operation 302, performing intention understanding on the main body text through the first understanding model to obtain second main body intention information; the second trunk intention information is one of the trunk intention information.
In the method, under the condition that the specified corpus does not meet the specific length range, the trunk extraction can be carried out on the specified corpus through the syntactic analyzer, so that the trunk text corresponding to the specified corpus is obtained. It can be understood that the intention understanding is then performed on the main body text through the first understanding model, so that second main body intention information is obtained. It should be noted that, the first and second skeleton intention information are only used to distinguish prediction results obtained by applying the first understanding model in different scenes, the second skeleton intention information is also one of the skeleton intention information, and the first and second skeleton intention information may be the same and may be different. It is further necessary to supplement that, after the skeleton extraction is performed on the specified corpus by the syntax analyzer, the obtained text can be obviously determined to be the skeleton text, and at this time, the judgment whether the skeleton text meets the specific length range may not be performed, and the intention understanding may be directly performed on the skeleton text through the first understanding model.
In one embodiment, the second condition is a non-specific length range; correspondingly, the method further comprises the following steps: firstly, under the condition that the specified corpus meets the unspecified length range, intention understanding is carried out on the specified corpus through a second understanding model, and first corpus intention information is obtained; then, the first corpus intent information is one of the corpus intent information.
It should be noted that the unspecified length in the second condition may be understood as an arbitrary length, and may be understood as a range of all lengths excluding the specific length corresponding to the first understanding model. That is, when the second condition is understood as any length, the obtained specified corpus is subjected to intention understanding through the second understanding model to obtain the first corpus intention information. After the first corpus intent information is obtained, the intent information corresponding to the specified corpus is obtained by integrating with the main stem intent information obtained by other embodiments.
When the second condition is understood to exclude all length ranges except the specific length corresponding to the first understanding model, and when the specified corpus is judged not to meet the first condition, the second understanding model can be adopted to predict the specified corpus to obtain the first corpus intention information. Similarly, under the condition of obtaining the first corpus intent information, the method can directly determine the first corpus intent information as the intent information corresponding to the specified corpus. The method also includes extracting a backbone of the specified corpus through a syntax analyzer, performing intention understanding on the extracted backbone text through a first understanding model to obtain second backbone intention information, and analyzing and integrating the first corpus intention information and the second backbone intention information to obtain intention information corresponding to the specified corpus.
In one embodiment, the second condition is processed by the first understanding model; correspondingly, the method further comprises the following steps: firstly, after judging that the result of the specified corpus is processed by the first understanding model, processing the specified corpus by the second understanding model to obtain second corpus intention information; then, the second corpus intent information is one of the corpus intent information.
In another case, the second condition may be processing through the first understanding model. It can be understood that all the specified corpuses need to be sequentially processed by the first understanding model and the second understanding model so as to obtain the main intention information and the second corpus intention information, and the main intention information and the second corpus intention information are integrated so as to obtain the intention information corresponding to the specified corpuses.
In yet another case, the second condition may be that the first understanding model is not processed. That is, it may be understood that, when the specified corpus does not satisfy the first condition, the specified corpus is processed by the second understanding model, and corpus intention information obtained by the second understanding model processing is determined as intention information.
In an embodiment, the method further comprises: and integrating the corpus intention information and the trunk intention information to obtain first appointed intention information corresponding to the appointed corpus.
Specifically, after the intention understanding is completed through the first understanding model and the second understanding model, the processing manner of the material intention information and the main intention information may be integration processing, so as to obtain the first designated intention information. In addition to the integration processing, confidence judgment may be performed on the corpus intent information and the trunk intent information to determine that the confidence level is higher among the corpus intent information and the trunk intent information as the intent information.
In an embodiment, the method further comprises: training an intention understanding model according to the main intention information sample corresponding to the main sample and the corpus intention information sample corresponding to the corpus sample to obtain a third understanding model; the third understanding model is used for predicting the corpus intention information and the main stem intention information to obtain first appointed intention information corresponding to the appointed corpus.
Specifically, the integration processing of the corpus intent information and the trunk intent information may be processed through a third understanding model, and the third understanding model is trained through a trunk intent information sample corresponding to the trunk sample and a corpus intent information sample corresponding to the corpus sample, so that the integration of the corpus intent information and the trunk intent information is realized, and the first specified intent information corresponding to the specified corpus is obtained.
To facilitate understanding of the above embodiments, several specific implementation scenarios are provided below for explanation. In one implementation scenario, the method is applied to an intelligent customer service of a store, and is trained through corpus samples associated with shopping of the store, firstly, stems of all corpora are extracted through a syntactic analyzer to obtain stem samples, and then, a classification model is trained through the stem samples to obtain a first understanding model. And then training the semantic understanding model through the corpus samples to obtain a second understanding model. The first condition is that the sentence length of the specified corpus does not exceed 8 characters, and the second condition is that the sentence length of the specified corpus exceeds 8 characters.
In the process of working, when the voice information of the user is collected, is "do you have milk in your store? I want to buy some milk. The intelligent customer service determines that the length of the voice information exceeds 8 words, inputs the voice information into the second understanding model for intention understanding, and obtains intention understanding information corresponding to 'milk buying'. When the collected voice information of the user is milk buying, the intelligent customer service determines that the length of the voice information is not more than 8 words, the voice information is input into the first understanding model for intention understanding, and intention understanding information corresponding to the milk buying is obtained.
In another implementation scenario, the method is applied to intelligent customer service in the field of express delivery, and is characterized in that a corpus sample associated with express delivery is trained, firstly, stems of all corpora are extracted through a syntax analyzer to obtain stem samples, and then, a classification model is trained through the stem samples to obtain a first understanding model. And then training the semantic understanding model through the corpus samples to obtain a second understanding model. And then training the intention understanding model through the main intention information sample corresponding to the main sample and the corpus intention information sample corresponding to the corpus sample to obtain a third understanding model. The first condition is that the sentence length of the specified corpus does not exceed 8 characters, and the second condition is that the prediction is carried out through the first understanding model.
When the intelligent customer service system works, when the collected voice information of the user is 'hello', an express delivery is sent out. The intelligent customer service determines that the length of the voice information exceeds 8 words, inputs the voice information into a syntactic analyzer for trunk extraction, obtains a trunk text of 'sending express', inputs the trunk text of 'sending express' into a first understanding model for intention understanding, and obtains trunk intention understanding information corresponding to 'sending express'. Then the voice message is' hello, i want to send an express delivery out. And inputting the semantic meaning information into a second understanding model to carry out intention understanding, and obtaining the corpus intention understanding information. And then inputting the main-stem intention understanding information and the corpus intention understanding information into a third understanding model for prediction to obtain intention understanding information corresponding to the voice information.
When the collected voice information of the user is 'sending express', the intelligent customer service determines that the length of the voice information is not more than 8 words, the main text 'sending express' is input into the first understanding model for intention understanding, and main intention understanding information corresponding to 'sending express' is obtained. And then will. Then the voice message is' hello, i want to send an express delivery out. And inputting the semantic meaning information into a second understanding model to carry out intention understanding, and obtaining the corpus intention understanding information. And then inputting the main-stem intention understanding information and the corpus intention understanding information into a third understanding model for prediction to obtain intention understanding information corresponding to the voice information.
In another implementation scenario, the method is applied to an intelligent housekeeper in the field of intelligent home, and the method is used for training through a corpus sample associated with the intelligent home, firstly, a backbone is extracted through a syntactic analyzer to obtain a backbone sample, and then, a classification model is trained through the backbone sample to obtain a first understanding model. And then training the semantic understanding model through the corpus samples and the main intention information samples corresponding to the main samples to obtain a second understanding model. The first condition is that the sentence length of the specified corpus does not exceed 8 characters, and the second condition is that the prediction is carried out through the first understanding model.
When the intelligent housekeeper works, the collected voice information of the user is that the user can turn on a kitchen lamp and is close to the water washing tank. The intelligent customer service determines that the length of the voice information exceeds 8 characters, inputs the voice information into a syntactic analyzer for backbone extraction, obtains a backbone text of turning on a kitchen lamp, inputs the backbone text of turning on the kitchen lamp into a first understanding model for intention understanding, and obtains backbone intention understanding information corresponding to the turning on of the kitchen lamp. Then the voice message is' help me turn on the light of the kitchen, and get close to the washing trough. And the main intention understanding information corresponding to the step of turning on the kitchen lamp is input into the second understanding model for intention understanding, and the corpus intention understanding information corresponding to the voice information is obtained.
Fig. 4 is a schematic diagram of an implementation module of a corpus understanding device according to an embodiment of the present invention.
Referring to fig. 4, another aspect of the embodiment of the present invention provides a corpus understanding device, including: an obtaining module 401, configured to obtain a corpus sample for representing a user intention; an extracting module 402, configured to perform skeleton extraction on the corpus sample through a syntax analyzer to obtain a skeleton sample, where the skeleton sample is used to represent skeleton information in the corpus sample; a training module 403, configured to train the classification model according to the trunk sample, to obtain a first understanding model; the training module 403 is configured to train the semantic understanding model according to the corpus samples to obtain a second understanding model; the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition; the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information; the first condition and the second condition are different.
In one embodiment, the first condition is a specific length range; correspondingly, the obtaining module is also used for obtaining the specified corpus; the apparatus further comprises: an understanding module 404, configured to perform intent understanding on the specified corpus through a first understanding model to obtain first trunk intent information when it is determined that the specified corpus satisfies a specific length range; the first trunk intention information is one of the trunk intention information.
In an implementation manner, the extracting module 402 is further configured to, when it is determined that the specified corpus does not satisfy the specific length range, perform skeleton extraction on the specified corpus through a syntax analyzer to obtain a skeleton text; an understanding module 404, configured to perform intention understanding on the main body text through the first understanding model, and obtain second main body intention information; the second trunk intention information is one of the trunk intention information.
In one embodiment, the second condition is a non-specific length range; correspondingly, the understanding module 404 is further configured to, when it is determined that the specified corpus satisfies the unspecified length range, perform intent understanding on the specified corpus through the second understanding model to obtain first corpus intent information; the first corpus intent information is one of corpus intent information.
In one embodiment, the second condition is processed by the first understanding model; correspondingly, the device further comprises: the processing module 405 is configured to, after the first understanding model is determined to be a result of the specified corpus and the first understanding model is processed, process the specified corpus through the second understanding model to obtain second corpus intention information; the second corpus intent information is one of corpus intent information.
In one embodiment, the apparatus further comprises: the integration module 406 is configured to integrate the corpus intent information and the backbone intent information to obtain first designated intent information corresponding to the designated corpus.
In an implementation manner, the training module 403 is further configured to train the intention understanding model according to the main intention information sample corresponding to the main sample and the corpus intention information sample corresponding to the corpus sample, so as to obtain a third understanding model; the third understanding model is used for predicting the corpus intention information and the main stem intention information to obtain first appointed intention information corresponding to the appointed corpus.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A corpus understanding method, characterized in that the method comprises:
obtaining a corpus sample for representing user intention;
performing stem extraction on the corpus samples through a syntactic analyzer to obtain stem samples, wherein the stem samples are used for representing stem information in the corpus samples;
training a classification model according to the trunk sample to obtain a first understanding model;
training a semantic understanding model according to the corpus samples to obtain a second understanding model;
the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition;
the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information;
the first condition and the second condition are different.
2. The method of claim 1, wherein the first condition is a specific length range;
correspondingly, the method further comprises the following steps:
obtaining a specified corpus;
under the condition that the specified corpus is judged to meet a specific length range, intention understanding is carried out on the specified corpus through a first understanding model, and first trunk intention information is obtained;
the first trunk intention information is one of the trunk intention information.
3. The method of claim 2, further comprising:
under the condition that the specified corpus is judged not to meet the specific length range, carrying out skeleton extraction on the specified corpus through a syntactic analyzer to obtain a skeleton text;
performing intention understanding on the main body text through a first understanding model to obtain second main body intention information;
the second trunk intention information is one of the trunk intention information.
4. The method of claim 1 or 2, wherein the second condition is a non-specific length range;
correspondingly, the method further comprises the following steps:
under the condition that the specified corpus is judged to meet the unspecified length range, intention understanding is carried out on the specified corpus through a second understanding model, and first corpus intention information is obtained;
the first corpus intent information is one of the corpus intent information.
5. The method according to claim 1 or 3, wherein the second condition is a first understanding model processing;
correspondingly, the method further comprises the following steps:
after the result of the specified corpus is judged to be processed by the first understanding model, the specified corpus is processed by the second understanding model, and second corpus intention information is obtained;
the second corpus intent information is one of the corpus intent information.
6. The method of claim 1, further comprising:
and integrating the corpus intention information and the main intention information to obtain first appointed intention information corresponding to the appointed corpus.
7. The method of claim 1, further comprising:
training an intention understanding model according to a main intention information sample corresponding to the main sample and a corpus intention information sample corresponding to the corpus sample to obtain a third understanding model;
the third understanding model is used for predicting the corpus intention information and the main stem intention information to obtain first appointed intention information corresponding to the appointed corpus.
8. A corpus understanding device, characterized in that said device comprises:
the obtaining module is used for obtaining a corpus sample for representing the intention of a user;
the extraction module is used for carrying out stem extraction on the corpus samples through a syntactic analyzer to obtain stem samples, and the stem samples are used for representing stem information in the corpus samples;
the training module is used for training the classification model according to the trunk sample to obtain a first understanding model;
the training module is used for training a semantic understanding model according to the corpus samples to obtain a second understanding model;
the first understanding model is used for performing intention understanding on the specified corpus to obtain main intention information under the condition that the specified corpus meets a first condition;
the second understanding model is used for performing intention understanding on the specified corpus under the condition that the specified corpus meets a second condition so as to obtain corpus intention information;
the first condition and the second condition are different.
9. The apparatus of claim 8, wherein the first condition is a specific length range; accordingly, the method can be used for solving the problems that,
the obtaining module is further configured to obtain the specified corpus;
the apparatus further comprises:
the understanding module is used for carrying out intention understanding on the specified corpus through a first understanding model under the condition that the specified corpus is judged to meet a specific length range, and obtaining first trunk intention information;
the first trunk intention information is one of the trunk intention information.
10. The apparatus of claim 9,
the extraction module is further configured to perform skeleton extraction on the specified corpus through a syntax analyzer to obtain a skeleton text under the condition that the specified corpus is judged not to satisfy the specific length range;
the understanding module is used for carrying out intention understanding on the main text through a first understanding model to obtain second main intention information;
the second trunk intention information is one of the trunk intention information.
CN202010691228.5A 2020-07-17 2020-07-17 Corpus understanding method and equipment Pending CN111985206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010691228.5A CN111985206A (en) 2020-07-17 2020-07-17 Corpus understanding method and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010691228.5A CN111985206A (en) 2020-07-17 2020-07-17 Corpus understanding method and equipment

Publications (1)

Publication Number Publication Date
CN111985206A true CN111985206A (en) 2020-11-24

Family

ID=73438665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010691228.5A Pending CN111985206A (en) 2020-07-17 2020-07-17 Corpus understanding method and equipment

Country Status (1)

Country Link
CN (1) CN111985206A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320374A (en) * 2008-07-10 2008-12-10 昆明理工大学 Field question classification method combining syntax structural relationship and field characteristic
CN109241261A (en) * 2018-08-30 2019-01-18 武汉斗鱼网络科技有限公司 User's intension recognizing method, device, mobile terminal and storage medium
CN110162633A (en) * 2019-05-21 2019-08-23 深圳市珍爱云信息技术有限公司 Voice data is intended to determine method, apparatus, computer equipment and storage medium
CN111078846A (en) * 2019-11-25 2020-04-28 青牛智胜(深圳)科技有限公司 Multi-turn dialog system construction method and system based on business scene
CN111400438A (en) * 2020-02-21 2020-07-10 镁佳(北京)科技有限公司 Method and device for identifying multiple intentions of user, storage medium and vehicle
CN111414457A (en) * 2020-03-20 2020-07-14 深圳前海微众银行股份有限公司 Intelligent question-answering method, device, equipment and storage medium based on federal learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101320374A (en) * 2008-07-10 2008-12-10 昆明理工大学 Field question classification method combining syntax structural relationship and field characteristic
CN109241261A (en) * 2018-08-30 2019-01-18 武汉斗鱼网络科技有限公司 User's intension recognizing method, device, mobile terminal and storage medium
CN110162633A (en) * 2019-05-21 2019-08-23 深圳市珍爱云信息技术有限公司 Voice data is intended to determine method, apparatus, computer equipment and storage medium
CN111078846A (en) * 2019-11-25 2020-04-28 青牛智胜(深圳)科技有限公司 Multi-turn dialog system construction method and system based on business scene
CN111400438A (en) * 2020-02-21 2020-07-10 镁佳(北京)科技有限公司 Method and device for identifying multiple intentions of user, storage medium and vehicle
CN111414457A (en) * 2020-03-20 2020-07-14 深圳前海微众银行股份有限公司 Intelligent question-answering method, device, equipment and storage medium based on federal learning

Similar Documents

Publication Publication Date Title
CN107832286B (en) Intelligent interaction method, equipment and storage medium
CN107818781B (en) Intelligent interaction method, equipment and storage medium
CN107797984B (en) Intelligent interaction method, equipment and storage medium
CN106649825B (en) Voice interaction system and creation method and device thereof
CN106601237B (en) Interactive voice response system and voice recognition method thereof
CN106446045B (en) User portrait construction method and system based on dialogue interaction
CN108388553B (en) Method for eliminating ambiguity in conversation, electronic equipment and kitchen-oriented conversation system
CN107609101A (en) Intelligent interactive method, equipment and storage medium
US9697821B2 (en) Method and system for building a topic specific language model for use in automatic speech recognition
CN114757176B (en) Method for acquiring target intention recognition model and intention recognition method
CN107729468A (en) Answer extracting method and system based on deep learning
CN108538294B (en) Voice interaction method and device
CN105893344A (en) User semantic sentiment analysis-based response method and device
CN111079029B (en) Sensitive account detection method, storage medium and computer equipment
CN106897290B (en) Method and device for establishing keyword model
CN110263155B (en) Data classification method, and training method and system of data classification model
CN105912575B (en) Text information pushing method and device
CN108595406B (en) User state reminding method and device, electronic equipment and storage medium
CN105354327A (en) Interface API recommendation method and system based on massive data analysis
CN111209363A (en) Corpus data processing method, apparatus, server and storage medium
CN112364622A (en) Dialog text analysis method, dialog text analysis device, electronic device and storage medium
KR20130022075A (en) Method for building emotional lexical information and apparatus for the same
CN111079428A (en) Word segmentation and industry dictionary construction method and device and readable storage medium
CN110956958A (en) Searching method, searching device, terminal equipment and storage medium
CN110019832B (en) Method and device for acquiring language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination