CN107507613B

CN107507613B - Scene-oriented Chinese instruction identification method, device, equipment and storage medium

Info

Publication number: CN107507613B
Application number: CN201710620448.7A
Authority: CN
Inventors: 闫永刚; 沈亮
Original assignee: Hefei Midea Intelligent Technologies Co Ltd
Current assignee: Hefei Midea Intelligent Technologies Co Ltd
Priority date: 2017-07-26
Filing date: 2017-07-26
Publication date: 2021-03-16
Anticipated expiration: 2037-07-26
Also published as: CN107507613A

Abstract

The invention provides a scene-oriented Chinese instruction identification method, a scene-oriented Chinese instruction identification device and a storage medium, wherein the scene-oriented Chinese instruction identification method comprises the following steps: and correcting the prediction weight of each prediction model according to a sample set comprising mispartition samples and a first preset formula, wherein the mispartition samples are test samples of which the prediction class identifications do not match with the actual class identifications. According to the technical scheme, the prediction weight of each prediction model is trained and corrected by using the sample set comprising the misclassified samples, so that the accuracy rate of Chinese instruction recognition is effectively improved, background computing resources are effectively saved through scene prediction, and the intelligent level of Chinese instruction recognition is improved.

Description

Scene-oriented Chinese instruction identification method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of man-machine intelligent interaction, in particular to a scene-oriented Chinese instruction identification method, a scene-oriented Chinese instruction identification device, computer equipment and a computer-readable storage medium.

Background

The modern intelligent question-answering system generally comprises a plurality of technical links such as voice recognition, text parsing, syntactic analysis, semantic analysis, topic recognition, parsing response and the like, wherein scene-oriented Chinese instruction recognition (mainly question sentence pattern recognition) in the syntactic analysis serves as a portal verification function of the whole intelligent question-answering system.

In the related technology, scene-oriented Chinese instruction identification in syntactic analysis is mainly realized by two major methods of query word rule pattern matching and conversion generation syntactic analysis, and the following technical defects exist:

(1) the query rule pattern matching is very complicated and difficult to exhaust all query tables, and the Chinese instruction is relatively shallow to understand and has low recognition accuracy.

(2) The syntactic analysis is generated by conversion, a corresponding word library set needs to be established in advance, a syntactic mode needs to be established in advance, excessive manual intervention is needed, and the intelligent degree is low.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art or the related art.

Therefore, the invention aims to provide a scene-oriented Chinese instruction identification method.

Another object of the present invention is to provide a scene-oriented Chinese command recognition apparatus.

It is a further object of this invention to provide such a computer apparatus.

It is yet another object of the present invention to provide a computer-readable storage medium.

In order to achieve the above object, a technical solution of a first aspect of the present invention provides a scene-oriented chinese instruction recognition method, including: and correcting the prediction weight of each prediction model according to a sample set comprising mispartition samples and a first preset formula, wherein the mispartition samples are test samples of which the prediction class identifications do not match with the actual class identifications.

In the technical scheme, the prediction weight of each prediction model is corrected according to the sample set comprising the misclassified samples and the first preset formula, so that the prediction weight of each prediction model is corrected by using the test sample with unmatched prediction class identification and actual class identification, the prediction models can be effectively trained, the prediction accuracy is improved, thereby effectively improving the accuracy of Chinese instruction identification, and when the predicted class identification of the test sample is not matched with the actual class identification, the test sample is marked as a wrong sample, meanwhile, the probability of the misclassified samples is improved, so that the misclassified samples can be preferentially extracted to serve as a sample set for correcting the prediction weight of each prediction model, and the misclassified samples can be preferentially extracted to serve as new test samples, so that the manual intervention is reduced to a certain extent, the intelligent level of the prediction model training is improved, and the intelligent level of Chinese instruction recognition is also improved.

In addition, the sample set including the misclassified samples may be a sample set including all the misclassified samples, or may be a sample set including a part of the misclassified samples and a part of the correctly predicted samples, and the number of the sample sets is large so as to achieve the purpose of correcting the prediction weight of each prediction model.

In the foregoing technical solution, preferably, the modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula specifically includes: cross-validating each prediction model according to a sample set comprising misclassified samples to determine a prediction accuracy of each prediction model; correcting the prediction weight of each prediction model according to a first preset formula and prediction precision, wherein the first preset formula comprises the following steps:

ω_iprediction weight, p, characterized as the i-th prediction model_iCharacterized by the prediction accuracy of the ith prediction model,

the characterization is the sum of the prediction accuracy of all prediction models.

In the technical scheme, the prediction accuracy of each prediction model is determined by cross-verifying each prediction model by using a sample set comprising a misclassified sample, specifically, a 10-fold cross-verification method can be adopted, that is, the sample set comprising the misclassified sample is divided into 10 parts, 9 parts are used as training data, 1 part are used as test data, the test is carried out, each test can obtain corresponding accuracy, the average value of the accuracy of 10 results is used as the prediction accuracy of the prediction model, generally, 10-fold cross-verification is carried out for multiple times, for example, 10 times, and then the average value is obtained, so that the accuracy of determining the prediction accuracy of the prediction model is improved.

The prediction weight of each prediction model is calculated through the first preset formula and the prediction precision to obtain the corrected prediction weight of each prediction model, so that the accuracy of determining the prediction weight of each prediction model is improved, and the accuracy of Chinese instruction identification is further improved.

In any one of the above technical solutions, preferably, before modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula, the method further includes: determining a prediction class identifier of the test sample according to the prediction weight of each prediction model and a second preset formula; if the actual class identification of the test sample is not matched with the predicted class identification, determining the test sample as a misclassified sample; improving the sampling probability of the misclassified samples to extract a sample set comprising the misclassified samples and to extract the misclassified samples as new test samples, wherein the second preset formula comprises:

pred＝Max(ω_i·n_j)

ω_iprediction weight, n, characterized as the ith prediction model_jCharacterized by the number of occurrences of the jth class identifier in all prediction models, and pred characterized by Max (ω)_i·n_j) The corresponding class identification, i.e. the prediction class identification.

According to the technical scheme, the prediction class identification of the test sample is determined according to the prediction weight of each prediction model and the second preset formula, the test sample with unmatched prediction class identification and actual class identification is marked as the wrongly-divided sample, the prediction model is tested, the next training of the prediction model is facilitated, the wrongly-divided sample can be preferentially extracted by improving the probability of the wrongly-divided sample to serve as a sample set for correcting the prediction weight of each prediction model, the wrongly-divided sample can be preferentially extracted to serve as a new test sample, manual intervention is reduced to a certain extent, the intellectualization level of the training of the prediction model is improved, and the accuracy of Chinese instruction recognition is further improved.

In any of the above technical solutions, preferably, before determining the prediction class identifier of the test sample according to the preset weight and the second preset formula of each prediction model, the method further includes: determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not; if the fact that the vocabulary matched with the preset scene vocabulary library is not included in the test sample is determined, a prompt signal is sent out, and the prediction type identification of the test sample is not determined; and if the test sample comprises the vocabulary matched with the preset scene vocabulary library, replacing the corresponding vocabulary in the test sample by the matched vocabulary in the preset scene vocabulary library, and determining the prediction type identification of the test sample.

In the technical scheme, whether the vocabulary matched with the preset scene vocabulary library is included in the test sample is determined before the prediction type identification of the test sample is determined, so that the scene prejudgment is realized, the Chinese instruction recognition is scene-oriented, the Chinese instruction recognition is more targeted, the background computing resources can be effectively saved, if the test sample is determined not to include the vocabulary matched with the preset scene vocabulary library, a prompt signal is sent, the determination of the prediction type identification of the test sample is not carried out, irrelevant test samples can be filtered, the background computing resources are further effectively saved, when the test sample is determined to include the vocabulary matched with the preset scene vocabulary library, the corresponding vocabulary in the test sample is replaced by the vocabulary matched with the preset scene vocabulary library, the prediction type identification of the test sample is determined, and the standardization degree of the test sample entering the prediction model is improved, the method is beneficial to the output of the prediction type identification matched with the actual type identification by the prediction model, and further improves the accuracy of Chinese instruction recognition.

For example, if the scene is set as a kitchen scene, the preset scene vocabulary library may include the following vocabularies: the first kind of common food materials (450 kinds of common food materials such as apple, celery, potato and the like and the same meaning thereof are defined and selected); a second kind of common recipes (10000 kinds of common recipes such as sauerkraut fish, shredded pork with fish flavor and the like and the same meaning thereof are defined and selected); a third category of flavor (comprising multiple sub-categories of sour, spicy, light, etc. and synonyms thereof); season (including multiple sub-categories such as morning, valentine's day, etc. and synonyms thereof); the fifth category of nutritional efficacy (including multiple subclasses of weight loss, insomnia, weight loss, and the like, and synonyms thereof); a sixth special group (including a plurality of subclasses such as drivers, teachers and examinees and the same meaning thereof); type seven disease conditioning (including multiple subclasses and synonyms thereof, such as hypertension, cold, toothache, etc.); the eighth category of beauty treatment and slimming (including a plurality of subclasses such as whitening, acne removal, freckle removal and the like and synonyms thereof); a ninth type of dish (including snack, barbeque, overnight, and the like and their equivalents); a tenth type of scene (including multiple subclasses such as singleness, afternoon tea, transition, etc., and synonyms thereof).

In any one of the above technical solutions, preferably, the improving the sampling probability of the misclassified sample specifically includes: and re-determining the sampling probability of the wrong sample according to a third preset formula, wherein the third preset formula comprises the following steps:

y_kcharacterised by the actual class identity, h, of the test sample k_(k)Characterised by the prediction class identity, W, of the test sample k_k+1Characterised by the sampling probability, Σ (y), of the redetermined miscut sample k_k≠h_(k)) Characterized as the total number of all misclassified samples.

In the technical scheme, the sampling probability of the misclassified samples is re-determined through a third preset formula, so that the sampling probability of the misclassified samples is improved according to a certain rule, a sample set containing the misclassified samples is favorably extracted to correct the prediction weight of each prediction model, the misclassified samples are favorably extracted to serve as new test samples, the sampling probability of the misclassified samples calculated by the third preset formula is gradually increased, that is, the sampling probability of the first misclassified sample is greater than that of the general sample, if the misclassified sample is further misclassified as a new test sample, the sampling probability will continue to increase, the sampling probability of the second-time misclassified sample is greater than that of the first-time misclassified sample, and a more appropriate prediction weight of each prediction model can be obtained through multiple rounds of training, so that the accuracy of Chinese instruction recognition can be effectively improved.

In any one of the above technical solutions, preferably, before modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula, the method further includes: and constructing prediction models according to a preset corpus based on preset rules, and presetting the prediction weight of each prediction model.

In the technical scheme, the construction of the prediction models is realized according to the preset corpus based on the preset rules, and then the prediction weight of each prediction model is preset, so that the realization of the training of the prediction models is facilitated, for example, 4 prediction models are provided, and the prediction weight of each prediction model can be preset to be 0.25.

The preset rules are a support vector machine algorithm, a random forest tree algorithm, a KNN nearest neighbor algorithm and a naive Bayes algorithm, each algorithm independently constructs a prediction model, and the accuracy of Chinese instruction recognition can be further improved by combining the prediction models.

The method comprises the steps of constructing a prediction model by using a preset corpus and providing corpora for training, wherein test samples and a sample set comprising wrongly-divided samples are extracted from the preset corpus, specifically, collecting and arranging doubtful sentences, imperative sentences, exclamatory sentences and statement sentences 4 types of corpora as the preset corpus, and marking to form a prediction model training test set T { (x)₁，y₁)，(x₂，y₂)…(x_n，y_n) Where x ∈ χ, and the instance space χ ∈ Rⁿ，y_nThe system belongs to a mark set {1, 2, 3 and 4}, the set respectively corresponds to 4 type marks of interrogative sentences, imperative sentences, exclamatory sentences and declarative sentences, each type corpus comprises related subclasses, wherein the interrogative sentences comprise 4 subclasses of the interrogative sentences, selective interrogative sentences, positive and negative interrogative sentences, whether to ask sentences, imperative sentences (comprising 4 subclasses of command imperative sentences, solicited imperative sentences, prohibited imperative sentences and solicited imperative sentences, and the exclamatory sentences comprise 4 subclasses of explamatory words, noun exclamatory sentences, spoken exclamatory sentences and adverbial exclamatory sentences, and the declarative sentences comprise 2 subclasses of negative statement declarative sentences and positive statement sentences.

The technical solution of the second aspect of the present invention provides a scene-oriented chinese instruction recognition apparatus, including: and the correcting unit is used for correcting the prediction weight of each prediction model according to a sample set comprising mispartition samples and a first preset formula, wherein the mispartition samples are test samples of which the prediction class identifications do not match with the actual class identifications.

In the above technical solution, preferably, the method further includes: a verification unit for cross-verifying each prediction model according to a sample set including misclassified samples to determine a prediction accuracy of each prediction model; the correction unit is further configured to: correcting the prediction weight of each prediction model according to a first preset formula and prediction precision, wherein the first preset formula comprises the following steps:

In any one of the above technical solutions, preferably, the method further includes: the determining unit is used for determining the prediction class identification of the test sample according to the prediction weight of each prediction model and a second preset formula; the determination unit is further configured to: when the actual class identification of the test sample is not matched with the prediction class identification, determining the test sample as a misclassified sample; and the improving unit is used for improving the sampling probability of the misclassified samples so as to extract a sample set comprising the misclassified samples and extract the misclassified samples as new test samples, wherein the second preset formula comprises:

pred＝Max(ω_i·n_j)

In any one of the above technical solutions, preferably, the determining unit is further configured to: determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not; the Chinese instruction recognition device further comprises: the prompting unit is used for sending a prompting signal when determining that the test sample does not include the vocabulary matched with the preset scene vocabulary library, and determining the prediction type identification of the test sample; and the replacing unit is used for replacing the corresponding vocabulary in the test sample by the matched vocabulary in the preset scene vocabulary library when the test sample is determined to comprise the vocabulary matched with the preset scene vocabulary library, and determining the prediction type identification of the test sample.

In any one of the above technical solutions, preferably, the determining unit is further configured to: and re-determining the sampling probability of the wrong sample according to a third preset formula, wherein the third preset formula comprises the following steps:

In any one of the above technical solutions, preferably, the method further includes: and the presetting unit is used for constructing the prediction models according to the preset corpus based on the preset rules and presetting the prediction weight of each prediction model.

An aspect of the third aspect of the present invention provides a computer device, where the computer device includes a processor, and the processor is configured to implement the steps of the scene-oriented chinese instruction recognition method according to any one of the aspects of the present invention as set forth in the first aspect of the present invention when executing a computer program stored in a memory.

In this technical solution, the computer device includes a processor, and the processor is configured to implement the steps of any one of the scene-oriented chinese instruction identification methods proposed in the technical solution of the first aspect of the present invention when executing the computer program stored in the memory, so that all the beneficial effects of any one of the scene-oriented chinese instruction identification methods proposed in the technical solution of the first aspect of the present invention are achieved, and details are not described herein again.

An aspect of the fourth aspect of the present invention provides a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the scene-oriented chinese instruction recognition methods presented in the aspect of the first aspect of the present invention.

In this technical solution, a computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the steps of any one of the scene-oriented chinese instruction identification methods proposed in the technical solution of the first aspect of the present invention are implemented, so that all the beneficial effects of any one of the scene-oriented chinese instruction identification methods proposed in the technical solution of the first aspect of the present invention are achieved, and details are not repeated herein.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 shows a schematic flow diagram of a scenario-oriented Chinese instruction recognition method according to one embodiment of the present invention;

FIG. 2 shows a schematic flow diagram of a scenario-oriented Chinese instruction recognition apparatus according to one embodiment of the present invention;

FIG. 3 shows a schematic flow diagram of a scenario-oriented Chinese instruction recognition method according to another embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Example 1

As shown in fig. 1, the method for recognizing a scene-oriented chinese instruction according to an embodiment of the present invention includes: step S102, correcting the prediction weight of each prediction model according to a sample set comprising mispartition samples and a first preset formula, wherein the mispartition samples are test samples of which the prediction class identifications are not matched with the actual class identifications.

In the embodiment, the prediction weight of each prediction model is corrected according to the sample set comprising the misclassified samples and the first preset formula, so that the prediction weight of each prediction model is corrected by the test samples with unmatched prediction class identifications and actual class identifications, the prediction models can be effectively trained, the prediction accuracy is improved, thereby effectively improving the accuracy of Chinese instruction identification, and when the predicted class identification of the test sample is not matched with the actual class identification, the test sample is marked as a wrong sample, meanwhile, the probability of the misclassified samples is improved, so that the misclassified samples can be preferentially extracted to serve as a sample set for correcting the prediction weight of each prediction model, and the misclassified samples can be preferentially extracted to serve as new test samples, so that the manual intervention is reduced to a certain extent, the intelligent level of the prediction model training is improved, and the intelligent level of Chinese instruction recognition is also improved.

In the foregoing embodiment, preferably, the modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula specifically includes: cross-validating each prediction model according to a sample set comprising misclassified samples to determine a prediction accuracy of each prediction model; correcting the prediction weight of each prediction model according to a first preset formula and prediction precision, wherein the first preset formula comprises the following steps:

In this embodiment, the prediction accuracy of each prediction model is determined by cross-verifying each prediction model with a sample set including misclassified samples, specifically, a 10-fold cross-verification method may be adopted, that is, the sample set including misclassified samples is divided into 10 samples, 9 samples are used as training data, and 1 sample is used as test data, and the test is performed, where each test results in a corresponding accuracy, an average value of the accuracy of 10 results is used as the prediction accuracy of the prediction model, and generally, 10-fold cross-verification is performed for multiple times, for example, 10 times, and then the average value is calculated, so as to improve the accuracy of determining the prediction accuracy of the prediction model.

In any of the above embodiments, preferably, before modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula, the method further includes: determining a prediction class identifier of the test sample according to the prediction weight of each prediction model and a second preset formula; if the actual class identification of the test sample is not matched with the predicted class identification, determining the test sample as a misclassified sample; improving the sampling probability of the misclassified samples to extract a sample set comprising the misclassified samples and to extract the misclassified samples as new test samples, wherein the second preset formula comprises:

pred＝Max(ω_i·n_j)

In the embodiment, the prediction class identification of the test sample is determined according to the prediction weight of each prediction model and the second preset formula, and the test sample with unmatched prediction class identification and actual class identification is marked as a misclassified sample, so that the test on the prediction model is realized, the next training on the prediction model is facilitated, the misclassified sample can be preferentially extracted as a sample set for correcting the prediction weight of each prediction model by improving the probability of the misclassified sample, the misclassified sample can also be preferentially extracted as a new test sample, the manual intervention is reduced to a certain extent, the intellectualization level of the training on the prediction model is improved, and the accuracy of Chinese instruction recognition is further improved.

In any of the above embodiments, preferably, before determining the prediction class identifier of the test sample according to the preset weight of each prediction model and the second preset formula, the method further includes: determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not; if the fact that the vocabulary matched with the preset scene vocabulary library is not included in the test sample is determined, a prompt signal is sent out, and the prediction type identification of the test sample is not determined; and if the test sample comprises the vocabulary matched with the preset scene vocabulary library, replacing the corresponding vocabulary in the test sample by the matched vocabulary in the preset scene vocabulary library, and determining the prediction type identification of the test sample.

In the embodiment, the scene prejudgment is realized by determining whether the test sample comprises words matched with the preset scene word library or not before the prediction class identification of the test sample is determined, so that the Chinese instruction recognition is scene-oriented, the Chinese instruction recognition is more targeted, background computing resources can be effectively saved, if the test sample is determined not to comprise the words matched with the preset scene word library, a prompt signal is sent, the determination of the prediction class identification of the test sample is not carried out, irrelevant test samples can be filtered, the background computing resources are further effectively saved, when the test sample is determined to comprise the words matched with the preset scene word library, the corresponding words in the test sample are replaced by the words matched with the preset scene word library, the determination of the prediction class identification of the test sample is carried out, and the standardization degree of the test sample entering the prediction model is improved, the method is beneficial to the output of the prediction type identification matched with the actual type identification by the prediction model, and further improves the accuracy of Chinese instruction recognition.

In any of the foregoing embodiments, preferably, the improving the sampling probability of the misclassified sample specifically includes: and re-determining the sampling probability of the wrong sample according to a third preset formula, wherein the third preset formula comprises the following steps:

In the embodiment, the sampling probability of the misclassified samples is re-determined through the third preset formula, so that the sampling probability of the misclassified samples is improved according to a certain rule, a sample set containing the misclassified samples is favorably extracted to correct the prediction weight of each prediction model, the misclassified samples are favorably extracted to serve as new test samples, the sampling probability of the misclassified samples calculated by the third preset formula is gradually increased, that is, the sampling probability of the first misclassified sample is greater than that of the general sample, if the misclassified sample is further misclassified as a new test sample, the sampling probability will continue to increase, the sampling probability of the second-time misclassified sample is greater than that of the first-time misclassified sample, and a more appropriate prediction weight of each prediction model can be obtained through multiple rounds of training, so that the accuracy of Chinese instruction recognition can be effectively improved.

In any of the above embodiments, preferably, before modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula, the method further includes: and constructing prediction models according to a preset corpus based on preset rules, and presetting the prediction weight of each prediction model.

In this embodiment, the construction of the prediction models is realized according to the preset corpus based on the preset rule, and then the prediction weight of each prediction model is preset, which is beneficial to the realization of the training of the prediction models, for example, 4 prediction models are provided, and the prediction weight of each prediction model can be preset to be 0.25.

Example 2

As shown in fig. 2, a scene-oriented chinese instruction recognition apparatus 200 according to an embodiment of the present invention includes: a correcting unit 201, configured to correct the prediction weight of each prediction model according to a sample set including misclassified samples and a first preset formula, where the misclassified samples are test samples whose prediction class identifiers do not match the actual class identifiers.

In the above embodiment, preferably, the method further includes: a verification unit 202, configured to cross-verify each prediction model according to a sample set including misclassified samples to determine a prediction accuracy of each prediction model;

the correction unit 201 is further configured to: correcting the prediction weight of each prediction model according to a first preset formula and prediction precision, wherein the first preset formula comprises the following steps:

In any one of the above embodiments, preferably, the method further includes: a determining unit 206, configured to determine a prediction class identifier of the test sample according to the prediction weight of each prediction model and a second preset formula; the determining unit 206 is further configured to: when the actual class identification of the test sample is not matched with the prediction class identification, determining the test sample as a misclassified sample; an increasing unit 208, configured to increase a sampling probability of the misclassified sample, so as to extract a sample set including the misclassified sample, and extract the misclassified sample as a new test sample, where the second preset formula includes:

pred＝Max(ω_i·n_j)

In any of the above embodiments, preferably, the determining unit 206 is further configured to: determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not; the Chinese instruction recognition device further comprises: the prompting unit 210 is configured to send a prompting signal when it is determined that the test sample does not include a vocabulary matched with the preset scene vocabulary library, and not determine the prediction class identifier of the test sample; and the replacing unit 212 is configured to, when it is determined that the test sample includes a vocabulary matched with the preset scene vocabulary library, replace the corresponding vocabulary in the test sample with the matched vocabulary in the preset scene vocabulary library, and determine the prediction class identifier of the test sample.

In any of the above embodiments, preferably, the determining unit 206 is further configured to: and re-determining the sampling probability of the wrong sample according to a third preset formula, wherein the third preset formula comprises the following steps:

In any one of the above embodiments, preferably, the method further includes: the presetting unit 214 is configured to construct prediction models according to a preset corpus based on preset rules, and preset a prediction weight of each prediction model.

The method comprises the steps of constructing a prediction model by using a preset corpus and providing corpora for training, wherein test samples and a sample set comprising wrongly-divided samples are extracted from the preset corpus, specifically, collecting and arranging doubtful sentences, imperative sentences, exclamatory sentences and statement sentences 4 types of corpora as the preset corpus, and marking to form a prediction model training test set T { (x)₁，y₁)，(x₂，y₂)…(x_n，y_n) Where x ∈ χ, and the instance space χ ∈ Rⁿ，y_nBelongs to a mark set {1, 2, 3, 4}, the set respectively corresponds to 4 class marks of question sentences, imperative sentences, exclamation sentences and declarative sentences, each class corpus comprises related subclasses, wherein the question sentences comprise special question sentences, selective question sentences, positive and negative question sentences, whether question sentences comprise 4 subclasses and imperative sentences (comprising command imperative sentences and request imperative sentencesThe expression sentence comprises 4 subclasses of sentences, imperative sentences and persuasive sentences, the exclamation sentence comprises 4 subclasses of an exclamation word exclamation sentence, a noun exclamation sentence, a spoken exclamation sentence and an adverb exclamation sentence, and the statement sentences comprise 2 subclasses of negative statement statements and positive statement statements.

Example 3

According to a computer device of an embodiment of the present invention, the computer device comprises a processor for implementing the steps of the scene-oriented chinese instruction recognition method as any one of the embodiments of the present invention set forth above when executing a computer program stored in a memory.

In this embodiment, the computer device includes a processor, and the processor is configured to implement the steps of any one of the scene-oriented chinese instruction identification methods proposed in the embodiments of the present invention when executing the computer program stored in the memory, so that all the beneficial effects of any one of the scene-oriented chinese instruction identification methods proposed in the embodiments of the present invention are achieved, and details are not described herein again.

Example 4

The computer readable storage medium according to an embodiment of the present invention has a computer program stored thereon, and the computer program, when executed by a processor, implements the steps of the scene-oriented chinese instruction recognition method of any one of the embodiments of the present invention set forth above.

In this embodiment, a computer-readable storage medium stores thereon a computer program, and when the computer program is executed by a processor, the steps of any one of the scene-oriented chinese instruction identification methods provided in the embodiments of the present invention described above are implemented, so that all the beneficial effects of any one of the scene-oriented chinese instruction identification methods provided in the embodiments of the present invention described above are achieved, and details are not repeated herein.

Example 5

As shown in fig. 3, according to the scene-oriented chinese instruction recognition method of an embodiment of the present invention, firstly, according to a corpus, 4 prediction models are constructed through a support vector machine algorithm, a random forest tree algorithm, a KNN nearest neighbor algorithm, and a naive bayesian algorithm, and weights ω 1, ω 2, ω 3, and ω 4 are preset respectively, then a test sample is extracted from the corpus, the test sample is read, a text character string returned by speech recognition is obtained, a natural language processing technique is used to perform chinese word segmentation, stop word filtering, dictionary customization and text deduplication on the text at a text parsing layer, then a text character string array of the processed test sample is obtained, then at a scene topic layer, whether words in a preset scene vocabulary library are included is judged, if no words are judged, i.e. words in the preset scene vocabulary library are not included, a prediction result is output, the question is irrelevant to a scene, if the question is judged to be yes, namely the question comprises words in a preset scene word library, the class identification of the test text is respectively predicted through 4 built prediction models, then the prediction result of each prediction model is integrated according to preset weights omega 1, omega 2, omega 3 and omega 4 to obtain the prediction class identification of the test text, then wrong division judgment is carried out, if the actual class identification of the test text is not matched with the prediction class identification, namely wrong division is judged, the test text is determined to be wrong division text, the prediction weight of each prediction model is corrected, if the actual class identification of the test text is matched with the prediction class identification, namely wrong division is judged, the prediction result, namely the prediction class identification, namely the actual class identification is output, the correction of the prediction weight of each prediction model is realized according to the wrong division sample, and the prediction weight of each prediction model is corrected, the accuracy rate of Chinese instruction identification can be effectively improved.

The technical scheme of the invention is described in detail in the above with reference to the accompanying drawings, and the invention provides a scene-oriented Chinese instruction identification method, device, equipment and storage medium, wherein the prediction weight of each prediction model is corrected according to a sample set comprising misclassification samples and a first preset formula, so that the accuracy of Chinese instruction identification is effectively improved, background computing resources are effectively saved through scene prejudgment, and the intelligent level of Chinese instruction identification is improved.

The steps in the method of the invention can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device of the invention can be merged, divided and deleted according to actual needs.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A scene-oriented Chinese instruction identification method is characterized by comprising the following steps:

modifying the prediction weight of each prediction model according to a sample set including the misclassified samples and a first preset formula,

the misclassification sample is a test sample with unmatched prediction class identification and actual class identification;

the correcting the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula specifically includes:

cross-validating each of the prediction models according to the sample set comprising the misclassified samples to determine a prediction accuracy of each of the prediction models;

correcting the prediction weight of each prediction model according to the first preset formula and the prediction precision,

wherein the first preset formula comprises:

the representation is the sum of the prediction precisions of all the prediction models;

before the modifying the prediction weight of each prediction model according to the sample set including the misclassified samples and the first preset formula, the method further includes:

determining a prediction class identifier of the test sample according to the prediction weight of each prediction model and a second preset formula;

if the actual class identifier of the test sample is not matched with the predicted class identifier, determining the test sample as the misclassified sample;

increasing the sampling probability of the misclassified samples to extract the sample set including the misclassified samples and to extract the misclassified samples as new test samples,

wherein the second preset formula comprises:

pred＝Max(ω_i·n_j)

ω_iprediction weight, n, characterized as the ith prediction model_jCharacterized by the number of occurrences of the jth class identifier in all prediction models, and pred characterized by Max (ω)_i·n_j) The corresponding class identifier, namely the prediction class identifier;

and constructing the prediction models according to a preset corpus based on a preset rule, and presetting the prediction weight of each prediction model.

2. The scene-oriented chinese instruction recognition method of claim 1, further comprising, before the determining the prediction class identifier of the test sample according to the preset weight and the second preset formula of each prediction model,:

determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not;

if the fact that the vocabulary matched with the preset scene vocabulary library is not included in the test sample is determined, a prompt signal is sent out, and the prediction type identification of the test sample is not determined;

and if the test sample is determined to comprise the vocabulary matched with the preset scene vocabulary library, replacing the corresponding vocabulary in the test sample with the matched vocabulary in the preset scene vocabulary library, and determining the prediction type identification of the test sample.

3. The scene-oriented chinese instruction recognition method of claim 1, wherein the improving the sampling probability of the misclassified samples specifically comprises:

re-determining the sampling probability of the misclassified samples according to a third preset formula,

wherein the third preset formula comprises:

y_kcharacterised by the actual class identity, h, of the test sample k_(k)Characterized by a prediction class identity, W, of the test sample k_k+1Characterised by the sampling probability, Σ (y), of the redetermined miscut sample k_k≠h_(k)) Characterized as the total number of all misclassified samples.

4. A scene-oriented Chinese instruction recognition device is characterized by comprising:

a correction unit for correcting the prediction weight of each prediction model according to a sample set including the misclassified samples and a first preset formula,

a verification unit, configured to cross-verify the each prediction model according to the sample set including the misclassified samples to determine a prediction accuracy of the each prediction model;

the correction unit is further configured to: correcting the prediction weight of each prediction model according to the first preset formula and the prediction precision,

wherein the first preset formula comprises:

the determining unit is used for determining the prediction class identification of the test sample according to the prediction weight of each prediction model and a second preset formula;

the determination unit is further configured to: when the actual class identification of the test sample is not matched with the prediction class identification, determining the test sample as the misclassified sample;

an increasing unit for increasing a sampling probability of the misclassified samples to extract the sample set including the misclassified samples and to extract the misclassified samples as new test samples,

wherein the second preset formula comprises:

pred＝Max(ω_i·n_j)

and the presetting unit is used for constructing the prediction models according to a preset corpus based on a preset rule and presetting the prediction weight of each prediction model.

5. The scene-oriented Chinese command recognition device of claim 4,

the determination unit is further configured to: determining whether the test sample comprises vocabularies matched with a preset scene vocabulary library or not;

the Chinese instruction recognition device further comprises:

the prompting unit is used for sending a prompting signal when determining that the test sample does not include the vocabulary matched with the preset scene vocabulary library, and determining the prediction type identification of the test sample;

and the replacing unit is used for replacing the corresponding vocabulary in the test sample by the matched vocabulary in the preset scene vocabulary library when the test sample is determined to comprise the vocabulary matched with the preset scene vocabulary library, and determining the prediction type identification of the test sample.

6. The scene-oriented Chinese command recognition device of claim 4,

the determination unit is further configured to: re-determining the sampling probability of the misclassified samples according to a third preset formula,

wherein the third preset formula comprises:

y_kcharacterised by the actual class identity, h, of the test sample k_(k)Characterized by a prediction class identity, W, of the test sample k_k+1Characterised by being re-determinedA determined probability of sampling the wrong sample k, Σ (y)_k≠h_(k)) Characterized as the total number of all misclassified samples.

7. A computer device, characterized in that it comprises a processor for implementing the steps of the scene oriented chinese instruction recognition method according to any one of claims 1 to 3 when executing a computer program stored in a memory.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the scene-oriented chinese instruction recognition method according to any one of claims 1 to 3.