CN110377911A

CN110377911A - Intension recognizing method and device under dialogue frame

Info

Publication number: CN110377911A
Application number: CN201910666196.0A
Authority: CN
Inventors: 李晓萍; 刘华杰; 黄炳; 陈建军
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2019-07-23
Filing date: 2019-07-23
Publication date: 2019-10-25
Anticipated expiration: 2039-07-23
Also published as: CN110377911B

Abstract

The present invention provides the intension recognizing method and device under a kind of dialogue frame, this method comprises: obtaining the matching degree of the regular monomer in corpus to be identified and preset rules template, the preset rules template includes a plurality of regular monomer, the corresponding label of every rule monomer；Judge whether there is the matching degree greater than preset threshold；If so, using the label of the corresponding regular monomer of maximum matching degree as intention assessment result；If not, by the intention assessment model of the corpus input pre-training to be identified, and, using the output of the intention assessment model of the pre-training as intention assessment result, wherein, by recycling machine learning model to identify that user is intended to when rule template cannot effectively identify that user is intended to, user's intention can be accurately identified, it is lower to the quantitative requirement of training sample, promote the development of the artificial smart machines such as question answering system, dialogue robot.

Description

Intension recognizing method and device under dialogue frame

Technical field

The present invention relates to the intension recognizing methods and dress under field of artificial intelligence more particularly to a kind of dialogue frame It sets.

Background technique

With the development of search engine technique, modern search engines, question answering system and dialogue robot need no longer It is simple correlation information retrieval, but is capable of the information requirement of profound understanding user, needed for is accurately provided for user Service, and correctly identify user be intended that realize this target committed step.

Currently, being mainly intended to using machine learning model identification user, the mark sample ability of hundreds of thousands is generally required The machine mould for training preferable recognition effect, for the question answering system or dialogue robot that construct initial stage, Wu Fashou Collection meets the corpus of session operational scenarios on a large scale, therefore cannot train to obtain the preferable machine learning model of recognition effect, causes not User's intention can be accurately identified；In addition, for the intention assessment based on dialogue, usually very due to user's input in dialogue It is short, therefore the difficulty for accurately identifying under dialogue frame user's intention is bigger, seriously constrains question answering system, dialogue robot etc. The development of artificial intelligence equipment.

Summary of the invention

For the problems of the prior art, the present invention provides under a kind of dialogue frame intension recognizing method and device, Electronic equipment and computer readable storage medium can at least be partially solved problems of the prior art.

To achieve the goals above, the present invention adopts the following technical scheme:

In a first aspect, the intension recognizing method under providing a kind of dialogue frame, comprising:

The matching degree of the regular monomer in corpus to be identified and preset rules template is obtained, the preset rules template includes A plurality of regular monomer, the corresponding label of every rule monomer；

Judge whether there is the matching degree greater than preset threshold；

If so, using the label of the corresponding regular monomer of maximum matching degree as intention assessment result；

If it is not, by the intention assessment model of the corpus to be identified input pre-training, also, by the intention of the pre-training The output of identification model is as intention assessment result.

Further, the matching degree for obtaining the regular monomer in corpus to be identified and preset rules template, comprising:

It parses the corpus to be identified and obtains multiple feature words to be matched and its part of speech；

Dependency rule monomer is obtained from the rule template according to the multiple feature word to be matched；

The corpus to be identified and dependency rule monomer are calculated according to the multiple feature word to be matched and its part of speech Matching degree.

Further, the parsing corpus to be identified obtains multiple feature words to be matched and its part of speech, comprising:

The corpus to be identified is segmented to obtain multiple feature words；

It removes the stop words in the multiple feature word and obtains multiple feature words to be matched；

The part of speech for marking the multiple feature word to be matched obtains multiple feature words to be matched and its part of speech.

Further, the regular monomer is made of multiple word slots, and each word slot is keyword and its synonym or part of speech；

It is described that dependency rule monomer is obtained from the rule template according to the multiple feature word to be matched, comprising:

The rule that word slot includes the feature word to be matched are obtained according to rule template described in a feature word search to be matched Then monomer is as dependency rule monomer.

Further, the corresponding weighted value of each institute's predicate slot；

It is described that the corpus to be identified and dependency rule list are calculated according to the multiple feature word to be matched and its part of speech The matching degree of body, comprising:

By the word slot of multiple feature words to be matched and its part of speech and a dependency rule monomer corresponding position carry out word or Part of speech matching；

The weighted value of the word slot of successful match is added up and obtains the matching of the corpus to be identified and the dependency rule monomer Degree.

Further, further includes:

Construct intention assessment model；

The intention assessment model is trained using no label corpus sample to obtain the intention assessment model of pre-training.

Further, described that the intention assessment model is trained to obtain pre-training using no label corpus sample Intention assessment model, comprising:

The no label corpus sample is clustered, obtain pre-set categories without label corpus sample；

Acquisition original sample is sampled to the no label corpus sample of each pre-set categories；

The intention assessment model is trained to obtain initial intention assessment model using the original sample by mark；

No label corpus sample remaining after sampling is inputted into the initial intention assessment model and obtains every remaining nothing The label of label corpus sample；

Using no label corpus sample remaining after modified sampling to the initial intention assessment model carry out into The training of one step obtains the intention assessment model of pre-training.

Further, further includes:

Obtain the testing material of known label；

The intention assessment model of the pre-training is tested using the testing material of the known label, and by the mould The output of type is as test result；

Based on the test result and known label, judge whether the intention assessment model of pre-training meets preset requirement；

If so, using "current" model as the object module for being used for intention assessment.

Further, further includes:

If "current" model does not meet preset requirement, "current" model is optimized and/or using updated trained sample This collection re-starts model training.

Second aspect provides the intention assessment device under a kind of dialogue frame, comprising:

Matching degree obtains module, obtains the matching degree of the regular monomer in corpus to be identified and preset rules template, described Preset rules template includes a plurality of regular monomer, the corresponding label of every rule monomer；

Matching judgment module judges whether there is the matching degree greater than preset threshold；

First intention identification module is greater than the matching degree of preset threshold, then by the corresponding rule of maximum matching degree if it exists The label of monomer is as intention assessment result；

Second intention identification module is greater than the matching degree of preset threshold if it does not exist, then inputs the corpus to be identified The intention assessment model of pre-training, also, using the output of the intention assessment model of the pre-training as intention assessment result.

Further, the matching degree acquisition module includes:

Resolution unit parses the corpus to be identified and obtains multiple feature words to be matched and its part of speech；

Rules Filtering unit obtains dependency rule list according to the multiple feature word to be matched from the rule template Body；

Matching degree computing unit, according to the multiple feature word to be matched and its part of speech calculate the corpus to be identified with The matching degree of dependency rule monomer.

Further, the resolution unit includes:

Subelement is segmented, the corpus to be identified is segmented to obtain multiple feature words；

Stop words subelement is removed, the stop words in the multiple feature word is removed and obtains multiple feature words to be matched；

Part-of-speech tagging subelement, the part of speech for marking the multiple feature word to be matched obtain multiple feature words to be matched And its part of speech.

The Rules Filtering unit includes:

Subelement is searched for, obtaining word slot according to rule template described in a feature word search to be matched includes the spy to be matched The regular monomer of word is levied as dependency rule monomer.

Further, the corresponding weighted value of each institute's predicate slot；

The matching degree computing unit includes:

Coupling subelement, by multiple feature words to be matched and its word slot of part of speech and a dependency rule monomer corresponding position Carry out word or part of speech matching；

Weighted value adds up subelement, obtains the corpus to be identified and the phase for the weighted value of the word slot of successful match is cumulative Close the matching degree of regular monomer.

Further, further includes:

Model construction module constructs intention assessment model；

Training module is trained to obtain the intention of pre-training using no label corpus sample to the intention assessment model Identification model.

Further, the training module includes:

Cluster cell clusters the no label corpus sample, obtain pre-set categories without label corpus sample；

Sampling unit is sampled acquisition original sample to the no label corpus sample of each pre-set categories；

First training unit is trained to obtain initial using the original sample by mark to the intention assessment model Intention assessment model；

Unit is marked, no label corpus sample remaining after sampling is inputted into the initial intention assessment model and obtains every The label of remaining no label corpus sample；

Second training unit knows the initial intention using no label corpus sample remaining after modified sampling Other model carries out further training and obtains the intention assessment model of pre-training.

Further, further includes:

Test sample obtains module, obtains the testing material of known label；

Test module surveys the intention assessment model of the pre-training using the testing material of the known label Examination, and using the output of the model as test result；

Test judgment module, be based on the test result and known label, judge pre-training intention assessment model whether Meet preset requirement；

Model output module, if the intention assessment model of pre-training meets preset requirement, using "current" model as being used for The object module of intention assessment.

Further, further includes:

Retraining module optimizes "current" model if the intention assessment model of pre-training does not meet preset requirement And/or model training is re-started using updated training sample set.

The third aspect, provides a kind of electronic equipment, including memory, processor and storage on a memory and can handled The computer program run on device, the processor are realized when executing described program:

Judge whether there is the matching degree greater than preset threshold；

If so, using the label of the corresponding regular monomer of maximum matching degree as intention assessment result.

Fourth aspect provides a kind of computer readable storage medium, is stored thereon with computer program, the computer program Realization when being executed by processor:

Judge whether there is the matching degree greater than preset threshold；

The embodiment of the present invention provides the intension recognizing method and device, electronic equipment and computer under a kind of dialogue frame Readable storage medium storing program for executing, this method comprises: the matching degree of the regular monomer in corpus to be identified and preset rules template is obtained, it is described Preset rules template includes a plurality of regular monomer, the corresponding label of every rule monomer；It judges whether there is and is greater than default threshold The matching degree of value；If so, using the label of the corresponding regular monomer of maximum matching degree as intention assessment result；If it is not, will be described The intention assessment model of corpus to be identified input pre-training, also, using the output of the intention assessment model of the pre-training as Intention assessment result, wherein by recycling machine learning model identification when rule template cannot effectively identify that user is intended to User is intended to, and can accurately identify user's intention, lower to the quantitative requirement of training sample, promotes question answering system, dialogue machine The development of the artificial smart machine such as device people.

In addition, the embodiment of the present invention using no label corpus as training sample, using clustering algorithm to no label corpus Classify, obtain uniform training sample with the mode of sampling, is unable to the covering range of effective guarantee sample, and effectively The workload for reducing mark sample, improves the speed and convenient degree of model training.

For above and other objects, features and advantages of the invention can be clearer and more comprehensible, preferred embodiment is cited below particularly, And cooperate institute's accompanying drawings, it is described in detail below.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is the application Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:

Fig. 1 is the server S 1 in the embodiment of the present invention and the configuration diagram between client device B1；

Fig. 2 is the framework between server S 1, client device B1 and database server S2 in the embodiment of the present invention Schematic diagram；

Fig. 3 is the flow diagram one of the intension recognizing method under the dialogue frame in the embodiment of the present invention；

Fig. 4 shows the specific steps of step S100 in Fig. 3；

Fig. 5 shows the specific steps of step S110 in Fig. 4；

Fig. 6 shows the specific steps of step S130 in Fig. 4；

Fig. 7 is the flow diagram two of the intension recognizing method under the dialogue frame in the embodiment of the present invention；

Fig. 8 shows a kind of specific steps of step S20 in Fig. 7；

Fig. 9 shows another specific steps of step S20 in Fig. 7；

Figure 10 is the structural block diagram of the intention assessment device under the dialogue frame in the embodiment of the present invention；

Figure 11 is the structure chart of electronic equipment of the embodiment of the present invention.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only The embodiment of the application a part, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, all should belong to the model of the application protection It encloses.

It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.

It should be noted that term " includes " and " tool in the description and claims of this application and above-mentioned attached drawing Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Intention assessment refers to when user and robot are linked up, can be proposed according to user direct of robot or Indirect information quickly to judge the true intention of user, and the intention assessment for talking with robot is the function point of a core, just The really basic function for being intended that robot of identification user.The method that the application proposes a set of robot intention assessment can be fast The robot that speed building meets user demand provides basis, and this method is configured to financial industry collection robot intention assessment Example is specifically described.

The prior art mainly uses machine learning model identification user to be intended to, and generally requires the mark sample of hundreds of thousands The machine mould that preferable recognition effect can be trained can not for the question answering system or dialogue robot that construct initial stage Collection meets the corpus of session operational scenarios on a large scale, therefore cannot train to obtain the preferable machine learning model of recognition effect, causes User's intention cannot be accurately identified；In addition, for the intention assessment based on dialogue, usually very due to user's input in dialogue It is short, therefore the difficulty for accurately identifying under dialogue frame user's intention is bigger, seriously constrains question answering system, dialogue robot etc. The development of artificial intelligence equipment.

At least partly to solve above-mentioned technical problem in the prior art, the embodiment of the present invention is provided under a kind of dialogue frame Intension recognizing method, pass through rule template cannot effectively identify user be intended to when recycle machine learning model identify user It is intended to, user's intention can be accurately identified, it is lower to the quantitative requirement of training sample, promote question answering system, dialogue robot Etc. the development of artificial smart machine.

In view of this, the device can be a kind of clothes this application provides the intention assessment device under a kind of dialogue frame Be engaged in device S1, and referring to Fig. 1, which can communicate to connect at least one client device B1, the client device B1 Corpus to be identified can be sent to the server S 1, the server S 1 can receive the corpus to be identified online.Institute It states server S 1 can be online or offline to pre-process the corpus to be identified of acquisition, obtains corpus to be identified and default rule The then matching degree of the regular monomer in template, the preset rules template include a plurality of regular monomer, and every rule monomer is corresponding One label；Judge whether there is the matching degree greater than preset threshold；If so, by the mark of the corresponding regular monomer of maximum matching degree Label are used as intention assessment result；If it is not, by the intention assessment model of the corpus input pre-training to be identified, also, will be described The output of the intention assessment model of pre-training is as intention assessment result.

In addition, referring to fig. 2, the server S 1 can also be communicated to connect at least one database server S2, described The intention assessment model and/or history that database server S2 is used to store pre-training are without label corpus sample and preset rules Template.The database server S2 is online by the intention assessment model of the pre-training and/or history without label corpus sample And preset rules template is sent to the server S 1.

Based on above content, the database server S2 can be also used for the testing material of storage known label.It is described The testing material of the known label is sent to the server S 1 online by database server S2, and the server S 1 can be with The testing material of the known label is received online, and test specimens are then obtained according to the testing material of at least one known label This, and the application test sample carries out model measurement to the model, and using the output of the model as test result, then base In the known evaluation result of the testing material of the test result and at least one known label, judge whether "current" model meets Preset requirement, if so, using "current" model as the object module for the intention assessment being used under dialogue frame；If "current" model is not Meet the preset requirement, then "current" model is optimized and/or using updated training sample set again to the model Carry out model training.

It is understood that the client device B1 may include smart phone, Flat electronic equipment, network machine top Box, portable computer, desktop computer, personal digital assistant (PDA), mobile unit, intelligent wearable device etc..Wherein, described Intelligent wearable device may include smart glasses, smart watches, Intelligent bracelet etc..

It in practical applications, the part of the intention assessment to engage in the dialogue under frame can be in the service as described in above content The side device S1 executes, that is, framework as shown in Figure 1, operation that can also be all are all completed in the client device B1, and should The client device B1 can be directly communicatively coupled with database server S2.It can specifically be set according to the client The processing capacity of standby B1 and the limitation of user's usage scenario etc. select.The application is not construed as limiting this.If all behaviour Work is all completed in the client device B1, and the client device B1 can also include processor, for the frame that engages in the dialogue The specific processing of intention assessment under frame.

Any suitable network protocol can be used between the server and the client device to be communicated, including In the network protocol that the application submitting day is not yet developed.The network protocol for example may include ICP/IP protocol, UDP/IP Agreement, http protocol, HTTPS agreement etc..Certainly, the network protocol for example can also include using on above-mentioned agreement RPC agreement (Remote Procedure Call Protocol, remote procedure call protocol), REST agreement (Representational State Transfer, declarative state transfer protocol) etc..

In one or more embodiments of the application, the testing material is to be not included in for model training without mark It signs in corpus sample, and is directed to the testing material, its known label need to be obtained.

Fig. 3 is the flow diagram one of the intension recognizing method under the dialogue frame in the embodiment of the present invention；Such as Fig. 3 institute Show, the intension recognizing method under the dialogue frame may include the following contents:

Step S100: the matching degree of the regular monomer in corpus to be identified and preset rules template, the default rule are obtained Then template includes a plurality of regular monomer, the corresponding label of every rule monomer.

It is worth noting that robot is still talked in either artificial customer service, can all be produced during for customer service Raw voice dialogue, can be by above-mentioned voice dialogue transcription at corpus to be identified by speech recognition technology.

Regular monomer may include multiple word slots, and a word can be accommodated in each word slot or defines word in word slot The part of speech of language is extended in addition, being additionally provided with thesaurus for the word in word slot for the expression way to word.

Such as: regular monomer can be the structure of word slot A+ word slot B, and the part of speech of word slot A is adjective, the word in word slot B For industrial and commercial bank, which can be expressed as " " adjective " industrial and commercial bank ".

The matching degree for obtaining corpus to be identified and regular monomer is to obtain the similitude of corpus and rule template to be identified.

Wherein, which is maintenance personnel according to the common corpus summary setting under application scenarios.

Step S200: the matching degree greater than preset threshold is judged whether there is；

If so, executing step S300；Otherwise, step S400 is executed.

Wherein, quantify the similarity of each rule in corpus to be identified and rule template by using matching degree, similarity is got over The value of height, matching degree is bigger.

When the matching degree of corpus to be identified and at least one rule template is greater than preset value, illustrate corpus to be identified and rule Then the similarity of template has reached preset requirement, at this time, it is believed that can use the meaning that rule template identifies the corpus to be identified Figure；

When the matching degree of corpus to be identified and strictly all rules template is respectively less than preset value, illustrate corpus to be identified and rule The similarity of template cannot reach preset requirement, at this time, it is believed that the meaning of the corpus to be identified cannot be identified using rule template Figure then needs to identify the intention of corpus to be identified using the intention assessment model of pre-training.

Specifically, which chooses according to practical application request, may be, for example, any value in [0.4~2], such as 0.5,0.7,0.8,0.9,1.2,1.5 etc..

Step S300: using the label of the corresponding regular monomer of maximum matching degree as intention assessment result.

That is: it selects with the label of the most like regular monomer of corpus to be identified as intention assessment result.

For example: assuming that corpus to be identified is " I wants to look into remaining sum ", regular monomer is " " verb " remaining sum ", corresponding mark Label are " querying the balance ", then the corpus to be identified and regular monomer similarity are very high, then the intention assessment knot of the corpus to be identified Fruit is " querying the balance ".

Step S400: by the intention assessment model of the corpus to be identified input pre-training, also, by the pre-training The output of intention assessment model is as intention assessment result.

Wherein, the intention assessment model of the pre-training is one kind of machine learning model or deep learning model, such as text This disaggregated model, the model are based on obtaining after no label corpus sample training.

Through the above technical solution it is known that intension recognizing method under dialogue frame provided in an embodiment of the present invention, By recycling machine learning model to identify when rule template cannot effectively identify that user is intended to, user is intended to, and rule match can To come into force in real time, the problem of machine learning model needs the training time that cannot come into force in real time can solve, can accurately identify use Family is intended to, lower to the quantitative requirement of training sample, promotes the hair of the artificial smart machines such as question answering system, dialogue robot Exhibition, the advantage of abundant binding rule template and machine learning model are solved and are significantly promoted under conditions of corpus quantity is few The problem of recognition accuracy can effectively improve the degree of intelligence of dialogue robot, accelerate the speed of production of dialogue robot.

In an alternative embodiment, step S100 may include the following contents, referring to fig. 4:

Step S110: the parsing corpus to be identified obtains multiple feature words to be matched and its part of speech；

For example, corpus to be identified is " I will turn 500 yuan ", then " transferring accounts "-verb can be extracted, " 500 yuan "-amount Word.

Part of speech includes the types such as quantifier, adjective, verb, pronoun, adverbial word, time word, name word, company's noun.

Step S120: dependency rule monomer is obtained from the rule template according to the multiple feature word to be matched；

Wherein, multiple regular monomers are contained in rule template, regular monomer is made of multiple word slots, provides in each word slot Word in word slot gives the part of speech of word in word slot；

It is traversed according to any of multiple feature words to be matched feature word to be matched each in the rule template Regular monomer, obtains regular monomer that word slot includes the feature word to be matched as dependency rule monomer, according to it is multiple to Dependency rule monomer is combined into the strictly all rules monomer group that feature word traverses, wherein when being traversed, meeting not only will Feature Words to be matched are compared with the keyword in regular monomer, can also with the synonym in the associated thesaurus of the keyword It is compared, more fully to obtain dependency rule monomer.

It is more if not obtaining the regular monomer comprising the feature word to be matched according to multiple feature word traversals to be matched The matching angle value of a feature word to be matched is 0.

Step S130: the corpus to be identified and related rule are calculated according to the multiple feature word to be matched and its part of speech The then matching degree of monomer.

Specifically, by the word slot in multiple feature words to be matched and its part of speech and dependency rule monomer keyword and The part of speech of word slot is matched.

In an alternative embodiment, step S110 may include the following contents, participate in Fig. 5:

Step S111: the corpus to be identified is segmented to obtain multiple feature words；

Step S112: it removes the stop words in the multiple feature word and obtains multiple feature words to be matched；

Step S113: the part of speech of the multiple feature word to be matched of mark obtains multiple feature words to be matched and its word Property.

Wherein, corpus to be identified segmented, go stop words, working for mark part of speech can be using text on the market The open source software of processing is realized, such as jieba.

In an alternative embodiment, step S130 may include the following contents, referring to Fig. 6:

Step S131: by the word slot of multiple feature words to be matched and its part of speech and a dependency rule monomer corresponding position into Row word or part of speech matching；

Step S132: the weighted value of the word slot of successful match is added up and obtains the corpus to be identified and the dependency rule list The matching degree of body.

Wherein, the weighted value of each word slot is defined in regular monomer.

Specifically, the weighted value of each word slot can be manually configured when rule template is arranged, can also be to a certain The existing positive collection sample of corpus and negative collection sample carry out the appearance of word statistics, certain word or part of speech in two set Weight of the difference of frequency divided by the total degree of appearance as the word.

In the following, citing is illustrated to matching degree is calculated:

Such as: regular monomer: " adjective " industrial and commercial bank, the weight of the corresponding word slot of adjective are 0.2, industrial and commercial bank pair The weight for the word slot answered is 0.7, the matching degree of corpus " industrial and commercial bank of wisdom " to be identified and rule are as follows: 0.2+0.7=0.9, The matching degree of corpus " Alibaba of development " to be identified and rule are as follows: 0.2+0=0.2.

Fig. 7 is the flow diagram two of the intension recognizing method under the dialogue frame in the embodiment of the present invention.Referring to Fig. 7, Intension recognizing method under the dialogue frame can also include: on the basis of comprising step shown in Fig. 3

Step S10: building intention assessment model；

Wherein, which can be coding-decoding frame model, which includes coding layer.So-called coding, being exactly will List entries is converted to the vector of a regular length；The vector of the regular length generated before, is exactly then converted by decoding Output sequence, in the training process of coding-decoding frame model, available characterization and the preferable semantic volume of generalization ability Code.Wherein, Encoder-Decoder frame is a model framework in deep learning, the mould of Encoder-Decoder frame Type includes but is not limited to sequence to sequence (Sequence to Sequence, abbreviation Seq2Seq) model, and Seq2Seq model can Using length memory (Long Short-Term Memory, abbreviation LSTM) or gating cycle unit (Gated Recurrent Unit, abbreviation GRU) Encoder layers and Decoder layers of algorithm realization, Transformer algorithm can also be used Realize Encoder-Decoder layers.

For example, the Seq2Seq model includes embedding layers, Encoder layers and Softmax layers of Word.Every is instructed Practice the problem of sample data includes, such as described problem are as follows: you are good, and may I ask you is XXX, is input to Word embedding Layer, embedding layers of the Word term vector that each word in the above problem is converted into regular length, is then output to Encoder layers, Encoder layers handle the term vector of input, and output state variable C is to Softmax layers, state variable C As Softmax layers of initial value, Softmax layers can export the corresponding label of the above problem after training, such as export Problem: you are good, and may I ask you is XXX, and corresponding label label_ asks whether me.

Step S20: the intention assessment model is trained to obtain the intention of pre-training using no label corpus sample Identification model.

On machine learning model training problem, aiming at the problem that manually mark heavy workload, devise using a small amount of mark The thinking that sample carries out model training is infused, following process is devised:

Referring to Fig. 8, training process be may comprise steps of:

Step S21: clustering the no label corpus sample, obtain pre-set categories without label corpus sample.

In order to obtain the no label corpus sample, it can record to the customer service of preservation, utilize speech recognition technology Customer service recording is carried out offline transcription into text by (Automatic Speech Recognition, abbreviation ASR), obtains original language Material；Then by the session operational scenarios in above-mentioned original language material, artificial correction process is carried out, obtains the no label voice sample, institute Stating check and correction includes but is not limited to error correction, and alignment sentence etc. is handled, every corpus number in the no label corpus sample of acquisition According to including a problem and an answer.The corpus data is such as are as follows: problem: you are good, and it is more for may I ask the housing loan interest rate of bank It is few? answer: you are good, and current interest rate is 5.6%.Does is alternatively, problem: my credit card amount how many now? answer: you are good, you Current amount is 50,000 RMB.

Clustering algorithm can use LDA (Latent Dirichlet Allocation) or K-means clustering algorithm. The quantity of default classification is needed in cluster, but the corresponding label of pre-set categories is not aware that.

Step S22: acquisition original sample is sampled to the no label corpus sample of each pre-set categories；

I.e. from the no label corpus sample of each pre-set categories, a certain number of corpus samples are obtained, are obtained To original sample.Wherein, the ratio or quantity of the sampling are configured according to actual needs, and the embodiment of the present invention does not limit It is fixed.

The small sample after sampling is labeled according to pre-set categories manually, obtains the corresponding label of every corpus data, To obtain the original small sample by mark.

Step S23: the intention assessment model is trained using the original sample by mark and is initially intended to Identification model；

The model may include coding (Encoder) layer and classification layer, and Encoder layers of the output is as the classification layer Input, the classification layer can use Softmax algorithm.The problem of including using every training sample data is as the model Input, the label for including using every training sample data of first training sample protects as the output of the model The parameter constant for holding Encoder layers is trained the model.In the training process to model, kept for Encoder layers Parameter constant, training obtain the parameter of the classification layer, the relationship between no label corpus sample be not only utilized in this way, but also can have The training classification layer of supervision, completes text categorization task, can use less sample training and obtain the higher model of generalization.

Step S24: no label corpus sample remaining after sampling is inputted into the initial intention assessment model and obtains every The label of remaining no label corpus sample；

Specifically, after obtaining the model, remaining no label corpus sample is carried out using the initial model Mark obtains the corresponding label of every corpus data of remaining no label corpus sample, i.e., remaining no label corpus sample The input as the initial model of every corpus data the problem of including, remaining no label corpus sample can be obtained The corresponding label of every corpus data.Then, people is carried out to the label of every corpus data of remaining no label corpus sample Work amendment, revised data include one as supplementary training sample, every training sample data of the supplementary training sample A problem and corresponding label.

Step S25: using after modified sampling remaining no label corpus sample to the initial intention assessment mould Type carries out further training and obtains the intention assessment model of pre-training.

By using above scheme, it can solve the problems, such as that project initial stage corpus is few, with machine person to person after project is online Talk with the increase of quantity, newly-increased data persistently pour into model and carry out Model Self-Learning Continuous optimization.

In an alternative embodiment, step S20 can also include the following steps, referring to Fig. 9:

Step S26: the testing material of known label is obtained；

Step S27: the testing material of the application known label tests the intention assessment model of the pre-training, And using the output of the model as test result；

Step S28: it is based on the test result and known label, it is pre- to judge whether the intention assessment model of pre-training meets If it is required that；

If so, executing step S29；If it is not, executing step S30.

Step S29: using "current" model as the object module for being used for intention assessment.

Step S30: optimization "current" model and/or more new training sample set, return step S21.

It based on the same inventive concept, can the embodiment of the present application also provides the intention assessment device under a kind of dialogue frame With for realizing method described in above-described embodiment, as described in the following examples.Due to the intention assessment under dialogue frame The principle that device solves the problems, such as is similar to the above method, therefore the implementation of the intention assessment device under dialogue frame may refer to The implementation of method is stated, overlaps will not be repeated.Used below, predetermined function may be implemented in term " unit " or " module " The combination of the software and/or hardware of energy.It is hard although device described in following embodiment is preferably realized with software The realization of the combination of part or software and hardware is also that may and be contemplated.

Figure 10 is the structural block diagram of the intention assessment device under the dialogue frame in the embodiment of the present invention.As shown in Figure 10, Intention assessment device under the dialogue frame may include: that matching degree obtains module 10, matching judgment module 20, first intention knowledge Other module 30 and second intention identification module 40.

Matching degree obtains the matching degree for the regular monomer that module 10 obtains in corpus to be identified and preset rules template, described Preset rules template includes a plurality of regular monomer, the corresponding label of every rule monomer；

Matching judgment module 20 judges whether there is the matching degree greater than preset threshold；

First intention identification module 30 is greater than the matching degree of preset threshold if it exists, then by the corresponding rule of maximum matching degree The label of monomer is as intention assessment result；

Second intention identification module 40 is greater than the matching degree of preset threshold if it does not exist, then inputs the corpus to be identified The intention assessment model of pre-training, also, using the output of the intention assessment model of the pre-training as intention assessment result.

Through the above technical solution it is known that intention assessment device under dialogue frame provided in an embodiment of the present invention, By recycling machine learning model to identify when rule template cannot effectively identify that user is intended to, user is intended to, and rule match can To come into force in real time, the problem of machine learning model needs the training time that cannot come into force in real time can solve, can accurately identify use Family is intended to, lower to the quantitative requirement of training sample, promotes the hair of the artificial smart machines such as question answering system, dialogue robot Exhibition, the advantage of abundant binding rule template and machine learning model are solved and are significantly promoted under conditions of corpus quantity is few The problem of recognition accuracy can effectively improve the degree of intelligence of dialogue robot, accelerate the speed of production of dialogue robot.

In an alternative embodiment, the matching degree obtain module include: resolution unit, Rules Filtering unit and Matching degree computing unit.

Matching degree computing unit according to the multiple feature word to be matched and its part of speech calculate the corpus to be identified with The matching degree of dependency rule monomer.

In an alternative embodiment, the resolution unit includes: participle subelement, removes stop words subelement and word Property mark subelement.

In an alternative embodiment, the regular monomer is made of multiple word slots, each word slot be keyword and its Synonym or part of speech；

The Rules Filtering unit includes: search subelement, according to rule template described in a feature word search to be matched Obtaining word slot includes the regular monomer of the feature word to be matched as dependency rule monomer.

In an alternative embodiment, the corresponding weighted value of each institute's predicate slot；

The matching degree computing unit includes: coupling subelement, the cumulative subelement of weighted value.

Wherein, the weighted value of each word slot is defined in regular monomer.

In the following, citing is illustrated to matching degree is calculated:

In an alternative embodiment, the intention assessment device under dialogue frame further include: model construction module and Training module.

Model construction module constructs intention assessment model；

On machine learning model training problem, aiming at the problem that manually mark heavy workload, in an optional implementation In example, the training module includes: that cluster cell, sampling unit, the first training unit, mark unit and the second training are single Member solves the problems, such as manually to mark heavy workload when training by the cooperation of said units.

No label corpus sample remaining after sampling is inputted the initial intention assessment model and obtains every by mark unit The label of remaining no label corpus sample；

Second training unit utilizes the remaining no label corpus sample after modified sampling to know the initial intention Other model carries out further training and obtains the intention assessment model of pre-training.

In an alternative embodiment, it is intended that identification device further include: test sample obtains module, test module, survey Try judgment module, model output module and retraining module.

Test sample obtains the testing material that module obtains known label；

Test module tests the intention assessment model of the pre-training using the testing material of the known label, And using the output of the model as test result；

It tests judgment module and is based on the test result and known label, judge whether the intention assessment model of pre-training accords with Close preset requirement；

If the intention assessment model of model output module pre-training meets preset requirement, using "current" model as being used to anticipate Scheme the object module of identification.

If the intention assessment model of retraining module pre-training does not meet preset requirement, "current" model is optimized And/or model training is re-started using updated training sample set.

Device, module or the unit that above-described embodiment illustrates can specifically be realized, Huo Zheyou by computer chip or entity Product with certain function is realized.It is a kind of typical to realize that equipment is electronic equipment, specifically, electronic equipment for example can be with For personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, Any in navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment sets Standby combination.

Electronic equipment specifically includes memory, processor and storage on a memory and can in a typical example The computer program run on a processor, the processor realize following step when executing described program:

Judge whether there is the matching degree greater than preset threshold；

As can be seen from the above description, electronic equipment provided in an embodiment of the present invention, the intention assessment that can be used under dialogue frame, By recycling machine learning model to identify when rule template cannot effectively identify that user is intended to, user is intended to, and can accurately know Other user is intended to, lower to the quantitative requirement of training sample, promotes the artificial smart machines such as question answering system, dialogue robot Development.

Below with reference to Figure 11, it illustrates the structural representations for the electronic equipment 600 for being suitable for being used to realize the embodiment of the present application Figure.

As shown in figure 11, electronic equipment 600 includes central processing unit (CPU) 601, can be according to being stored in read-only deposit Program in reservoir (ROM) 602 is loaded into random access storage device (RAM) from storage section 608) program in 603 and Execute various work appropriate and processing.In RAM603, also it is stored with system 600 and operates required various programs and data. CPU601, ROM602 and RAM603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to bus 604。

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And including such as LAN card, the communications portion 609 of the network interface card of modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 606 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted as needed such as storage section 608.

Particularly, according to an embodiment of the invention, may be implemented as computer above with reference to the process of flow chart description Software program.For example, the embodiment of the present invention includes a kind of computer readable storage medium, it is stored thereon with computer program, The computer program realizes following step when being executed by processor:

Judge whether there is the matching degree greater than preset threshold；

As can be seen from the above description, computer readable storage medium provided in an embodiment of the present invention, can be used under dialogue frame Intention assessment, pass through rule template cannot effectively identify user be intended to when recycle machine learning model identify user meaning Figure can accurately identify user's intention, lower to the quantitative requirement of training sample, promote question answering system, dialogue robot etc. The development of artificial intelligence equipment.

In such embodiments, which can be downloaded and installed from network by communications portion 609, And/or it is mounted from detachable media 611.

Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer readable instructions, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase change memory (PRAM), static random access memory (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), flash memory or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassettes, tape magnetic disk storage or other magnetic storage devices Or any other non-transmission medium, can be used for storage can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as the data-signal and carrier wave of modulation.

For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.

The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability It include so that the process, method, commodity or the equipment that include a series of elements not only include those elements, but also to wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including described want There is also other identical elements in the process, method of element, commodity or equipment.

It will be understood by those skilled in the art that embodiments herein can provide as method, system or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims

1. the intension recognizing method under a kind of dialogue frame characterized by comprising

Judge whether there is the matching degree greater than preset threshold；

If it is not, by the intention assessment model of the corpus to be identified input pre-training, also, by the intention assessment of the pre-training The output of model is as intention assessment result.

2. the intension recognizing method under dialogue frame according to claim 1, which is characterized in that described to obtain language to be identified The matching degree of material and the regular monomer in preset rules template, comprising:

The matching of the corpus to be identified and dependency rule monomer is calculated according to the multiple feature word to be matched and its part of speech Degree.

3. the intension recognizing method under dialogue frame according to claim 2, which is characterized in that wait know described in the parsing Other corpus obtains multiple feature words to be matched and its part of speech, comprising:

The corpus to be identified is segmented to obtain multiple feature words；

4. the intension recognizing method under dialogue frame according to claim 2, which is characterized in that the regular monomer is by more A word slot composition, each word slot give the part of speech of word in word or word slot in word slot；

The rule list that word slot includes the feature word to be matched is obtained according to rule template described in a feature word search to be matched Body is as dependency rule monomer.

5. the intension recognizing method under dialogue frame according to claim 4, which is characterized in that each institute's predicate slot is corresponding One weighted value；

It is described that the corpus to be identified and dependency rule monomer are calculated according to the multiple feature word to be matched and its part of speech Matching degree, comprising:

The word slot of multiple feature words to be matched and its part of speech and a dependency rule monomer corresponding position is subjected to word or part of speech Matching；

The weighted value of the word slot of successful match is added up and obtains the matching degree of the corpus to be identified and the dependency rule monomer.

6. the intension recognizing method under dialogue frame according to claim 1, which is characterized in that further include:

Construct intention assessment model；

7. the intension recognizing method under dialogue frame according to claim 6, which is characterized in that described using no label language Material sample is trained the intention assessment model to obtain the intention assessment model of pre-training, comprising:

No label corpus sample remaining after sampling is inputted into the initial intention assessment model and obtains every remaining no label The label of corpus sample；

The initial intention assessment model is carried out using no label corpus sample remaining after modified sampling further Training obtains the intention assessment model of pre-training.

8. the intension recognizing method under dialogue frame according to claim 6, which is characterized in that further include:

Obtain the testing material of known label；

The intention assessment model of the pre-training is tested using the testing material of the known label, and by the model Output is used as test result；

9. the intension recognizing method under dialogue frame according to claim 8, which is characterized in that further include:

If "current" model does not meet preset requirement, "current" model is optimized and/or using updated training sample set Re-start model training.

10. the intention assessment device under a kind of dialogue frame characterized by comprising

Matching degree obtains module, obtains the matching degree of the regular monomer in corpus to be identified and preset rules template, described default Rule template includes a plurality of regular monomer, the corresponding label of every rule monomer；

First intention identification module is greater than the matching degree of preset threshold, then by the corresponding regular monomer of maximum matching degree if it exists Label as intention assessment result；

Second intention identification module is greater than the matching degree of preset threshold if it does not exist, then by the pre- instruction of corpus input to be identified Experienced intention assessment model, also, using the output of the intention assessment model of the pre-training as intention assessment result.

11. the intention assessment device under dialogue frame according to claim 10, which is characterized in that the matching degree obtains Module includes:

Rules Filtering unit obtains dependency rule monomer according to the multiple feature word to be matched from the rule template；

Matching degree computing unit, according to the multiple feature word to be matched and its part of speech calculate the corpus to be identified to it is related The matching degree of regular monomer.

12. the intention assessment device under dialogue frame according to claim 11, which is characterized in that the resolution unit packet It includes:

Part-of-speech tagging subelement, mark the multiple feature word to be matched part of speech obtain multiple feature words to be matched and its Part of speech.

13. the intention assessment device under dialogue frame according to claim 11, which is characterized in that the regular monomer by Multiple word slot compositions, each word slot are keyword and its synonym or part of speech；

The Rules Filtering unit includes:

Subelement is searched for, obtaining word slot according to rule template described in a feature word search to be matched includes the Feature Words to be matched The regular monomer of language is as dependency rule monomer.

14. the intention assessment device under dialogue frame according to claim 13, which is characterized in that each institute's predicate slot pair Answer a weighted value；

The matching degree computing unit includes:

Coupling subelement carries out the word slot of multiple feature words to be matched and its part of speech and a dependency rule monomer corresponding position Word or part of speech matching；

Weighted value adds up subelement, obtains that the corpus to be identified is related to this to advise for the weighted value of the word slot of successful match is cumulative The then matching degree of monomer.

15. the intention assessment device under dialogue frame according to claim 10, which is characterized in that further include:

Model construction module constructs intention assessment model；

Training module is trained to obtain the intention assessment of pre-training using no label corpus sample to the intention assessment model Model.

16. the intention assessment device under dialogue frame according to claim 15, which is characterized in that the training module packet It includes:

First training unit is trained the intention assessment model using the original sample by mark and is initially intended to Identification model；

Unit is marked, no label corpus sample remaining after sampling is inputted into the initial intention assessment model and obtains every residue The label without label corpus sample；

Second training unit, using no label corpus sample remaining after modified sampling to the initial intention assessment mould Type carries out further training and obtains the intention assessment model of pre-training.

17. the intention assessment device under dialogue frame according to claim 15, which is characterized in that further include:

Test sample obtains module, obtains the testing material of known label；

Judgment module is tested, the test result and known label is based on, judges whether the intention assessment model of pre-training meets Preset requirement；

Model output module, if the intention assessment model of pre-training meets preset requirement, using "current" model as being used to be intended to The object module of identification.

18. the intention assessment device under dialogue frame according to claim 17, which is characterized in that further include:

Retraining module, if the intention assessment model of pre-training does not meet preset requirement, "current" model is optimized and/or Model training is re-started using updated training sample set.

19. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor realizes the described in any item dialog boxes of claim 1 to 9 when executing described program The step of intension recognizing method under frame.

20. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program quilt The step of intension recognizing method under the described in any item dialogue frames of claim 1 to 9 is realized when processor executes.