CN110532400A

CN110532400A - Knowledge base maintenance method and device based on text classification prediction

Info

Publication number: CN110532400A
Application number: CN201910830001.1A
Authority: CN
Inventors: 李加庆; 沈春泽; 王景斌
Original assignee: Jiangsu Suning Bank Co Ltd
Current assignee: Jiangsu Suning Bank Co Ltd
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2019-12-03

Abstract

The invention discloses a kind of knowledge base maintenance method and devices based on text classification prediction.This method first passes through knowledge-based classification corpus to train knowledge point classification prediction model, it is then based on the maintenance that knowledge point classification prediction model carries out knowledge base, and recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation is provided and is associated with answer.The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improve the maintenance efficiency and quality of knowledge base.And by the dynamic monitoring of knowledge base, realizing that the near real-time training of disaggregated model updates in real time, carrying out continuing iteration to model, improve the recall rate and accuracy rate of entire customer service robot, promote the overall customer experience of intelligent customer service robot.

Description

Knowledge base maintenance method and device based on text classification prediction

Technical field

The present invention relates to field of artificial intelligence more particularly to a kind of knowledge base maintenance sides based on text classification prediction Method and device.

Background technique

Intelligent customer service robot has to compare and be widely applied in all trades and professions at present, according to the inquiry of user, in knowledge Related problem and answer are found in library.The quality of knowledge base largely determines the effect of intelligent robot, also determines The user experience of customer service is determined.

Knowledge base maintenance system is a pith in intelligent customer service system, and Normal practice is to rely on customer service people Member safeguards the data in knowledge base system, in order to improve the effect and accuracy rate of knowledge library searching, customer service personnel Need accurately to classify to knowledge base data, and as far as possible increase standard ask it is similar ask mutation, with improve intelligence visitor Take the intention assessment ability inquired to user.Point of multiple business scenarios can be included in the knowledge base in usual one vertical field Class can asks comprising the standard of multiple traffic issues below each classification, in order to more accurately allow user to understand problem, general one Standard ask can correspond to it is several it is similar ask, asked in the form of different problems by covering the same standard.Such " classification --- knowledge point The hierarchical structure of --- similar to ask --- knowledge point answer that standard is asked " constitutes the logical construction of knowledge base.Customer service personnel Data maintenance is carried out according to the update of such logical construction combination business scenario and knowledge point for the maintenance of knowledge base.

The maintenance of knowledge base generally comprises the new knowledge point standard of increase and asks, update existing knowledge point, increase knowledge point It is similar the operation such as to ask.Especially increase new knowledge point standard and asks and supplement that knowledge point standard asks similar asks two kinds of situation needs Business personnel carries out classification to it and specifies, and close knowledge point belongs to the same classification, this is necessary for recalling for problem, knows Knowledge point classification is specified to need being consistent property.

However during actual maintenance knowledge base, business personnel is for newly-increased knowledge point or increases newly belonging to similar ask Classification hold sometimes be not it is very quasi-, identical knowledge point or similar knowledge are looked in the existing knowledge point of knowledge base For point artificially to judge that the classification of new knowledge point is not very convenient, especially different customer service personnel safeguard the same knowledge It is easy to appear the situation at sixes and sevens for obscuring confusion so as to cause classification when library, and then production is accurately recalled to knowledge point answer It is raw to influence, influence the experience effect of customer service robot.

In view of this, R & D design goes out a kind of knowledge base maintenance method that can solve the above problem.

Summary of the invention

The purpose of the present invention aims to solve the problem that the above problem, to provide a kind of knowledge base maintenance based on text classification prediction Method and device.

To achieve the above object, in a first aspect, the present invention provides a kind of knowledge base dimensions based on text classification prediction Maintaining method, this method comprises the following steps:

1) the knowledge point question text of knowledge base management front-end interface user input is obtained；

2) the knowledge point question text for inputting user in step 1), which carries out string processing and text term vector, indicates；

3) preparatory trained knowledge point classification prediction model is called to be divided according to the expression of the text term vector of step 2) Class prediction obtains score value of all categories to calculate, then by score value and class label composition binary group (classification, score value) It sorts to obtain list of categories according to score value descending order；

4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses from list of categories One classification, and receive the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface；

5) standards under step 4) classification all in knowledge base are asked similar to the knowledge point problem progress of user's input Degree calculates, and asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge for the higher multiple standards of similarity Front-end interface is managed in depositary management, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem；It receives by knowledge base management system front end circle The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as New knowledge point problem and corresponding answer information, and be stored in repository database.

Further, the knowledge point question text of user's input is carried out string processing in the step 2) includes character Purifying, text error correction, business terms normalization and word segmentation processing.

Further, trained knowledge point classification prediction model specifically comprises the following steps: in advance in the step 3)

3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text This document form or grey iterative generation device object form；

Standards all in step 3a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection；

Step 3b) corpus data collections 3c) are divided into training set and test set, to the knowledge point question text in training set It is indicated using term vector, is trained using neural network, test verifying is carried out to the neural network model after training, is constructed Predictablity rate meets the knowledge point classification prediction model of threshold value；

Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change When quantity is greater than given threshold, then go to step 3a) it carries out re -training and updates.

Further, the step 3b) in pretreatment include character purifying, text error correction, business terms normalization and divide Word processing.

Further, the step 3c) in the neural network that uses for TextCNN or LSTM, term vector expression use Word2Vec model.

Further, the step 3d) it specifically includes:

Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the classification changed Quantity is denoted as M, and the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, and 1≤i≤ N.When M/N is greater than threshold value A, or there is an i, when KNi/Ki is greater than threshold value B, go to step 2a) pull knowledge base data And model modification training step is carried out, re -training is carried out to model and is updated.

Further, Text similarity computing is obtained by following equation in the step 5):

Wherein x, y represent knowledge point text, V_xAnd V_yRepresent the feature vector of text, t_xAnd t_yRepresent point of corresponding text Class label One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.

In second aspect, present aspect additionally provides a kind of knowledge base maintenance method and device based on text classification prediction, The device includes knowledge point classification prediction model training module and knowledge base maintenance module, and the knowledge base maintenance module is to base In the knowledge point, classification prediction model training module carries out the maintenance of knowledge base comprising: knowledge point question text obtains mould Block, knowledge point question text processing module, knowledge point classification prediction module, classification output and confirmation module and knowledge point answer are pre- Survey and classification correction verification module；

The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring Knowledge base realizes that the near real-time training of knowledge point classification prediction model updates；

The knowledge point question text obtains module, to obtain the knowledge point of knowledge base management front-end interface user input Question text, and it is sent to Text Pretreatment and classification prediction module；

The knowledge point question text processing module, the knowledge point question text to input to user carry out at character string Reason and text term vector indicate, and the expression of text term vector is sent to preparatory trained knowledge point disaggregated model training mould Block；

The knowledge point classification prediction module, to call knowledge point classification prediction model according to Text Pretreatment module Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small；

Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from A classification is chosen in list of categories, and receives the classification of knowledge base management front-end interface feedback confirmed through user；

The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base defeated with user The knowledge point problem entered carries out similarity calculation, and the higher multiple standards of similarity are asked as user's Input knowledge point problem Reference standard, which asks and is sent to knowledge base management front-end interface, to be known with therefrom being asked the highest standard of similarity by user with input Know point problem association, or using Input knowledge point problem as new knowledge point problem and typing answer corresponding with problem；And It receives and is associated with by the Input knowledge point problem via user's confirmation that knowledge base management system front-end interface is sent with what standard was asked Information, or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in knowledge base data In library.

Further, the knowledge point classification prediction model training module includes:

Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding classification Label is made into data pair, saves as text file form or grey iterative generation device object form；

Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base, Obtain corpus data collection；

Model training and authentication module, to divide training set and test set to by corpus data collections, in training set Knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training into The knowledge point classification prediction model that predictablity rate meets threshold value is constructed in row test verifying；

Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification in knowledge base It changes quantity and is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and test Card module carries out re -training and updates.

Further, the neural network that the model training and authentication module use is TextCNN or LSTM, term vector table Show using Word2Vec model.

It is proposed by the present invention it is a kind of based on text classification prediction knowledge base maintenance method, by knowledge-based classification corpus come Knowledge point classification prediction model is trained, knowledge based point classification prediction model carries out the maintenance of knowledge base, and passes through text phase It recalls like degree algorithm and is asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, by manual maintenance knowledge The process in library is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improves the maintenance effect of knowledge base Rate and quality.And the training of the near real-time by the dynamic monitoring of knowledge base, realizing disaggregated model in real time updates, to model into Row continues iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes the whole user's body of intelligent customer service robot It tests.

Detailed description of the invention

Fig. 1 is knowledge point provided in an embodiment of the present invention classification prediction model on-line training flow chart；

Fig. 2 is knowledge base knowledge point structure figure provided in an embodiment of the present invention；

Fig. 3 is the knowledge base maintenance method flow diagram of knowledge based point provided in an embodiment of the present invention classification prediction model；

Fig. 4 is the knowledge base maintenance structure drawing of device of knowledge based point disaggregated model provided in an embodiment of the present invention.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be noted that attached drawing is merely illustrative, do not drawn according to stringent ratio It makes, and promising may wherein describe convenient and the partial enlargement of progress, diminution, may also have for conventional partial structure certain It is default.

A kind of knowledge base maintenance method based on text classification prediction proposed by the present invention includes knowledge point classification prediction mould The step 2 of the knowledge base maintenance of step 1 and knowledge based point the classification prediction model of type training.

By taking the insurance customer service corpus of open source as an example, insurance domain knowledge base based on extraction section corpus includes people The class of insurance business such as body insurance, health insurance, car insurance, medical insurance, retired danger, long-term care danger, annuity.

Step 1: knowledge point classification prediction model training, the step are as shown in Figure 1.

Step 1-1: corpus integration.

Fig. 2 show a kind of logic association form of knowledge in knowledge base point structure, a standard ask can correspond to it is more It is a similar to ask.By the standard for insuring domain knowledge base ask about it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, (knowledge point problem --- label) saves as the text file form in memory, or saves as grey iterative generation device object form, Data item format content is as follows:

Serial number	Knowledge point problem	Class label
			1	Which does Q1: life-insurance product have (standard is asked)	Personal insurance
1-1	Which kind of life insurance should I select (similar to ask)	Personal insurance
			1-2	Recommend Product for life insurance agent's (similar to ask)	Personal insurance
……	……	……
			X	What relationship do Q2: driving record and car insurance have (standard is asked)	Car insurance
X-1	Does personal driving record influence vehicle insurance (similar to ask)	Car insurance
			Y	Does which selection Q3: I think buying car danger, there is (standard is asked)	Car insurance
……	……	……

Step 1-2: corpus of text pretreatment.

Standards all in step 1-1 are asked about corresponding similar ask to pre-process, obtain corpus data collection.Specifically: Character purifying is carried out to the knowledge point question text in knowledge base corpus, UTF-8 coded format is converted to, is gone with regular expression Except characters such as tab, messy code character, punctuation mark, spcial characters, using Ngram or homonym alternative forms to knowing The business vocabulary known in point carries out text error correction and term normalized, using Python open source participle tool Jieba, addition Insurance business dictionary carries out Chinese word segmentation one by one to knowledge point, obtains the corpus data collection after segmenting.

Step 1-3: model training and verifying.

To step 1-2, treated that labeled data is divided into training set and test set, to the knowledge point problem in training set Text is indicated using term vector, is trained using neural network, is carried out test verifying, structure to the neural network model after training Build out the text classification prediction model that predictablity rate meets threshold value.Wherein term vector uses Word2Vec model, this is a kind of Text vector representation comprising semantic information, the Semantic Similarity of text is indicated with the distance of low-dimensional vector space.Training The neural network used can be the heterogeneous networks model such as TextCNN, LSTM, and specific Training strategy is without limitation.

Step 1-4: near real-time model modification.

Real-time statistics are done to the change of knowledge base underlying database, classification sum is denoted as N, the categorical measure note changed For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N Greater than threshold value A (such as 10%), or there are an i, when KNi/Ki is greater than threshold value B (such as 15%), go to step It 2a) pulls knowledge base data and carries out model modification training step, re -training is carried out to model and update.

Step 2: the knowledge base maintenance of knowledge based point classification prediction model, the step are as shown in Figure 3.

Step 2-1 obtains the knowledge point question text of knowledge base management front-end interface user input, for example user is in knowledge Front-end interface Input knowledge point is managed in depositary management: " I wants to buy life insurance have which selection "

Step 2-2, by user input knowledge point question text " I want buy life insurance, have which selection " it carries out Pretreatment, the knowledge point are denoted as Q, and pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing, obtain To: " I wants to buy which selection personal insurance has " then carries out the expression of text term vector to the knowledge point Q after participle, calls Trained knowledge point classification prediction model indicates classify to predict to calculate and obtain according to text term vector in advance in step 1 Score value of all categories (such as having 50 kinds of classifications in knowledge base, then correspondence obtains 50 score value), then by score value with Class label forms binary group (classification, score), obtains binary group (classification, score) sequence according to score descending order List of categories [(personal insurance, 0.6), (car insurance, 0.2), (other insurances, 0.08) ...].Wherein maximum value 0.6 is corresponding Classification " personal insurance ", second largest value 0.2 are corresponding classification " car insurance ".

Step 2-3: classification output and confirmation, by step 2-2 list of categories [(personal insurance, 0.6), (car insurance, 0.2), (other insurances, 0.08) ...] it is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories Not (such as user confirm " personal insurance " be correct classification), and receive by knowledge base management front-end interface send via with A classification of confirmation is chosen from list of categories in family.

Step 2-4: knowledge point answer prediction and classification verification, by all standards under the category in knowledge base (personal insurance) It asks and carries out similarity calculation with the knowledge point problem Q of user's input, the higher multiple standards of similarity are asked and are inputted as user Knowledge base management front-end interface is asked and be sent to the reference standard of knowledge point problem, so that user selects according to the actual situation.Text This similarity calculation is obtained by following equation:

It is asked by multiple standards that above-mentioned algorithm obtains, than may be from the classification for confirming or specifying in user if any k In, it is also possible to from other classifications.Wherein the highest standard of similarity is asked, the reference standard as user's Input knowledge point It asks.The module purpose is to provide knowledge point answer prediction and classification verification, includes two kinds of situations:

(1) if the knowledge point that user increases newly is to belong to some standard to ask, the highest standard of this similarity ask by Be program speculate optimal selection, user, which can choose, asks Knowledge Relation to the standard, can also be associated with second or Third standard asks option, and so on.After association, newly-increased knowledge point is similar by one asked as standard supplement It asks, answer and the standard ask that unanimously classification will also be set as consistent.

Such as: have standard to ask Q1 under the personal insurance classification that user confirms in step 2-3: " which life-insurance product has " and Q3: " I think buying car danger, have which selection ".Take θ=0.3, wherein Final classification amendment similarity is Sim (Q, Q1)=0.7 × 0.5+0.3 × 1=0.65 and Sim (Q, Q3)=0.7 × 0.7+0.3 × 0=0.49).And so on complete all standards under personal insurance and ask, if calculating Q1 and Q similarity highest in result, So Q1 is asked as the most matching criteria of the corresponding prediction of knowledge point problem.It is ultimately sent to knowledge base management front-end interface The forward multiple standards that sort are asked.Confirm by user, standard asks that the answer of Q1 can be used as the answer of Input knowledge point, in front end Input knowledge point is associated with the standard and asks that classification remains unchanged, is saved in repository database.

(2) if the knowledge point that user increases newly is that a new knowledge point standard is asked, user directly enters to be known accordingly Know point answer.The new knowledge point problem of user's typing: " whether purchase life insurance needs to check UP ", via above-mentioned identical Confirm that the classification of the knowledge point problem is " personal insurance " in step 2-3 after processing, " personal insurance " classification is calculated by step 2-4 Under all standards ask the similarity with the knowledge point problem " purchase life insurance whether need to check UP " of user's input, by phase Ask that the reference standard as user's Input knowledge point problem asks and is sent to knowledge base management front end like higher multiple standards are spent Interface, so that user selects according to the actual situation.Since user knows that the knowledge point problem of oneself input is complications, User can not find the problem of inputting with it associated reference standard and ask, at this moment, be asked by the knowledge point that user specifies it to input Topic is asked as new knowledge point standard, and the corresponding answer of typing, then saves newly-increased knowledge point problem, answer and corresponding classification Into repository database.

It should be noted that the present invention is adapted to the more complete knowledge base of classification system.It the problem of for user's input, uses What family knew that newly-increased knowledge point problem or supplement standard are asked similar asks information.For supplement, through the invention may be used Auxiliary user recalls for what the judgement of knowledge point classification and standard were asked.That is: search which class is the problem belong in knowledge base Not, and the existing standard of matching is asked, can reduce artificial workload and error rate in this way.And the knowledge completely new for one Point, the present invention can play secondary category judgement.

The step of above knowledge point typing, class prediction, standard asks association, only needs a small amount of manual confirmation step in the process, Customer service the personnel type of manual confirmation knowledge point or artificial matching knowledge point in increasingly numerous and jumbled knowledge base is avoided to answer Case alleviates the workload of customer service knowledge base maintenance, improves the efficiency and quality of knowledge base maintenance.Fig. 4 is that the present invention is real The knowledge base maintenance structure drawing of device of the knowledge based point disaggregated model of example offer is provided.

As shown in figure 4, the knowledge base maintenance device of knowledge based point disaggregated model provided in an embodiment of the present invention includes knowing Know point classification prediction model training module 1 and knowledge base maintenance module 2, knowledge base maintenance module 2 is to be based on the knowledge point The knowledge point disaggregated model that disaggregated model training module 1 generates carries out the maintenance of knowledge base comprising: knowledge point question text obtains Modulus block 21, knowledge point question text processing module 22, knowledge point classification prediction module 23, classification output and 24 and of confirmation module Knowledge point answer prediction and classification correction verification module 25.

Knowledge point classification prediction model training module 1, to training knowledge point classification prediction model and real-time monitoring knowledge Library realizes that the near real-time training of knowledge point classification prediction model updates.

Knowledge point question text obtains module 21, and the knowledge point to obtain knowledge base management front-end interface user input is asked Text is inscribed, and is sent to knowledge point question text processing module.

Knowledge point question text processing module 22, the knowledge point question text to input to user carry out at character string Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, then carry out text for text after word segmentation processing This term vector indicates, and the expression of text term vector is sent to trained knowledge point classification prediction module in advance.

Knowledge point classification prediction module 23, to call knowledge point classification prediction model according to Text Pretreatment module text Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small.

Classification output and confirmation module 24, by list of categories be sent to knowledge base management front-end interface with by user from classification A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user.

Standards all under the category in knowledge base are asked and are inputted with user by knowledge point answer prediction and classification correction verification module 25 Knowledge point problem carry out similarity calculation, ask the higher multiple standards of similarity to the ginseng as user's Input knowledge point problem The standard of examining asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge by user Point problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem；And it receives The related information asked via the Input knowledge point problem and standard of user's confirmation sent by knowledge base management system front-end interface, Or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.

Wherein, classification prediction model training module 1 in knowledge point includes:

Knowledge base Data Integration module 11, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding point Class label is made into data pair, saves as text file form or grey iterative generation device object form.

Corpus of text preprocessing module 12 is located in advance to ask about corresponding similar ask to standards all in knowledge base Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, obtain corpus data collection.

Model training and authentication module 13, to divide training set and test set to by corpus data collections, to training set In knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training Test verifying is carried out, the textual classification model that predictablity rate meets threshold value is constructed.Wherein term vector uses Word2Vec mould Type, this is a kind of text vector representation comprising semantic information, and the semanteme of text is indicated with the distance of low-dimensional vector space Similitude.The neural network that training uses can not limit for the heterogeneous networks model such as TextCNN, LSTM, specific Training strategy It is fixed.

Near real-time model modification module 14 does real-time statistics to the change of knowledge base underlying database, when class in knowledge base Not Bian Geng quantity be greater than preset threshold, then expect that preprocessing module 12 and model are instructed by knowledge base Data Integration module 11, text Experienced and authentication module 13 carries out re -training and updates.Specifically: classification sum is denoted as N, the categorical measure note changed For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N Greater than threshold value A, or there are an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and carries out mould Type updates training step, carries out re -training to model and updates.Near real-time model modification module 14 passes through in real time to knowledge base Dynamic monitoring, realize disaggregated model near real-time training update, to model carry out continue iteration, improve entire customer service robot Recall rate and accuracy rate, promoted intelligent customer service robot overall customer experience.

To sum up, a kind of knowledge base maintenance method and device based on text classification prediction proposed by the present invention, passes through knowledge Library taxonomy come train knowledge point classification prediction model, knowledge based point classify prediction model carry out knowledge base maintenance, And recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, it will The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, and raising is known Know the maintenance efficiency and quality in library.And by the dynamic monitoring of knowledge base, realizing the near real-time training of disaggregated model in real time It updates, model is carried out to continue iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes intelligent customer service robot Overall customer experience.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of knowledge base maintenance method based on text classification prediction, which comprises the steps of:

3) the preparatory trained knowledge point classification prediction model of calling classify according to the expression of the text term vector of step 2) pre- Survey to calculate and obtain score value of all categories, then by score value and class label composition binary group (classification, score value) according to Score value descending order sorts to obtain list of categories；

4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories Not, and the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface is received；

5) standards under step 4) classification all in knowledge base are asked and carries out similarity meter with the knowledge point problem of user's input It calculates, asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge depositary management for the higher multiple standards of similarity System front end interface is managed, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem；It receives by knowledge base management system front end circle The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as New knowledge point problem and corresponding answer information, and be stored in repository database.

2. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 2) it includes that character purifying, text error correction, business terms are returned that the knowledge point question text of user's input, which is carried out string processing, in One change and word segmentation processing.

3. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 3) trained knowledge point classification prediction model specifically comprises the following steps: in advance in

3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text text Part form or grey iterative generation device object form；

Standards all in step 23a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection；

Step 3b) corpus data collections 3c) are divided into training set and test set, the knowledge point question text in training set is used Term vector indicates, is trained using neural network, carries out test verifying to the neural network model after training, construct prediction Accuracy rate meets the knowledge point classification prediction model of threshold value；

Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change quantity When greater than given threshold, then go to step 2a) it carries out re -training and updates.

4. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step Pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing in 3b).

5. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step For TextCNN or LSTM, term vector indicates to use Word2Vec model the neural network used in 3c).

6. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step 3d) specifically include:

Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the categorical measure changed It is denoted as M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.When M/N is greater than threshold value A, or there is an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and goes forward side by side Row model modification training step carries out re -training to model and updates.

7. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 5) Text similarity computing is obtained by following equation in:

Wherein x, y represent knowledge point text, V_xAnd V_yRepresent the feature vector of text, t_xAnd t_yRepresent the tag along sort of corresponding text One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.

8. a kind of knowledge base maintenance device based on text classification prediction, which is characterized in that including knowledge point classification prediction model Training module and knowledge base maintenance module, the knowledge base maintenance module is to based on knowledge point classification prediction model training The maintenance of module progress knowledge base comprising: knowledge point question text obtains module, knowledge point question text processing module, knows Know point classification prediction module, classification output and confirmation module and knowledge point answer prediction and classification correction verification module；

The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring knowledge Library realizes that the near real-time training of knowledge point classification prediction model updates；

The knowledge point question text obtains module, to obtain the knowledge point problem of knowledge base management front-end interface user input Text, and it is sent to knowledge point text processing module；

The knowledge point question text processing module, to knowledge point question text that user is inputted carry out string processing and Text term vector indicates, and the expression of text term vector is sent to knowledge point classification prediction module；

The knowledge point classification prediction module, to call knowledge point classification prediction model according to the text of Text Pretreatment module Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small；

Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from classification A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user；

The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base and user's input Knowledge point problem carries out similarity calculation, asks the higher multiple standards of similarity to the reference as user's Input knowledge point problem Standard asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge point by user Problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem；And receive by The related information asked via the Input knowledge point problem and standard of user's confirmation that knowledge base management system front-end interface is sent, or Using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.

9. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that described Knowledge point classification prediction model training module include:

Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding tag along sort Data pair are made into, text file form or grey iterative generation device object form are saved as；

Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base, is obtained Corpus data collection；

Model training and authentication module, to divide training set and test set to by corpus data collections, to knowing in training set Know point question text is indicated using term vector, is trained using neural network, is surveyed to the neural network model after training Test card constructs the knowledge point classification prediction model that predictablity rate meets threshold value；

Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification changes in knowledge base Quantity is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and verifying mould Block carries out re -training and updates.

10. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that institute The neural network that model training and authentication module use is stated as TextCNN or LSTM, term vector indicates to use Word2Vec model.