CN110532400A - Knowledge base maintenance method and device based on text classification prediction - Google Patents
Knowledge base maintenance method and device based on text classification prediction Download PDFInfo
- Publication number
- CN110532400A CN110532400A CN201910830001.1A CN201910830001A CN110532400A CN 110532400 A CN110532400 A CN 110532400A CN 201910830001 A CN201910830001 A CN 201910830001A CN 110532400 A CN110532400 A CN 110532400A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- knowledge point
- text
- knowledge base
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of knowledge base maintenance method and devices based on text classification prediction.This method first passes through knowledge-based classification corpus to train knowledge point classification prediction model, it is then based on the maintenance that knowledge point classification prediction model carries out knowledge base, and recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation is provided and is associated with answer.The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improve the maintenance efficiency and quality of knowledge base.And by the dynamic monitoring of knowledge base, realizing that the near real-time training of disaggregated model updates in real time, carrying out continuing iteration to model, improve the recall rate and accuracy rate of entire customer service robot, promote the overall customer experience of intelligent customer service robot.
Description
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of knowledge base maintenance sides based on text classification prediction
Method and device.
Background technique
Intelligent customer service robot has to compare and be widely applied in all trades and professions at present, according to the inquiry of user, in knowledge
Related problem and answer are found in library.The quality of knowledge base largely determines the effect of intelligent robot, also determines
The user experience of customer service is determined.
Knowledge base maintenance system is a pith in intelligent customer service system, and Normal practice is to rely on customer service people
Member safeguards the data in knowledge base system, in order to improve the effect and accuracy rate of knowledge library searching, customer service personnel
Need accurately to classify to knowledge base data, and as far as possible increase standard ask it is similar ask mutation, with improve intelligence visitor
Take the intention assessment ability inquired to user.Point of multiple business scenarios can be included in the knowledge base in usual one vertical field
Class can asks comprising the standard of multiple traffic issues below each classification, in order to more accurately allow user to understand problem, general one
Standard ask can correspond to it is several it is similar ask, asked in the form of different problems by covering the same standard.Such " classification --- knowledge point
The hierarchical structure of --- similar to ask --- knowledge point answer that standard is asked " constitutes the logical construction of knowledge base.Customer service personnel
Data maintenance is carried out according to the update of such logical construction combination business scenario and knowledge point for the maintenance of knowledge base.
The maintenance of knowledge base generally comprises the new knowledge point standard of increase and asks, update existing knowledge point, increase knowledge point
It is similar the operation such as to ask.Especially increase new knowledge point standard and asks and supplement that knowledge point standard asks similar asks two kinds of situation needs
Business personnel carries out classification to it and specifies, and close knowledge point belongs to the same classification, this is necessary for recalling for problem, knows
Knowledge point classification is specified to need being consistent property.
However during actual maintenance knowledge base, business personnel is for newly-increased knowledge point or increases newly belonging to similar ask
Classification hold sometimes be not it is very quasi-, identical knowledge point or similar knowledge are looked in the existing knowledge point of knowledge base
For point artificially to judge that the classification of new knowledge point is not very convenient, especially different customer service personnel safeguard the same knowledge
It is easy to appear the situation at sixes and sevens for obscuring confusion so as to cause classification when library, and then production is accurately recalled to knowledge point answer
It is raw to influence, influence the experience effect of customer service robot.
In view of this, R & D design goes out a kind of knowledge base maintenance method that can solve the above problem.
Summary of the invention
The purpose of the present invention aims to solve the problem that the above problem, to provide a kind of knowledge base maintenance based on text classification prediction
Method and device.
To achieve the above object, in a first aspect, the present invention provides a kind of knowledge base dimensions based on text classification prediction
Maintaining method, this method comprises the following steps:
1) the knowledge point question text of knowledge base management front-end interface user input is obtained;
2) the knowledge point question text for inputting user in step 1), which carries out string processing and text term vector, indicates;
3) preparatory trained knowledge point classification prediction model is called to be divided according to the expression of the text term vector of step 2)
Class prediction obtains score value of all categories to calculate, then by score value and class label composition binary group (classification, score value)
It sorts to obtain list of categories according to score value descending order;
4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses from list of categories
One classification, and receive the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface;
5) standards under step 4) classification all in knowledge base are asked similar to the knowledge point problem progress of user's input
Degree calculates, and asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge for the higher multiple standards of similarity
Front-end interface is managed in depositary management, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input
Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem;It receives by knowledge base management system front end circle
The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as
New knowledge point problem and corresponding answer information, and be stored in repository database.
Further, the knowledge point question text of user's input is carried out string processing in the step 2) includes character
Purifying, text error correction, business terms normalization and word segmentation processing.
Further, trained knowledge point classification prediction model specifically comprises the following steps: in advance in the step 3)
3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text
This document form or grey iterative generation device object form;
Standards all in step 3a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection;
Step 3b) corpus data collections 3c) are divided into training set and test set, to the knowledge point question text in training set
It is indicated using term vector, is trained using neural network, test verifying is carried out to the neural network model after training, is constructed
Predictablity rate meets the knowledge point classification prediction model of threshold value;
Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change
When quantity is greater than given threshold, then go to step 3a) it carries out re -training and updates.
Further, the step 3b) in pretreatment include character purifying, text error correction, business terms normalization and divide
Word processing.
Further, the step 3c) in the neural network that uses for TextCNN or LSTM, term vector expression use
Word2Vec model.
Further, the step 3d) it specifically includes:
Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the classification changed
Quantity is denoted as M, and the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, and 1≤i≤
N.When M/N is greater than threshold value A, or there is an i, when KNi/Ki is greater than threshold value B, go to step 2a) pull knowledge base data
And model modification training step is carried out, re -training is carried out to model and is updated.
Further, Text similarity computing is obtained by following equation in the step 5):
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent point of corresponding text
Class label One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
In second aspect, present aspect additionally provides a kind of knowledge base maintenance method and device based on text classification prediction,
The device includes knowledge point classification prediction model training module and knowledge base maintenance module, and the knowledge base maintenance module is to base
In the knowledge point, classification prediction model training module carries out the maintenance of knowledge base comprising: knowledge point question text obtains mould
Block, knowledge point question text processing module, knowledge point classification prediction module, classification output and confirmation module and knowledge point answer are pre-
Survey and classification correction verification module;
The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring
Knowledge base realizes that the near real-time training of knowledge point classification prediction model updates;
The knowledge point question text obtains module, to obtain the knowledge point of knowledge base management front-end interface user input
Question text, and it is sent to Text Pretreatment and classification prediction module;
The knowledge point question text processing module, the knowledge point question text to input to user carry out at character string
Reason and text term vector indicate, and the expression of text term vector is sent to preparatory trained knowledge point disaggregated model training mould
Block;
The knowledge point classification prediction module, to call knowledge point classification prediction model according to Text Pretreatment module
Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to
Score arranges obtain list of categories from big to small;
Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from
A classification is chosen in list of categories, and receives the classification of knowledge base management front-end interface feedback confirmed through user;
The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base defeated with user
The knowledge point problem entered carries out similarity calculation, and the higher multiple standards of similarity are asked as user's Input knowledge point problem
Reference standard, which asks and is sent to knowledge base management front-end interface, to be known with therefrom being asked the highest standard of similarity by user with input
Know point problem association, or using Input knowledge point problem as new knowledge point problem and typing answer corresponding with problem;And
It receives and is associated with by the Input knowledge point problem via user's confirmation that knowledge base management system front-end interface is sent with what standard was asked
Information, or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in knowledge base data
In library.
Further, the knowledge point classification prediction model training module includes:
Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding classification
Label is made into data pair, saves as text file form or grey iterative generation device object form;
Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base,
Obtain corpus data collection;
Model training and authentication module, to divide training set and test set to by corpus data collections, in training set
Knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training into
The knowledge point classification prediction model that predictablity rate meets threshold value is constructed in row test verifying;
Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification in knowledge base
It changes quantity and is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and test
Card module carries out re -training and updates.
Further, the neural network that the model training and authentication module use is TextCNN or LSTM, term vector table
Show using Word2Vec model.
It is proposed by the present invention it is a kind of based on text classification prediction knowledge base maintenance method, by knowledge-based classification corpus come
Knowledge point classification prediction model is trained, knowledge based point classification prediction model carries out the maintenance of knowledge base, and passes through text phase
It recalls like degree algorithm and is asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, by manual maintenance knowledge
The process in library is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improves the maintenance effect of knowledge base
Rate and quality.And the training of the near real-time by the dynamic monitoring of knowledge base, realizing disaggregated model in real time updates, to model into
Row continues iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes the whole user's body of intelligent customer service robot
It tests.
Detailed description of the invention
Fig. 1 is knowledge point provided in an embodiment of the present invention classification prediction model on-line training flow chart;
Fig. 2 is knowledge base knowledge point structure figure provided in an embodiment of the present invention;
Fig. 3 is the knowledge base maintenance method flow diagram of knowledge based point provided in an embodiment of the present invention classification prediction model;
Fig. 4 is the knowledge base maintenance structure drawing of device of knowledge based point disaggregated model provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be noted that attached drawing is merely illustrative, do not drawn according to stringent ratio
It makes, and promising may wherein describe convenient and the partial enlargement of progress, diminution, may also have for conventional partial structure certain
It is default.
A kind of knowledge base maintenance method based on text classification prediction proposed by the present invention includes knowledge point classification prediction mould
The step 2 of the knowledge base maintenance of step 1 and knowledge based point the classification prediction model of type training.
By taking the insurance customer service corpus of open source as an example, insurance domain knowledge base based on extraction section corpus includes people
The class of insurance business such as body insurance, health insurance, car insurance, medical insurance, retired danger, long-term care danger, annuity.
Step 1: knowledge point classification prediction model training, the step are as shown in Figure 1.
Step 1-1: corpus integration.
Fig. 2 show a kind of logic association form of knowledge in knowledge base point structure, a standard ask can correspond to it is more
It is a similar to ask.By the standard for insuring domain knowledge base ask about it is corresponding it is similar ask, be made into data pair with corresponding tag along sort,
(knowledge point problem --- label) saves as the text file form in memory, or saves as grey iterative generation device object form,
Data item format content is as follows:
Serial number | Knowledge point problem | Class label |
1 | Which does Q1: life-insurance product have (standard is asked) | Personal insurance |
1-1 | Which kind of life insurance should I select (similar to ask) | Personal insurance |
1-2 | Recommend Product for life insurance agent's (similar to ask) | Personal insurance |
…… | …… | …… |
X | What relationship do Q2: driving record and car insurance have (standard is asked) | Car insurance |
X-1 | Does personal driving record influence vehicle insurance (similar to ask) | Car insurance |
Y | Does which selection Q3: I think buying car danger, there is (standard is asked) | Car insurance |
…… | …… | …… |
Step 1-2: corpus of text pretreatment.
Standards all in step 1-1 are asked about corresponding similar ask to pre-process, obtain corpus data collection.Specifically:
Character purifying is carried out to the knowledge point question text in knowledge base corpus, UTF-8 coded format is converted to, is gone with regular expression
Except characters such as tab, messy code character, punctuation mark, spcial characters, using Ngram or homonym alternative forms to knowing
The business vocabulary known in point carries out text error correction and term normalized, using Python open source participle tool Jieba, addition
Insurance business dictionary carries out Chinese word segmentation one by one to knowledge point, obtains the corpus data collection after segmenting.
Step 1-3: model training and verifying.
To step 1-2, treated that labeled data is divided into training set and test set, to the knowledge point problem in training set
Text is indicated using term vector, is trained using neural network, is carried out test verifying, structure to the neural network model after training
Build out the text classification prediction model that predictablity rate meets threshold value.Wherein term vector uses Word2Vec model, this is a kind of
Text vector representation comprising semantic information, the Semantic Similarity of text is indicated with the distance of low-dimensional vector space.Training
The neural network used can be the heterogeneous networks model such as TextCNN, LSTM, and specific Training strategy is without limitation.
Step 1-4: near real-time model modification.
Real-time statistics are done to the change of knowledge base underlying database, classification sum is denoted as N, the categorical measure note changed
For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N
Greater than threshold value A (such as 10%), or there are an i, when KNi/Ki is greater than threshold value B (such as 15%), go to step
It 2a) pulls knowledge base data and carries out model modification training step, re -training is carried out to model and update.
Step 2: the knowledge base maintenance of knowledge based point classification prediction model, the step are as shown in Figure 3.
Step 2-1 obtains the knowledge point question text of knowledge base management front-end interface user input, for example user is in knowledge
Front-end interface Input knowledge point is managed in depositary management: " I wants to buy life insurance have which selection "
Step 2-2, by user input knowledge point question text " I want buy life insurance, have which selection " it carries out
Pretreatment, the knowledge point are denoted as Q, and pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing, obtain
To: " I wants to buy which selection personal insurance has " then carries out the expression of text term vector to the knowledge point Q after participle, calls
Trained knowledge point classification prediction model indicates classify to predict to calculate and obtain according to text term vector in advance in step 1
Score value of all categories (such as having 50 kinds of classifications in knowledge base, then correspondence obtains 50 score value), then by score value with
Class label forms binary group (classification, score), obtains binary group (classification, score) sequence according to score descending order
List of categories [(personal insurance, 0.6), (car insurance, 0.2), (other insurances, 0.08) ...].Wherein maximum value 0.6 is corresponding
Classification " personal insurance ", second largest value 0.2 are corresponding classification " car insurance ".
Step 2-3: classification output and confirmation, by step 2-2 list of categories [(personal insurance, 0.6), (car insurance,
0.2), (other insurances, 0.08) ...] it is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories
Not (such as user confirm " personal insurance " be correct classification), and receive by knowledge base management front-end interface send via with
A classification of confirmation is chosen from list of categories in family.
Step 2-4: knowledge point answer prediction and classification verification, by all standards under the category in knowledge base (personal insurance)
It asks and carries out similarity calculation with the knowledge point problem Q of user's input, the higher multiple standards of similarity are asked and are inputted as user
Knowledge base management front-end interface is asked and be sent to the reference standard of knowledge point problem, so that user selects according to the actual situation.Text
This similarity calculation is obtained by following equation:
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent point of corresponding text
Class label One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
It is asked by multiple standards that above-mentioned algorithm obtains, than may be from the classification for confirming or specifying in user if any k
In, it is also possible to from other classifications.Wherein the highest standard of similarity is asked, the reference standard as user's Input knowledge point
It asks.The module purpose is to provide knowledge point answer prediction and classification verification, includes two kinds of situations:
(1) if the knowledge point that user increases newly is to belong to some standard to ask, the highest standard of this similarity ask by
Be program speculate optimal selection, user, which can choose, asks Knowledge Relation to the standard, can also be associated with second or
Third standard asks option, and so on.After association, newly-increased knowledge point is similar by one asked as standard supplement
It asks, answer and the standard ask that unanimously classification will also be set as consistent.
Such as: have standard to ask Q1 under the personal insurance classification that user confirms in step 2-3: " which life-insurance product has
" and Q3: " I think buying car danger, have which selection ".Take θ=0.3, wherein
Final classification amendment similarity is Sim (Q, Q1)=0.7 × 0.5+0.3 × 1=0.65 and Sim (Q, Q3)=0.7 × 0.7+0.3
× 0=0.49).And so on complete all standards under personal insurance and ask, if calculating Q1 and Q similarity highest in result,
So Q1 is asked as the most matching criteria of the corresponding prediction of knowledge point problem.It is ultimately sent to knowledge base management front-end interface
The forward multiple standards that sort are asked.Confirm by user, standard asks that the answer of Q1 can be used as the answer of Input knowledge point, in front end
Input knowledge point is associated with the standard and asks that classification remains unchanged, is saved in repository database.
(2) if the knowledge point that user increases newly is that a new knowledge point standard is asked, user directly enters to be known accordingly
Know point answer.The new knowledge point problem of user's typing: " whether purchase life insurance needs to check UP ", via above-mentioned identical
Confirm that the classification of the knowledge point problem is " personal insurance " in step 2-3 after processing, " personal insurance " classification is calculated by step 2-4
Under all standards ask the similarity with the knowledge point problem " purchase life insurance whether need to check UP " of user's input, by phase
Ask that the reference standard as user's Input knowledge point problem asks and is sent to knowledge base management front end like higher multiple standards are spent
Interface, so that user selects according to the actual situation.Since user knows that the knowledge point problem of oneself input is complications,
User can not find the problem of inputting with it associated reference standard and ask, at this moment, be asked by the knowledge point that user specifies it to input
Topic is asked as new knowledge point standard, and the corresponding answer of typing, then saves newly-increased knowledge point problem, answer and corresponding classification
Into repository database.
It should be noted that the present invention is adapted to the more complete knowledge base of classification system.It the problem of for user's input, uses
What family knew that newly-increased knowledge point problem or supplement standard are asked similar asks information.For supplement, through the invention may be used
Auxiliary user recalls for what the judgement of knowledge point classification and standard were asked.That is: search which class is the problem belong in knowledge base
Not, and the existing standard of matching is asked, can reduce artificial workload and error rate in this way.And the knowledge completely new for one
Point, the present invention can play secondary category judgement.
The step of above knowledge point typing, class prediction, standard asks association, only needs a small amount of manual confirmation step in the process,
Customer service the personnel type of manual confirmation knowledge point or artificial matching knowledge point in increasingly numerous and jumbled knowledge base is avoided to answer
Case alleviates the workload of customer service knowledge base maintenance, improves the efficiency and quality of knowledge base maintenance.Fig. 4 is that the present invention is real
The knowledge base maintenance structure drawing of device of the knowledge based point disaggregated model of example offer is provided.
As shown in figure 4, the knowledge base maintenance device of knowledge based point disaggregated model provided in an embodiment of the present invention includes knowing
Know point classification prediction model training module 1 and knowledge base maintenance module 2, knowledge base maintenance module 2 is to be based on the knowledge point
The knowledge point disaggregated model that disaggregated model training module 1 generates carries out the maintenance of knowledge base comprising: knowledge point question text obtains
Modulus block 21, knowledge point question text processing module 22, knowledge point classification prediction module 23, classification output and 24 and of confirmation module
Knowledge point answer prediction and classification correction verification module 25.
Knowledge point classification prediction model training module 1, to training knowledge point classification prediction model and real-time monitoring knowledge
Library realizes that the near real-time training of knowledge point classification prediction model updates.
Knowledge point question text obtains module 21, and the knowledge point to obtain knowledge base management front-end interface user input is asked
Text is inscribed, and is sent to knowledge point question text processing module.
Knowledge point question text processing module 22, the knowledge point question text to input to user carry out at character string
Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, then carry out text for text after word segmentation processing
This term vector indicates, and the expression of text term vector is sent to trained knowledge point classification prediction module in advance.
Knowledge point classification prediction module 23, to call knowledge point classification prediction model according to Text Pretreatment module text
Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to
Score arranges obtain list of categories from big to small.
Classification output and confirmation module 24, by list of categories be sent to knowledge base management front-end interface with by user from classification
A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user.
Standards all under the category in knowledge base are asked and are inputted with user by knowledge point answer prediction and classification correction verification module 25
Knowledge point problem carry out similarity calculation, ask the higher multiple standards of similarity to the ginseng as user's Input knowledge point problem
The standard of examining asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge by user
Point problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem;And it receives
The related information asked via the Input knowledge point problem and standard of user's confirmation sent by knowledge base management system front-end interface,
Or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.
Wherein, classification prediction model training module 1 in knowledge point includes:
Knowledge base Data Integration module 11, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding point
Class label is made into data pair, saves as text file form or grey iterative generation device object form.
Corpus of text preprocessing module 12 is located in advance to ask about corresponding similar ask to standards all in knowledge base
Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, obtain corpus data collection.
Model training and authentication module 13, to divide training set and test set to by corpus data collections, to training set
In knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training
Test verifying is carried out, the textual classification model that predictablity rate meets threshold value is constructed.Wherein term vector uses Word2Vec mould
Type, this is a kind of text vector representation comprising semantic information, and the semanteme of text is indicated with the distance of low-dimensional vector space
Similitude.The neural network that training uses can not limit for the heterogeneous networks model such as TextCNN, LSTM, specific Training strategy
It is fixed.
Near real-time model modification module 14 does real-time statistics to the change of knowledge base underlying database, when class in knowledge base
Not Bian Geng quantity be greater than preset threshold, then expect that preprocessing module 12 and model are instructed by knowledge base Data Integration module 11, text
Experienced and authentication module 13 carries out re -training and updates.Specifically: classification sum is denoted as N, the categorical measure note changed
For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N
Greater than threshold value A, or there are an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and carries out mould
Type updates training step, carries out re -training to model and updates.Near real-time model modification module 14 passes through in real time to knowledge base
Dynamic monitoring, realize disaggregated model near real-time training update, to model carry out continue iteration, improve entire customer service robot
Recall rate and accuracy rate, promoted intelligent customer service robot overall customer experience.
To sum up, a kind of knowledge base maintenance method and device based on text classification prediction proposed by the present invention, passes through knowledge
Library taxonomy come train knowledge point classification prediction model, knowledge based point classify prediction model carry out knowledge base maintenance,
And recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, it will
The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, and raising is known
Know the maintenance efficiency and quality in library.And by the dynamic monitoring of knowledge base, realizing the near real-time training of disaggregated model in real time
It updates, model is carried out to continue iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes intelligent customer service robot
Overall customer experience.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of knowledge base maintenance method based on text classification prediction, which comprises the steps of:
1) the knowledge point question text of knowledge base management front-end interface user input is obtained;
2) the knowledge point question text for inputting user in step 1), which carries out string processing and text term vector, indicates;
3) the preparatory trained knowledge point classification prediction model of calling classify according to the expression of the text term vector of step 2) pre-
Survey to calculate and obtain score value of all categories, then by score value and class label composition binary group (classification, score value) according to
Score value descending order sorts to obtain list of categories;
4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories
Not, and the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface is received;
5) standards under step 4) classification all in knowledge base are asked and carries out similarity meter with the knowledge point problem of user's input
It calculates, asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge depositary management for the higher multiple standards of similarity
System front end interface is managed, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input
Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem;It receives by knowledge base management system front end circle
The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as
New knowledge point problem and corresponding answer information, and be stored in repository database.
2. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step
2) it includes that character purifying, text error correction, business terms are returned that the knowledge point question text of user's input, which is carried out string processing, in
One change and word segmentation processing.
3. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step
3) trained knowledge point classification prediction model specifically comprises the following steps: in advance in
3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text text
Part form or grey iterative generation device object form;
Standards all in step 23a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection;
Step 3b) corpus data collections 3c) are divided into training set and test set, the knowledge point question text in training set is used
Term vector indicates, is trained using neural network, carries out test verifying to the neural network model after training, construct prediction
Accuracy rate meets the knowledge point classification prediction model of threshold value;
Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change quantity
When greater than given threshold, then go to step 2a) it carries out re -training and updates.
4. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step
Pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing in 3b).
5. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step
For TextCNN or LSTM, term vector indicates to use Word2Vec model the neural network used in 3c).
6. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step
3d) specifically include:
Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the categorical measure changed
It is denoted as M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.When
M/N is greater than threshold value A, or there is an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and goes forward side by side
Row model modification training step carries out re -training to model and updates.
7. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step
5) Text similarity computing is obtained by following equation in:
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent the tag along sort of corresponding text
One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
8. a kind of knowledge base maintenance device based on text classification prediction, which is characterized in that including knowledge point classification prediction model
Training module and knowledge base maintenance module, the knowledge base maintenance module is to based on knowledge point classification prediction model training
The maintenance of module progress knowledge base comprising: knowledge point question text obtains module, knowledge point question text processing module, knows
Know point classification prediction module, classification output and confirmation module and knowledge point answer prediction and classification correction verification module;
The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring knowledge
Library realizes that the near real-time training of knowledge point classification prediction model updates;
The knowledge point question text obtains module, to obtain the knowledge point problem of knowledge base management front-end interface user input
Text, and it is sent to knowledge point text processing module;
The knowledge point question text processing module, to knowledge point question text that user is inputted carry out string processing and
Text term vector indicates, and the expression of text term vector is sent to knowledge point classification prediction module;
The knowledge point classification prediction module, to call knowledge point classification prediction model according to the text of Text Pretreatment module
Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to
Score arranges obtain list of categories from big to small;
Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from classification
A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user;
The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base and user's input
Knowledge point problem carries out similarity calculation, asks the higher multiple standards of similarity to the reference as user's Input knowledge point problem
Standard asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge point by user
Problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem;And receive by
The related information asked via the Input knowledge point problem and standard of user's confirmation that knowledge base management system front-end interface is sent, or
Using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.
9. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that described
Knowledge point classification prediction model training module include:
Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding tag along sort
Data pair are made into, text file form or grey iterative generation device object form are saved as;
Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base, is obtained
Corpus data collection;
Model training and authentication module, to divide training set and test set to by corpus data collections, to knowing in training set
Know point question text is indicated using term vector, is trained using neural network, is surveyed to the neural network model after training
Test card constructs the knowledge point classification prediction model that predictablity rate meets threshold value;
Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification changes in knowledge base
Quantity is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and verifying mould
Block carries out re -training and updates.
10. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that institute
The neural network that model training and authentication module use is stated as TextCNN or LSTM, term vector indicates to use Word2Vec model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910830001.1A CN110532400A (en) | 2019-09-04 | 2019-09-04 | Knowledge base maintenance method and device based on text classification prediction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910830001.1A CN110532400A (en) | 2019-09-04 | 2019-09-04 | Knowledge base maintenance method and device based on text classification prediction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110532400A true CN110532400A (en) | 2019-12-03 |
Family
ID=68666534
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910830001.1A Withdrawn CN110532400A (en) | 2019-09-04 | 2019-09-04 | Knowledge base maintenance method and device based on text classification prediction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110532400A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111143533A (en) * | 2019-12-26 | 2020-05-12 | 苏宁金融科技(南京)有限公司 | Customer service method and system based on user behavior data |
CN111221799A (en) * | 2019-12-16 | 2020-06-02 | 广州科腾信息技术有限公司 | IT knowledge intelligent operation management system |
CN111241258A (en) * | 2020-01-08 | 2020-06-05 | 泰康保险集团股份有限公司 | Data cleaning method and device, computer equipment and readable storage medium |
CN111259115A (en) * | 2020-01-15 | 2020-06-09 | 车智互联(北京)科技有限公司 | Training method and device for content authenticity detection model and computing equipment |
CN111400413A (en) * | 2020-03-10 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and system for determining category of knowledge points in knowledge base |
CN111553140A (en) * | 2020-05-13 | 2020-08-18 | 金蝶软件(中国)有限公司 | Data processing method, data processing apparatus, and computer storage medium |
CN112035325A (en) * | 2020-09-01 | 2020-12-04 | 中国银行股份有限公司 | Automatic monitoring method and device for text robot |
CN112256850A (en) * | 2019-12-31 | 2021-01-22 | 北京来也网络科技有限公司 | Data processing method, equipment and storage medium combining RPA and AI |
CN112766255A (en) * | 2021-01-19 | 2021-05-07 | 上海微盟企业发展有限公司 | Optical character recognition method, device, equipment and storage medium |
CN113064887A (en) * | 2021-03-22 | 2021-07-02 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113127769A (en) * | 2021-04-07 | 2021-07-16 | 华东师范大学 | Exercise label prediction system based on label tree and artificial intelligence |
CN113723975A (en) * | 2021-09-13 | 2021-11-30 | 国泰君安证券股份有限公司 | System, method, device, processor and computer readable storage medium for realizing intelligent quality inspection processing in intelligent return visit service |
CN117236934A (en) * | 2023-11-01 | 2023-12-15 | 山东经纬信息集团有限公司 | Industrial Internet remote monitoring operation and maintenance management system and working method thereof |
CN117520475A (en) * | 2023-12-29 | 2024-02-06 | 四川互慧软件有限公司 | Construction method of nursing knowledge base |
-
2019
- 2019-09-04 CN CN201910830001.1A patent/CN110532400A/en not_active Withdrawn
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111221799A (en) * | 2019-12-16 | 2020-06-02 | 广州科腾信息技术有限公司 | IT knowledge intelligent operation management system |
CN111143533A (en) * | 2019-12-26 | 2020-05-12 | 苏宁金融科技(南京)有限公司 | Customer service method and system based on user behavior data |
CN111143533B (en) * | 2019-12-26 | 2023-06-30 | 苏宁金融科技(南京)有限公司 | Customer service method and system based on user behavior data |
CN112256850A (en) * | 2019-12-31 | 2021-01-22 | 北京来也网络科技有限公司 | Data processing method, equipment and storage medium combining RPA and AI |
CN111241258A (en) * | 2020-01-08 | 2020-06-05 | 泰康保险集团股份有限公司 | Data cleaning method and device, computer equipment and readable storage medium |
CN111259115B (en) * | 2020-01-15 | 2023-06-02 | 车智互联(北京)科技有限公司 | Training method and device for content authenticity detection model and computing equipment |
CN111259115A (en) * | 2020-01-15 | 2020-06-09 | 车智互联(北京)科技有限公司 | Training method and device for content authenticity detection model and computing equipment |
CN111400413A (en) * | 2020-03-10 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Method and system for determining category of knowledge points in knowledge base |
CN111400413B (en) * | 2020-03-10 | 2023-06-30 | 支付宝(杭州)信息技术有限公司 | Method and system for determining category of knowledge points in knowledge base |
CN111553140A (en) * | 2020-05-13 | 2020-08-18 | 金蝶软件(中国)有限公司 | Data processing method, data processing apparatus, and computer storage medium |
CN111553140B (en) * | 2020-05-13 | 2024-03-19 | 金蝶软件(中国)有限公司 | Data processing method, data processing apparatus, and computer storage medium |
CN112035325B (en) * | 2020-09-01 | 2023-08-18 | 中国银行股份有限公司 | Text robot automatic monitoring method and device |
CN112035325A (en) * | 2020-09-01 | 2020-12-04 | 中国银行股份有限公司 | Automatic monitoring method and device for text robot |
CN112766255A (en) * | 2021-01-19 | 2021-05-07 | 上海微盟企业发展有限公司 | Optical character recognition method, device, equipment and storage medium |
CN113064887A (en) * | 2021-03-22 | 2021-07-02 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113064887B (en) * | 2021-03-22 | 2023-12-08 | 平安银行股份有限公司 | Data management method, device, equipment and storage medium |
CN113127769A (en) * | 2021-04-07 | 2021-07-16 | 华东师范大学 | Exercise label prediction system based on label tree and artificial intelligence |
CN113127769B (en) * | 2021-04-07 | 2022-07-29 | 华东师范大学 | Exercise label prediction system based on label tree and artificial intelligence |
CN113723975A (en) * | 2021-09-13 | 2021-11-30 | 国泰君安证券股份有限公司 | System, method, device, processor and computer readable storage medium for realizing intelligent quality inspection processing in intelligent return visit service |
CN117236934A (en) * | 2023-11-01 | 2023-12-15 | 山东经纬信息集团有限公司 | Industrial Internet remote monitoring operation and maintenance management system and working method thereof |
CN117236934B (en) * | 2023-11-01 | 2024-05-07 | 山东经纬信息集团有限公司 | Industrial Internet remote monitoring operation and maintenance management system |
CN117520475A (en) * | 2023-12-29 | 2024-02-06 | 四川互慧软件有限公司 | Construction method of nursing knowledge base |
CN117520475B (en) * | 2023-12-29 | 2024-03-19 | 四川互慧软件有限公司 | Construction method of nursing knowledge base |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532400A (en) | Knowledge base maintenance method and device based on text classification prediction | |
CN109493166B (en) | Construction method for task type dialogue system aiming at e-commerce shopping guide scene | |
CN107818138B (en) | Case law regulation recommendation method and system | |
CN110175227B (en) | Dialogue auxiliary system based on team learning and hierarchical reasoning | |
CN109783639B (en) | Mediated case intelligent dispatching method and system based on feature extraction | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN110826320A (en) | Sensitive data discovery method and system based on text recognition | |
CN113051365A (en) | Industrial chain map construction method and related equipment | |
CN110516057B (en) | Petition question answering method and device | |
CN111309887B (en) | Method and system for training text key content extraction model | |
CN109325780A (en) | A kind of exchange method of the intelligent customer service system in E-Governance Oriented field | |
CN112417132B (en) | New meaning identification method for screening negative samples by using guest information | |
CN111159336A (en) | Semi-supervised judicial entity and event combined extraction method | |
CN116010581A (en) | Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene | |
CN112016313A (en) | Spoken language element identification method and device and alarm situation analysis system | |
CN111462752A (en) | Client intention identification method based on attention mechanism, feature embedding and BI-L STM | |
CN115906842A (en) | Policy information identification method | |
CN107818173A (en) | A kind of false comment filter method of Chinese based on vector space model | |
CN112950414B (en) | Legal text representation method based on decoupling legal elements | |
CN112200674B (en) | Stock market emotion index intelligent calculation information system | |
CN111507849A (en) | Authority guaranteeing method and related device and equipment | |
CN109635289A (en) | Entry classification method and audit information abstracting method | |
CN112749530A (en) | Text encoding method, device, equipment and computer readable storage medium | |
CN117875921B (en) | Human resource management method and system based on artificial intelligence | |
CN110852070A (en) | Document vector generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191203 |