CN110532400A - Knowledge base maintenance method and device based on text classification prediction - Google Patents

Knowledge base maintenance method and device based on text classification prediction Download PDF

Info

Publication number
CN110532400A
CN110532400A CN201910830001.1A CN201910830001A CN110532400A CN 110532400 A CN110532400 A CN 110532400A CN 201910830001 A CN201910830001 A CN 201910830001A CN 110532400 A CN110532400 A CN 110532400A
Authority
CN
China
Prior art keywords
knowledge
knowledge point
text
knowledge base
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910830001.1A
Other languages
Chinese (zh)
Inventor
李加庆
沈春泽
王景斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Suning Bank Co Ltd
Original Assignee
Jiangsu Suning Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Suning Bank Co Ltd filed Critical Jiangsu Suning Bank Co Ltd
Priority to CN201910830001.1A priority Critical patent/CN110532400A/en
Publication of CN110532400A publication Critical patent/CN110532400A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of knowledge base maintenance method and devices based on text classification prediction.This method first passes through knowledge-based classification corpus to train knowledge point classification prediction model, it is then based on the maintenance that knowledge point classification prediction model carries out knowledge base, and recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation is provided and is associated with answer.The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improve the maintenance efficiency and quality of knowledge base.And by the dynamic monitoring of knowledge base, realizing that the near real-time training of disaggregated model updates in real time, carrying out continuing iteration to model, improve the recall rate and accuracy rate of entire customer service robot, promote the overall customer experience of intelligent customer service robot.

Description

Knowledge base maintenance method and device based on text classification prediction
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of knowledge base maintenance sides based on text classification prediction Method and device.
Background technique
Intelligent customer service robot has to compare and be widely applied in all trades and professions at present, according to the inquiry of user, in knowledge Related problem and answer are found in library.The quality of knowledge base largely determines the effect of intelligent robot, also determines The user experience of customer service is determined.
Knowledge base maintenance system is a pith in intelligent customer service system, and Normal practice is to rely on customer service people Member safeguards the data in knowledge base system, in order to improve the effect and accuracy rate of knowledge library searching, customer service personnel Need accurately to classify to knowledge base data, and as far as possible increase standard ask it is similar ask mutation, with improve intelligence visitor Take the intention assessment ability inquired to user.Point of multiple business scenarios can be included in the knowledge base in usual one vertical field Class can asks comprising the standard of multiple traffic issues below each classification, in order to more accurately allow user to understand problem, general one Standard ask can correspond to it is several it is similar ask, asked in the form of different problems by covering the same standard.Such " classification --- knowledge point The hierarchical structure of --- similar to ask --- knowledge point answer that standard is asked " constitutes the logical construction of knowledge base.Customer service personnel Data maintenance is carried out according to the update of such logical construction combination business scenario and knowledge point for the maintenance of knowledge base.
The maintenance of knowledge base generally comprises the new knowledge point standard of increase and asks, update existing knowledge point, increase knowledge point It is similar the operation such as to ask.Especially increase new knowledge point standard and asks and supplement that knowledge point standard asks similar asks two kinds of situation needs Business personnel carries out classification to it and specifies, and close knowledge point belongs to the same classification, this is necessary for recalling for problem, knows Knowledge point classification is specified to need being consistent property.
However during actual maintenance knowledge base, business personnel is for newly-increased knowledge point or increases newly belonging to similar ask Classification hold sometimes be not it is very quasi-, identical knowledge point or similar knowledge are looked in the existing knowledge point of knowledge base For point artificially to judge that the classification of new knowledge point is not very convenient, especially different customer service personnel safeguard the same knowledge It is easy to appear the situation at sixes and sevens for obscuring confusion so as to cause classification when library, and then production is accurately recalled to knowledge point answer It is raw to influence, influence the experience effect of customer service robot.
In view of this, R & D design goes out a kind of knowledge base maintenance method that can solve the above problem.
Summary of the invention
The purpose of the present invention aims to solve the problem that the above problem, to provide a kind of knowledge base maintenance based on text classification prediction Method and device.
To achieve the above object, in a first aspect, the present invention provides a kind of knowledge base dimensions based on text classification prediction Maintaining method, this method comprises the following steps:
1) the knowledge point question text of knowledge base management front-end interface user input is obtained;
2) the knowledge point question text for inputting user in step 1), which carries out string processing and text term vector, indicates;
3) preparatory trained knowledge point classification prediction model is called to be divided according to the expression of the text term vector of step 2) Class prediction obtains score value of all categories to calculate, then by score value and class label composition binary group (classification, score value) It sorts to obtain list of categories according to score value descending order;
4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses from list of categories One classification, and receive the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface;
5) standards under step 4) classification all in knowledge base are asked similar to the knowledge point problem progress of user's input Degree calculates, and asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge for the higher multiple standards of similarity Front-end interface is managed in depositary management, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem;It receives by knowledge base management system front end circle The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as New knowledge point problem and corresponding answer information, and be stored in repository database.
Further, the knowledge point question text of user's input is carried out string processing in the step 2) includes character Purifying, text error correction, business terms normalization and word segmentation processing.
Further, trained knowledge point classification prediction model specifically comprises the following steps: in advance in the step 3)
3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text This document form or grey iterative generation device object form;
Standards all in step 3a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection;
Step 3b) corpus data collections 3c) are divided into training set and test set, to the knowledge point question text in training set It is indicated using term vector, is trained using neural network, test verifying is carried out to the neural network model after training, is constructed Predictablity rate meets the knowledge point classification prediction model of threshold value;
Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change When quantity is greater than given threshold, then go to step 3a) it carries out re -training and updates.
Further, the step 3b) in pretreatment include character purifying, text error correction, business terms normalization and divide Word processing.
Further, the step 3c) in the neural network that uses for TextCNN or LSTM, term vector expression use Word2Vec model.
Further, the step 3d) it specifically includes:
Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the classification changed Quantity is denoted as M, and the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, and 1≤i≤ N.When M/N is greater than threshold value A, or there is an i, when KNi/Ki is greater than threshold value B, go to step 2a) pull knowledge base data And model modification training step is carried out, re -training is carried out to model and is updated.
Further, Text similarity computing is obtained by following equation in the step 5):
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent point of corresponding text Class label One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
In second aspect, present aspect additionally provides a kind of knowledge base maintenance method and device based on text classification prediction, The device includes knowledge point classification prediction model training module and knowledge base maintenance module, and the knowledge base maintenance module is to base In the knowledge point, classification prediction model training module carries out the maintenance of knowledge base comprising: knowledge point question text obtains mould Block, knowledge point question text processing module, knowledge point classification prediction module, classification output and confirmation module and knowledge point answer are pre- Survey and classification correction verification module;
The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring Knowledge base realizes that the near real-time training of knowledge point classification prediction model updates;
The knowledge point question text obtains module, to obtain the knowledge point of knowledge base management front-end interface user input Question text, and it is sent to Text Pretreatment and classification prediction module;
The knowledge point question text processing module, the knowledge point question text to input to user carry out at character string Reason and text term vector indicate, and the expression of text term vector is sent to preparatory trained knowledge point disaggregated model training mould Block;
The knowledge point classification prediction module, to call knowledge point classification prediction model according to Text Pretreatment module Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small;
Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from A classification is chosen in list of categories, and receives the classification of knowledge base management front-end interface feedback confirmed through user;
The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base defeated with user The knowledge point problem entered carries out similarity calculation, and the higher multiple standards of similarity are asked as user's Input knowledge point problem Reference standard, which asks and is sent to knowledge base management front-end interface, to be known with therefrom being asked the highest standard of similarity by user with input Know point problem association, or using Input knowledge point problem as new knowledge point problem and typing answer corresponding with problem;And It receives and is associated with by the Input knowledge point problem via user's confirmation that knowledge base management system front-end interface is sent with what standard was asked Information, or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in knowledge base data In library.
Further, the knowledge point classification prediction model training module includes:
Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding classification Label is made into data pair, saves as text file form or grey iterative generation device object form;
Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base, Obtain corpus data collection;
Model training and authentication module, to divide training set and test set to by corpus data collections, in training set Knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training into The knowledge point classification prediction model that predictablity rate meets threshold value is constructed in row test verifying;
Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification in knowledge base It changes quantity and is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and test Card module carries out re -training and updates.
Further, the neural network that the model training and authentication module use is TextCNN or LSTM, term vector table Show using Word2Vec model.
It is proposed by the present invention it is a kind of based on text classification prediction knowledge base maintenance method, by knowledge-based classification corpus come Knowledge point classification prediction model is trained, knowledge based point classification prediction model carries out the maintenance of knowledge base, and passes through text phase It recalls like degree algorithm and is asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, by manual maintenance knowledge The process in library is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, improves the maintenance effect of knowledge base Rate and quality.And the training of the near real-time by the dynamic monitoring of knowledge base, realizing disaggregated model in real time updates, to model into Row continues iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes the whole user's body of intelligent customer service robot It tests.
Detailed description of the invention
Fig. 1 is knowledge point provided in an embodiment of the present invention classification prediction model on-line training flow chart;
Fig. 2 is knowledge base knowledge point structure figure provided in an embodiment of the present invention;
Fig. 3 is the knowledge base maintenance method flow diagram of knowledge based point provided in an embodiment of the present invention classification prediction model;
Fig. 4 is the knowledge base maintenance structure drawing of device of knowledge based point disaggregated model provided in an embodiment of the present invention.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be noted that attached drawing is merely illustrative, do not drawn according to stringent ratio It makes, and promising may wherein describe convenient and the partial enlargement of progress, diminution, may also have for conventional partial structure certain It is default.
A kind of knowledge base maintenance method based on text classification prediction proposed by the present invention includes knowledge point classification prediction mould The step 2 of the knowledge base maintenance of step 1 and knowledge based point the classification prediction model of type training.
By taking the insurance customer service corpus of open source as an example, insurance domain knowledge base based on extraction section corpus includes people The class of insurance business such as body insurance, health insurance, car insurance, medical insurance, retired danger, long-term care danger, annuity.
Step 1: knowledge point classification prediction model training, the step are as shown in Figure 1.
Step 1-1: corpus integration.
Fig. 2 show a kind of logic association form of knowledge in knowledge base point structure, a standard ask can correspond to it is more It is a similar to ask.By the standard for insuring domain knowledge base ask about it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, (knowledge point problem --- label) saves as the text file form in memory, or saves as grey iterative generation device object form, Data item format content is as follows:
Serial number Knowledge point problem Class label
1 Which does Q1: life-insurance product have (standard is asked) Personal insurance
1-1 Which kind of life insurance should I select (similar to ask) Personal insurance
1-2 Recommend Product for life insurance agent's (similar to ask) Personal insurance
…… …… ……
X What relationship do Q2: driving record and car insurance have (standard is asked) Car insurance
X-1 Does personal driving record influence vehicle insurance (similar to ask) Car insurance
Y Does which selection Q3: I think buying car danger, there is (standard is asked) Car insurance
…… …… ……
Step 1-2: corpus of text pretreatment.
Standards all in step 1-1 are asked about corresponding similar ask to pre-process, obtain corpus data collection.Specifically: Character purifying is carried out to the knowledge point question text in knowledge base corpus, UTF-8 coded format is converted to, is gone with regular expression Except characters such as tab, messy code character, punctuation mark, spcial characters, using Ngram or homonym alternative forms to knowing The business vocabulary known in point carries out text error correction and term normalized, using Python open source participle tool Jieba, addition Insurance business dictionary carries out Chinese word segmentation one by one to knowledge point, obtains the corpus data collection after segmenting.
Step 1-3: model training and verifying.
To step 1-2, treated that labeled data is divided into training set and test set, to the knowledge point problem in training set Text is indicated using term vector, is trained using neural network, is carried out test verifying, structure to the neural network model after training Build out the text classification prediction model that predictablity rate meets threshold value.Wherein term vector uses Word2Vec model, this is a kind of Text vector representation comprising semantic information, the Semantic Similarity of text is indicated with the distance of low-dimensional vector space.Training The neural network used can be the heterogeneous networks model such as TextCNN, LSTM, and specific Training strategy is without limitation.
Step 1-4: near real-time model modification.
Real-time statistics are done to the change of knowledge base underlying database, classification sum is denoted as N, the categorical measure note changed For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N Greater than threshold value A (such as 10%), or there are an i, when KNi/Ki is greater than threshold value B (such as 15%), go to step It 2a) pulls knowledge base data and carries out model modification training step, re -training is carried out to model and update.
Step 2: the knowledge base maintenance of knowledge based point classification prediction model, the step are as shown in Figure 3.
Step 2-1 obtains the knowledge point question text of knowledge base management front-end interface user input, for example user is in knowledge Front-end interface Input knowledge point is managed in depositary management: " I wants to buy life insurance have which selection "
Step 2-2, by user input knowledge point question text " I want buy life insurance, have which selection " it carries out Pretreatment, the knowledge point are denoted as Q, and pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing, obtain To: " I wants to buy which selection personal insurance has " then carries out the expression of text term vector to the knowledge point Q after participle, calls Trained knowledge point classification prediction model indicates classify to predict to calculate and obtain according to text term vector in advance in step 1 Score value of all categories (such as having 50 kinds of classifications in knowledge base, then correspondence obtains 50 score value), then by score value with Class label forms binary group (classification, score), obtains binary group (classification, score) sequence according to score descending order List of categories [(personal insurance, 0.6), (car insurance, 0.2), (other insurances, 0.08) ...].Wherein maximum value 0.6 is corresponding Classification " personal insurance ", second largest value 0.2 are corresponding classification " car insurance ".
Step 2-3: classification output and confirmation, by step 2-2 list of categories [(personal insurance, 0.6), (car insurance, 0.2), (other insurances, 0.08) ...] it is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories Not (such as user confirm " personal insurance " be correct classification), and receive by knowledge base management front-end interface send via with A classification of confirmation is chosen from list of categories in family.
Step 2-4: knowledge point answer prediction and classification verification, by all standards under the category in knowledge base (personal insurance) It asks and carries out similarity calculation with the knowledge point problem Q of user's input, the higher multiple standards of similarity are asked and are inputted as user Knowledge base management front-end interface is asked and be sent to the reference standard of knowledge point problem, so that user selects according to the actual situation.Text This similarity calculation is obtained by following equation:
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent point of corresponding text Class label One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
It is asked by multiple standards that above-mentioned algorithm obtains, than may be from the classification for confirming or specifying in user if any k In, it is also possible to from other classifications.Wherein the highest standard of similarity is asked, the reference standard as user's Input knowledge point It asks.The module purpose is to provide knowledge point answer prediction and classification verification, includes two kinds of situations:
(1) if the knowledge point that user increases newly is to belong to some standard to ask, the highest standard of this similarity ask by Be program speculate optimal selection, user, which can choose, asks Knowledge Relation to the standard, can also be associated with second or Third standard asks option, and so on.After association, newly-increased knowledge point is similar by one asked as standard supplement It asks, answer and the standard ask that unanimously classification will also be set as consistent.
Such as: have standard to ask Q1 under the personal insurance classification that user confirms in step 2-3: " which life-insurance product has " and Q3: " I think buying car danger, have which selection ".Take θ=0.3, wherein Final classification amendment similarity is Sim (Q, Q1)=0.7 × 0.5+0.3 × 1=0.65 and Sim (Q, Q3)=0.7 × 0.7+0.3 × 0=0.49).And so on complete all standards under personal insurance and ask, if calculating Q1 and Q similarity highest in result, So Q1 is asked as the most matching criteria of the corresponding prediction of knowledge point problem.It is ultimately sent to knowledge base management front-end interface The forward multiple standards that sort are asked.Confirm by user, standard asks that the answer of Q1 can be used as the answer of Input knowledge point, in front end Input knowledge point is associated with the standard and asks that classification remains unchanged, is saved in repository database.
(2) if the knowledge point that user increases newly is that a new knowledge point standard is asked, user directly enters to be known accordingly Know point answer.The new knowledge point problem of user's typing: " whether purchase life insurance needs to check UP ", via above-mentioned identical Confirm that the classification of the knowledge point problem is " personal insurance " in step 2-3 after processing, " personal insurance " classification is calculated by step 2-4 Under all standards ask the similarity with the knowledge point problem " purchase life insurance whether need to check UP " of user's input, by phase Ask that the reference standard as user's Input knowledge point problem asks and is sent to knowledge base management front end like higher multiple standards are spent Interface, so that user selects according to the actual situation.Since user knows that the knowledge point problem of oneself input is complications, User can not find the problem of inputting with it associated reference standard and ask, at this moment, be asked by the knowledge point that user specifies it to input Topic is asked as new knowledge point standard, and the corresponding answer of typing, then saves newly-increased knowledge point problem, answer and corresponding classification Into repository database.
It should be noted that the present invention is adapted to the more complete knowledge base of classification system.It the problem of for user's input, uses What family knew that newly-increased knowledge point problem or supplement standard are asked similar asks information.For supplement, through the invention may be used Auxiliary user recalls for what the judgement of knowledge point classification and standard were asked.That is: search which class is the problem belong in knowledge base Not, and the existing standard of matching is asked, can reduce artificial workload and error rate in this way.And the knowledge completely new for one Point, the present invention can play secondary category judgement.
The step of above knowledge point typing, class prediction, standard asks association, only needs a small amount of manual confirmation step in the process, Customer service the personnel type of manual confirmation knowledge point or artificial matching knowledge point in increasingly numerous and jumbled knowledge base is avoided to answer Case alleviates the workload of customer service knowledge base maintenance, improves the efficiency and quality of knowledge base maintenance.Fig. 4 is that the present invention is real The knowledge base maintenance structure drawing of device of the knowledge based point disaggregated model of example offer is provided.
As shown in figure 4, the knowledge base maintenance device of knowledge based point disaggregated model provided in an embodiment of the present invention includes knowing Know point classification prediction model training module 1 and knowledge base maintenance module 2, knowledge base maintenance module 2 is to be based on the knowledge point The knowledge point disaggregated model that disaggregated model training module 1 generates carries out the maintenance of knowledge base comprising: knowledge point question text obtains Modulus block 21, knowledge point question text processing module 22, knowledge point classification prediction module 23, classification output and 24 and of confirmation module Knowledge point answer prediction and classification correction verification module 25.
Knowledge point classification prediction model training module 1, to training knowledge point classification prediction model and real-time monitoring knowledge Library realizes that the near real-time training of knowledge point classification prediction model updates.
Knowledge point question text obtains module 21, and the knowledge point to obtain knowledge base management front-end interface user input is asked Text is inscribed, and is sent to knowledge point question text processing module.
Knowledge point question text processing module 22, the knowledge point question text to input to user carry out at character string Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, then carry out text for text after word segmentation processing This term vector indicates, and the expression of text term vector is sent to trained knowledge point classification prediction module in advance.
Knowledge point classification prediction module 23, to call knowledge point classification prediction model according to Text Pretreatment module text Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small.
Classification output and confirmation module 24, by list of categories be sent to knowledge base management front-end interface with by user from classification A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user.
Standards all under the category in knowledge base are asked and are inputted with user by knowledge point answer prediction and classification correction verification module 25 Knowledge point problem carry out similarity calculation, ask the higher multiple standards of similarity to the ginseng as user's Input knowledge point problem The standard of examining asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge by user Point problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem;And it receives The related information asked via the Input knowledge point problem and standard of user's confirmation sent by knowledge base management system front-end interface, Or using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.
Wherein, classification prediction model training module 1 in knowledge point includes:
Knowledge base Data Integration module 11, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding point Class label is made into data pair, saves as text file form or grey iterative generation device object form.
Corpus of text preprocessing module 12 is located in advance to ask about corresponding similar ask to standards all in knowledge base Reason, including character purifying, text error correction, business terms normalization and word segmentation processing, obtain corpus data collection.
Model training and authentication module 13, to divide training set and test set to by corpus data collections, to training set In knowledge point question text using term vector indicate, be trained using neural network, to the neural network model after training Test verifying is carried out, the textual classification model that predictablity rate meets threshold value is constructed.Wherein term vector uses Word2Vec mould Type, this is a kind of text vector representation comprising semantic information, and the semanteme of text is indicated with the distance of low-dimensional vector space Similitude.The neural network that training uses can not limit for the heterogeneous networks model such as TextCNN, LSTM, specific Training strategy It is fixed.
Near real-time model modification module 14 does real-time statistics to the change of knowledge base underlying database, when class in knowledge base Not Bian Geng quantity be greater than preset threshold, then expect that preprocessing module 12 and model are instructed by knowledge base Data Integration module 11, text Experienced and authentication module 13 carries out re -training and updates.Specifically: classification sum is denoted as N, the categorical measure note changed For M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.Work as M/N Greater than threshold value A, or there are an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and carries out mould Type updates training step, carries out re -training to model and updates.Near real-time model modification module 14 passes through in real time to knowledge base Dynamic monitoring, realize disaggregated model near real-time training update, to model carry out continue iteration, improve entire customer service robot Recall rate and accuracy rate, promoted intelligent customer service robot overall customer experience.
To sum up, a kind of knowledge base maintenance method and device based on text classification prediction proposed by the present invention, passes through knowledge Library taxonomy come train knowledge point classification prediction model, knowledge based point classify prediction model carry out knowledge base maintenance, And recalled by text similarity measurement algorithm and asked with standard similar in knowledge point, secondary-confirmation and the associated means of answer are provided, it will The process of manual maintenance knowledge base is intelligent, and auxiliary customer service personnel accurately carry out the updating maintenance of knowledge point, and raising is known Know the maintenance efficiency and quality in library.And by the dynamic monitoring of knowledge base, realizing the near real-time training of disaggregated model in real time It updates, model is carried out to continue iteration, improves the recall rate and accuracy rate of entire customer service robot, promotes intelligent customer service robot Overall customer experience.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (10)

1. a kind of knowledge base maintenance method based on text classification prediction, which comprises the steps of:
1) the knowledge point question text of knowledge base management front-end interface user input is obtained;
2) the knowledge point question text for inputting user in step 1), which carries out string processing and text term vector, indicates;
3) the preparatory trained knowledge point classification prediction model of calling classify according to the expression of the text term vector of step 2) pre- Survey to calculate and obtain score value of all categories, then by score value and class label composition binary group (classification, score value) according to Score value descending order sorts to obtain list of categories;
4) list of categories in step 3) is sent to knowledge base management front-end interface, so that user chooses one kind from list of categories Not, and the classification for choosing confirmation from list of categories via user sent by knowledge base management front-end interface is received;
5) standards under step 4) classification all in knowledge base are asked and carries out similarity meter with the knowledge point problem of user's input It calculates, asks that the reference standard as user's Input knowledge point problem asks and is sent to knowledge depositary management for the higher multiple standards of similarity System front end interface is managed, is associated with so that user therefrom asks the highest standard of similarity with Input knowledge point problem, or will input Knowledge point problem is as new knowledge point problem and typing answer corresponding with problem;It receives by knowledge base management system front end circle The Input knowledge point problem and the related information asked of standard via user's confirmation that face is sent, or using Input knowledge point problem as New knowledge point problem and corresponding answer information, and be stored in repository database.
2. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 2) it includes that character purifying, text error correction, business terms are returned that the knowledge point question text of user's input, which is carried out string processing, in One change and word segmentation processing.
3. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 3) trained knowledge point classification prediction model specifically comprises the following steps: in advance in
3a) by the standard of knowledge base ask and it is corresponding it is similar ask, be made into data pair with corresponding tag along sort, save as text text Part form or grey iterative generation device object form;
Standards all in step 23a) 3b) are asked about corresponding similar ask to pre-process, obtain corpus data collection;
Step 3b) corpus data collections 3c) are divided into training set and test set, the knowledge point question text in training set is used Term vector indicates, is trained using neural network, carries out test verifying to the neural network model after training, construct prediction Accuracy rate meets the knowledge point classification prediction model of threshold value;
Real-time statistics 3d) are done to the change of knowledge base underlying database, when classification or a certain classification correspond to knowledge point change quantity When greater than given threshold, then go to step 2a) it carries out re -training and updates.
4. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step Pretreatment includes character purifying, text error correction, business terms normalization and word segmentation processing in 3b).
5. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step For TextCNN or LSTM, term vector indicates to use Word2Vec model the neural network used in 3c).
6. the knowledge base maintenance method according to claim 3 based on text classification prediction, which is characterized in that the step 3d) specifically include:
Real-time statistics are done to the knowledge point change of knowledge base underlying database, classification sum is denoted as N, the categorical measure changed It is denoted as M, the quantity for the knowledge point changed in classification i is denoted as KNi, and knowledge point sum before changing is Ki, 1≤i≤N.When M/N is greater than threshold value A, or there is an i, and when KNi/Ki is greater than threshold value B, go to step 2a) it pulls knowledge base data and goes forward side by side Row model modification training step carries out re -training to model and updates.
7. the knowledge base maintenance method according to claim 1 based on text classification prediction, which is characterized in that the step 5) Text similarity computing is obtained by following equation in:
Wherein x, y represent knowledge point text, VxAnd VyRepresent the feature vector of text, txAnd tyRepresent the tag along sort of corresponding text One-hot vector indicates that θ ∈ (0,1) is weight adjustment parameter.
8. a kind of knowledge base maintenance device based on text classification prediction, which is characterized in that including knowledge point classification prediction model Training module and knowledge base maintenance module, the knowledge base maintenance module is to based on knowledge point classification prediction model training The maintenance of module progress knowledge base comprising: knowledge point question text obtains module, knowledge point question text processing module, knows Know point classification prediction module, classification output and confirmation module and knowledge point answer prediction and classification correction verification module;
The knowledge point classification prediction model training module, to training knowledge point classification prediction model and real-time monitoring knowledge Library realizes that the near real-time training of knowledge point classification prediction model updates;
The knowledge point question text obtains module, to obtain the knowledge point problem of knowledge base management front-end interface user input Text, and it is sent to knowledge point text processing module;
The knowledge point question text processing module, to knowledge point question text that user is inputted carry out string processing and Text term vector indicates, and the expression of text term vector is sent to knowledge point classification prediction module;
The knowledge point classification prediction module, to call knowledge point classification prediction model according to the text of Text Pretreatment module Term vector indicates to carry out score value calculating of all categories, and score value and class label are formed binary group (classification, score), according to Score arranges obtain list of categories from big to small;
Classification output and confirmation module, by category list be sent to knowledge base management front-end interface with by user from classification A classification is chosen in list, and receives the classification of knowledge base management front-end interface feedback confirmed through user;
The knowledge point answer prediction and classification correction verification module ask standards all under the category in knowledge base and user's input Knowledge point problem carries out similarity calculation, asks the higher multiple standards of similarity to the reference as user's Input knowledge point problem Standard asks and is sent to knowledge base management front-end interface therefrom to be asked the highest standard of similarity and Input knowledge point by user Problem association, or using input pointing problem as new knowledge point problem and typing answer corresponding with problem;And receive by The related information asked via the Input knowledge point problem and standard of user's confirmation that knowledge base management system front-end interface is sent, or Using Input knowledge point problem as new knowledge point problem and corresponding answer information, and it is stored in repository database.
9. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that described Knowledge point classification prediction model training module include:
Knowledge base Data Integration module, to by the standard of knowledge base ask and it is corresponding it is similar ask, with corresponding tag along sort Data pair are made into, text file form or grey iterative generation device object form are saved as;
Corpus of text preprocessing module is pre-processed to ask about corresponding similar ask to standards all in knowledge base, is obtained Corpus data collection;
Model training and authentication module, to divide training set and test set to by corpus data collections, to knowing in training set Know point question text is indicated using term vector, is trained using neural network, is surveyed to the neural network model after training Test card constructs the knowledge point classification prediction model that predictablity rate meets threshold value;
Near real-time model modification module does real-time statistics to the change of knowledge base underlying database, when classification changes in knowledge base Quantity is greater than preset threshold, then by knowledge base Data Integration module, corpus of text preprocessing module and model training and verifying mould Block carries out re -training and updates.
10. a kind of knowledge base maintenance device based on text classification prediction according to claim 8, which is characterized in that institute The neural network that model training and authentication module use is stated as TextCNN or LSTM, term vector indicates to use Word2Vec model.
CN201910830001.1A 2019-09-04 2019-09-04 Knowledge base maintenance method and device based on text classification prediction Withdrawn CN110532400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910830001.1A CN110532400A (en) 2019-09-04 2019-09-04 Knowledge base maintenance method and device based on text classification prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910830001.1A CN110532400A (en) 2019-09-04 2019-09-04 Knowledge base maintenance method and device based on text classification prediction

Publications (1)

Publication Number Publication Date
CN110532400A true CN110532400A (en) 2019-12-03

Family

ID=68666534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910830001.1A Withdrawn CN110532400A (en) 2019-09-04 2019-09-04 Knowledge base maintenance method and device based on text classification prediction

Country Status (1)

Country Link
CN (1) CN110532400A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143533A (en) * 2019-12-26 2020-05-12 苏宁金融科技(南京)有限公司 Customer service method and system based on user behavior data
CN111221799A (en) * 2019-12-16 2020-06-02 广州科腾信息技术有限公司 IT knowledge intelligent operation management system
CN111241258A (en) * 2020-01-08 2020-06-05 泰康保险集团股份有限公司 Data cleaning method and device, computer equipment and readable storage medium
CN111259115A (en) * 2020-01-15 2020-06-09 车智互联(北京)科技有限公司 Training method and device for content authenticity detection model and computing equipment
CN111400413A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for determining category of knowledge points in knowledge base
CN111553140A (en) * 2020-05-13 2020-08-18 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium
CN112035325A (en) * 2020-09-01 2020-12-04 中国银行股份有限公司 Automatic monitoring method and device for text robot
CN112256850A (en) * 2019-12-31 2021-01-22 北京来也网络科技有限公司 Data processing method, equipment and storage medium combining RPA and AI
CN112766255A (en) * 2021-01-19 2021-05-07 上海微盟企业发展有限公司 Optical character recognition method, device, equipment and storage medium
CN113064887A (en) * 2021-03-22 2021-07-02 平安银行股份有限公司 Data management method, device, equipment and storage medium
CN113127769A (en) * 2021-04-07 2021-07-16 华东师范大学 Exercise label prediction system based on label tree and artificial intelligence
CN113723975A (en) * 2021-09-13 2021-11-30 国泰君安证券股份有限公司 System, method, device, processor and computer readable storage medium for realizing intelligent quality inspection processing in intelligent return visit service
CN117236934A (en) * 2023-11-01 2023-12-15 山东经纬信息集团有限公司 Industrial Internet remote monitoring operation and maintenance management system and working method thereof
CN117520475A (en) * 2023-12-29 2024-02-06 四川互慧软件有限公司 Construction method of nursing knowledge base

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221799A (en) * 2019-12-16 2020-06-02 广州科腾信息技术有限公司 IT knowledge intelligent operation management system
CN111143533A (en) * 2019-12-26 2020-05-12 苏宁金融科技(南京)有限公司 Customer service method and system based on user behavior data
CN111143533B (en) * 2019-12-26 2023-06-30 苏宁金融科技(南京)有限公司 Customer service method and system based on user behavior data
CN112256850A (en) * 2019-12-31 2021-01-22 北京来也网络科技有限公司 Data processing method, equipment and storage medium combining RPA and AI
CN111241258A (en) * 2020-01-08 2020-06-05 泰康保险集团股份有限公司 Data cleaning method and device, computer equipment and readable storage medium
CN111259115B (en) * 2020-01-15 2023-06-02 车智互联(北京)科技有限公司 Training method and device for content authenticity detection model and computing equipment
CN111259115A (en) * 2020-01-15 2020-06-09 车智互联(北京)科技有限公司 Training method and device for content authenticity detection model and computing equipment
CN111400413A (en) * 2020-03-10 2020-07-10 支付宝(杭州)信息技术有限公司 Method and system for determining category of knowledge points in knowledge base
CN111400413B (en) * 2020-03-10 2023-06-30 支付宝(杭州)信息技术有限公司 Method and system for determining category of knowledge points in knowledge base
CN111553140A (en) * 2020-05-13 2020-08-18 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium
CN111553140B (en) * 2020-05-13 2024-03-19 金蝶软件(中国)有限公司 Data processing method, data processing apparatus, and computer storage medium
CN112035325B (en) * 2020-09-01 2023-08-18 中国银行股份有限公司 Text robot automatic monitoring method and device
CN112035325A (en) * 2020-09-01 2020-12-04 中国银行股份有限公司 Automatic monitoring method and device for text robot
CN112766255A (en) * 2021-01-19 2021-05-07 上海微盟企业发展有限公司 Optical character recognition method, device, equipment and storage medium
CN113064887A (en) * 2021-03-22 2021-07-02 平安银行股份有限公司 Data management method, device, equipment and storage medium
CN113064887B (en) * 2021-03-22 2023-12-08 平安银行股份有限公司 Data management method, device, equipment and storage medium
CN113127769A (en) * 2021-04-07 2021-07-16 华东师范大学 Exercise label prediction system based on label tree and artificial intelligence
CN113127769B (en) * 2021-04-07 2022-07-29 华东师范大学 Exercise label prediction system based on label tree and artificial intelligence
CN113723975A (en) * 2021-09-13 2021-11-30 国泰君安证券股份有限公司 System, method, device, processor and computer readable storage medium for realizing intelligent quality inspection processing in intelligent return visit service
CN117236934A (en) * 2023-11-01 2023-12-15 山东经纬信息集团有限公司 Industrial Internet remote monitoring operation and maintenance management system and working method thereof
CN117236934B (en) * 2023-11-01 2024-05-07 山东经纬信息集团有限公司 Industrial Internet remote monitoring operation and maintenance management system
CN117520475A (en) * 2023-12-29 2024-02-06 四川互慧软件有限公司 Construction method of nursing knowledge base
CN117520475B (en) * 2023-12-29 2024-03-19 四川互慧软件有限公司 Construction method of nursing knowledge base

Similar Documents

Publication Publication Date Title
CN110532400A (en) Knowledge base maintenance method and device based on text classification prediction
CN109493166B (en) Construction method for task type dialogue system aiming at e-commerce shopping guide scene
CN107818138B (en) Case law regulation recommendation method and system
CN110175227B (en) Dialogue auxiliary system based on team learning and hierarchical reasoning
CN109783639B (en) Mediated case intelligent dispatching method and system based on feature extraction
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN110826320A (en) Sensitive data discovery method and system based on text recognition
CN113051365A (en) Industrial chain map construction method and related equipment
CN110516057B (en) Petition question answering method and device
CN111309887B (en) Method and system for training text key content extraction model
CN109325780A (en) A kind of exchange method of the intelligent customer service system in E-Governance Oriented field
CN112417132B (en) New meaning identification method for screening negative samples by using guest information
CN111159336A (en) Semi-supervised judicial entity and event combined extraction method
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
CN112016313A (en) Spoken language element identification method and device and alarm situation analysis system
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
CN115906842A (en) Policy information identification method
CN107818173A (en) A kind of false comment filter method of Chinese based on vector space model
CN112950414B (en) Legal text representation method based on decoupling legal elements
CN112200674B (en) Stock market emotion index intelligent calculation information system
CN111507849A (en) Authority guaranteeing method and related device and equipment
CN109635289A (en) Entry classification method and audit information abstracting method
CN112749530A (en) Text encoding method, device, equipment and computer readable storage medium
CN117875921B (en) Human resource management method and system based on artificial intelligence
CN110852070A (en) Document vector generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191203