CN113806558B

CN113806558B - Question selection method, knowledge graph construction device and electronic equipment

Info

Publication number: CN113806558B
Application number: CN202111105937.1A
Authority: CN
Inventors: 李海滨; 郭玮; 储开龙
Original assignee: Hubei Tiantian Digital Chain Technology Co ltd
Current assignee: Hubei Tiantian Digital Chain Technology Co ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2024-03-26
Anticipated expiration: 2041-09-22
Also published as: CN113806558A

Abstract

The application provides a problem selection method, a knowledge graph construction device and electronic equipment, and belongs to the technical field of computers. The problem selection method comprises the steps of obtaining a pre-established problem knowledge graph, wherein the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set; determining a first question from the question knowledge graph; determining target questions related to the semantic similarity of the first questions from the question knowledge graph; outputting the first question and the target question. Because the target problem is related to the semantic similarity of the first problem, the target problem and the first problem are problems of the same knowledge surface, and the interviewee is asked by the target problem and the first problem, so that the grasping degree of the interviewee on the knowledge surface can be more comprehensively examined, thereby helping interviewee questions and reducing the requirement of interview on the personal professional ability of the interviewee.

Description

Question selection method, knowledge graph construction device and electronic equipment

Technical Field

The present invention relates to the technical field of computers, and in particular, to a problem selection method, a knowledge graph construction device, and an electronic device.

Background

In the interview, the interviewer asks the interviewer to judge whether the interviewer meets the interview requirement, but the present interview method relies on the personal ability of the interviewer to ask questions of the interviewer, the interview method usually needs at least one professional in human resource management and at least one professional in the field of interview post as the interviewer, and the interviewer also needs to ask other questions again in the field after the interviewer asks a question in the technical field of interview post, so as to comprehensively check the professional ability of the interviewer, but the method requires the interviewer to comprehensively master the professional knowledge in the field, has higher personal professional ability requirement of the interviewer, so that during interview, the interviewer needs to have higher professional ability as the interviewer, the interview cost is improved, and the existing interview mode has lower efficiency.

Disclosure of Invention

The application provides a problem selection method, a knowledge graph construction device and electronic equipment, so as to solve the problems of low interview efficiency and high labor cost required by interview in the existing interview mode.

In a first aspect, the present application provides a problem selection method, including: acquiring a pre-established problem knowledge graph, wherein the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represent the connection relationship of the two corresponding problem nodes; determining a first question from the question knowledge graph; determining target questions related to the semantic similarity of the first questions from the question knowledge graph; outputting the first question and the target question.

In the embodiment of the application, the problem knowledge graph is established in advance, so that when the first problem is tried out, the target problem related to the semantic similarity of the first problem can be determined from the problem knowledge graph. Because the target problem is related to the semantic similarity of the first problem, the target problem and the first problem are problems of the same knowledge surface, the interviewee is asked through the target problem and the first problem, and the grasping degree of the interviewee on the knowledge surface can be comprehensively checked, so that the interviewee is assisted, the requirement of interview on the personal professional ability of the interview is reduced, and the problems of low interview efficiency and high labor cost required by the interview in the existing interview mode are further improved.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, determining, from the problem knowledge graph, a target problem related to a semantic similarity of the first problem includes: and determining a target problem with the semantic similarity with the first problem greater than a preset threshold value from the problem knowledge graph.

In the embodiment of the application, the problem that the semantic similarity between the knowledge graph and the first problem is larger than the preset threshold is taken as the target problem, and the problem in the knowledge graph is screened through the preset threshold, so that the correlation between the finally obtained target problem and the first problem is higher, and the target problem and the knowledge point of the first problem assessment are ensured to belong to the same knowledge plane as much as possible.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, determining, from the problem knowledge graph, a target problem related to a semantic similarity of the first problem includes: determining a second problem with the semantic similarity to the first problem larger than a first preset threshold value from the problem knowledge graph, deleting the first problem from the problem knowledge graph, and obtaining a second problem knowledge graph; determining a third problem with the semantic similarity to the second problem larger than a second preset threshold value from the second problem knowledge graph, deleting the second problem from the second problem knowledge graph, and obtaining a third problem knowledge graph until a preset stopping condition is met; the target problem includes a second problem and a third problem.

In the embodiment of the application, the problems in the problem knowledge graph are screened through a first preset threshold value to obtain a second problem with semantic similarity to the first problem being larger than the first preset threshold value, and the first problem is deleted from the problem knowledge graph to obtain the second problem knowledge graph; and screening the problems in the second problem knowledge graph through a second preset threshold value to obtain a third problem with semantic similarity larger than the first preset threshold value with the second problem, deleting the second problem from the problem knowledge graph to obtain a second problem knowledge graph, and the like until a preset stopping condition is met to obtain a target problem comprising the second problem and the third problem. Based on the method, the obtained knowledge points of the second problem and the first problem to be checked belong to the same knowledge plane, and the knowledge points of the third problem and the second problem to be checked belong to the same knowledge plane, so that the interviewee is gradually asked layer by layer, and the interviewee can be more comprehensively checked on the knowledge plane.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, the acquiring a pre-established knowledge graph of a problem includes: acquiring a problem set, wherein the problem set comprises at least two problems; obtaining a knowledge-graph vector for each problem in the problem set based on a pre-trained BERT (Bidirectional Encoder Representations from Transformers, bi-directional encoder representation from transformer) model; based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set; and constructing the problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set.

In the embodiment of the application, knowledge graph vectors of each problem in the problem set are obtained through a pre-trained BERT model, then semantic similarity of any two problems in the problem set is obtained through the knowledge graph vectors of each problem, and then the problem in the problem set is used as a node in the problem knowledge graph, and the semantic similarity is used as a connection relation of the two nodes where the two problems corresponding to the semantic similarity are located. Because the problems of the same knowledge surface are examined, the same words are necessarily included, so that the semantic similarity of the problems with the closer examined knowledge points is larger, the semantic similarity among the problems is used as the connection relation among the problem knowledge graph nodes, the obtained problem knowledge graph accords with actual requirements, the knowledge graph vector of each problem in the problem set is obtained, the semantic similarity of any two problems in the problem set is further obtained, and the problem knowledge graph is built according to the semantic similarity, so that the target problem related to the first problem can be found quickly in the interview.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, the obtaining, based on a pre-trained BERT model, a knowledge-graph vector of each problem in the problem set includes: acquiring all entities included in each question in the question set; aiming at each problem, obtaining a semantic vector of each entity in the problem based on the pre-trained BERT model and all entities included in the problem; and obtaining the knowledge graph vector of the problem based on the semantic vectors of all the entities in the problem and a preset rule.

According to the method and the device, semantic vectors of all entities in a problem are obtained through the BERT model, and then the knowledge graph vector of the problem is obtained based on the semantic vectors of all the entities of the problem and a preset rule, and because the knowledge graph vector of the problem is only determined by the entities in the problem, the influence of words, such as language words, related words and the like, irrelevant to assessment knowledge points on the knowledge graph vector of the problem can be effectively reduced, the knowledge graph vector can be known to more accurately represent the assessment knowledge points of the problem, and the finally obtained knowledge graph of the problem can more accurately reflect the correlation of different problems.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, a process for training the BERT model includes: acquiring a training problem set, wherein the entity of each problem in the training problem set is marked with a real named entity label; inputting the training problem set into a BERT pre-training model to obtain training semantic vectors of words included in each problem in the training problem set; inputting the obtained training semantic vector into a classification model to obtain a predicted named entity label of each entity; and updating parameters of the BERT model and the classification model based on the real named entity tag, the predicted named entity tag and a back propagation algorithm until preset conditions are met, so as to obtain a trained BERT model.

In the embodiment of the application, when the BERT model is trained, the prediction naming entity label of each entity is obtained by introducing the classification model, and then the parameters of the BERT model and the classification model are updated based on the real naming entity label, the prediction naming entity label and the back propagation algorithm, so that the training of the model is accelerated, the training time is shortened, and the accuracy of the model is improved.

With reference to the foregoing technical solution provided by the first aspect, in some possible implementation manners, acquiring an entity included in each problem in the problem set includes: inputting the question set into the BERT model to obtain semantic vectors of words included in each question in the question set; inputting the obtained semantic vector into a trained classification model to obtain a named entity label of each entity; and obtaining the entity included in each question according to the named entity label of the entity included in the question.

In a second aspect, the present application further provides a knowledge graph construction method, including: acquiring a problem set, wherein the problem set comprises at least two problems; obtaining a knowledge graph vector of each problem in the problem set based on a pre-trained BERT model; based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set; based on the problem set and the semantic similarity of any two problems in the problem set, a problem knowledge graph is constructed, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represents the connection relation of the two corresponding problem nodes.

In a third aspect, the present application further provides a problem selection apparatus, including an acquisition module, a determination module, and an output module. The acquisition module is used for acquiring a pre-established problem knowledge graph, wherein the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represent the connection relation of the two corresponding problem nodes; the determining module is used for determining a first problem from the problem knowledge graph; the determining module is further used for determining target questions related to the semantic similarity of the first questions from the question knowledge graph; the output module is used for outputting the first problem and the target problem.

In a fourth aspect, the present application further provides a knowledge graph construction apparatus, including: the device comprises an acquisition module and a processing module. The acquisition module is used for acquiring a problem set, wherein the problem set comprises at least two problems; the processing module is used for obtaining a knowledge graph vector of each problem in the problem set based on a pre-trained BERT model; the processing module is further used for obtaining the semantic similarity of any two questions in the question set based on the knowledge graph vector of each question; the processing module is further used for constructing a problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set, wherein the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represents the connection relation of the two corresponding problem nodes.

In a fifth aspect, embodiments of the present application further provide an electronic device, including: the device comprises a memory and a processor, wherein the memory is connected with the processor; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform a method as provided by the embodiments of the first aspect and/or any of the possible implementations in combination with the embodiments of the first aspect, or to perform a method as provided by the embodiments of the second aspect.

In a sixth aspect, the embodiments of the present application further provide a computer readable storage medium having stored thereon a computer program which, when executed by a computer, performs a method as provided by the embodiments of the first aspect and/or any of the possible implementations in combination with the embodiments of the first aspect, or performs a method as provided by the embodiments of the second aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a problem selection method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a problem knowledge graph structure provided in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a named entity labeling model according to an embodiment of the present disclosure;

fig. 4 is a flow chart of a knowledge graph construction method according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating a problem selection device according to an embodiment of the present application;

fig. 6 is a block diagram of a knowledge graph construction apparatus according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, relational terms such as "first," "second," and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Furthermore, the term "and/or" in this application is merely an association relation describing an association object, and indicates that three relations may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone.

The technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic diagram of a problem selection method according to an embodiment of the present application, and the steps included in the problem selection method will be described with reference to fig. 1.

S110: and acquiring a pre-established problem knowledge graph.

The knowledge graph of the problem can be obtained in advance, stored in a database and directly called when needed.

The problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represents the connection relation of the two corresponding problem nodes.

In one embodiment, the problems included in the problem knowledge graph are all problems in a specific technical field, for example, technical interview problems encountered in interview of an algorithm engineer may be: what is the principle of XGBoost algorithm? What is the method of XGBoost to prevent overfitting? What are the connections and differences of GBDT and XGBoost? What is the basic principle of the gradient-lifted tree GBDT? And the like. The examples herein are for ease of understanding only and should not be taken as limiting the present application.

In one embodiment, the process of obtaining the pre-established knowledge graph of the problem may be: firstly, acquiring a problem set, wherein the problem set comprises at least two problems; then, based on a pre-trained BERT model, obtaining a knowledge graph vector of each problem in the problem set; based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set; and finally, constructing a problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set. The problems in the problem set are used as nodes in the problem knowledge graph, and the semantic similarity is used as a connection relation between the two nodes where the two problems corresponding to the semantic similarity are located.

In one embodiment, the cosine similarity of the knowledge-graph vectors of the two questions is used as the semantic similarity of the two questions. In addition, when calculating the semantic similarity, the semantic similarity of a certain problem with itself may not be calculated.

In one embodiment, when a repeated problem is included in the problem set, the semantic similarity of the same problem is set to 0.

For easy understanding, please refer to fig. 2, fig. 2 is a schematic structural diagram of a problem knowledge graph according to an embodiment of the present application. The knowledge graph shown in fig. 2 includes 4 questions, namely a question 1, a question 2, a question 3 and a question 4, wherein the similarity 1 characterizes semantic similarity of the question 1 and the question 2; similarity 2 characterizes semantic similarity of problem 2 and problem 3; similarity 3 characterizes semantic similarity of problem 3 and problem 4; similarity 4 characterizes semantic similarity of problem 1 and problem 4; similarity 5 characterizes semantic similarity of problem 2 and problem 4; similarity 6 characterizes the semantic similarity of questions 1 and 3, any two questions being connected by their semantic similarity to each other.

In one embodiment, the process of obtaining the knowledge-graph vector of each question in the question set based on the pre-trained BERT model may be to first obtain all entities included in each question in the question set; then, aiming at each problem, based on a pre-trained BERT model and all entities included in the problem, obtaining a semantic vector of each entity in the problem; and finally, obtaining the knowledge graph vector of the problem based on the semantic vectors of all the entities in the problem and a preset rule.

Wherein the preset rule here may be to calculate the average of semantic vectors of all entities in the same problem; alternatively, it may be to calculate the sum of the semantic vectors of all entities in the same problem, i.e. the sum of the semantic vectors of all entities in the same problem.

The process of obtaining the entity included in each question in the question set may be: the labeling of the entity is completed through manual labeling naming, and then the labeled words are used as the entity; or the named entity identification of the entity included in each question in the question set is completed through NER (Named Entity Recognition, named entity identification), the named entity label of each entity is obtained, and words with the named entity label in each question are obtained; or firstly inputting the problem set into a BERT model to obtain the semantic vector of the word included in each problem in the problem set; inputting the obtained semantic vector into a trained classification model to obtain a named entity label of each entity; and finally, acquiring words with named entity labels in each question, wherein the words with the named entity labels in each question are the entities of the question.

Each of the above-described questions includes a word that may represent each word in the text, or each word in other languages, such as each word in english.

In one embodiment, the process of training the BERT model may be to first obtain a training problem set, where the entities of each problem in the training problem set are labeled with a true named entity tag; inputting the training problem set into a BERT pre-training model to obtain training semantic vectors of words included in each problem in the training problem set; inputting the obtained training semantic vector into a classification model to obtain a predicted named entity label of each entity; and finally updating parameters of the BERT pre-training model and the classification model based on the real named entity tags, the predicted named entity tags and the back propagation algorithm until preset conditions are met, so as to obtain a trained BERT model.

The preset condition here may be that the error between the actual named entity tag and the predicted named entity tag is less than or equal to a preset error threshold, where the preset error threshold may be set according to the actual requirement, and is not limited herein.

In one embodiment, a linear layer+softmax function may be utilized as the classification model described above.

The training problem set may be a subset of an overall interview problem set, where the overall interview problem set includes all questions prepared for interviews, and the training problem set includes 60% -80% of the questions in the overall interview problem set, and the specific proportion may be set according to practical situations, and is not limited herein. The above-mentioned general interview problem set constitutes a test problem set for testing the trained BERT model, except for the training problem set.

For ease of understanding, referring to fig. 3, fig. 3 is a block diagram of performing entity labeling on a sentence by using a BERT model and a linear layer+softmax function according to an embodiment of the present application. The BERT model inputs the semantic vector of each word in the input question data into a linear layer, namely a semantic vector 1, a semantic vector 2 and a semantic vector 3 … … semantic vector n, and the linear layer gives the NER label of each word in the input sentence, namely a named entity label.

S120: a first question is determined from a question knowledge graph.

The first question is determined from the question knowledge graph, and in one embodiment, one question may be automatically selected from the question knowledge graph according to a set program to be used as the first question, for example, one question may be randomly selected from the question knowledge graph to be used as the first question. In another embodiment, in response to an operation of selecting a question by a user, a question corresponding to the user operation may be selected from a question knowledge graph as the first question.

S130: a target question related to the semantic similarity of the first question is determined from the question knowledge graph.

And taking the problem related to the semantic similarity of the first problem in the problem knowledge graph as a target problem. The number of the target questions may be set according to actual requirements, for example, 1, 2, 3, 4, 5, 6 … … N, where N is a positive integer.

In one embodiment, the implementation process of S130 may be: and determining the target problem with the semantic similarity with the first problem greater than a preset threshold value from the problem knowledge graph. The preset threshold may be set according to actual requirements, for example, may be 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1 … … K,0< K <1.

When the number of the target questions is N and the number of questions with the semantic similarity to the first question greater than the preset threshold value is N in the question knowledge graph, the questions with the semantic similarity to the first question greater than the preset threshold value are taken as the target questions. And when the number of the questions with the semantic similarity to the first question larger than the preset threshold value is larger than N in the question knowledge graph, randomly selecting N questions from the questions with the semantic similarity to the first question larger than the preset threshold value as target questions. When the number of questions with the semantic similarity to the first question being greater than the preset threshold is smaller than N in the question knowledge graph, reducing the preset threshold until the number of questions with the semantic similarity to the first question being greater than the modified preset threshold is greater than or equal to N, and randomly selecting N questions from the questions with the semantic similarity to the first question being greater than the preset threshold as target questions, wherein the magnitude of reducing the preset threshold can be a preset fixed value, such as reducing by 0.01 each time, and the specific reduction value is not limited; the preset threshold value may be reduced in an equal proportion, for example, 1% of the preset threshold value may be reduced each time, and the specific reduction proportion is not limited herein.

In one embodiment, the first N questions with the greatest semantic similarity to the first question in the question knowledge graph are taken as target questions, for example, the first N questions with the greatest similarity are selected as target questions in the order of high similarity to bottom. Wherein N is a positive integer.

In one embodiment, the implementation process of S130 may be: determining a second problem with the semantic similarity to the first problem larger than a first preset threshold value from the problem knowledge graph, deleting the first problem from the problem knowledge graph, and obtaining a second problem knowledge graph; determining a third problem with the semantic similarity larger than a second preset threshold value from the second problem knowledge graph, deleting the second problem from the second problem knowledge graph, and obtaining the third problem knowledge graph until a preset stopping condition is met; the objective problems include a second problem, a third problem. The second knowledge graph is the knowledge graph from which the first problem is deleted; the third knowledge-graph is a second knowledge-graph from which the second problem was deleted.

The preset stopping condition may be that the number of times of the circulation reaches a preset value, the preset value is a positive integer, when the preset value is 1, the circulation is stopped after the second problem is obtained, and the target problem at the moment includes a first problem and a second problem; when the preset value asks 2, the target questions at the moment comprise a first question, a second question and a third question; when the preset value is n (n is a positive integer), the target problems at this time include a first problem, a second problem, a third problem … …, and an n+1th problem. The first preset threshold value and the second preset threshold value … … and the nth preset threshold value can be the same, namely the same value is taken; or the first preset threshold value and the second preset threshold value … … are partially the same or completely different from each other, and specific values of the first preset threshold value and the second preset threshold value … … and the n preset threshold value can be set according to actual requirements, which is not limited herein.

In one embodiment, the preset stopping condition may be that a preset number of target questions is obtained, for example, when the preset number is 4, that is, when the number of target questions to be obtained is 4, if only 1 question is selected from the question knowledge graph each time, the cycle is stopped after obtaining the fifth question, and the target questions at this time include the second question, the third question, the fourth question, and the fifth question. The preset number may be set according to actual requirements, and the preset number may be a positive integer, which is only for understanding, and should not be taken as a limitation of the application.

In one embodiment, determining an n+1-th problem with the semantic similarity to the n-th problem greater than an n-th preset threshold from the problem knowledge graph may be to randomly select one problem from the problems with the semantic similarity to the n-th problem greater than the n-th preset threshold as the n+1-th problem; or the problem with the greatest semantic similarity with the nth problem is regarded as the (n+1) th problem.

In yet another embodiment, k is selected when a second problem with a semantic similarity to the first problem greater than a first preset threshold is determined from the problem knowledge graph ₁ The problems are as the second problem, k ₁ Is a positive integer; determining semantic facies with the questions from the question knowledge graph for each of the second questions when determining a third question from the question knowledge graph that has a semantic similarity with the second question greater than a second preset thresholdK with similarity greater than a second preset threshold ₂ The third problem is k ₂ Is a positive integer, i.e. the third problem comprises k ₁ ×k ₂ A problem; and so on, in the nth cycle, determining k with the semantic similarity to the nth problem greater than the nth preset threshold value from the problem knowledge graph _n The n+1 th problem, k _n Is a positive integer, i.e. the n+1 th problem includes k ₁ ×k ₂ ……×k _n A problem. Wherein k is ₁ 、k ₂ ……、k _n May be equal, or may be partially equal or not equal in total, k ₁ 、k ₂ ……、k _n The specific values of (c) may be set according to actual needs, and are not limited herein.

The specific implementation manner of selecting the problem from the problem knowledge graph is consistent with the implementation manner of determining the target problem from the problem knowledge graph, and is not described herein.

S140: outputting the first question and the target question.

The first question and the target question are output so that, for example, the interview function asks the interviewer based on the question.

Referring to fig. 4, fig. 4 is a schematic diagram of a knowledge graph construction method according to an embodiment of the present application, and the steps included in the method will be described with reference to fig. 1.

S210: a problem set is obtained.

S220: and obtaining a knowledge graph vector of each problem in the problem set based on the pre-trained BERT model.

S230: based on the knowledge graph vector of each problem, the semantic similarity of any two problems in the problem set is obtained.

S240: and constructing a problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set.

The specific process of constructing the knowledge graph is already described and will not be described again here.

Referring to fig. 5, fig. 5 is a schematic diagram of a problem selection apparatus 10 according to an embodiment of the present application, which includes an obtaining module 110, a determining module 120, and an output module 130.

The obtaining module 110 is configured to obtain a pre-established problem knowledge graph, where the problem knowledge graph includes a problem set and semantic similarity of any two problems in the problem set, and the problem in the problem set represents a problem node in the problem knowledge graph, and the semantic similarity of any two problems represents a connection relationship between two corresponding problem nodes.

The determining module 120 is configured to determine a first question from the question knowledge graph.

The determining module 120 is further configured to determine a target question related to the semantic similarity of the first question from the question knowledge graph.

The output module 130 is configured to output the first question and the target question.

The determining module 120 is specifically configured to determine, from the problem knowledge graph, a target problem having a semantic similarity with the first problem greater than a preset threshold.

The determining module 120 is specifically configured to determine a second problem with a semantic similarity with the first problem being greater than a first preset threshold from the problem knowledge graph, and delete the first problem from the problem knowledge graph to obtain a second problem knowledge graph; determining a third problem with the semantic similarity to the second problem larger than a second preset threshold value from the second problem knowledge graph, deleting the second problem from the second problem knowledge graph, and obtaining a third problem knowledge graph until a preset stopping condition is met; the target problem includes the second problem and the third problem.

The question selection device 10 further comprises a construction module for obtaining a question set comprising at least two questions; obtaining a knowledge graph vector of each problem in the problem set based on a pre-trained BERT model; based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set; and constructing the problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set.

The construction module is specifically configured to obtain all entities included in each question in the question set; for each question, obtaining a semantic vector for each word in the question based on the pre-trained BERT model and all entities included in the question; and obtaining the knowledge graph vector of the problem based on the semantic vectors of all the entities in the problem and a preset rule.

The construction module is specifically used for inputting the question set into the BERT model to obtain the semantic vector of the word included in each question in the question set; inputting the obtained training semantic vector into a trained classification model to obtain a named entity label of each entity; and obtaining the entity included in each question according to the named entity label of the entity included in the question.

The question selection device 10 further includes a training module, configured to obtain a training question set, where entities of each question in the training question set are labeled with a true named entity label; inputting the training problem set into a BERT pre-training model to obtain training semantic vectors of entities included in each problem in the training problem set; inputting the obtained training semantic vector into a classification model to obtain a predicted named entity label of each entity; and updating parameters of the BERT pre-training model and the classification model based on the real named entity label, the predicted named entity label and a back propagation algorithm until preset conditions are met, so as to obtain a trained BERT model.

The specific operation and implementation principle of the problem selection device 10 are already described and will not be described herein.

Referring to fig. 6, fig. 6 is a schematic diagram construction apparatus 20 according to an embodiment of the present application, which includes an obtaining module 210 and a processing module 220.

The obtaining module 210 is configured to obtain a question set, where the question set includes at least two questions;

the processing module 220 is configured to obtain a knowledge-graph vector of each question in the question set based on a pre-trained BERT model;

the processing module 220 is further configured to obtain semantic similarity of any two questions in the question set based on the knowledge graph vector of each question;

the processing module 220 is further configured to construct a problem knowledge graph based on the problem set and semantic similarities of any two problems in the problem set, where the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarities of any two problems represent connection relationships between two corresponding problem nodes.

The specific working contents and implementation principles of the knowledge graph construction device 20 are already described above, and will not be described herein.

Please refer to fig. 7, which is an electronic device provided in an embodiment of the present application. The electronic device 300 includes: transceiver 310, memory 320, communication bus 330, processor 340.

The transceiver 310, the memory 320, and the processor 340 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically coupled to each other via one or more communication buses 330 or signal lines. Wherein the transceiver 310 is used for receiving and transmitting data. The memory 320 is used to store a computer program, such as a software function module shown in fig. 5 or 6, that is, the problem selection device 10 in fig. 5, or the knowledge graph construction device 20 in fig. 6. The problem selection means 10 comprise at least one software function module which may be stored in the memory 320 in the form of software or firmware (firmware) or which is solidified in an Operating System (OS) of the electronic device 300. The processor 340 is configured to execute the executable modules stored in the memory 320.

For example, the processor 340, when executing the software functional modules or computer programs included in the problem selection device 10, is configured to: acquiring a pre-established problem knowledge graph, wherein the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represent the connection relationship of the two corresponding problem nodes; determining a first question from the question knowledge graph; determining target questions related to the semantic similarity of the first questions from the question knowledge graph; outputting the first question and the target question.

For example, the processor 340, when executing the software functional modules or computer programs included in the knowledge graph construction apparatus 20, is configured to: acquiring a problem set, wherein the problem set comprises at least two problems; obtaining a knowledge graph vector of each problem in the problem set based on a pre-trained BERT model; based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set; based on the problem set and the semantic similarity of any two problems in the problem set, a problem knowledge graph is constructed, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represents the connection relation of the two corresponding problem nodes.

The Memory 320 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

Processor 340 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. The general purpose processor may be a microprocessor or the processor 340 may be any conventional processor or the like.

The electronic device 300 includes, but is not limited to, a personal computer, a server, and the like.

The embodiments of the present application also provide a non-volatile computer readable storage medium (hereinafter referred to as a storage medium) on which a computer program is stored, where the computer program, when executed by a computer such as the electronic device 300 described above, performs the problem selection method and/or the knowledge graph construction method described above.

Wherein, the storage medium comprises: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A method of question selection, comprising:

acquiring a pre-established problem knowledge graph, wherein the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represent the connection relationship of the two corresponding problem nodes;

determining a first question from the question knowledge graph;

determining target questions related to the semantic similarity of the first questions from the question knowledge graph;

Outputting the first question and the target question;

wherein determining, from the problem knowledge graph, a target problem related to semantic similarity of the first problem, includes:

determining a second problem with the semantic similarity to the first problem larger than a first preset threshold value from the problem knowledge graph, deleting the first problem from the problem knowledge graph, and obtaining a second problem knowledge graph;

determining a third problem with the semantic similarity to the second problem larger than a second preset threshold value from the second problem knowledge graph, deleting the second problem from the second problem knowledge graph, and obtaining a third problem knowledge graph until a preset stopping condition is met; the target problem includes the second problem and the third problem.

2. The method of claim 1, wherein determining a target question from the question knowledge graph that relates to semantic similarity of the first question comprises:

and determining a target problem with the semantic similarity with the first problem greater than a preset threshold value from the problem knowledge graph.

3. The method of claim 1, wherein the obtaining a pre-established knowledge-graph of the problem comprises:

Acquiring a problem set, wherein the problem set comprises at least two problems;

obtaining a knowledge graph vector of each problem in the problem set based on a pre-trained BERT model;

based on the knowledge graph vector of each problem, obtaining the semantic similarity of any two problems in the problem set;

and constructing the problem knowledge graph based on the problem set and the semantic similarity of any two problems in the problem set.

4. A method according to claim 3, wherein the deriving a knowledge-graph vector for each question in the set of questions based on the pre-trained BERT model comprises:

acquiring all entities included in each question in the question set;

aiming at each problem, obtaining a semantic vector of each entity in the problem based on the pre-trained BERT model and all entities included in the problem;

and obtaining the knowledge graph vector of the problem based on the semantic vectors of all the entities in the problem and a preset rule.

5. A method according to claim 3, wherein the process of training the BERT model comprises:

acquiring a training problem set, wherein the entity of each problem in the training problem set is marked with a real named entity label;

Inputting the training problem set into a BERT pre-training model to obtain training semantic vectors of words included in each problem in the training problem set;

inputting the obtained training semantic vector into a classification model to obtain a predicted named entity label of each entity;

and updating parameters of the BERT pre-training model and the classification model based on the real named entity label, the predicted named entity label and a back propagation algorithm until preset conditions are met, so as to obtain a trained BERT model.

6. The method of claim 4, wherein obtaining the entity that each question in the set of questions comprises:

inputting the question set into the BERT model to obtain semantic vectors of all words included in each question in the question set;

inputting the obtained semantic vector into a trained classification model to obtain a named entity label of an entity included in each problem;

and obtaining the entity included in each question according to the named entity label of the entity included in the question.

7. A question selecting apparatus, comprising:

the system comprises an acquisition module, a judgment module and a judgment module, wherein the acquisition module is used for acquiring a pre-established problem knowledge graph, the problem knowledge graph comprises a problem set and semantic similarity of any two problems in the problem set, wherein the problems in the problem set represent problem nodes in the problem knowledge graph, and the semantic similarity of any two problems represent the connection relation of the two corresponding problem nodes;

The determining module is used for determining a first problem from the problem knowledge graph;

the determining module is further configured to determine, from the problem knowledge graph, a target problem related to semantic similarity of the first problem;

the output module is used for outputting the first problem and the target problem;

the determining module is specifically configured to determine a second problem with a semantic similarity with the first problem being greater than a first preset threshold from the problem knowledge graph, and delete the first problem from the problem knowledge graph to obtain a second problem knowledge graph; determining a third problem with the semantic similarity to the second problem larger than a second preset threshold value from the second problem knowledge graph, deleting the second problem from the second problem knowledge graph, and obtaining a third problem knowledge graph until a preset stopping condition is met; the target problem includes the second problem and the third problem.

8. An electronic device, comprising: the device comprises a memory and a processor, wherein the memory is connected with the processor;

the memory is used for storing programs;

the processor is configured to invoke a program stored in the memory to perform the method of any of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which computer program, when being run by a computer, performs the method according to any of claims 1-6.