CN117371523A - Education knowledge graph construction method and system based on man-machine hybrid enhancement - Google Patents

Education knowledge graph construction method and system based on man-machine hybrid enhancement Download PDF

Info

Publication number
CN117371523A
CN117371523A CN202311388715.4A CN202311388715A CN117371523A CN 117371523 A CN117371523 A CN 117371523A CN 202311388715 A CN202311388715 A CN 202311388715A CN 117371523 A CN117371523 A CN 117371523A
Authority
CN
China
Prior art keywords
model
knowledge
layer
education
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311388715.4A
Other languages
Chinese (zh)
Inventor
蔡林沁
王灏澜
蔡志伟
任波
唐晓铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202311388715.4A priority Critical patent/CN117371523A/en
Publication of CN117371523A publication Critical patent/CN117371523A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a method and a system for building an education knowledge graph based on man-machine hybrid enhancement, and belongs to the technical field of knowledge graphs. The method designs a multi-level education knowledge graph body model based on output; the deep learning algorithm is designed to comprise an educational entity recognition model integrating a domain dictionary, a two-way long-short-term memory network and a conditional random field, and an educational entity relation extraction model integrating a full word mask Chinese pre-training model, a two-way gating network and an attention mechanism in a deep manner, so that the efficient automatic extraction of named entities and relations is realized. The system automatically completes education entity identification, relation extraction and map construction through a computer integrated entity identification and relation extraction model; the education specialist further adjusts and optimizes the education entity, the relation and the map by using the domain knowledge and experience; through repeated man-machine interaction process, man-machine mixing enhanced intelligence is realized, and a high-performance education knowledge graph is constructed.

Description

Education knowledge graph construction method and system based on man-machine hybrid enhancement
Technical Field
The invention belongs to the technical field of knowledge graphs, and relates to an automatic discipline education knowledge graph construction method and system based on man-machine hybrid enhancement.
Background
At present, the method for constructing the educational knowledge graph focuses on describing the semantic relationship among the knowledge, but the mapping from educational resources to comprehensive capacity quality cannot be described, and the learning service which truly meets the talent culture requirement is difficult to be provided effectively. In fact, there is an inherent logical structure between knowledge and teaching resources, courses, competence. Therefore, how to mine the hierarchical mapping relation between the knowledge and each element and construct a multi-level education knowledge graph is a key problem to be solved.
In the knowledge extraction task in the education field, the problems of multiple special vocabularies, entity nesting and word boundary blurring exist in the text in the education field, so that the entity in the specific education field cannot be accurately identified, and more related text semantic features are required to be considered for extracting the relation between the entities. The knowledge extraction task in the field of automated discipline education also lacks a specialized data set, and the prior art cannot accurately extract educational knowledge entities and relationships of the automated discipline.
Methods for educational knowledge graph construction include manual construction and automated construction. The manual construction method generally needs to manually extract or purely manually operate the discipline knowledge in the education field according to rules, and has low efficiency; the automatic construction method reduces the complicated operation, but cannot ensure the quality and accuracy of the constructed educational knowledge graph. Therefore, a new educational knowledge graph construction method and system for efficiently constructing an educational knowledge graph of an automation discipline is needed.
Disclosure of Invention
In view of the above, the invention aims to provide an automatic discipline education knowledge graph construction method and system based on man-machine hybrid enhancement, which solve the problems that the current education knowledge graph construction method is single in knowledge content, low in accuracy of entity relation extraction and incapable of being combined with a manual and automatic construction method. Aiming at the automation discipline, the invention designs a multi-level education knowledge graph body based on Output (OBE). And a Edu-BBiLC model and a Edu-BBiGA model are designed, deep semantic features in an automation discipline text are fully considered, and educational knowledge entities and relations are efficiently and accurately extracted. According to the human-computer hybrid enhancement-based automatic discipline education knowledge graph construction system, education entity identification, relationship extraction and graph construction are automatically completed through a computer integrated naming entity model and a relationship extraction model; the education specialist further adjusts and optimizes the education entity, the relation and the map by using the domain knowledge and experience; through repeated man-machine interaction process, man-machine mixing enhancement intelligence is realized, and a high-performance automatic chemistry science education knowledge graph is constructed.
In order to achieve the above purpose, the present invention provides the following technical solutions:
scheme 1:
an automatic discipline education knowledge graph construction method based on man-machine mixing enhancement is used for forming a discipline system for culturing professional talents by connecting knowledge, courses and comprehensive ability quality in education resources based on a multi-level knowledge graph body of an Output (OBE). On the basis of constructing an automatic chemical department named entity recognition and relationship extraction dataset, semantic information in education field texts is fully utilized, a named entity recognition model of a fusion field dictionary and a relationship extraction model of a whole word mask mechanism and an attention mechanism are designed, and education knowledge entities and relationships of the automatic chemical department are accurately extracted, so that efficient construction of education knowledge graphs is realized.
Finally, the automatic discipline education knowledge graph construction system based on man-machine hybrid enhancement provided by the invention adjusts, supplements and optimizes the entity relationship automatically extracted from the text through hybrid enhancement of field expert knowledge and machine intelligence, thereby enhancing the authority and accuracy of automatic chemical discipline education knowledge graph construction.
The method specifically comprises the following steps:
s1: designing and constructing a multi-level education knowledge graph body based on yield (OBE) according to the field of automatic chemistry department;
s2: constructing a knowledge extraction data set in the field of automation disciplines, wherein the knowledge extraction data set comprises a named entity identification data set and a relation extraction data set, and the two data sets comprise a training data set, a verification data set and a test data set;
s3: an educational entity recognition model (Edu-BBiLC) is built for realizing automatic recognition tasks of named entities, and the Edu-BBiLC model is deeply fused with a domain dictionary, a two-way long and short term memory network (BiLSTM) and a Conditional Random Field (CRF), so that the extraction accuracy of specific educational entities in the educational domain is improved; training the Edu-BBiLC model by utilizing the training data set in the named entity identification data set constructed in the step S2;
s4: executing an early stop strategy by using the verification data set in the named entity identification data set constructed in the step S2, and selecting a Edu-BBiLC model with optimal performance; testing the optimal Edu-BBiLC training model by using a test data set in the named entity identification data set to obtain an optimal named entity identification model;
s5: building an educational entity relation extraction model (Edu-BBiGA), and deeply fusing a full word mask Chinese pre-training model (BERT-wwm), a bi-directional gating network (BiGRU) and an Attention mechanism (Attention), wherein the educational entity relation extraction model is used for automatically completing an educational entity relation extraction task, and training the Edu-BBiGA model by utilizing a training data set in a relation extraction data set constructed in the step S2;
s6: executing an early stop strategy by using the verification data set in the relation extraction data set constructed in the step S2, and selecting a Edu-BBiGA model with optimal performance; and testing the optimal Edu-BBiGA training model by using a test data set in the relation extraction data set to obtain an optimal relation extraction model.
S7: invoking the optimal named entity recognition model and the relation extraction model in the steps S4 and S6, wherein the optimal named entity recognition model and the relation extraction model are used for processing the input education field text, and education field experts further adjust and optimize education entities, relations and patterns output by the model; realize man-machine hybrid enhancement intelligence, accomplish the construction of high performance education knowledge graph, concretely include the following step:
s71: and deploying the optimal named entity recognition model (namely Edu-BBiLC model) and the optimal relation extraction model (namely Edu-BBiGA model) encapsulation interface trained in the steps S4 and S6 into a system client.
S72: invoking an interface in the step S71, wherein the interface is used for processing the text input of the education field, and the education field expert further adjusts and optimizes education entities, relations and patterns output by the model; the man-machine hybrid enhanced intelligence is realized, and the construction of the high-performance education knowledge graph is completed.
Further, in step S1, the multi-level education knowledge graph body based on the output includes four levels, which are respectively a teaching resource layer, a knowledge system layer, a course system layer and an ability quality layer. The teaching resource layer comprises courseware, teaching plan, course outline, teaching audio and video and network resources; the knowledge system layer comprises knowledge points, knowledge units, knowledge fields, relationships and attributes thereof, wherein typical relationships comprise containing relationships, first repair relationships, brother relationships, identical relationships, supporting relationships and the like, and typical attributes comprise knowledge point importance, knowledge point difficulty, knowledge concepts, supporting strength and the like; the course system layer comprises courses, course targets, relations and attributes, wherein typical relations comprise containing relations, first-repair relations, supporting relations and the like, and typical attributes comprise course descriptions and the like; the competence layer contains graduation requirements, competence index points and relations and attributes thereof, wherein typical relations have containing relations, and typical attributes comprise index point descriptions and the like.
Further, in step S2, a knowledge extraction dataset of the automation discipline domain is constructed, specifically including: and extracting knowledge comprising entities and relations from the automatic discipline by a manual semiautomatic labeling method, then carrying out labeling work of the data set by referring to requirements of a mode layer, and dividing the data set into a training set, a verification set and a test set according to the proportion, thereby completing the construction of the named entity identification data set and the relation extraction data set.
Further, in step S3, the built Edu-BBiLC model includes an educational text input layer, a character representation layer, a BiLSTM encoding layer, a CRF decoding layer, and an entity output layer, and specifically includes:
(1) In the educational text input layer, characters in the educational field Chinese text are firstly converted into feature vectors with semantic information, namely, the input Chinese text is expressed as a character vector sequence through character embedding based on BERT;
(2) Matching the character vector sequence with vocabulary in the domain dictionary, integrating word characteristics in the domain dictionary, and connecting the word characteristics with the character vector sequence to obtain character vector sequence representation with enhanced word characteristics, wherein the method comprises the following steps of: firstly, classifying words matched from a domain dictionary by setting a word label set; secondly, compressing word label sets of each character into vectors with fixed dimensions by a word frequency weighting method; finally, the representation of the word label set is connected into a feature with fixed dimension, and the feature is added into the representation of each corresponding character corresponding to the character representation layer;
(3) The education field text is embedded in word vectors of the character representation layer through a Edu-BERT model and then is divided into a mark sequence to be input into a BiLSTM coding layer and a CRF decoding layer; the BiLSTM extracts semantic features from the context in the forward direction and the backward direction, and the CRF layer acquires the optimal prediction sequence through learning relation constraints among labels; wherein, the Edu-BERT model represents a Chinese pre-training model of a fusion domain dictionary, and CRF represents a conditional random field;
(4) After the input feature vector sequence is subjected to model coding and decoding processing, the prediction label of each character in the sequence is finally output; these labels are intended to represent the identified educational field entities and their semantic types.
Further, in step S5, the built Edu-BBiGA model includes an educational text input layer, an embedded layer, a biglu coding layer, an Attention layer, and a relationship output layer, and specifically includes:
(1) Introducing a full word mask mechanism into an embedding layer in a pre-training stage of Chinese text input in the education field, so that a model can learn word semantic information in the pre-training process, and converting text into a character vector sequence with word information;
(2) The BiGRU coding layer extracts deep features containing forward and backward information from the sequence through a bidirectional gating cyclic unit network, and after feature extraction, attention mechanisms are introduced into the Attention layer, different weights are distributed to different feature vectors, so that feature fusion is carried out on the vector sequence, and semantic information of strong related words is better captured;
(3) And outputting the relation prediction labels output by the relation output layer through normalization processing, wherein the relation prediction labels correspond to the relation categories of the educational knowledge entities one by one.
Further, in step S4 or S6, an Adam optimizer is used to optimally train the Edu-BBiLC model or the Edu-BBiGA model.
Further, in step S71, the trained optimal model is encapsulated into an API service, and the service is built using the flash back-end framework to provide an interface; the service is deployed into a server.
Scheme 2:
an automatic discipline education knowledge graph construction system based on man-machine hybrid enhancement comprises a text input module, a knowledge extraction module, a hybrid enhancement module and a data storage module;
the text input module is used for inputting unstructured text in the automation discipline;
the knowledge extraction module is used for calling an optimal named entity recognition model and an optimal relation extraction model to automatically complete extraction of the entity and the relation; the named entity recognition model and the relation extraction model adopt a Edu-BBiLC model and a Edu-BBiGA model according to any one of claims 1 to 6 respectively;
the mixed enhancement module refers to that the expert in the teaching field adjusts and optimizes the extracted educational entity and relationship further in a man-machine interaction mode;
the data storage module is used for storing the entity relation triples finally determined after optimization and adjustment into the graph database.
Further, the system adopts a front-end and back-end separated architecture; the front end uses the Vue framework to implement the user interface and interaction logic; the backend uses a SpringBoot framework to implement backend services and API interfaces.
The invention has the beneficial effects that:
(1) Compared with the traditional education knowledge graph body, the model not only describes the semantic relationship among knowledge, but also describes the mapping from education resources to comprehensive capacity quality, thereby effectively providing learning services which truly meet the needs of talent culture.
(2) In knowledge extraction, aiming at the problem that specific entities cannot be identified due to multiple special vocabularies, entity nesting and word boundary ambiguity in the education field, the Edu-BBiLC model provided by the invention can well solve the problem by introducing more word features by combining a field dictionary. In order to extract the relation between entities in the education field more rapidly and accurately, the invention adopts a Edu-BBiGA model, and the efficiency of relation extraction can be effectively improved by introducing a full word mask mechanism and combining a bidirectional gating circulation network with an attention mechanism to effectively capture the related information between words.
(3) According to the method and the system for constructing the automated discipline education knowledge graph based on man-machine hybrid enhancement, which are provided by the invention, the entity and the relationship can be adjusted through manual operation of experts in the education field on the basis of knowledge extraction, so that the whole education knowledge network is optimized. Compared with the traditional purely manual construction method and the purely automatic construction method, the method and the system provided by the invention can be more accurate and convenient, so that the efficiency of constructing the knowledge graph of the automated discipline education is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.
Drawings
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:
FIG. 1 is a schematic flow chart of an automated discipline education knowledge graph construction method based on man-machine hybrid enhancement of the present invention;
FIG. 2 is a diagram of a multi-level educational knowledge graph ontology model based on yield (OBE);
FIG. 3 is a diagram of a named entity recognition model (Edu-BBiLC) in knowledge extraction;
FIG. 4 is a diagram of the relational extraction model (Edu-BBiGA) in knowledge extraction;
FIG. 5 is a block diagram of an automated chemistry teaching knowledge graph construction system based on man-machine hybrid enhancement.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.
Referring to fig. 1 to 5, the invention provides an automated discipline education knowledge graph construction method based on man-machine hybrid enhancement, which specifically comprises the following steps:
step one: the method comprises the steps of obtaining corpus from a knowledge source, constructing an automated discipline education knowledge graph body based on Output (OBE), and specifically comprising abstracting concepts and relations from course outline, teaching materials and teaching plan, and obtaining a multi-level education knowledge graph body model serving as a model layer of the knowledge graph by taking the cultivation of comprehensive ability quality of students as a target through the thought based on output, wherein the model layer plays a role in restraining and standardizing a data layer.
In the embodiment of the invention, aiming at culturing the capability of students to solve complex engineering, a multi-level education knowledge graph ontology model facing to output is constructed based on an OBE method, and the specific ontology model is shown in figure 2 and comprises four layers of teaching resources, a knowledge system, a course system and capability quality from bottom to top.
The teaching resource layer is the basis of other layers. The teaching resource layer contains various educational resources involved in the teaching process, such as courseware, teaching plan, course outline and the like, and network resources (such as resources related to the automated chemistry department in hundred degrees encyclopedia and wiki), which are sources and bases of knowledge.
The knowledge hierarchy layer can normalize the knowledge hierarchy. The knowledge system layer is mainly divided into three levels from the knowledge content aspect, and comprises knowledge points, knowledge units, knowledge fields, relationships and attributes thereof, wherein typical relationships comprise relationships, first repair relationships, brother relationships, the same relationships, supporting relationships and the like, and typical attributes comprise knowledge point importance, knowledge point difficulty, knowledge concepts, supporting strength and the like.
The course system layer mainly realizes the connection of knowledge, courses and targets. In the course system layer, courses, course targets and relationships and attributes thereof are contained, and typical relationships include containing relationships, first-repair relationships, supporting relationships and the like, and typical attributes include course descriptions and the like.
The competence quality layer is based on a culture scheme of disciplines from the standpoint of the culture of competence quality of disciplines, and comprehensively evaluates actual skills and competence obtained by students in the learning process by defining competence index points and competence index points. The competence layer contains graduation requirements, competence index points and relations and attributes thereof, wherein typical relations have containing relations, and typical attributes comprise index point descriptions and the like.
Step two: and carrying out knowledge extraction tasks under the constraint of a mode layer. In the embodiment of the invention, a data set for automatic chemical department knowledge extraction needs to be constructed first, wherein the data set comprises a named entity identification data set and a relation extraction data set of an automatic chemical department. Knowledge (including entities and relationships) is extracted semi-manually from part of courses of an automation discipline such as automatic control principles, modern control theory, computer control technology and the like, and then the labeling work of a data set is carried out with reference to requirements of a pattern layer, and the data set is divided into a training set, a verification set and a test set according to proper proportion.
(1) Identifying a dataset for named entities of educational concept identification; firstly, creating an entity annotation set according to entity annotation standards provided by field experts; entity tags fall into three categories: "kno" represents knowledge points, "edu" represents teaching requirements, and "cou" represents course goals. Secondly, semi-manual labeling is carried out on unstructured texts in the language library according to the entities in the collection, and the entity labeling rule is BMES labeling. Finally, after the data set is marked, the data set is divided into a training set, a verification set and a test set according to the proportion of 6:2:2.
(2) For the relationship classification task of educational relationship extraction, the present embodiment constructs a relationship extraction dataset. Based on the determined entities, the relationships between the entities are classified into six types: including relationships, sibling relationships, repair-first relationships, methods of use, supporting relationships, and identical relationships. In the embodiment, a semi-manual labeling method is adopted to process the text in the corpus. After careful examination and comprehensive evaluation by field experts, text data containing relationship labels is obtained. In each piece of data, the text preceding the "$" tag represents an entity whose position in the text sentence is represented by the "#" tag, while the beginning number tag of each piece of data represents the corresponding relationship type. Finally, the data is divided into a training set, a verification set and a test set according to the proportion of 8:1:1.
Step three: and building an educational entity identification model.
The educational entity recognition model in this embodiment adopts Edu-BBiGA model, whose architecture is shown in fig. 3, which includes five layers including an educational text input layer, a character representation layer, a BiLSTM encoding layer, a CRF decoding layer, and an entity output layer.
The method comprises the following steps:
(1) In the input layer, characters in Chinese text need to be converted into feature vectors with semantic information. Let the input text be the character sequence s= { c 1 ,c 2 ,...,c n Each character c i Represented as vectors by embeddingWherein e B c Represents BERT-based character embedding. Thereby obtaining a character vector sequence
(2) In the embodiment of the invention, word characteristics in the domain dictionary are integrated and connected with the character vector sequence to obtain character vector sequence representation with enhanced word characteristics.
First, words that match from the domain dictionary are classified. In the embodiment, four word tag sets "BMES" are provided for dividing each character c i All matching words in the domain dictionary. For each character c in the input text sequence S i Set of tags:
wherein L is d Representing all words in the domain dictionary, w representing the word to be matched, c i Representing to be matchedThe word i indicates where the character is located in the word, p indicates a position that is not first but before i, and q indicates a position that is not last but after i. If the character does not have a matching word in the word tag set, then "NULL" is added to the blank word tag set to represent.
Secondly, compressing the word label set of each character into a vector with fixed dimension by a word frequency weighting method. Let f (w) denote the occurrence frequency of word w in the statistical data, and we can obtain word tag set S L Is a word-frequency weighted representation of:
wherein,represents the BERT-based word embedding. The word statistical data set used above is derived from data in the task training set.
Finally, the representations of the four word label sets are connected into a fixed-dimension feature X w And added to the representation of each character corresponding to the character representation layer. X is X w =[v s (B);v s (M);v s (E);v s (S)];x c ′=[x c ;X w ]Wherein v is s Is the word frequency weighted representation function.
(3) The education field text is embedded into word vectors at the character representation layer through a Edu-BERT model, divided into tag sequences, and input into the BiLSTM coding layer and the CRF decoding layer. Let its input sequence be x= { X 1 ,x 2 ,x 3 ,...,x n For one of which is input x i ,x i Previous contextual feature vectorsCalculated from forward LSTM, and x i Context feature vector +.>Is obtained by backward LSTM calculation, and inputs x into the forward LSTM calculation and the backward LSTM calculation i The obtained feature vectors are integrated intoFeature vector h i After being trained by BiLSTM network, the labeling predictive probability matrix P in the sequence is obtained i . CRF layer is composed of P i Calculating a state transition matrix A between the front label and the rear label i . After model processing, the final sequence labeling prediction result is Y= { Y 1 ,y 2 ,y 3 ,...,y n }. The probability of labeling the predicted result for each word is: />Y x Is all labels in the input sequence that x may correspond to, is +.>Is the correct label therein; s (X, Y) represents the predictive probability of each annotation.
(4) At the final decoding, the prediction probability y is selected * The maximum optimal sequence label is used as a final label sequence prediction result, and the optimal sequence label is calculated as follows:and finally outputting the prediction label of each character in the sequence after the input feature vector sequence is processed by the model. These labels are intended to represent the identified entities and their semantic types. In this embodiment, different labels are used for three different types of entities: "kno" represents knowledge points, "edu" represents teaching requirements, "cou" represents course goals; the entity labeling method uses a BMES mode for labeling.
Step four: and building an education relation extraction model.
The educational relation extraction model in this embodiment adopts a Edu-BBiGA model, and the architecture of the model is shown in fig. 4, and the model comprises an educational text input layer, an embedding layer, a biglu coding layer, an Attention layer and a relation output layer. The method comprises the following steps:
(1) For a character sequence s= { c of an input text 1 ,c 2 ,...,c n ' wherein character c i Represented as vectors by embeddingAfter that, the character vector sequence +.>When training samples are generated in the pre-training stage, if a word is masked, the whole word masking mechanism masks all Chinese characters that make up the same word.
(2) In this embodiment, the BiGRU coding layer is simplified and represented as h jit =bigru(c jit ) The method comprises the steps of carrying out a first treatment on the surface of the At the t-th moment, the word vector of the ith character of the jth sentence input is denoted as c jit Feature vector h obtained after feature extraction through BiGRU coding layer jit
(3) Since the semantic connection of relationships between educational knowledge entities is critical to the classification of relationships. Therefore, in order to highlight the importance of different words to the overall relationship classification task, the present embodiment introduces a mechanism of Attention at the Attention layer. And by allocating different weights to different feature vectors, feature fusion is carried out on the vector sequences, so that more key semantic information is obtained. The specific calculation formula of the weight coefficient of the attention mechanism is as follows:
u jit =tanh(w w h jit +b w )
wherein u is jit Representing the attention matrix, alpha, after transformation by linear layers ijt Representing the attention coefficient, s jit Representing the weighted eigenvector, w w Represents the weight coefficient, h jit Representing feature vectors obtained by BiGRU layer feature extraction, b w Represents the bias coefficient, u w Representing a randomly initialized attention matrix.
(4) Finally, carrying out normalization processing on the obtained product and outputting a prediction result: y is j =softmax(w k s jit +b k ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein w is k Representing the matrix of weight coefficients to be trained from the Attention layer to the output layer, b k Representing the corresponding bias to be trained. y is j The j-th sentence corresponds to the output relation prediction labels, and the labels correspond to the relation categories among the educational knowledge entities one by one.
Step five: and respectively sending the named entity identification data set and the relation extraction data set into a knowledge extraction framework for model training, and finally obtaining an optimal training model through selection of a verification set and evaluation of a test set. The evaluation indexes of the model are four commonly used evaluation indexes, namely accuracy, precision, recall rate and F1 fraction.
Training of the Edu-BBiLC model is achieved on an knowledge extraction framework, namely a deep learning framework, and specifically comprises the following steps: the total number of iterations of training is set to 30, the maximum length of the text input sequence is set to 150, and the hidden layer dimension of the LSTM network in the model is set to 128 dimensions. The training and validation batch size is set to 32. In the next experiment, the learning rate was set to 3e-5 and the discard rate was set to 0.3 by using Adam optimizer.
Training of the Edu-BBiGA model is achieved on an knowledge extraction framework, namely a deep learning framework, and specifically comprises the following steps: the total number of iterations of training is set to 30, the maximum length of the text input sequence is set to 120, and the hidden layer dimension of the GRU network in the model is set to 128 dimensions. The training and validation batch size was set to 16. In the next experiment, the learning rate was set to 1e-4 and the discard rate was set to 0.2 by using Adam optimizer.
Step six: in the system for constructing the automated discipline teaching knowledge graph based on man-machine hybrid enhancement, the system adopts a framework with separated front and rear ends, connection and storage of a database, deployment and calling of a knowledge extraction model and realization of manual hybrid enhancement of a user interface. The method comprises the following steps:
(1) Front-end and back-end separation architecture:
the front end uses the Vue framework to implement the user interface and interaction logic.
The backend uses a SpringBoot framework to implement backend services and API interfaces.
(2) Database selection:
the Neo4j graph database is used to store and manage knowledge-graph data.
(3) Knowledge extraction module:
the integrated, already trained deep learning model includes a Named Entity Recognition (NER) model and a Relationship Extraction (RE) model.
When the user inputs unstructured text in the text input module, the optimal NER model and RE model are invoked to extract entity and relationship information.
And outputting the extracted entity and relation as a triplet.
(4) Mixing enhancement module:
interfaces and operations are provided for domain experts to manually adjust and optimize entity relationship triples.
Triples may be added, deleted, or edited to improve the accuracy and integrity of the data.
(5) And a data storage module:
and storing the finally determined entity relation triples in a Neo4j graph database.
Knowledge-graph data is stored and queried by the backend in combination with the use of Neo4 j's graph structure and query language.
The system can realize the construction and updating of the knowledge graph of the automatic chemistry science education. When the user inputs unstructured text, the system automatically extracts entities and relations in the unstructured text, and optimizes and perfects the unstructured text by combining manual operation. And finally, storing the extracted and completed entity relation triples in a Neo4j graph database for subsequent knowledge graph query and application.
According to the automatic discipline education knowledge graph construction system based on man-machine hybrid enhancement, which is designed by the invention, the entity relationship automatically extracted from the text is adjusted, supplemented and optimized through man-machine cooperation, so that the hybrid enhancement of the field expert knowledge and the machine intelligence is realized, and the authority and the accuracy of the education knowledge graph construction are further improved.
Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims (8)

1. The method for constructing the education knowledge graph based on man-machine hybrid enhancement is characterized by comprising the following steps of:
s1: designing and constructing a multi-level education knowledge graph body based on output according to the field of automatic chemistry department;
s2: constructing a knowledge extraction data set in the field of automation disciplines, wherein the knowledge extraction data set comprises a named entity identification data set and a relation extraction data set, and the two data sets comprise a training data set, a verification data set and a test data set;
s3: constructing an education entity recognition model Edu-BBiLC for realizing a named entity automatic recognition task, wherein the Edu-BBiLC model deeply fuses a domain dictionary, a two-way long and short term memory network BiLSTM and a conditional random field CRF, and improves the extraction accuracy of specific education entities in the education domain; training the Edu-BBiLC model by utilizing the training data set in the named entity identification data set constructed in the step S2;
s4: executing an early stop strategy by using the verification data set in the named entity identification data set constructed in the step S2, and selecting a Edu-BBiLC model with optimal performance; testing the optimal Edu-BbilC training model by using a test data set in the named entity identification data set to obtain an optimal named entity identification model;
s5: building an educational entity relation extraction model Edu-BBiGA, and deeply fusing a full-word mask Chinese pre-training model BERT-wwm, a bi-directional gating network BiGRU and an Attention mechanism Attention, wherein the training data set in the relation extraction data set constructed in the step S2 is used for training the Edu-BBiGA model;
s6: executing an early stop strategy by using the verification data set in the relation extraction data set constructed in the step S2, and selecting a Edu-BBiGA model with optimal performance; testing the optimal Edu-BBiGA training model by using a test data set in the relation extraction data set to obtain an optimal relation extraction model;
s7: invoking the optimal named entity recognition model and the relation extraction model in the steps S4 and S6, wherein the optimal named entity recognition model and the relation extraction model are used for processing the input education field text, and education field experts further adjust and optimize education entities, relations and patterns output by the model; the man-machine hybrid enhanced intelligence is realized, and the construction of the high-performance education knowledge graph is completed.
2. The method for constructing an educational knowledge graph based on man-machine hybrid enhancement according to claim 1, wherein in step S1, the multi-level educational knowledge graph body based on output comprises four levels, namely a teaching resource layer, a knowledge system layer, a course system layer and a competence layer; the teaching resource layer comprises courseware, teaching plan, course outline, teaching plan, teaching audio and video and network resources; the knowledge system layer comprises knowledge points, knowledge units, knowledge fields, relationships and attributes thereof, wherein typical relationships comprise a containing relationship, a first repair relationship, a brother relationship, the same relationship and a supporting relationship, and typical attributes comprise knowledge point importance, knowledge point difficulty, knowledge concepts and supporting strength; the course system layer comprises courses, course targets, relations and attributes thereof, wherein typical relations comprise a containing relation, a first-repair relation and a supporting relation, and typical attributes comprise course descriptions; the competence layer comprises graduation requirements, capability index points, competence index points, relations and attributes thereof, wherein typical relations have inclusion relations, and typical attributes comprise index point descriptions.
3. The method for constructing an educational knowledge graph based on man-machine hybrid enhancement according to claim 1, wherein in step S2, a knowledge extraction dataset of an automation discipline field is constructed, specifically comprising: and extracting knowledge comprising entities and relations from the automatic discipline by a manual semiautomatic labeling method, then carrying out labeling work of the data set by referring to requirements of a mode layer, and dividing the data set into a training set, a verification set and a test set according to the proportion, thereby completing the construction of the named entity identification data set and the relation extraction data set.
4. The method for constructing an educational knowledge graph based on man-machine hybrid enhancement according to claim 1, wherein in step S3, the constructed Edu-BBiLC model comprises an educational text input layer, a character representation layer, a BiLSTM encoding layer, a CRF decoding layer and an entity output layer, and specifically comprises:
(1) In the educational text input layer, characters in the educational field Chinese text are firstly converted into feature vectors with semantic information, namely, the input Chinese text is expressed as a character vector sequence through character embedding based on BERT;
(2) Matching the character vector sequence with vocabulary in the domain dictionary, integrating word characteristics in the domain dictionary, and connecting the word characteristics with the character vector sequence to obtain character vector sequence representation with enhanced word characteristics, wherein the method comprises the following steps of: firstly, classifying words matched from a domain dictionary by setting a word label set; secondly, compressing word label sets of each character into vectors with fixed dimensions by a word frequency weighting method; finally, the representation of the word label set is connected into a feature with fixed dimension, and the feature is added into the representation of each corresponding character corresponding to the character representation layer;
(3) The education field text is embedded in word vectors of the character representation layer through a Edu-BERT model and then is divided into a mark sequence to be input into a BiLSTM coding layer and a CRF decoding layer; the BiLSTM extracts semantic features from the context in the forward direction and the backward direction, and the CRF layer acquires the optimal prediction sequence through learning relation constraints among labels; wherein, the Edu-BERT model represents a Chinese pre-training model of a fusion domain dictionary, and CRF represents a conditional random field;
(4) After the input feature vector sequence is subjected to model coding and decoding processing, the prediction label of each character in the sequence is finally output; these labels are intended to represent the identified educational field entities and their semantic types.
5. The method for constructing the education knowledge map based on man-machine hybrid enhancement according to claim 1, wherein in step S5, the constructed Edu-BBiGA model comprises an education text input layer, an embedding layer, a biglu coding layer, an Attention layer and a relation output layer, and specifically comprises:
(1) Introducing a full word mask mechanism into an embedding layer in a pre-training stage of Chinese text input in the education field, so that a model can learn word semantic information in the pre-training process, and converting text into a character vector sequence with word information;
(2) The BiGRU coding layer extracts deep features containing forward and backward information from the sequence through a bidirectional gating cyclic unit network, and after feature extraction, attention mechanisms are introduced into the Attention layer, different weights are distributed to different feature vectors, so that feature fusion is carried out on the vector sequence, and semantic information of strong related words is better captured;
(3) And outputting the relation prediction labels output by the relation output layer through normalization processing, wherein the relation prediction labels correspond to the relation categories of the educational knowledge entities one by one.
6. The method for constructing an educational knowledge graph based on man-machine hybrid enhancement according to claim 1, wherein in step S4 or S6, an Adam optimizer is used to optimally train Edu-BBiLC model or Edu-BBiGA model.
7. The education knowledge graph construction system based on man-machine hybrid enhancement is characterized by comprising a text input module, a knowledge extraction module, a hybrid enhancement module and a data storage module;
the text input module is used for inputting unstructured text in the subjects in the field of automatic chemistry science and education;
the knowledge extraction module is used for calling an optimal named entity recognition model and an optimal relation extraction model to automatically complete extraction of the entity and the relation; the named entity recognition model and the relation extraction model adopt a Edu-BBiLC model and a Edu-BBiGA model according to any one of claims 1 to 6 respectively;
the mixed enhancement module refers to that the expert in the teaching field adjusts and optimizes the extracted educational entity and relationship further in a man-machine interaction mode;
the data storage module is used for storing the entity relation triples finally determined after optimization and adjustment into the graph database.
8. The system for constructing the education knowledge map based on man-machine hybrid enhancement according to claim 7, wherein the system adopts a front-rear end separated architecture; the front end uses the Vue framework to implement the user interface and interaction logic; the backend uses a SpringBoot framework to implement backend services and API interfaces.
CN202311388715.4A 2023-10-24 2023-10-24 Education knowledge graph construction method and system based on man-machine hybrid enhancement Pending CN117371523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311388715.4A CN117371523A (en) 2023-10-24 2023-10-24 Education knowledge graph construction method and system based on man-machine hybrid enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311388715.4A CN117371523A (en) 2023-10-24 2023-10-24 Education knowledge graph construction method and system based on man-machine hybrid enhancement

Publications (1)

Publication Number Publication Date
CN117371523A true CN117371523A (en) 2024-01-09

Family

ID=89388864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311388715.4A Pending CN117371523A (en) 2023-10-24 2023-10-24 Education knowledge graph construction method and system based on man-machine hybrid enhancement

Country Status (1)

Country Link
CN (1) CN117371523A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808085A (en) * 2024-02-29 2024-04-02 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium
CN117910567A (en) * 2024-03-20 2024-04-19 道普信息技术有限公司 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808085A (en) * 2024-02-29 2024-04-02 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium
CN117808085B (en) * 2024-02-29 2024-05-07 南京师范大学 Automatic discipline knowledge framework construction method, device, equipment and storage medium
CN117910567A (en) * 2024-03-20 2024-04-19 道普信息技术有限公司 Vulnerability knowledge graph construction method based on safety dictionary and deep learning network

Similar Documents

Publication Publication Date Title
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN111046179B (en) Text classification method for open network question in specific field
CN109508459B (en) Method for extracting theme and key information from news
CN111126040B (en) Biomedical named entity recognition method based on depth boundary combination
CN117371523A (en) Education knowledge graph construction method and system based on man-machine hybrid enhancement
CN113806494B (en) Named entity recognition method based on pre-training language model
CN114168709B (en) Text classification method based on lightweight pre-training language model
CN111222318B (en) Trigger word recognition method based on double-channel bidirectional LSTM-CRF network
CN110597961A (en) Text category labeling method and device, electronic equipment and storage medium
CN112733866A (en) Network construction method for improving text description correctness of controllable image
CN114896388A (en) Hierarchical multi-label text classification method based on mixed attention
CN112163089B (en) High-technology text classification method and system integrating named entity recognition
CN112766507B (en) Complex problem knowledge base question-answering method based on embedded and candidate sub-graph pruning
CN116127090B (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN114492441A (en) BilSTM-BiDAF named entity identification method based on machine reading understanding
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN110941958A (en) Text category labeling method and device, electronic equipment and storage medium
CN112597285A (en) Man-machine interaction method and system based on knowledge graph
CN114239612A (en) Multi-modal neural machine translation method, computer equipment and storage medium
Li et al. Using context information to enhance simple question answering
CN117371481A (en) Neural network model retrieval method based on meta learning
CN116595170A (en) Medical text classification method based on soft prompt
CN113342982B (en) Enterprise industry classification method integrating Roberta and external knowledge base
CN115600595A (en) Entity relationship extraction method, system, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination