CN117077786A - Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method - Google Patents
Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method Download PDFInfo
- Publication number
- CN117077786A CN117077786A CN202310829332.XA CN202310829332A CN117077786A CN 117077786 A CN117077786 A CN 117077786A CN 202310829332 A CN202310829332 A CN 202310829332A CN 117077786 A CN117077786 A CN 117077786A
- Authority
- CN
- China
- Prior art keywords
- knowledge
- medical
- entity
- patient
- entities
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 238000011156 evaluation Methods 0.000 claims abstract description 21
- 238000005070 sampling Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 18
- JEIPFZHSYJVQDO-UHFFFAOYSA-N iron(III) oxide Inorganic materials O=[Fe]O[Fe]=O JEIPFZHSYJVQDO-UHFFFAOYSA-N 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 14
- 230000009471 action Effects 0.000 claims description 11
- 230000002787 reinforcement Effects 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 9
- 201000010099 disease Diseases 0.000 claims description 8
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 8
- 239000003814 drug Substances 0.000 claims description 7
- 230000000694 effects Effects 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 6
- 229940079593 drug Drugs 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 230000008901 benefit Effects 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 230000009977 dual effect Effects 0.000 claims 7
- 238000012935 Averaging Methods 0.000 claims 1
- 230000000873 masking effect Effects 0.000 claims 1
- 239000000284 extract Substances 0.000 abstract 1
- 238000004088 simulation Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 239000002245 particle Substances 0.000 description 2
- 208000024891 symptom Diseases 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010339 medical test Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000001356 surgical procedure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0985—Hyperparameter optimisation; Meta-learning; Learning-to-learn
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Human Computer Interaction (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention discloses a knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and a method. The system extracts a medical named entity in the patient problem through a medical named entity identification module, and inputs the medical named entity into a medical knowledge graph matching module to match a knowledge entity so as to acquire professional medical background knowledge; the knowledge entity sampling module selects the most relevant knowledge entity to the question, and reduces the influence of the irrelevant knowledge entity on answer generation; then, inputting the questions and the knowledge entity obtained by sampling into a large language model together for fine tuning training; and finally, outputting the answer through a dialogue generating module. The method presents higher scores on bilingual evaluation criteria (BLEU) and automatic abstract evaluation criteria (ROUGE), and the generated answers are closer to the level of human doctors. The invention obviously improves the practicability of the medical dialogue system.
Description
Technical Field
The invention relates to a data knowledge dual-drive intelligent medical dialogue system and method based on a knowledge graph, and belongs to the technical field of medical care informatics and artificial intelligence intersection.
Background
Large Language Models (LLM) are becoming an important tool in the field of intelligent medicine. These advanced models are trained with powerful computing power and massive data, giving machines the ability to understand and generate human language. In smart medicine, it can help doctors to diagnose and predict diseases, and provide patient consultation and health management services. As the demand for intelligent medical diagnosis continues to grow, large language models are becoming increasingly important in accurately and efficiently diagnosing diseases and reducing human error.
Large language models are built by learning language statistics in large amounts of data, however, in the medical field there are many unique terms, jargon and text formats, which makes the large language model fine-tuned for optimal performance.
Recently, some researchers have used medical corpora to fine tune large language models and adjust their weight parameters to better understand the language of the medical field. The current medical dialogue system is mainly used for optimizing English, and the Chinese question-answering capability is weak. To solve this problem, honglin Xiong, shaping Wang, YItao Zhu et al 'doctorGLM: fine-tuning your Chinese Doctor is not a Herculean Task,' (arXiv preprint arXiv:2305.07340,2023) collects a large number of Chinese medical dialogue datasets with the help of ChatGPT and achieves Fine tuning of the ChatGLM-6B model on a single A10080G GPU in 13 hours. This makes it easier to deploy a large language dialog model in chinese for medical use. The paper "ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge," (Cureus 15 (6): e40895,2023) by Li Y, li Z, zhang K et al uses the collected 700 diseases and their corresponding symptoms and 5000 generated doctor-patient dialogs to achieve medical testing and drug recommendation. In addition, the paper acquires 20 ten thousand real doctor-patient dialogues from an online questionnaire medical consultation website. By fine tuning a large language model (LLaMA) using twenty-five and five thousand doctor-patient dialogues, the model is provided with the ability to understand patient needs, provide intelligent advice, and provide valuable assistance in a variety of medical related fields. The paper "HuaTuo (Hua Tuo): tuning LLaMA Model with Chinese Medical Knowledge," (arXiv preprint arXiv:2304.06975,2023) by Haochun Wang, chi Liu, nuwa Xi et al also utilizes LLaMA as the base model and fine-tunes it on the Chinese medical dialogue dataset using the Instruct-tuning technique. Instruct-tuning is an effective method of fine tuning. The principle is to guide the behavior of the language model by providing explicit instructions or examples and fine-tune it according to specific tasks or fields. To solve the problem of instruction or example singleness, the paper introduces medical knowledge graph to construct eight thousand instructions. In addition to introducing instruction or example sets, chang Shu, baian Chen, fangyu Liu et al 'Visual Med-Alpaca: AParameter-Efficient Biomedical LLM with Visual Capabilities' integrates LLaMA models with medical vision models for multimodal biomedical tasks. The model can efficiently perform various medical tasks by means of several hours of instruction adjustment and pluggable vision modules.
Fine tuning has been demonstrated to significantly improve the performance of large language models on medical tasks. However, existing corpus-based dialogue dataset tuning techniques are not ideal enough and may produce misleading or inaccurate answers due to lack of specialized medical knowledge. In order to better meet the requirements of the medical field, fine tuning and training are required to be performed by combining professional medical knowledge, so that the large language model is ensured to be more accurate and reliable when medical consultation and suggestion are provided.
Disclosure of Invention
Aiming at the defects or shortcomings of the prior art, the invention provides a data knowledge dual-drive intelligent medical dialogue system and a method based on a knowledge graph, which combine the advantages of the knowledge graph, a large language model, deep reinforcement learning and other technologies, solve the problem of insufficient specialized knowledge of the medical dialogue system, and enable answers generated by the large language model to be closer to the level of human doctors. Specifically, higher bilingual evaluation criteria (BLEU) and automatic digest evaluation criteria (ROUGE) scores are achieved.
The technical scheme adopted for solving the technical problems is as follows: a data knowledge dual-drive intelligent medical dialogue system based on a knowledge graph comprises a medical named entity recognition module, a medical knowledge graph matching module, a knowledge entity sampling module, a large language model fine tuning module and a dialogue generation module.
The medical named entity recognition module is used for extracting medical named entities in patient problems, including disease names, body parts, medical procedures, medicines and departments. The module is the basis of a subsequent medical knowledge graph matching module.
The medical knowledge map matching module is used for matching the related medical named entities such as disease names, medical procedures and the like with nodes in the medical knowledge map after the related medical named entities are obtained so as to obtain related professional background knowledge.
The number of knowledge entities obtained by the knowledge graph matching module is often large, and the knowledge entities are irrelevant to the problem raised by the patient. The knowledge entity sampling module samples the knowledge entity obtained by the medical knowledge graph matching module to obtain the knowledge entity most suitable for answering the patient questions.
The large language model fine tuning module inputs the questions raised by the patient and the knowledge entity obtained by the knowledge entity sampling module into the large language model for fine tuning training, so that the large language model can generate answers close to the level of human doctors.
The dialogue generation module is an output module of the system. And fine-tuning the trained model by utilizing a large language model fine-tuning module, and outputting the generated answer aiming at the input patient questions and related knowledge entities.
The invention also provides a method for realizing the data knowledge dual-drive intelligent medical dialogue system based on the knowledge graph, which comprises the following steps:
step 1: data set acquisition and related knowledge graph preparation. A patient-doctor question and answer data set and a medical knowledge graph are acquired.
Step 2: medical named entity identification. The BERT-BILSTM-CRF model is utilized to extract medical named entities in the patient problem.
Step 3: medical knowledge graph matching. And matching the extracted medical named entity with the head entity of the triplet in the medical knowledge graph. And if the matching is successful, taking tail entities corresponding to all the head entities as knowledge entities.
Step 4: knowledge entity sampling. And (3) sampling the knowledge entity matched in the step (3) by using a sampler based on deep reinforcement learning to obtain the knowledge entity most suitable for answering the patient questions.
Step 5: and (5) fine tuning of a large language model. And (3) inputting the knowledge entity obtained by sampling in the step (4) and the original problem of the patient into a large language model for fine tuning training.
Step 6: and generating a question and answer. And (5) generating an answer according to the model parameters obtained by the fine tuning training in the step (5).
Step 7: and testing the model effect. The quality of the model generated answer is verified on the verification set.
The beneficial effects are that:
1. the invention introduces the medical knowledge graph into the fine tuning training of the large language model in the medical dialogue field, and provides the expertise of the medical field for the large language model. Through fine tuning training, the large language model can better understand medical concepts and terms, accuracy and specialty of answer generation are improved, and practicability of a medical dialogue system is improved.
2. The invention designs a sampler based on deep reinforcement learning, which samples the knowledge entity, reduces the influence of irrelevant knowledge entities on the model generation quality, and ensures that the answers generated by the model are more targeted.
3. Compared with a fine-tuning large-scale language model in the traditional medical field, the knowledge graph is introduced through the sampler based on deep reinforcement learning, and the knowledge graph is greatly improved in bilingual evaluation auxiliary index (BLEU) and retrospective oriented generalized evaluation auxiliary index (ROUGE) evaluation indexes. Specifically, bilingual evaluation auxiliary index is improved by 7.1%, and retrospective-oriented generalized evaluation auxiliary index (ROUGE) is improved by 2.7%.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a graph comparing the performance of the fine-tuning large language model of the present invention with other medical fields on bilingual evaluation auxiliary indicators (BLEU) and retrospective guided generalized evaluation auxiliary indicators (ROUGE).
Description of the drawings: fig. 2a is a comparison of the bluu score of the method of the invention and other methods, and fig. 2b is a comparison of the rouge score.
FIG. 3 is a comparison of an example of the effect of generating a fine-tuning large language model of the present invention with other medical fields.
Detailed Description
The invention is described in further detail below with reference to the drawings.
It should be understood that the description is only illustrative and is not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
As shown in fig. 1, the invention provides a method for implementing a data knowledge dual-drive intelligent medical dialogue system based on a knowledge graph, which comprises the following steps:
and step 1, acquiring a data set and preparing a related knowledge graph.
The invention first collects a patient-doctor dialogue dataset and a medical knowledge-graph. Taking the Chinese medical dialogue data set (CMD) and the Chinese medical multi-modal knowledge graph (CM 3 KG) data set as examples, the Chinese medical dialogue data set consists of 792099 question-answer pairs including men, department, obstetrics, gynecology, oncology, pediatrics and surgery. The chinese medical multimodal knowledge graph contains 8808 symptom nodes, 3353 medical examination nodes, 17318 drug nodes, and 366 food nodes. Table 1 is an example of a patient-physician dialogue, and table 2 is an example of nodes and their relationships in a knowledge graph.
Table 1 patient-doctor dialogue example
Table 2 node in knowledge graph and relationship example thereof
And 2, identifying the medical named entity. And (3) after the patient-doctor dialogue data set and the medical knowledge graph are acquired according to the step (1), extracting medical named entities in the patient problem by using a BERT-BILSTM-CRF model. The medical named entity recognition task can be seen as a character-level classification problem, wherein each character in the input sequence carries a label indicating whether it belongs to a medical named entity. These labels are typically assigned using a BIO tagging scheme that identifies the beginning (B-), internal (I-) and external (O) positions of named entities in the sequence. For example, for the input sequence "Ganmaoke may take Ganmaoling particles" where "Ganmaoling particles" means a drug. In the BIO tagging scheme, the first character "feel" will be tagged as B-MEDICINE indicating the beginning of the pharmaceutical entity, while the last character "grain" of the Ganmaoling grain will be tagged as O indicating that the entity ends at this location with the other characters "cap", "ling" and "grain" will be tagged as I-MEDICINE indicating that they are inside the pharmaceutical entity.
To achieve character-level classification, each character of a sentence is first embedded into a continuous latent space. The present invention uses a combination of BERT and LSTM as an encoder for this purpose. BERT uses a bi-directional transducer architecture to pretrain a large amount of unannotated text data using two unsupervised learning objectives, namely a masked language model and next sentence prediction. The pre-training method enables the BERT to learn the context representation of rich words and sentences, and can fine-tune downstream tasks such as named entity recognition and the like. In addition to BERT, the encoder architecture of the present invention also uses LSTM as a sequential modeling component. LSTM is able to capture dependencies between markers in the entire sequence. In particular, bi-directional LSTM is used to encode the forward and backward context of each tag, enabling the model to use the information of the forward and backward tags for classification tasks and, in addition, to enhance the ability of the model to capture global dependencies between tags, the present invention adds a Conditional Random Field (CRF) layer above the BERT-LSTM encoder. The CRF layer uses transition probabilities between tokens to model sequential dependencies between token assignments to help optimize the classifier generated output sequence. Specifically, the CRF layer receives as input the coded feature sequence obtained from the BERT-LSTM encoder and outputs a tag sequence corresponding to each tag in the input sentence.
Assuming q is the input sequence, the predicted tag sequenceCan be expressed as:
the loss function of the medical named entity recognition network is defined as:
wherein N represents the number of samples, M represents the number of categories, y ij Indicating whether the real label of sample i is of category j,the probability that the model predicts sample i as class j is represented.
And 3, medical knowledge graph matching. And (3) matching the medical named entity extracted in the step (2) with the head entity of the triplet in the medical knowledge graph. And if the medical naming entity is consistent with the head entity of the triplet in the knowledge graph, taking the tail entity corresponding to the head entity as the knowledge entity.
And 4, sampling a knowledge entity. The number of the knowledge entities matched in the step 3 is often large, and the sampler based on deep reinforcement learning is designed to sample the knowledge entities matched in the step 3 so as to obtain the knowledge entity most suitable for answering the patient questions. The state, actions, rewards, and optimizations for deep reinforcement learning are modeled as follows:
status: all knowledge entities are encoded with BERT to obtain their hidden representations, based on the matched knowledge entities. These hidden representations are then subjected to an average pooling operation to obtain a current state representation. Mathematically, the form of the current state is as follows:
where s represents the current state, n is the number of matched knowledge entities, e k Representing the kth knowledge entity.
The actions are as follows: to determine the probability of selecting each knowledge entity, the present invention employs a three-layer multi-layer perceptron (MLP) as the policy network. The input layer of the MLP contains 768 neurons, corresponding to the size of the state representation. The output layer of the MLP contains n neurons and is activated using a softmax activation function, where n is the number of matches to the knowledge entity. Each neuron in the output layer represents a probability of selecting a corresponding knowledge entity. Mathematically, this can be expressed as:
p=softmax(MLP(s))
where s is the current state. p [ k ]]E p represents the probability of selecting the kth knowledge entity. Next, the policy network may be based on p [ k ]]Sampling the knowledge entity. The present invention represents the action corresponding to the kth entity as a [ k ]]It belongs to the set {0,1} p[k] 。a[k]=1 denotes selecting the kth knowledge entity, and a [ k ]]=0 indicates no selection. Thus, the joint probability density function of the policy network output can be expressed as:
where θ is a parameter of the policy network.
Rewarding: the present invention utilizes the loss values of a large language model to construct a reward function. Specifically, the reward function is defined as:
r=-L llm +c
where r is a prize, L llm Is a loss of the large language model, and will be described in step 6, c is a hyper-parameter.
Optimizing: the goal of the optimization is to optimize the parameter θ by maximizing the desired jackpot value over all possible policy trajectories. The optimal θ can be expressed as:
where τ is the policy track consisting of state s, action a, and prize r.
The desired jackpot value for a trajectory may be expressed as:
where B is the total number of states, r m Is in the mth state s m Take action a m The awards obtained.
The present invention uses a gradient descent technique to update the parameter θ to maximize J (θ). In order to prevent the network from being excessively updated due to the influence of a large prize value, the present invention subtracts the offset value during the update. Thus, the gradient is expressed as:
wherein A is m =r m -r m Is an advantage in reinforcement learning. r is (r) m Is a rewarding expectation, can be achieved by s m And a m And (5) calculating to obtain the product. Action a m And directly selecting a knowledge entity according to the output strategy distribution p. a, a m Can be expressed as:
and 5, fine tuning the large language model. By using the medical named entity recognition network and the knowledge entity sampling network, i.e. step 2, step 3 and step 4, the knowledge entity most suitable for answering the patient's questions can be obtained. The invention uses the sampled knowledge entity and the original patient questions as the input of the large language model for fine adjustment, thus realizing accurate answer to the patient questions. The invention adopts ChatGLM-6B as a basic model. The ChatGLM-6B model is an open bilingual language model based on the Generic Language Model (GLM) framework, with 62 billion parameters. The invention adopts a parameter adjustment (p-turn) technique. The technique only trims 0.1% of the parameters and can achieve good performance. The loss function of the ChatGLM-6B fine tuning network is defined as:
where cross sentropy represents the cross entropy loss function, z represents the actual answer,representing the answer generated by ChatGLM-6B.
And 6, generating a question and answer. And 5, generating an answer according to the model parameters obtained by fine tuning training in the step.
And 7, testing the model effect.
In order to test the model effect, 80% of patient-doctor data are randomly selected as a training set, the rest 20% of data are selected as a test set, and 10% of data are randomly selected from the training set as a verification set to adjust the model super-parameters. The test model of the invention is used for testing the performance of bilingual evaluation auxiliary indexes (BLEU), retrospective oriented generalized evaluation auxiliary indexes (ROUGE) and other indexes. The calculation formula of these indexes is as follows:
wherein BP is a brevity penalty term for penalizing a case where the generated text is shorter than the reference text, pn measures the ratio of N-gram in the generated text to appear in the reference text, N is the maximum N-gram length, G is the generated sentence, S is the real sentence, and CountS (w) represents word w in the generated textThe number of occurrences in this S, countG (w), represents the number of occurrences of word w in reference text G, β 2 Is a constant and is typically set to 1.2.
The effects of the present invention will be described in further detail with reference to simulation experiments.
1. Simulation conditions and parameter settings:
the simulation experiments of the invention are carried out on simulation platforms of Python3.9.0, pytorch1.11 and CUDA 11.3. The CPU model of the computer is Intel Kuui 9-12900K, and the GPU model is Ing Weida Geforce RTX 3090. The learning rate is set to 2e-5.
2. The simulation content:
fig. 2 shows the performance comparison of the technical solution of the present invention with other fine-tuning large language models in the medical field on bilingual evaluation auxiliary index (BLEU) and retrospective oriented generalized evaluation auxiliary index (ROUGE) indexes. The abscissa is the different technical schemes. Fig. 2 (a) is a bilingual evaluation aid indicator (BLEU) on the ordinate, and fig. 2 (b) is a retrospectively guided generalized evaluation aid indicator (ROUGE) on the ordinate. By comparison, the answers generated by the method are closer to the level of a real doctor.
FIG. 3 shows a comparison of the technical solution of the present invention with a question-answer example of a large language model in the general field. By comparing, the answers generated by the method are more specific, and the answers generated by other large-scale language models are more unoccupied.
In summary of the simulation results and analysis, the knowledge graph-based data knowledge dual-drive intelligent medical dialogue system provided by the invention obtains better performance on bilingual evaluation auxiliary indicators (BLEU) and retrospective oriented generalized evaluation auxiliary indicators (ROUGE) indicators. Compared with question-answer examples of other large language models, the answer generated by the method is more targeted. These illustrate that the answers generated by the method of the present invention are closer to the level of the real doctor, so that the present invention can be better applied in the actual medical dialogue scene.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explanation of the principles of the present invention and are in no way limiting of the invention. Accordingly, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the scope of the present invention. Furthermore, the appended claims are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or equivalents of such scope and boundary.
Claims (9)
1. The data knowledge double-drive intelligent medical dialogue system based on the knowledge graph is characterized by comprising a medical named entity recognition module, a medical knowledge graph matching module, a knowledge entity sampling module, a large language model fine tuning module and a dialogue generating module;
the medical named entity recognition module is used for extracting medical named entities in the patient problems, including disease names, body parts, medical procedures, medicines and departments, and is the basis of a subsequent medical knowledge graph matching module;
the medical knowledge map matching module is used for matching the disease name and the medical program medical naming entity with the nodes in the medical knowledge map after the disease name and the medical naming entity are obtained so as to obtain relevant professional background knowledge;
the number of the knowledge entities obtained by the knowledge graph matching module is huge, the knowledge entities comprise knowledge entities which are not related to the questions raised by the patient, and the knowledge entity sampling module samples the knowledge entities obtained by the medical knowledge graph matching module to obtain the knowledge entities which can answer the questions of the patient most;
the large language model fine tuning module inputs the questions raised by the patient and the knowledge entity obtained by the knowledge entity sampling module into the large language model for fine tuning training, so that the large language model can generate answers close to the level of human doctors;
the dialogue generating module is an output module of the system, and a large language model fine tuning module is utilized to fine tune a trained model, and the generated answer is output aiming at the input patient questions and related knowledge entities.
2. The method for realizing the data knowledge dual-drive intelligent medical dialogue system based on the knowledge graph is characterized by comprising the following steps of:
step 1: acquiring a data set and preparing a related knowledge graph;
acquiring a patient-doctor question and answer data set and a medical knowledge graph;
step 2: medical named entity identification;
extracting medical named entities in the patient problems by using a BERT-BILSTM-CRF model;
step 3: medical knowledge graph matching;
matching the extracted medical named entity with head entities of the triples in the medical knowledge graph, and taking tail entities corresponding to all the head entities as knowledge entities if the matching is successful;
step 4: sampling a knowledge entity;
sampling the knowledge entity matched in the step 3 by using a sampler based on deep reinforcement learning to obtain the knowledge entity most suitable for answering the patient questions;
step 5: fine tuning of a large language model;
inputting the knowledge entity obtained by sampling in the step 4 and the original problem of the patient into a large language model for fine tuning training;
step 6: generating questions and answers;
generating an answer according to the model parameters obtained by the fine tuning training in the step 5;
step 7: testing the model effect;
the quality of the model generated answer is verified on the verification set.
3. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 1 comprises: a patient-physician session dataset and a medical knowledge-graph are collected.
4. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 2 comprises: after the patient-physician dialogue dataset and medical knowledge graph are acquired according to step 1 above, to achieve character-level classification, each character of the sentence is first embedded into a continuous latent space, using the combination of BERT and LSTM as the encoder, the BERT utilizing a bi-directional transducer architecture that is pre-trained on large amounts of un-annotated text data, using two unsupervised learning objectives: masking language models and next sentence predictions, this pre-training method enabling BERT to learn rich word and sentence context representations, fine tuning named entity recognition downstream tasks, the encoder architecture using LSTM as a sequential modeling component in addition to BERT, LSTM being able to capture dependencies between labels in the whole sequence, bi-directional LSTM being used to encode forward and backward context for each label, enabling models to classify tasks using past and future label information, adding a Conditional Random Field (CRF) layer over the BERT-LSTM encoder, the CRF layer modeling sequential dependencies between label assignments using transition probabilities between labels, the CRF layer receiving the encoded signature sequences obtained from the BERT-LSTM encoder as input and outputting a label sequence corresponding to each label in the input sentence;
assuming q is the input sequence, the predicted tag sequenceExpressed as:
the loss function of the medical named entity recognition network is defined as:
wherein N represents the number of samples, M represents the number of categories, y ij Indicating whether the real label of sample i is of category j,the probability that the model predicts sample i as class j is represented.
5. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 3 comprises: and (3) matching the medical named entity extracted in the step (2) with the head entity of the triplet in the medical knowledge graph, and taking the tail entity corresponding to the head entity as the knowledge entity if the medical named entity is consistent with the head entity of the triplet in the knowledge graph.
6. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 4 comprises: sampling the knowledge entity matched in the step 3 by using a sampler based on deep reinforcement learning to obtain the knowledge entity most suitable for answering the patient questions, wherein the state, action, rewards and optimization modeling of the deep reinforcement learning comprises:
status: according to the matched knowledge entities, all the knowledge entities are encoded using BERT to obtain their hidden representations, which are then subjected to an averaging pooling operation to obtain a current state representation in the form of:
where s represents the current state, n is the number of matched knowledge entities, e k Representing a kth knowledge entity;
the actions are as follows: to determine the probability of selecting each knowledge entity, a three-layer multi-layer perceptron (MLP) is employed as the policy network, the input layer of the MLP containing 768 neurons corresponding to the size of the state representation, the output layer of the MLP containing n neurons and being activated using a softmax activation function, where n is the number of matched knowledge entities, each neuron in the output layer representing the probability of selecting a respective knowledge entity expressed as:
p=softmax(MLP(s))
where s is the current state, p [ k ]]E p represents the probability of selecting the kth knowledge entity, and the policy network is based on p [ k ]]Sampling the knowledge entity, representing the action corresponding to the kth entity as a [ k ]]It belongs to the set {0,1} p[k] ,a[k]=1 denotes selecting the kth knowledge entity, and a [ k ]]=0 denotes no choice, and the joint probability density function of the policy network output is expressed as:
wherein θ is a parameter of the policy network;
rewarding: constructing a reward function by using the loss value of the large language model, wherein the reward function is defined as:
r=-L llm +c
where r is a prize, L llm Is the loss of the large language model, c is a hyper-parameter;
optimizing: the goal of the optimization is to optimize the parameter θ by maximizing the desired jackpot value over all possible policy trajectories, the optimal θ being expressed as:
where τ is the policy track consisting of state s, action a and prize r;
the desired jackpot value for a track is expressed as:
where B is the total number of states, r m Is in the mth state s m Take action a m A prize obtained;
the parameter θ is updated using a gradient descent technique to maximize J (θ), the gradient being expressed as a subtraction of the offset value during the update in order to prevent the network from being over-updated by the influence of the large prize value:
wherein the method comprises the steps ofIs an advantage in reinforcement learning, +.>Is a rewarding expectation, by s m And->Calculated, action->Directly selecting a knowledge entity according to the output policy profile p, a>Expressed as:
7. the method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 5 comprises: the sampled knowledge entity and the original patient questions are used as input of a large language model to be finely tuned, accurate answer to the patient questions is achieved, a ChatGLM-6B is used as a basic model, the ChatGLM-6B model is an open bilingual language model based on a General Language Model (GLM) framework, 62 hundred million parameters are provided, a parameter adjustment (p-turn) technology is adopted, and a loss function of the ChatGLM-6B fine tuning network is defined as follows:
where cross sentropy represents the cross entropy loss function, z represents the actual answer,representing the answer generated by ChatGLM-6B.
8. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 6 comprises: and (5) generating an answer according to the model parameters obtained by the fine tuning training in the step (5).
9. The method for implementing a knowledge-based data knowledge dual-driven intelligent medical dialogue system according to claim 2, wherein the step 7 comprises: randomly selecting 80% of patient-doctor data as a training set, and the rest 20% of data as a test set, and randomly selecting 10% of data from the training set as a verification set to adjust model super-parameters, wherein the test model is expressed on bilingual evaluation auxiliary indexes (BLEU) and retrospective oriented generalized evaluation auxiliary indexes (ROUGE) indexes, and the calculation formulas of the indexes are as follows:
wherein BP is a brevity penalty term for penalizing a case where the generated text is shorter than the reference text, pn measures the ratio of N-gram in the generated text to occur in the reference text, N is the maximum N-gram length, G is the generated sentence, S is the real sentence, countS (w) represents the number of times word w occurs in the generated text S, countG (w) represents the number of times word w occurs in the reference text G, beta 2 Is a constant set to 1.2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310829332.XA CN117077786A (en) | 2023-07-07 | 2023-07-07 | Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310829332.XA CN117077786A (en) | 2023-07-07 | 2023-07-07 | Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117077786A true CN117077786A (en) | 2023-11-17 |
Family
ID=88715994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310829332.XA Pending CN117077786A (en) | 2023-07-07 | 2023-07-07 | Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117077786A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117520508A (en) * | 2023-11-20 | 2024-02-06 | 广州方舟信息科技有限公司 | Medical dialogue answer generation method, device, electronic equipment and storage medium |
CN117573843A (en) * | 2024-01-15 | 2024-02-20 | 图灵人工智能研究院(南京)有限公司 | Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system |
CN117709441A (en) * | 2024-02-06 | 2024-03-15 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN117933364A (en) * | 2024-03-20 | 2024-04-26 | 烟台海颐软件股份有限公司 | Power industry model training method based on cross-language knowledge migration and experience driving |
CN117995426A (en) * | 2024-04-07 | 2024-05-07 | 北京惠每云科技有限公司 | Medical knowledge graph construction method and device, electronic equipment and storage medium |
CN118116620A (en) * | 2024-04-28 | 2024-05-31 | 支付宝(杭州)信息技术有限公司 | Medical question answering method and device and electronic equipment |
CN118133883A (en) * | 2024-05-06 | 2024-06-04 | 杭州海康威视数字技术股份有限公司 | Graph sampling method, graph prediction method, and storage medium |
-
2023
- 2023-07-07 CN CN202310829332.XA patent/CN117077786A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117520508A (en) * | 2023-11-20 | 2024-02-06 | 广州方舟信息科技有限公司 | Medical dialogue answer generation method, device, electronic equipment and storage medium |
CN117520508B (en) * | 2023-11-20 | 2024-05-28 | 广州方舟信息科技有限公司 | Medical dialogue answer generation method, device, electronic equipment and storage medium |
CN117573843A (en) * | 2024-01-15 | 2024-02-20 | 图灵人工智能研究院(南京)有限公司 | Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system |
CN117573843B (en) * | 2024-01-15 | 2024-04-02 | 图灵人工智能研究院(南京)有限公司 | Knowledge calibration and retrieval enhancement-based medical auxiliary question-answering method and system |
CN117709441A (en) * | 2024-02-06 | 2024-03-15 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN117709441B (en) * | 2024-02-06 | 2024-05-03 | 云南联合视觉科技有限公司 | Method for training professional medical large model through gradual migration field |
CN117933364A (en) * | 2024-03-20 | 2024-04-26 | 烟台海颐软件股份有限公司 | Power industry model training method based on cross-language knowledge migration and experience driving |
CN117933364B (en) * | 2024-03-20 | 2024-06-04 | 烟台海颐软件股份有限公司 | Power industry model training method based on cross-language knowledge migration and experience driving |
CN117995426A (en) * | 2024-04-07 | 2024-05-07 | 北京惠每云科技有限公司 | Medical knowledge graph construction method and device, electronic equipment and storage medium |
CN118116620A (en) * | 2024-04-28 | 2024-05-31 | 支付宝(杭州)信息技术有限公司 | Medical question answering method and device and electronic equipment |
CN118133883A (en) * | 2024-05-06 | 2024-06-04 | 杭州海康威视数字技术股份有限公司 | Graph sampling method, graph prediction method, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117077786A (en) | Knowledge graph-based data knowledge dual-drive intelligent medical dialogue system and method | |
Van Aken et al. | Clinical outcome prediction from admission notes using self-supervised knowledge integration | |
CN110705293A (en) | Electronic medical record text named entity recognition method based on pre-training language model | |
Liu et al. | Medical-vlbert: Medical visual language bert for covid-19 ct report generation with alternate learning | |
CN108628824A (en) | A kind of entity recognition method based on Chinese electronic health record | |
CN109378066A (en) | A kind of control method and control device for realizing disease forecasting based on feature vector | |
CN113688248B (en) | Medical event identification method and system under condition of small sample weak labeling | |
CN109003677B (en) | Structured analysis processing method for medical record data | |
CN112420191A (en) | Traditional Chinese medicine auxiliary decision making system and method | |
CN111651991A (en) | Medical named entity identification method utilizing multi-model fusion strategy | |
CN115293128A (en) | Model training method and system based on multi-modal contrast learning radiology report generation | |
CN112182168B (en) | Medical record text analysis method and device, electronic equipment and storage medium | |
Colla et al. | Semantic coherence markers: The contribution of perplexity metrics | |
Hsu et al. | Multi-label classification of ICD coding using deep learning | |
CN114417836A (en) | Deep learning-based Chinese electronic medical record text semantic segmentation method | |
Melnyk et al. | Generative artificial intelligence terminology: a primer for clinicians and medical researchers | |
CN116403706A (en) | Diabetes prediction method integrating knowledge expansion and convolutional neural network | |
CN111222325A (en) | Medical semantic labeling method and system of bidirectional stack type recurrent neural network | |
Zhang et al. | Bert with enhanced layer for assistant diagnosis based on Chinese obstetric EMRs | |
Mou et al. | Named entity recognition based on transformer encoder in the medical field | |
CN113643825A (en) | Medical case knowledge base construction method and system based on clinical key characteristic information | |
Dao et al. | Patient Similarity using Electronic Health Records and Self-supervised Learning | |
Jiang et al. | Dual memory network for medical dialogue generation | |
Kong et al. | TCM disease diagnosis based on convolutional cyclic neural network algorithm | |
CN117194604B (en) | Intelligent medical patient inquiry corpus construction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |