CN117747087A - Training method of large inquiry model, inquiry method and device based on large inquiry model - Google Patents
Training method of large inquiry model, inquiry method and device based on large inquiry model Download PDFInfo
- Publication number
- CN117747087A CN117747087A CN202311758183.9A CN202311758183A CN117747087A CN 117747087 A CN117747087 A CN 117747087A CN 202311758183 A CN202311758183 A CN 202311758183A CN 117747087 A CN117747087 A CN 117747087A
- Authority
- CN
- China
- Prior art keywords
- inquiry
- model
- doctor
- patient
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 143
- 238000000034 method Methods 0.000 title claims abstract description 88
- 230000004044 response Effects 0.000 claims abstract description 31
- 230000011218 segmentation Effects 0.000 claims description 47
- 230000006870 function Effects 0.000 claims description 42
- 201000010099 disease Diseases 0.000 claims description 17
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 238000011156 evaluation Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000000873 masking effect Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 7
- 238000005065 mining Methods 0.000 claims description 4
- 230000004069 differentiation Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 12
- 206010035664 Pneumonia Diseases 0.000 description 5
- 230000001684 chronic effect Effects 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 206010011224 Cough Diseases 0.000 description 3
- 238000003745 diagnosis Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- ULGZDMOVFRHVEP-RWJQBGPGSA-N Erythromycin Chemical compound O([C@@H]1[C@@H](C)C(=O)O[C@@H]([C@@]([C@H](O)[C@@H](C)C(=O)[C@H](C)C[C@@](C)(O)[C@H](O[C@H]2[C@@H]([C@H](C[C@@H](C)O2)N(C)C)O)[C@H]1C)(C)O)CC)[C@H]1C[C@@](C)(OC)[C@@H](O)[C@H](C)O1 ULGZDMOVFRHVEP-RWJQBGPGSA-N 0.000 description 2
- 208000010201 Exanthema Diseases 0.000 description 2
- 206010037660 Pyrexia Diseases 0.000 description 2
- 239000008186 active pharmaceutical agent Substances 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 201000005884 exanthem Diseases 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 230000000670 limiting effect Effects 0.000 description 2
- 206010037844 rash Diseases 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 206010012434 Dermatitis allergic Diseases 0.000 description 1
- 206010012735 Diarrhoea Diseases 0.000 description 1
- 206010051841 Exposure to allergen Diseases 0.000 description 1
- 206010019233 Headaches Diseases 0.000 description 1
- 208000031226 Hyperlipidaemia Diseases 0.000 description 1
- 208000003251 Pruritus Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 201000008937 atopic dermatitis Diseases 0.000 description 1
- 208000010668 atopic eczema Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 229960003276 erythromycin Drugs 0.000 description 1
- 230000000763 evoking effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 231100000869 headache Toxicity 0.000 description 1
- 230000020169 heat generation Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000007803 itching Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000002483 medication Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 235000014102 seafood Nutrition 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Landscapes
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The disclosure provides a training method of a big inquiry model, a big inquiry method and a device based on the big inquiry model, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue; inputting training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and doctor response information of each round of patient input information; and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
Description
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to a training method of a large inquiry model, and an inquiry method and device based on the large model.
Background
In the current medical health industry, effective communication between doctors and patients is critical for accurate diagnosis and providing high quality medical services. To address this problem, physicians can be assisted in high-efficiency communication with patients based on large models.
Disclosure of Invention
The present disclosure provides a training method for a big model of a interview, a big model-based interview method, an apparatus, an electronic device, a storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a training method of a large inquiry model, including: acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; determining prompt words of a large inquiry model, and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue; inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and the doctor response information of each round of patient input information; and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
According to another aspect of the present disclosure, there is provided a large model-based inquiry method including: calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model; inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited; the target big inquiry model is a big model obtained by training by the training method of the big inquiry model.
According to another aspect of the present disclosure, there is provided a training apparatus of a large inquiry model, including: the acquisition module is used for acquiring a plurality of rounds of doctor-patient inquiry dialogues in the medical field, wherein the rounds of doctor-patient inquiry dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; the generation module is used for determining prompt words of the large inquiry model and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue; the training module is used for inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and the doctor response information; and the adjusting module is used for adjusting the big inquiry model based on the loss function, and continuously training the adjusted model until the training ending condition is met to obtain the target big inquiry model.
According to another aspect of the present disclosure, there is provided a large model-based inquiry apparatus including: the calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model; the inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited; the target big inquiry model is a big model obtained by training by the training method of the big inquiry model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for training a big model for interviewing and the method for interviewing based on a big model according to the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of training a big model for interviewing, the big model based method of interviewing according to the embodiment of the above aspect.
According to another aspect of the disclosure, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the method for training a big model for interviewing according to the embodiment of the above aspect, and the method for interviewing based on the big model.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of a training method of a large inquiry model according to an embodiment of the disclosure;
FIG. 2 is a flow chart of another method for training a large inquiry model according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a multi-round doctor-patient inquiry dialogue acquisition provided in an embodiment of the disclosure;
FIG. 4 is a schematic diagram of masking according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a process of determining a minimum semantic unit of a large inquiry model in a training method of the large inquiry model according to an embodiment of the present disclosure;
FIG. 6 is a flow chart of another method for training a large inquiry model according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart of a method for interviewing based on a large model according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a interview using a target interview size model provided by an embodiment of the present disclosure;
FIG. 9 is a schematic flow chart of a large model-based interrogation provided by an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of an application framework based on a target consultation large model provided by an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of a training device for a large inquiry model according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a large model-based interrogation device according to an embodiment of the present disclosure;
fig. 13 is a block diagram of an electronic device for implementing a training method of a large-scale inquiry model of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Artificial intelligence (Artificial Intelligence, AI) is a piece of technical science that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.
Natural language processing (Natural Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. The natural language processing is mainly applied to the aspects of machine translation, public opinion monitoring, automatic abstracting, viewpoint extraction, text classification, question answering, text semantic comparison, voice recognition and the like.
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and is an inherent rule and expression hierarchy of Learning sample data, so that a Machine can analyze Learning ability like a person, can recognize data such as characters, images and sounds, and is widely applied to speech and image recognition.
Large models (Large models) refer to models with a Large number of parameters and complex structures in the field of machine learning and artificial intelligence. These models typically require extensive computational resources to train and deploy, and are capable of processing and understanding more, more complex data. The large model can be applied to a plurality of application fields such as automatic writing, chat robots, virtual assistants, voice assistants, automatic translation and the like.
It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present disclosure are all authorized by the user or sufficiently authorized by the parties, and the collection, use, and processing of the related data requires compliance with the relevant laws and regulations and standards, without violating the public welfare.
The disclosed embodiments are applicable to a variety of medical and health institutions, including but not limited to hospitals, clinics, telemedicine service platforms, and the like. 1. In basic medical institutions and resource deficiency areas, doctors can be assisted to carry out preliminary screening and diagnosis, and standardized and specialized inquiry suggestions are provided. 2. In a hospital and a remote medical service platform, doctors can be helped to collect patient information and basic symptoms in advance in the pre-diagnosis stage, and the doctors are helped to carry out preliminary inquiry and form professional judgment. 3. For medical education training, the medical students and primary doctors can be assisted to carry out clinical communication and case analysis training, and the professional skills and clinical experience of the medical students and primary doctors are improved.
Fig. 1 is a flow chart of a training method of a large inquiry model according to an embodiment of the disclosure. As shown in fig. 1, the training method of the large inquiry model may include:
s101, acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
It should be noted that, the execution body of the training method of the big inquiry model in the embodiment of the present disclosure may be a hardware device with information processing capability and/or software necessary for driving the hardware device to work. Alternatively, the execution body may include a server, a user terminal, and other intelligent devices. Optionally, the user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, etc. Alternatively, the server may be a cloud server, a server of a distributed system, a server combined with a blockchain, or the like. The embodiments of the present disclosure are not particularly limited.
In some implementations, multiple rounds of doctor-patient interview sessions may be based on a knowledge base of the medical field. Alternatively, multiple rounds of doctor-patient interview sessions may be acquired based on medical knowledge graph (Medical Knowledge Graph, MKG) data, patient case data, an on-line interview system, and the like. Multiple rounds of physician-to-patient consultation dialogs may also be generated based on the existing large model. The multi-round doctor-patient inquiry dialogue comprises a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
S102, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
In some implementations, the ability of the large inquiry model to be specifically positioned and evoked in the medical field can be aided by preset prompt words, and training corpus of the large inquiry model is generated according to multiple rounds of doctor-patient inquiry dialogue and prompt words.
Optionally, the training corpus is obtained by combining the prompt words of the large inquiry model and multiple rounds of doctor-patient inquiry dialogues. The prompt term may be determined based on the purpose of the large inquiry model. For example, if a large model is used for a consultation, the large model may be used as a doctor, and "please reply as a doctor" may be used as a prompt word for the large model.
Illustratively, table 1 shows one example of a training corpus. Based on the prompt word "please help as a doctor to make a professional inquiry to the patient, such as please reply to ask you what is uncomfortable? And carrying out multi-round doctor-patient inquiry dialogue to obtain training corpus.
TABLE 1 example of training corpus
S103, inputting training corpus into the large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and doctor response information.
In some implementations, after the training corpus is obtained, the training corpus is input into the large inquiry model for training, and the large inquiry model generates corresponding doctor prediction answers based on each round of patient input information in the training corpus, so that the patient input information and the doctor prediction answers can be used as a large inquiry model prediction inquiry dialogue.
Further, in order to adjust the big inquiry model so that the big inquiry model meets the requirement of high performance, the loss function of the big inquiry model can be determined by calculating the loss function of the doctor prediction answer and the doctor response information. Alternatively, computation of cross entropy loss may be performed on the doctor's predicted answer and the doctor's answer information, determining the difference in probability distribution of the interview-big model predicted answer and the answer information probability distribution.
S104, adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
In some implementations, to prevent overfitting of the loss function, regularization may be combined and gradients of the loss function calculated, with model parameters of the interview large model being adjusted based on the gradients. And further training the adjusted model until the training ending condition is met to obtain the target consultation large model.
Optionally, setting the training times of the large inquiry model as a training ending condition, and determining that the training ending condition is met when the training times of the large inquiry model reach the set times to obtain the target large inquiry model. And the training ending condition can be determined based on the accuracy of the prediction reply output by the large inquiry model, and when the accuracy of the prediction reply is greater than a set threshold, the training ending condition is determined to be met, so that the target large inquiry model is obtained.
According to the training method of the large inquiry model, which is provided by the embodiment of the disclosure, multiple rounds of doctor-patient inquiry dialogues in the medical field are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 2 is a flow chart of a training method of a large inquiry model according to an embodiment of the disclosure. As shown in fig. 2, the training method of the large inquiry model may include:
S201, acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
In some implementations, to improve reliability of the doctor-patient interview session, errors in the interview session are reduced by obtaining multiple rounds of initial doctor-patient interview sessions and filtering the multiple rounds of initial doctor-patient interview sessions to obtain multiple rounds of doctor-patient interview sessions.
Optionally, the roles of the doctor and the patient can be distinguished from each other by using multiple rounds of initial doctor-patient consultation conversations, so as to obtain the conversation data of each role, and the roles can be distinguished by using a regular expression. Further, scores of doctor dialogue data in multiple dimensions are obtained, the scores of the doctor dialogue data in the multiple dimensions are weighted to obtain comprehensive scores of the doctor dialogue data, and then multiple rounds of initial doctor-patient inquiry dialogues are filtered based on the comprehensive scores of the doctor dialogue data to obtain multiple rounds of doctor-patient inquiry dialogues.
Alternatively, the doctor session data may be scored manually from multiple dimensions, and the doctor session data may be scored automatically by a large model to obtain a composite score for the doctor session data. Multiple rounds of initial doctor-patient interview sessions in which doctor session data with a composite score greater than a set threshold are located may be selected as multiple rounds of doctor-patient interview sessions.
In some implementations, multiple rounds of initial doctor-patient interview sessions may be acquired from the doctor knowledge graph, the online interview session, and the patient's case information to obtain multiple rounds of doctor-patient interview sessions with high reliability, which may help to train the reliability of the interview large model.
Alternatively, medical data may be obtained from a clinical trial database, and knowledge-graph mining may be performed on the medical data to obtain a medical knowledge-graph. And then the medical knowledge graph can be input into a large language model to generate multiple rounds of initial doctor-patient inquiry dialogs.
An example of a medical knowledge graph for example of hyperlipidemia is shown in table 2 below:
TABLE 2
Wherein the data listed in table 2 are the data items required to generate multiple rounds of initial doctor-patient interview sessions.
Alternatively, an online inquiry dialogue may be acquired, and multiple rounds of initial doctor-patient inquiry dialogues may be acquired according to the online inquiry dialogues, where the online inquiry dialogues may be used as the multiple rounds of initial doctor-patient inquiry dialogues. Case information can also be acquired, and multiple rounds of initial doctor-patient consultation dialogs can be acquired according to the case information. The answers to the questions and the patient of the doctor can be obtained from the case information, thereby generating a plurality of rounds of initial doctor-patient inquiry dialogues.
Illustratively, table 3 shows an example of data for an online interview session, where the session information in table 3 may be used as a multi-round initial doctor-patient interview session.
TABLE 3 Table 3
Table 4 shows an example of case information based on which a physician's question and patient's answer may be obtained, thereby generating multiple rounds of initial physician-to-patient interview sessions.
TABLE 4 Table 4
For example, according to the complaint information "cough for 2 days, heat generation by today", a doctor's question "what is the condition appearing? "and then generates a round of initial doctor-patient inquiry dialogue: the doctor: what is the condition present? ", patient: cough for 2 days, and fever caused by this.
A flow chart of a multiple round doctor-patient inquiry session is acquired as shown in fig. 3. Multiple rounds of doctor-patient consultation dialogs can be obtained by obtaining multiple rounds of initial doctor-patient consultation dialogs and performing role segmentation and quality filtering on the multiple rounds of initial doctor-patient consultation dialogs. The method comprises the steps of generating multiple rounds of initial doctor-patient inquiry dialogs based on a medical knowledge graph, acquiring the multiple rounds of initial doctor-patient inquiry dialogs based on an online inquiry dialogs, and acquiring the multiple rounds of initial doctor-patient inquiry dialogs based on case information.
Optionally, regular expressions may be used for multiple rounds of initial doctor-patient inquiry sessions to perform role segmentation to obtain patient session data and doctor session data, respectively. And scoring doctor dialogue data, and screening the round of initial doctor-patient inquiry dialogues based on scoring results to obtain a plurality of rounds of doctor-patient inquiry dialogues.
S202, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
S203, inputting the training corpus into the large inquiry model and outputting doctor prediction answers of each round of patient input information.
The relevant content of steps S202-S203 can be seen in the above embodiments, and will not be described here again.
S204, masking and shielding each round of patient input information, and only reserving doctor prediction answers of each round of patient input information.
In some implementations, to reduce training time of the large inquiry model and improve training efficiency of the large inquiry model, each round of patient input information may be masked based on a prediction dialogue of the patient and the doctor output by the large inquiry model, and only doctor prediction answers of each round of patient input information may be retained.
Alternatively, a mask may be used to hide patient input information in the patient to physician prediction dialogue of the output of the interview simulator, with only physician prediction answers for each round of patient input information displayed.
As shown in the schematic diagram of masking in fig. 4, the gray part is the masking part and the white part is the reserved part. Fig. 4a shows a schematic diagram before mask occlusion, in which only the last doctor prediction answer, namely doctor 3, is displayed in the prediction dialogue between the patient and the doctor, which is output by the interview big model. Fig. 4b shows that each round of patient input information is masked and masked, and doctor's predicted answers to each round of patient input information, namely doctor 1, doctor 2 and doctor 3, are retained.
S205, performing cross entropy loss calculation on doctor prediction answers and doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
In some implementations, parameters of the large interview model may be optimized and performance improved by calculating a loss function between doctor's predicted answers and doctor's response information for each round of patient input information, measuring the error between the probability distribution of the predicted results of the large interview model and the probability distribution of the real data.
Alternatively, the loss function of the big inquiry model may be determined by determining the probability distribution of the doctor's predicted response and the probability distribution of the doctor's response information, and calculating the cross entropy loss between the doctor's predicted response and the doctor's response information. The calculated cross entropy loss formula is as follows:
L=-∑ i y i log(p i ) (1)
wherein L represents a loss function, y i Representing probability distribution of doctor response information, p i Representing the probability distribution of the doctor's predicted response.
S206, adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
The relevant content of step S206 may be referred to the above embodiments, and will not be described herein.
In some implementations, after the target large-scale inquiry model is obtained, the target large-scale inquiry model may be tested. The target large inquiry model can be used for inquiry, inquiry results are obtained, and evaluation test is carried out on the target large inquiry model.
Optionally, a plurality of specified candidate diseases can be selected, and the candidate diseases are queried based on the target large query model, so that a predicted query result of the target large query model on the candidate diseases is obtained. And further, the reference inquiry result of the expert system on the candidate diseases can be obtained, and the target inquiry large model is evaluated based on the predicted inquiry result and the reference inquiry result.
Optionally, if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, fine tuning is continued on the target large inquiry model. If the evaluation result indicates that the target large inquiry model passes the evaluation, the target large inquiry model can be put into use so as to assist doctors in inquiry.
According to the training method of the large inquiry model, multiple rounds of initial doctor-patient inquiry dialogues are obtained, multiple rounds of initial doctor-patient inquiry dialogues are screened, multiple rounds of doctor-patient inquiry dialogues are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model, so that the large inquiry model can learn knowledge reserve and logic reasoning capability when facing different symptoms and different speaking modes, and the large inquiry model can accurately inquire the problem with identification significance. And training the large inquiry model based on the training corpus, and determining a loss function of the large inquiry model by performing cross entropy loss calculation on the large inquiry model. And adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
On the basis of the above embodiments, the process of determining the minimum semantic unit of the big inquiry model according to the embodiments of the present disclosure may be explained, as shown in fig. 5, and may include:
s501, acquiring medical literature and acquiring vocabulary in the medical field based on the medical literature.
It can be understood that, because the big inquiry model needs to be dedicated to the medical field, a semantic unit of the medical field needs to be constructed, and knowledge of the medical field and the big inquiry model are combined, so that the big inquiry model is trained, the big inquiry model can be helped to understand the semantics of the medical field, and the performance of the big inquiry model in the medical field is improved.
In some implementations, medical documents, such as medical guidelines, medical articles, and the like, may be obtained based on academic search engines, medical databases, academic libraries, and the like. And then, the medical literature can be segmented to acquire the vocabulary of the medical field, so that the large inquiry model can understand the vocabulary of the medical field, and the performance of the large inquiry model is improved.
In some implementations, the vocabulary of the medical field may be determined by counting words of high frequency of occurrence in the medical literature. Alternatively, a word segmentation set of the medical document can be obtained by segmenting the medical document, and for each word segment in the word segmentation set, a Term Frequency (TF) -inverse text Frequency index (Inverse Document Frequency, IDF) of the segmented word is obtained.
Alternatively, the TF of a term may be determined based on the total number of terms in the set of terms and the number of times any term occurs in the set of terms. The formula for calculating TF of any word is as follows:
wherein TF represents word frequency of the word segmentation, N represents the number of times any word segmentation occurs in the word segmentation set, and J represents the total number of the word segmentation in the word segmentation set.
Further, the number of documents containing the word and the total number of medical documents are obtained, the IDF of the word is determined according to the number of documents and the total number of medical documents, and the formula for calculating the IDF of the word is as follows:
wherein IDF represents inverse text frequency index, M represents total number of medical documents, and I represents number of documents.
Further, based on the TF and IDF of any word, the TF-IDF of the word can be determined. The formula for calculating the TF-IDF of the segmentation is as follows:
TF-IDF=TF*IDF (4)
where TF represents the word frequency of the word segmentation and IDF represents the inverse text frequency index.
Alternatively, candidate tokens may be obtained from a collection of tokens based on the TF-IDF of the token, and the vocabulary of the medical domain may be determined based on the candidate tokens. A TF-IDF threshold of the word segmentation may be set, and a word segment greater than or equal to the TF-IDF threshold is selected from the word segmentation set as a candidate word segment.
Alternatively, it may be determined whether there is a duplication of the candidate word with the words in the initial vocabulary. If the candidate word is not repeated with the word in the vocabulary, the candidate word is updated to the initial vocabulary as the vocabulary of the medical field, so that the large model is helped to understand the semantics of the medical field, and the performance of the large model in the medical field is improved.
In some implementations, words in the medical document may also be clustered based on the character sequence to obtain a vocabulary for the medical field. The method comprises the steps of segmenting a medical document to obtain a segmented set of the medical document, and performing character segmentation on segmented words in the segmented set to obtain a character sequence.
Further, based on byte-level byte pairs, codes (Byte Level Byte Pair Encoder, BBPE) are used for clustering the characters in the character sequence and the characters in the initial vocabulary, so that the vocabulary in the medical field is obtained. The characters in the initial vocabulary may be clustered based on the frequency of occurrence of adjacent character pairs.
Optionally, the occurrence frequencies of all adjacent character pairs are obtained from the character sequence, adjacent characters with the highest occurrence frequency are selected from all adjacent character pairs to be combined, a combined character segment is obtained, and the combined character segment is combined with the initial vocabulary.
Illustratively, the frequency of occurrence of all adjacent character pairs is calculated to be f (C 1 ,C 2 ) And selecting adjacent characters with highest occurrence frequency from the characters to combine to obtain a combined character segment as The combined character segment is +.>The new token m is replaced, and the initial vocabulary V is merged with the combined character segment v=v u { m }.
Further, the process is repeated until the iteration ending condition is met, and a target vocabulary is obtained, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field. Knowledge and semantics of the medical field can be better understood based on the target vocabulary inquiry large model. Alternatively, character combinations whose vocabulary size reaches a set value or which have no highest occurrence frequency may be combined as the iteration end condition.
S502, carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as the minimum semantic unit of the large inquiry model.
In some implementations, the medical domain code word vector can be obtained by vector encoding the medical domain vocabulary, and the medical domain code word vector is used as the minimum semantic unit of the large inquiry model. The words can be vector coded by the large inquiry model to obtain coded word vectors.
Illustratively, before the minimum semantic unit of the large inquiry model is built, the chronic obstructive pneumonia may be classified into chronic obstructive pneumonia, and the chronic obstructive pneumonia may be changed into chronic obstructive pneumonia after the construction, so that the large inquiry model may be helpful for better understanding the meaning of the chronic obstructive pneumonia.
According to the training method of the large inquiry model, before the training corpus is input into the large inquiry model for training, the large inquiry model is trained on the basis of the minimum semantic unit and the training corpus by constructing the minimum semantic unit of the large inquiry model in the medical field, so that the large inquiry model can be helped to understand the semantics of the medical field, and the performance of the large inquiry model in the medical field is improved. And determining a loss function of the large inquiry model by performing cross entropy loss calculation on the large inquiry model. And adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 6 is a flowchart of a training method of a large inquiry model according to an embodiment of the present disclosure. As shown in fig. 6, the training method of the large inquiry model may include:
s601, acquiring multiple rounds of initial doctor-patient inquiry dialogues, and filtering the multiple rounds of initial doctor-patient inquiry dialogues to obtain multiple rounds of doctor-patient inquiry dialogues.
S602, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
S603, acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature.
S604, carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as the minimum semantic unit of the large inquiry model.
S605, inputting training corpus into the large inquiry model, and outputting doctor prediction answers of each round of patient input information.
S606, masking and shielding each round of patient input information, and only reserving doctor prediction answers of each round of patient input information.
S607, performing cross entropy loss calculation on doctor prediction answers and doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
And S608, adjusting the large inquiry model based on the loss function, and continuing to train the adjusted model until the training ending condition is met to obtain the target large inquiry model.
According to the training method of the large inquiry model, which is provided by the embodiment of the disclosure, multiple rounds of doctor-patient inquiry dialogues in the medical field are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 7 is a flow chart of a method for inquiry based on a large model according to an embodiment of the disclosure.
As shown in fig. 7, the large model-based inquiry method may include:
s701, calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model.
It should be noted that the target big inquiry model may be obtained by using the training method of the big inquiry model shown in fig. 1-6, which is not described herein.
In some implementations, the target questioning large model is guided to play a doctor by calling the target questioning large model and inputting play role prompt information into the target questioning large model, and the problem of the user is solved by using the professional knowledge in the medical field.
For example, as shown in fig. 8, a schematic diagram of a query using a target large model for query is shown, and a prompt word "please play a doctor" is input to guide the target large model for query to answer a user's question using the expertise of the medical field, or to ask the user's condition.
S702, inputting at least one time of inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each time of inquiry information until the inquiry is ended and the target inquiry large model is exited.
In some implementations, after the target large-scale query language model is prompted to play the role of a doctor based on the play role prompt information, one or more pieces of query information may be input to the target large-scale query model for describing the condition. The target inquiry large model generates corresponding inquiry replies based on inquiry information input by the user until the user finishes inquiry and exits the target inquiry large model.
Exemplary illustration, a schematic diagram of a interview using a target interview size model is shown in fig. 8. The right side is the inquiry information input by the user (patient), and the left side is the inquiry reply output by the target inquiry large model. The user inputs ' please play a doctor ' as playing role prompt information, the target inquiry large model receives the prompt information and plays the role of the doctor to have a good conversation with the user ', and I will play the doctor to chat with you. In the next chat process, i will substitute this role to chat with you. If you want to exit role playing, I can directly let me exit, after which I will stop playing roles. ". The user inputs inquiry information "your own, i have developed a red rash" and the target inquiry big model outputs inquiry replies of the inquiry information "when is a red rash developed? The user further inputs inquiry information of yesterday night bar, itching based on inquiry replies, and describes the illness state of the user. The target big model outputs a inquiry response "do you have other symptoms such as fever, headache, cough, diarrhea, etc. based on the inquiry information? The user inputs inquiry information "no symptoms" to reply to the target inquiry big model. The target big model outputs a query reply "what special food you have eaten recently or go to places that are not frequently? And the user inputs inquiry information of 'yesterday has spent a lot of seafood in the evening' to reply to the target inquiry big model. The target big model outputs a inquiry response based on inquiry information, "what drugs do you have been used to treat? The user inputs inquiry information "a little erythromycin is coated, and the target inquiry large model can output inquiry reply based on the information" according to your description, allergic dermatitis is possible ". I would prescribe some medications for you to treat. You need to pay much attention to diet and avoid exposure to allergens. "as a result of the current interrogation, the user may end the interrogation and exit the target interrogation large model.
According to the big model-based inquiry method provided by the embodiment of the disclosure, the target big inquiry model is called and plays roles based on the prompt information, so that the target big inquiry model can imitate doctors to perform free inquiry, a patient can be helped to know and become familiar with own illness, economic cost and time cost of both doctors and patients are saved, and the utilization rate of medical resources is improved.
A flow chart of the large model-based interrogation is shown in fig. 9. The training corpus of the large inquiry model is generated by acquiring multiple rounds of doctor-patient inquiry dialogues in the medical field and according to the prompt words of the large inquiry model and the multiple rounds of doctor-patient inquiry dialogues. Training the large inquiry model according to the training corpus, calculating a loss function of the large inquiry model, adjusting the large inquiry model based on the loss function, and continuously training the adjusted large inquiry model until the training ending condition is met to obtain the target large inquiry model. Furthermore, the inquiry can be performed based on the target inquiry large model, and inquiry replies matched with the inquiry information are output by the target inquiry large model through inputting the inquiry information.
FIG. 10 illustrates an application framework diagram based on a target interview big model. The user (patient) can input the inquiry information on the interactive interface and receive the inquiry reply output by the target inquiry large model, wherein the inquiry reply is displayed on the output interface. The output interface calls a model service through an application programming interface (Application Programming Interface, API), and a target inquiry large model is used for searching a database to acquire information matched with inquiry information as inquiry replies. Meanwhile, the target inquiry large model can also feed back and iterate based on the data, so that the performance of the model is further improved, and the use requirement of a user is met.
The embodiment of the disclosure is mainly applied to medical consultation, and the application scene comprises the steps of helping doctors to collect enough information in advance and helping the doctors to make decisions, so that the time of the doctors in the process of seeing the doctor is greatly reduced. Exemplary description:
1. pre-consultation scenario:
under the conditions of waiting in a hospital and an Internet hospital, a patient can acquire basic conditions of the patient in advance by carrying out API interaction with a target inquiry large model in a small program or hospital application program mode, and the condition of the patient can be quickly known when a doctor takes a doctor.
2. Inquiry/guide scenario:
under the condition that the patient does not visit, the patient can interact with the target inquiry large model in advance through the small program, the webpage end and the hospital end application program, so as to obtain the disease condition of the patient in advance, and judge whether to use the hospital and what department to diagnose and treat the patient.
Corresponding to the training method of the big inquiry model provided by the above embodiments, an embodiment of the present disclosure further provides a training device of the big inquiry model, and since the training device of the big inquiry model provided by the embodiment of the present disclosure corresponds to the training method of the big inquiry model provided by the above embodiments, the implementation of the training method of the big inquiry model is also applicable to the training device of the big inquiry model provided by the embodiment of the present disclosure, which is not described in detail in the following embodiments.
Fig. 11 is a schematic structural diagram of a training device of a big inquiry model according to an embodiment of the present disclosure.
As shown in fig. 11, a training apparatus 1100 of a large-scale inquiry model according to an embodiment of the present disclosure includes: an acquisition module 1101, a generation module 1102, a training module 1103 and an adjustment module 1104.
The acquiring module 1101 is configured to acquire a plurality of rounds of doctor-patient inquiry dialogues in the medical field, where the plurality of rounds of doctor-patient inquiry dialogues include a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
The generating module 1102 is configured to determine a prompt word of the big inquiry model, and generate a training corpus based on the prompt word and the multiple rounds of doctor-patient inquiry dialogue.
The training module 1103 is configured to input the training corpus into a large inquiry model, output a doctor prediction answer of each round of patient input information, and determine a loss function of the large inquiry model according to the doctor prediction answer of each round of patient input information and the doctor response information.
And the adjusting module 1104 is configured to adjust the big inquiry model based on the loss function, and continue training the adjusted model until the training end condition is met, thereby obtaining the target big inquiry model.
In one embodiment of the present disclosure, the training module 1103 is further configured to: masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information; and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles; obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data; and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: acquiring medical data from a clinical trial database; mining the knowledge graph of the medical data to obtain a medical knowledge graph; inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogs according to the case information.
In one embodiment of the present disclosure, the training module 1103 is further configured to: acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature; and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
In one embodiment of the present disclosure, the training module 1103 is further configured to: performing word segmentation on the medical document to obtain a word segmentation set of the medical document; aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word; based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set; and determining the vocabulary of the medical field based on the candidate word segmentation.
In one embodiment of the present disclosure, the training module 1103 is further configured to: judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not; and if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
In one embodiment of the present disclosure, the training module 1103 is further configured to: performing word segmentation on the medical document to obtain a word segmentation set of the medical document; performing character segmentation on the segmented words in the segmented word set to obtain a character sequence; and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
In one embodiment of the present disclosure, the training module 1103 is further configured to: acquiring the occurrence frequency of all adjacent character pairs from the character sequence; selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary; and (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
In one embodiment of the present disclosure, the adjusting module 1104 is further configured to: selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases; acquiring a reference inquiry result of an expert system on the candidate diseases; evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result; and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
According to the training device of the large inquiry model, which is provided by the embodiment of the disclosure, through acquiring multiple rounds of doctor-patient inquiry dialogues in the medical field and based on the prompt words of the large inquiry model, training corpus of the large inquiry model is generated. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
According to an embodiment of the present disclosure, the present disclosure further provides a large model-based interrogation device, which is configured to implement the large model-based interrogation method described above.
Fig. 12 is a schematic structural view of a large model-based interrogation apparatus according to a first embodiment of the present disclosure.
As shown in fig. 12, a large model-based interrogation apparatus 1200 of an embodiment of the present disclosure includes: call module 1201, inquiry module 1202.
The calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model.
The inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited.
According to the big model-based inquiry device provided by the embodiment of the disclosure, the target inquiry big model is called and plays roles based on the prompt information, so that the target inquiry big model can imitate doctors to perform free inquiry, a patient can be helped to know and become familiar with own illness, economic cost and time cost of both doctors and patients are saved, and the utilization rate of medical resources is improved.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 13 illustrates a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 13, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to computer programs/instructions stored in a Read Only Memory (ROM) 1302 or loaded from a storage unit 1306 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data required for the operation of the device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.
Various components in device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1301 performs the respective methods and processes described above, for example, a training method of a consultation large model. For example, in some embodiments, the training method of the interview simulator may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as in some embodiments of the storage unit 1306, part or all of the computer program/instructions may be loaded and/or installed onto the device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program/instructions are loaded into RAM 1303 and executed by computing unit 1301, one or more steps of the method of training a interview big model described above may be performed. Alternatively, in other embodiments, computing unit 1301 may be configured to perform the training method of the interview-big model in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs/instructions that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs/instructions running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.
Claims (29)
1. A method of training a large model for interrogation, wherein the method comprises:
acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information;
determining prompt words of a large inquiry model, and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue;
Inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and the doctor response information of each round of patient input information;
and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
2. The method of claim 1, wherein the determining a loss function of the large query model from the doctor's predicted answer and the doctor's answer information for each round of patient input information comprises:
masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information;
and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
3. The method of claim 1, wherein the acquiring multiple rounds of doctor-patient interview sessions for a medical field comprises:
and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
4. The method of claim 3, wherein the filtering the multiple rounds of initial doctor-patient interview sessions to obtain the multiple rounds of doctor-patient interview sessions comprises:
performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles;
obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data;
and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
5. The method of claim 3, wherein the acquiring multiple rounds of initial doctor-patient interview sessions comprises:
acquiring medical data from a clinical trial database;
mining the knowledge graph of the medical data to obtain a medical knowledge graph;
inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
6. The method of claim 3, wherein the acquiring multiple rounds of initial doctor-patient interview sessions comprises:
acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or the number of the groups of groups,
And acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogues according to the case information.
7. The method of claim 1, wherein the inputting the training corpus into a large questionnaire model prior to training further comprises:
acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature;
and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
8. The method of claim 7, wherein the obtaining the vocabulary of the medical domain based on the medical document comprises:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word;
based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set;
and determining the vocabulary of the medical field based on the candidate word segmentation.
9. The method of claim 8, wherein the determining the vocabulary of the medical field based on the candidate word segment comprises:
judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not;
And if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
10. The method of claim 7, wherein the obtaining the vocabulary of the medical domain based on the medical document comprises:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
performing character segmentation on the segmented words in the segmented word set to obtain a character sequence;
and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
11. The method of claim 10, wherein the encoding BBPE based on byte-level byte pairs clusters the characters in the character sequence with the characters in the initial vocabulary to obtain the vocabulary of the medical field, comprising:
acquiring the occurrence frequency of all adjacent character pairs from the character sequence;
selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary;
And (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
12. The method of claim 1, wherein after obtaining the target inquiry-size model, further comprising:
selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases;
acquiring a reference inquiry result of an expert system on the candidate diseases;
evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result;
and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
13. A large model-based interrogation method, wherein the method comprises:
calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model;
inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited;
Wherein the target inquiry large model is a large model trained by the training method according to any one of claims 1 to 12.
14. A training device for a large survey model, wherein the device comprises:
the acquisition module is used for acquiring a plurality of rounds of doctor-patient inquiry dialogues in the medical field, wherein the rounds of doctor-patient inquiry dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information;
the generation module is used for determining prompt words of the large inquiry model and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue;
the training module is used for inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and the doctor response information;
and the adjusting module is used for adjusting the big inquiry model based on the loss function, and continuously training the adjusted model until the training ending condition is met to obtain the target big inquiry model.
15. The apparatus of claim 14, wherein the training module is further to:
Masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information;
and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
16. The apparatus of claim 14, wherein the acquisition module is further configured to:
and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
17. The apparatus of claim 16, wherein the acquisition module is further configured to:
performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles;
obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data;
and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
18. The apparatus of claim 16, wherein the acquisition module is further configured to:
Acquiring medical data from a clinical trial database;
mining the knowledge graph of the medical data to obtain a medical knowledge graph;
inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
19. The apparatus of claim 16, wherein the acquisition module is further configured to:
acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or the number of the groups of groups,
and acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogues according to the case information.
20. The apparatus of claim 14, wherein the training module is further to:
acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature;
and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
21. The apparatus of claim 20, wherein the training module is further to:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word;
Based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set;
and determining the vocabulary of the medical field based on the candidate word segmentation.
22. The apparatus of claim 21, wherein the training module is further to:
judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not;
and if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
23. The apparatus of claim 20, wherein the training module is further to:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
performing character segmentation on the segmented words in the segmented word set to obtain a character sequence;
and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
24. The apparatus of claim 23, wherein the training module is further configured to:
acquiring the occurrence frequency of all adjacent character pairs from the character sequence;
selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary;
And (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
25. The apparatus of claim 14, wherein the adjustment module is further configured to:
selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases;
acquiring a reference inquiry result of an expert system on the candidate diseases;
evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result;
and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
26. A large model-based interrogation device, wherein the device comprises:
the calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model;
the inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited;
Wherein the target consultation large model is a large model trained by the training device according to any one of claims 14-25.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311758183.9A CN117747087A (en) | 2023-12-20 | 2023-12-20 | Training method of large inquiry model, inquiry method and device based on large inquiry model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311758183.9A CN117747087A (en) | 2023-12-20 | 2023-12-20 | Training method of large inquiry model, inquiry method and device based on large inquiry model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117747087A true CN117747087A (en) | 2024-03-22 |
Family
ID=90278875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311758183.9A Pending CN117747087A (en) | 2023-12-20 | 2023-12-20 | Training method of large inquiry model, inquiry method and device based on large inquiry model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117747087A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118013021A (en) * | 2024-04-08 | 2024-05-10 | 浙江口碑网络技术有限公司 | Medicine answering method, device, equipment and medium based on large language model |
-
2023
- 2023-12-20 CN CN202311758183.9A patent/CN117747087A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118013021A (en) * | 2024-04-08 | 2024-05-10 | 浙江口碑网络技术有限公司 | Medicine answering method, device, equipment and medium based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111444709B (en) | Text classification method, device, storage medium and equipment | |
WO2021233112A1 (en) | Multimodal machine learning-based translation method, device, equipment, and storage medium | |
EP4060565A1 (en) | Method and apparatus for acquiring pre-trained model | |
Dharwadkar et al. | A medical chatbot | |
CN112131366B (en) | Method, device and storage medium for training text classification model and text classification | |
Shou et al. | Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis | |
WO2022007823A1 (en) | Text data processing method and device | |
CN108491486B (en) | Method, device, terminal equipment and storage medium for simulating patient inquiry dialogue | |
CN110427486B (en) | Body condition text classification method, device and equipment | |
US11934790B2 (en) | Neural network training method and apparatus, semantic classification method and apparatus and medium | |
CN111274397B (en) | Method and device for establishing entity relation detection model | |
CN113407677B (en) | Method, apparatus, device and storage medium for evaluating consultation dialogue quality | |
CN117747087A (en) | Training method of large inquiry model, inquiry method and device based on large inquiry model | |
CN112100406A (en) | Data processing method, device, equipment and medium | |
CN110399472A (en) | Reminding method, device, computer equipment and storage medium are putd question in interview | |
CN110929532B (en) | Data processing method, device, equipment and storage medium | |
CN115374771A (en) | Text label determination method and device | |
CN110991183A (en) | Method, device, equipment and storage medium for determining predicate of problem | |
Bai et al. | Ofasys: A multi-modal multi-task learning system for building generalist models | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN117422067A (en) | Information processing method, information processing device, electronic equipment and storage medium | |
CN116894498A (en) | Training method, strategy identification method, device and equipment of network model | |
WO2023124837A1 (en) | Inquiry processing method and apparatus, device, and storage medium | |
CN116994695A (en) | Training method, device, equipment and storage medium of report generation model | |
CN113761899A (en) | Medical text generation method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |