CN117747087A - Training method of large inquiry model, inquiry method and device based on large inquiry model - Google Patents

Training method of large inquiry model, inquiry method and device based on large inquiry model Download PDF

Info

Publication number
CN117747087A
CN117747087A CN202311758183.9A CN202311758183A CN117747087A CN 117747087 A CN117747087 A CN 117747087A CN 202311758183 A CN202311758183 A CN 202311758183A CN 117747087 A CN117747087 A CN 117747087A
Authority
CN
China
Prior art keywords
inquiry
model
doctor
patient
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311758183.9A
Other languages
Chinese (zh)
Inventor
郭佳昌
夏源
陈俊
黄海峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311758183.9A priority Critical patent/CN117747087A/en
Publication of CN117747087A publication Critical patent/CN117747087A/en
Pending legal-status Critical Current

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The disclosure provides a training method of a big inquiry model, a big inquiry method and a device based on the big inquiry model, and relates to the technical field of artificial intelligence. The specific implementation scheme is as follows: acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue; inputting training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and doctor response information of each round of patient input information; and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.

Description

Training method of large inquiry model, inquiry method and device based on large inquiry model
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to a training method of a large inquiry model, and an inquiry method and device based on the large model.
Background
In the current medical health industry, effective communication between doctors and patients is critical for accurate diagnosis and providing high quality medical services. To address this problem, physicians can be assisted in high-efficiency communication with patients based on large models.
Disclosure of Invention
The present disclosure provides a training method for a big model of a interview, a big model-based interview method, an apparatus, an electronic device, a storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a training method of a large inquiry model, including: acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; determining prompt words of a large inquiry model, and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue; inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and the doctor response information of each round of patient input information; and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
According to another aspect of the present disclosure, there is provided a large model-based inquiry method including: calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model; inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited; the target big inquiry model is a big model obtained by training by the training method of the big inquiry model.
According to another aspect of the present disclosure, there is provided a training apparatus of a large inquiry model, including: the acquisition module is used for acquiring a plurality of rounds of doctor-patient inquiry dialogues in the medical field, wherein the rounds of doctor-patient inquiry dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information; the generation module is used for determining prompt words of the large inquiry model and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue; the training module is used for inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and the doctor response information; and the adjusting module is used for adjusting the big inquiry model based on the loss function, and continuously training the adjusted model until the training ending condition is met to obtain the target big inquiry model.
According to another aspect of the present disclosure, there is provided a large model-based inquiry apparatus including: the calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model; the inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited; the target big inquiry model is a big model obtained by training by the training method of the big inquiry model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method for training a big model for interviewing and the method for interviewing based on a big model according to the embodiment of the above aspect.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of training a big model for interviewing, the big model based method of interviewing according to the embodiment of the above aspect.
According to another aspect of the disclosure, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the method for training a big model for interviewing according to the embodiment of the above aspect, and the method for interviewing based on the big model.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic flow chart of a training method of a large inquiry model according to an embodiment of the disclosure;
FIG. 2 is a flow chart of another method for training a large inquiry model according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a multi-round doctor-patient inquiry dialogue acquisition provided in an embodiment of the disclosure;
FIG. 4 is a schematic diagram of masking according to an embodiment of the present disclosure;
FIG. 5 is a flowchart illustrating a process of determining a minimum semantic unit of a large inquiry model in a training method of the large inquiry model according to an embodiment of the present disclosure;
FIG. 6 is a flow chart of another method for training a large inquiry model according to an embodiment of the present disclosure;
FIG. 7 is a schematic flow chart of a method for interviewing based on a large model according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a interview using a target interview size model provided by an embodiment of the present disclosure;
FIG. 9 is a schematic flow chart of a large model-based interrogation provided by an embodiment of the present disclosure;
FIG. 10 is a schematic diagram of an application framework based on a target consultation large model provided by an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of a training device for a large inquiry model according to an embodiment of the present disclosure;
fig. 12 is a schematic structural diagram of a large model-based interrogation device according to an embodiment of the present disclosure;
fig. 13 is a block diagram of an electronic device for implementing a training method of a large-scale inquiry model of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Artificial intelligence (Artificial Intelligence, AI) is a piece of technical science that studies, develops theories, methods, techniques and application systems for simulating, extending and expanding human intelligence. At present, the AI technology has the advantages of high automation degree, high accuracy and low cost, and is widely applied.
Natural language processing (Natural Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. The natural language processing is mainly applied to the aspects of machine translation, public opinion monitoring, automatic abstracting, viewpoint extraction, text classification, question answering, text semantic comparison, voice recognition and the like.
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and is an inherent rule and expression hierarchy of Learning sample data, so that a Machine can analyze Learning ability like a person, can recognize data such as characters, images and sounds, and is widely applied to speech and image recognition.
Large models (Large models) refer to models with a Large number of parameters and complex structures in the field of machine learning and artificial intelligence. These models typically require extensive computational resources to train and deploy, and are capable of processing and understanding more, more complex data. The large model can be applied to a plurality of application fields such as automatic writing, chat robots, virtual assistants, voice assistants, automatic translation and the like.
It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present disclosure are all authorized by the user or sufficiently authorized by the parties, and the collection, use, and processing of the related data requires compliance with the relevant laws and regulations and standards, without violating the public welfare.
The disclosed embodiments are applicable to a variety of medical and health institutions, including but not limited to hospitals, clinics, telemedicine service platforms, and the like. 1. In basic medical institutions and resource deficiency areas, doctors can be assisted to carry out preliminary screening and diagnosis, and standardized and specialized inquiry suggestions are provided. 2. In a hospital and a remote medical service platform, doctors can be helped to collect patient information and basic symptoms in advance in the pre-diagnosis stage, and the doctors are helped to carry out preliminary inquiry and form professional judgment. 3. For medical education training, the medical students and primary doctors can be assisted to carry out clinical communication and case analysis training, and the professional skills and clinical experience of the medical students and primary doctors are improved.
Fig. 1 is a flow chart of a training method of a large inquiry model according to an embodiment of the disclosure. As shown in fig. 1, the training method of the large inquiry model may include:
s101, acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
It should be noted that, the execution body of the training method of the big inquiry model in the embodiment of the present disclosure may be a hardware device with information processing capability and/or software necessary for driving the hardware device to work. Alternatively, the execution body may include a server, a user terminal, and other intelligent devices. Optionally, the user terminal includes, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, etc. Alternatively, the server may be a cloud server, a server of a distributed system, a server combined with a blockchain, or the like. The embodiments of the present disclosure are not particularly limited.
In some implementations, multiple rounds of doctor-patient interview sessions may be based on a knowledge base of the medical field. Alternatively, multiple rounds of doctor-patient interview sessions may be acquired based on medical knowledge graph (Medical Knowledge Graph, MKG) data, patient case data, an on-line interview system, and the like. Multiple rounds of physician-to-patient consultation dialogs may also be generated based on the existing large model. The multi-round doctor-patient inquiry dialogue comprises a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
S102, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
In some implementations, the ability of the large inquiry model to be specifically positioned and evoked in the medical field can be aided by preset prompt words, and training corpus of the large inquiry model is generated according to multiple rounds of doctor-patient inquiry dialogue and prompt words.
Optionally, the training corpus is obtained by combining the prompt words of the large inquiry model and multiple rounds of doctor-patient inquiry dialogues. The prompt term may be determined based on the purpose of the large inquiry model. For example, if a large model is used for a consultation, the large model may be used as a doctor, and "please reply as a doctor" may be used as a prompt word for the large model.
Illustratively, table 1 shows one example of a training corpus. Based on the prompt word "please help as a doctor to make a professional inquiry to the patient, such as please reply to ask you what is uncomfortable? And carrying out multi-round doctor-patient inquiry dialogue to obtain training corpus.
TABLE 1 example of training corpus
S103, inputting training corpus into the large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and doctor response information.
In some implementations, after the training corpus is obtained, the training corpus is input into the large inquiry model for training, and the large inquiry model generates corresponding doctor prediction answers based on each round of patient input information in the training corpus, so that the patient input information and the doctor prediction answers can be used as a large inquiry model prediction inquiry dialogue.
Further, in order to adjust the big inquiry model so that the big inquiry model meets the requirement of high performance, the loss function of the big inquiry model can be determined by calculating the loss function of the doctor prediction answer and the doctor response information. Alternatively, computation of cross entropy loss may be performed on the doctor's predicted answer and the doctor's answer information, determining the difference in probability distribution of the interview-big model predicted answer and the answer information probability distribution.
S104, adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
In some implementations, to prevent overfitting of the loss function, regularization may be combined and gradients of the loss function calculated, with model parameters of the interview large model being adjusted based on the gradients. And further training the adjusted model until the training ending condition is met to obtain the target consultation large model.
Optionally, setting the training times of the large inquiry model as a training ending condition, and determining that the training ending condition is met when the training times of the large inquiry model reach the set times to obtain the target large inquiry model. And the training ending condition can be determined based on the accuracy of the prediction reply output by the large inquiry model, and when the accuracy of the prediction reply is greater than a set threshold, the training ending condition is determined to be met, so that the target large inquiry model is obtained.
According to the training method of the large inquiry model, which is provided by the embodiment of the disclosure, multiple rounds of doctor-patient inquiry dialogues in the medical field are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 2 is a flow chart of a training method of a large inquiry model according to an embodiment of the disclosure. As shown in fig. 2, the training method of the large inquiry model may include:
S201, acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
In some implementations, to improve reliability of the doctor-patient interview session, errors in the interview session are reduced by obtaining multiple rounds of initial doctor-patient interview sessions and filtering the multiple rounds of initial doctor-patient interview sessions to obtain multiple rounds of doctor-patient interview sessions.
Optionally, the roles of the doctor and the patient can be distinguished from each other by using multiple rounds of initial doctor-patient consultation conversations, so as to obtain the conversation data of each role, and the roles can be distinguished by using a regular expression. Further, scores of doctor dialogue data in multiple dimensions are obtained, the scores of the doctor dialogue data in the multiple dimensions are weighted to obtain comprehensive scores of the doctor dialogue data, and then multiple rounds of initial doctor-patient inquiry dialogues are filtered based on the comprehensive scores of the doctor dialogue data to obtain multiple rounds of doctor-patient inquiry dialogues.
Alternatively, the doctor session data may be scored manually from multiple dimensions, and the doctor session data may be scored automatically by a large model to obtain a composite score for the doctor session data. Multiple rounds of initial doctor-patient interview sessions in which doctor session data with a composite score greater than a set threshold are located may be selected as multiple rounds of doctor-patient interview sessions.
In some implementations, multiple rounds of initial doctor-patient interview sessions may be acquired from the doctor knowledge graph, the online interview session, and the patient's case information to obtain multiple rounds of doctor-patient interview sessions with high reliability, which may help to train the reliability of the interview large model.
Alternatively, medical data may be obtained from a clinical trial database, and knowledge-graph mining may be performed on the medical data to obtain a medical knowledge-graph. And then the medical knowledge graph can be input into a large language model to generate multiple rounds of initial doctor-patient inquiry dialogs.
An example of a medical knowledge graph for example of hyperlipidemia is shown in table 2 below:
TABLE 2
Wherein the data listed in table 2 are the data items required to generate multiple rounds of initial doctor-patient interview sessions.
Alternatively, an online inquiry dialogue may be acquired, and multiple rounds of initial doctor-patient inquiry dialogues may be acquired according to the online inquiry dialogues, where the online inquiry dialogues may be used as the multiple rounds of initial doctor-patient inquiry dialogues. Case information can also be acquired, and multiple rounds of initial doctor-patient consultation dialogs can be acquired according to the case information. The answers to the questions and the patient of the doctor can be obtained from the case information, thereby generating a plurality of rounds of initial doctor-patient inquiry dialogues.
Illustratively, table 3 shows an example of data for an online interview session, where the session information in table 3 may be used as a multi-round initial doctor-patient interview session.
TABLE 3 Table 3
Table 4 shows an example of case information based on which a physician's question and patient's answer may be obtained, thereby generating multiple rounds of initial physician-to-patient interview sessions.
TABLE 4 Table 4
For example, according to the complaint information "cough for 2 days, heat generation by today", a doctor's question "what is the condition appearing? "and then generates a round of initial doctor-patient inquiry dialogue: the doctor: what is the condition present? ", patient: cough for 2 days, and fever caused by this.
A flow chart of a multiple round doctor-patient inquiry session is acquired as shown in fig. 3. Multiple rounds of doctor-patient consultation dialogs can be obtained by obtaining multiple rounds of initial doctor-patient consultation dialogs and performing role segmentation and quality filtering on the multiple rounds of initial doctor-patient consultation dialogs. The method comprises the steps of generating multiple rounds of initial doctor-patient inquiry dialogs based on a medical knowledge graph, acquiring the multiple rounds of initial doctor-patient inquiry dialogs based on an online inquiry dialogs, and acquiring the multiple rounds of initial doctor-patient inquiry dialogs based on case information.
Optionally, regular expressions may be used for multiple rounds of initial doctor-patient inquiry sessions to perform role segmentation to obtain patient session data and doctor session data, respectively. And scoring doctor dialogue data, and screening the round of initial doctor-patient inquiry dialogues based on scoring results to obtain a plurality of rounds of doctor-patient inquiry dialogues.
S202, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
S203, inputting the training corpus into the large inquiry model and outputting doctor prediction answers of each round of patient input information.
The relevant content of steps S202-S203 can be seen in the above embodiments, and will not be described here again.
S204, masking and shielding each round of patient input information, and only reserving doctor prediction answers of each round of patient input information.
In some implementations, to reduce training time of the large inquiry model and improve training efficiency of the large inquiry model, each round of patient input information may be masked based on a prediction dialogue of the patient and the doctor output by the large inquiry model, and only doctor prediction answers of each round of patient input information may be retained.
Alternatively, a mask may be used to hide patient input information in the patient to physician prediction dialogue of the output of the interview simulator, with only physician prediction answers for each round of patient input information displayed.
As shown in the schematic diagram of masking in fig. 4, the gray part is the masking part and the white part is the reserved part. Fig. 4a shows a schematic diagram before mask occlusion, in which only the last doctor prediction answer, namely doctor 3, is displayed in the prediction dialogue between the patient and the doctor, which is output by the interview big model. Fig. 4b shows that each round of patient input information is masked and masked, and doctor's predicted answers to each round of patient input information, namely doctor 1, doctor 2 and doctor 3, are retained.
S205, performing cross entropy loss calculation on doctor prediction answers and doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
In some implementations, parameters of the large interview model may be optimized and performance improved by calculating a loss function between doctor's predicted answers and doctor's response information for each round of patient input information, measuring the error between the probability distribution of the predicted results of the large interview model and the probability distribution of the real data.
Alternatively, the loss function of the big inquiry model may be determined by determining the probability distribution of the doctor's predicted response and the probability distribution of the doctor's response information, and calculating the cross entropy loss between the doctor's predicted response and the doctor's response information. The calculated cross entropy loss formula is as follows:
L=-∑ i y i log(p i ) (1)
wherein L represents a loss function, y i Representing probability distribution of doctor response information, p i Representing the probability distribution of the doctor's predicted response.
S206, adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
The relevant content of step S206 may be referred to the above embodiments, and will not be described herein.
In some implementations, after the target large-scale inquiry model is obtained, the target large-scale inquiry model may be tested. The target large inquiry model can be used for inquiry, inquiry results are obtained, and evaluation test is carried out on the target large inquiry model.
Optionally, a plurality of specified candidate diseases can be selected, and the candidate diseases are queried based on the target large query model, so that a predicted query result of the target large query model on the candidate diseases is obtained. And further, the reference inquiry result of the expert system on the candidate diseases can be obtained, and the target inquiry large model is evaluated based on the predicted inquiry result and the reference inquiry result.
Optionally, if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, fine tuning is continued on the target large inquiry model. If the evaluation result indicates that the target large inquiry model passes the evaluation, the target large inquiry model can be put into use so as to assist doctors in inquiry.
According to the training method of the large inquiry model, multiple rounds of initial doctor-patient inquiry dialogues are obtained, multiple rounds of initial doctor-patient inquiry dialogues are screened, multiple rounds of doctor-patient inquiry dialogues are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model, so that the large inquiry model can learn knowledge reserve and logic reasoning capability when facing different symptoms and different speaking modes, and the large inquiry model can accurately inquire the problem with identification significance. And training the large inquiry model based on the training corpus, and determining a loss function of the large inquiry model by performing cross entropy loss calculation on the large inquiry model. And adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
On the basis of the above embodiments, the process of determining the minimum semantic unit of the big inquiry model according to the embodiments of the present disclosure may be explained, as shown in fig. 5, and may include:
s501, acquiring medical literature and acquiring vocabulary in the medical field based on the medical literature.
It can be understood that, because the big inquiry model needs to be dedicated to the medical field, a semantic unit of the medical field needs to be constructed, and knowledge of the medical field and the big inquiry model are combined, so that the big inquiry model is trained, the big inquiry model can be helped to understand the semantics of the medical field, and the performance of the big inquiry model in the medical field is improved.
In some implementations, medical documents, such as medical guidelines, medical articles, and the like, may be obtained based on academic search engines, medical databases, academic libraries, and the like. And then, the medical literature can be segmented to acquire the vocabulary of the medical field, so that the large inquiry model can understand the vocabulary of the medical field, and the performance of the large inquiry model is improved.
In some implementations, the vocabulary of the medical field may be determined by counting words of high frequency of occurrence in the medical literature. Alternatively, a word segmentation set of the medical document can be obtained by segmenting the medical document, and for each word segment in the word segmentation set, a Term Frequency (TF) -inverse text Frequency index (Inverse Document Frequency, IDF) of the segmented word is obtained.
Alternatively, the TF of a term may be determined based on the total number of terms in the set of terms and the number of times any term occurs in the set of terms. The formula for calculating TF of any word is as follows:
wherein TF represents word frequency of the word segmentation, N represents the number of times any word segmentation occurs in the word segmentation set, and J represents the total number of the word segmentation in the word segmentation set.
Further, the number of documents containing the word and the total number of medical documents are obtained, the IDF of the word is determined according to the number of documents and the total number of medical documents, and the formula for calculating the IDF of the word is as follows:
wherein IDF represents inverse text frequency index, M represents total number of medical documents, and I represents number of documents.
Further, based on the TF and IDF of any word, the TF-IDF of the word can be determined. The formula for calculating the TF-IDF of the segmentation is as follows:
TF-IDF=TF*IDF (4)
where TF represents the word frequency of the word segmentation and IDF represents the inverse text frequency index.
Alternatively, candidate tokens may be obtained from a collection of tokens based on the TF-IDF of the token, and the vocabulary of the medical domain may be determined based on the candidate tokens. A TF-IDF threshold of the word segmentation may be set, and a word segment greater than or equal to the TF-IDF threshold is selected from the word segmentation set as a candidate word segment.
Alternatively, it may be determined whether there is a duplication of the candidate word with the words in the initial vocabulary. If the candidate word is not repeated with the word in the vocabulary, the candidate word is updated to the initial vocabulary as the vocabulary of the medical field, so that the large model is helped to understand the semantics of the medical field, and the performance of the large model in the medical field is improved.
In some implementations, words in the medical document may also be clustered based on the character sequence to obtain a vocabulary for the medical field. The method comprises the steps of segmenting a medical document to obtain a segmented set of the medical document, and performing character segmentation on segmented words in the segmented set to obtain a character sequence.
Further, based on byte-level byte pairs, codes (Byte Level Byte Pair Encoder, BBPE) are used for clustering the characters in the character sequence and the characters in the initial vocabulary, so that the vocabulary in the medical field is obtained. The characters in the initial vocabulary may be clustered based on the frequency of occurrence of adjacent character pairs.
Optionally, the occurrence frequencies of all adjacent character pairs are obtained from the character sequence, adjacent characters with the highest occurrence frequency are selected from all adjacent character pairs to be combined, a combined character segment is obtained, and the combined character segment is combined with the initial vocabulary.
Illustratively, the frequency of occurrence of all adjacent character pairs is calculated to be f (C 1 ,C 2 ) And selecting adjacent characters with highest occurrence frequency from the characters to combine to obtain a combined character segment as The combined character segment is +.>The new token m is replaced, and the initial vocabulary V is merged with the combined character segment v=v u { m }.
Further, the process is repeated until the iteration ending condition is met, and a target vocabulary is obtained, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field. Knowledge and semantics of the medical field can be better understood based on the target vocabulary inquiry large model. Alternatively, character combinations whose vocabulary size reaches a set value or which have no highest occurrence frequency may be combined as the iteration end condition.
S502, carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as the minimum semantic unit of the large inquiry model.
In some implementations, the medical domain code word vector can be obtained by vector encoding the medical domain vocabulary, and the medical domain code word vector is used as the minimum semantic unit of the large inquiry model. The words can be vector coded by the large inquiry model to obtain coded word vectors.
Illustratively, before the minimum semantic unit of the large inquiry model is built, the chronic obstructive pneumonia may be classified into chronic obstructive pneumonia, and the chronic obstructive pneumonia may be changed into chronic obstructive pneumonia after the construction, so that the large inquiry model may be helpful for better understanding the meaning of the chronic obstructive pneumonia.
According to the training method of the large inquiry model, before the training corpus is input into the large inquiry model for training, the large inquiry model is trained on the basis of the minimum semantic unit and the training corpus by constructing the minimum semantic unit of the large inquiry model in the medical field, so that the large inquiry model can be helped to understand the semantics of the medical field, and the performance of the large inquiry model in the medical field is improved. And determining a loss function of the large inquiry model by performing cross entropy loss calculation on the large inquiry model. And adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 6 is a flowchart of a training method of a large inquiry model according to an embodiment of the present disclosure. As shown in fig. 6, the training method of the large inquiry model may include:
s601, acquiring multiple rounds of initial doctor-patient inquiry dialogues, and filtering the multiple rounds of initial doctor-patient inquiry dialogues to obtain multiple rounds of doctor-patient inquiry dialogues.
S602, determining prompt words of the large inquiry model, and generating training corpus based on the prompt words and multiple rounds of doctor-patient inquiry dialogue.
S603, acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature.
S604, carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as the minimum semantic unit of the large inquiry model.
S605, inputting training corpus into the large inquiry model, and outputting doctor prediction answers of each round of patient input information.
S606, masking and shielding each round of patient input information, and only reserving doctor prediction answers of each round of patient input information.
S607, performing cross entropy loss calculation on doctor prediction answers and doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
And S608, adjusting the large inquiry model based on the loss function, and continuing to train the adjusted model until the training ending condition is met to obtain the target large inquiry model.
According to the training method of the large inquiry model, which is provided by the embodiment of the disclosure, multiple rounds of doctor-patient inquiry dialogues in the medical field are obtained, and training corpus of the large inquiry model is generated based on prompt words of the large inquiry model. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
Fig. 7 is a flow chart of a method for inquiry based on a large model according to an embodiment of the disclosure.
As shown in fig. 7, the large model-based inquiry method may include:
s701, calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model.
It should be noted that the target big inquiry model may be obtained by using the training method of the big inquiry model shown in fig. 1-6, which is not described herein.
In some implementations, the target questioning large model is guided to play a doctor by calling the target questioning large model and inputting play role prompt information into the target questioning large model, and the problem of the user is solved by using the professional knowledge in the medical field.
For example, as shown in fig. 8, a schematic diagram of a query using a target large model for query is shown, and a prompt word "please play a doctor" is input to guide the target large model for query to answer a user's question using the expertise of the medical field, or to ask the user's condition.
S702, inputting at least one time of inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each time of inquiry information until the inquiry is ended and the target inquiry large model is exited.
In some implementations, after the target large-scale query language model is prompted to play the role of a doctor based on the play role prompt information, one or more pieces of query information may be input to the target large-scale query model for describing the condition. The target inquiry large model generates corresponding inquiry replies based on inquiry information input by the user until the user finishes inquiry and exits the target inquiry large model.
Exemplary illustration, a schematic diagram of a interview using a target interview size model is shown in fig. 8. The right side is the inquiry information input by the user (patient), and the left side is the inquiry reply output by the target inquiry large model. The user inputs ' please play a doctor ' as playing role prompt information, the target inquiry large model receives the prompt information and plays the role of the doctor to have a good conversation with the user ', and I will play the doctor to chat with you. In the next chat process, i will substitute this role to chat with you. If you want to exit role playing, I can directly let me exit, after which I will stop playing roles. ". The user inputs inquiry information "your own, i have developed a red rash" and the target inquiry big model outputs inquiry replies of the inquiry information "when is a red rash developed? The user further inputs inquiry information of yesterday night bar, itching based on inquiry replies, and describes the illness state of the user. The target big model outputs a inquiry response "do you have other symptoms such as fever, headache, cough, diarrhea, etc. based on the inquiry information? The user inputs inquiry information "no symptoms" to reply to the target inquiry big model. The target big model outputs a query reply "what special food you have eaten recently or go to places that are not frequently? And the user inputs inquiry information of 'yesterday has spent a lot of seafood in the evening' to reply to the target inquiry big model. The target big model outputs a inquiry response based on inquiry information, "what drugs do you have been used to treat? The user inputs inquiry information "a little erythromycin is coated, and the target inquiry large model can output inquiry reply based on the information" according to your description, allergic dermatitis is possible ". I would prescribe some medications for you to treat. You need to pay much attention to diet and avoid exposure to allergens. "as a result of the current interrogation, the user may end the interrogation and exit the target interrogation large model.
According to the big model-based inquiry method provided by the embodiment of the disclosure, the target big inquiry model is called and plays roles based on the prompt information, so that the target big inquiry model can imitate doctors to perform free inquiry, a patient can be helped to know and become familiar with own illness, economic cost and time cost of both doctors and patients are saved, and the utilization rate of medical resources is improved.
A flow chart of the large model-based interrogation is shown in fig. 9. The training corpus of the large inquiry model is generated by acquiring multiple rounds of doctor-patient inquiry dialogues in the medical field and according to the prompt words of the large inquiry model and the multiple rounds of doctor-patient inquiry dialogues. Training the large inquiry model according to the training corpus, calculating a loss function of the large inquiry model, adjusting the large inquiry model based on the loss function, and continuously training the adjusted large inquiry model until the training ending condition is met to obtain the target large inquiry model. Furthermore, the inquiry can be performed based on the target inquiry large model, and inquiry replies matched with the inquiry information are output by the target inquiry large model through inputting the inquiry information.
FIG. 10 illustrates an application framework diagram based on a target interview big model. The user (patient) can input the inquiry information on the interactive interface and receive the inquiry reply output by the target inquiry large model, wherein the inquiry reply is displayed on the output interface. The output interface calls a model service through an application programming interface (Application Programming Interface, API), and a target inquiry large model is used for searching a database to acquire information matched with inquiry information as inquiry replies. Meanwhile, the target inquiry large model can also feed back and iterate based on the data, so that the performance of the model is further improved, and the use requirement of a user is met.
The embodiment of the disclosure is mainly applied to medical consultation, and the application scene comprises the steps of helping doctors to collect enough information in advance and helping the doctors to make decisions, so that the time of the doctors in the process of seeing the doctor is greatly reduced. Exemplary description:
1. pre-consultation scenario:
under the conditions of waiting in a hospital and an Internet hospital, a patient can acquire basic conditions of the patient in advance by carrying out API interaction with a target inquiry large model in a small program or hospital application program mode, and the condition of the patient can be quickly known when a doctor takes a doctor.
2. Inquiry/guide scenario:
under the condition that the patient does not visit, the patient can interact with the target inquiry large model in advance through the small program, the webpage end and the hospital end application program, so as to obtain the disease condition of the patient in advance, and judge whether to use the hospital and what department to diagnose and treat the patient.
Corresponding to the training method of the big inquiry model provided by the above embodiments, an embodiment of the present disclosure further provides a training device of the big inquiry model, and since the training device of the big inquiry model provided by the embodiment of the present disclosure corresponds to the training method of the big inquiry model provided by the above embodiments, the implementation of the training method of the big inquiry model is also applicable to the training device of the big inquiry model provided by the embodiment of the present disclosure, which is not described in detail in the following embodiments.
Fig. 11 is a schematic structural diagram of a training device of a big inquiry model according to an embodiment of the present disclosure.
As shown in fig. 11, a training apparatus 1100 of a large-scale inquiry model according to an embodiment of the present disclosure includes: an acquisition module 1101, a generation module 1102, a training module 1103 and an adjustment module 1104.
The acquiring module 1101 is configured to acquire a plurality of rounds of doctor-patient inquiry dialogues in the medical field, where the plurality of rounds of doctor-patient inquiry dialogues include a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information.
The generating module 1102 is configured to determine a prompt word of the big inquiry model, and generate a training corpus based on the prompt word and the multiple rounds of doctor-patient inquiry dialogue.
The training module 1103 is configured to input the training corpus into a large inquiry model, output a doctor prediction answer of each round of patient input information, and determine a loss function of the large inquiry model according to the doctor prediction answer of each round of patient input information and the doctor response information.
And the adjusting module 1104 is configured to adjust the big inquiry model based on the loss function, and continue training the adjusted model until the training end condition is met, thereby obtaining the target big inquiry model.
In one embodiment of the present disclosure, the training module 1103 is further configured to: masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information; and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles; obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data; and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: acquiring medical data from a clinical trial database; mining the knowledge graph of the medical data to obtain a medical knowledge graph; inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
In one embodiment of the present disclosure, the obtaining module 1101 is further configured to: acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogs according to the case information.
In one embodiment of the present disclosure, the training module 1103 is further configured to: acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature; and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
In one embodiment of the present disclosure, the training module 1103 is further configured to: performing word segmentation on the medical document to obtain a word segmentation set of the medical document; aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word; based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set; and determining the vocabulary of the medical field based on the candidate word segmentation.
In one embodiment of the present disclosure, the training module 1103 is further configured to: judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not; and if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
In one embodiment of the present disclosure, the training module 1103 is further configured to: performing word segmentation on the medical document to obtain a word segmentation set of the medical document; performing character segmentation on the segmented words in the segmented word set to obtain a character sequence; and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
In one embodiment of the present disclosure, the training module 1103 is further configured to: acquiring the occurrence frequency of all adjacent character pairs from the character sequence; selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary; and (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
In one embodiment of the present disclosure, the adjusting module 1104 is further configured to: selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases; acquiring a reference inquiry result of an expert system on the candidate diseases; evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result; and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
According to the training device of the large inquiry model, which is provided by the embodiment of the disclosure, through acquiring multiple rounds of doctor-patient inquiry dialogues in the medical field and based on the prompt words of the large inquiry model, training corpus of the large inquiry model is generated. And training the large inquiry model based on the training corpus, calculating a loss function of the large inquiry model, and adjusting the large inquiry model based on the loss function until a target large inquiry model is obtained. The understanding capability of the target inquiry large model to the medical field can be improved, so that the target inquiry large model can simulate doctors to conduct free inquiry, can adapt to various complex and changeable conditions, and has better flexibility and expandability.
According to an embodiment of the present disclosure, the present disclosure further provides a large model-based interrogation device, which is configured to implement the large model-based interrogation method described above.
Fig. 12 is a schematic structural view of a large model-based interrogation apparatus according to a first embodiment of the present disclosure.
As shown in fig. 12, a large model-based interrogation apparatus 1200 of an embodiment of the present disclosure includes: call module 1201, inquiry module 1202.
The calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model.
The inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited.
According to the big model-based inquiry device provided by the embodiment of the disclosure, the target inquiry big model is called and plays roles based on the prompt information, so that the target inquiry big model can imitate doctors to perform free inquiry, a patient can be helped to know and become familiar with own illness, economic cost and time cost of both doctors and patients are saved, and the utilization rate of medical resources is improved.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 13 illustrates a schematic block diagram of an example electronic device 1300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 13, the electronic device 1300 includes a computing unit 1301 that can perform various appropriate actions and processes according to computer programs/instructions stored in a Read Only Memory (ROM) 1302 or loaded from a storage unit 1306 into a Random Access Memory (RAM) 1303. In the RAM 1303, various programs and data required for the operation of the device 1300 can also be stored. The computing unit 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304.
Various components in device 1300 are connected to I/O interface 1305, including: an input unit 1306 such as a keyboard, a mouse, or the like; an output unit 1307 such as various types of displays, speakers, and the like; storage unit 1308, such as a magnetic disk, optical disk, etc.; and a communication unit 1309 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 1309 allows the device 1300 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 1301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1301 performs the respective methods and processes described above, for example, a training method of a consultation large model. For example, in some embodiments, the training method of the interview simulator may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as in some embodiments of the storage unit 1306, part or all of the computer program/instructions may be loaded and/or installed onto the device 1300 via the ROM 1302 and/or the communication unit 1309. When the computer program/instructions are loaded into RAM 1303 and executed by computing unit 1301, one or more steps of the method of training a interview big model described above may be performed. Alternatively, in other embodiments, computing unit 1301 may be configured to perform the training method of the interview-big model in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs/instructions that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs/instructions running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (29)

1. A method of training a large model for interrogation, wherein the method comprises:
acquiring a plurality of rounds of doctor-patient consultation dialogues in the medical field, wherein the plurality of rounds of doctor-patient consultation dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information;
determining prompt words of a large inquiry model, and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue;
Inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers and the doctor response information of each round of patient input information;
and adjusting the large inquiry model based on the loss function, and continuing training the adjusted model until the training ending condition is met to obtain the target large inquiry model.
2. The method of claim 1, wherein the determining a loss function of the large query model from the doctor's predicted answer and the doctor's answer information for each round of patient input information comprises:
masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information;
and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
3. The method of claim 1, wherein the acquiring multiple rounds of doctor-patient interview sessions for a medical field comprises:
and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
4. The method of claim 3, wherein the filtering the multiple rounds of initial doctor-patient interview sessions to obtain the multiple rounds of doctor-patient interview sessions comprises:
performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles;
obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data;
and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
5. The method of claim 3, wherein the acquiring multiple rounds of initial doctor-patient interview sessions comprises:
acquiring medical data from a clinical trial database;
mining the knowledge graph of the medical data to obtain a medical knowledge graph;
inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
6. The method of claim 3, wherein the acquiring multiple rounds of initial doctor-patient interview sessions comprises:
acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or the number of the groups of groups,
And acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogues according to the case information.
7. The method of claim 1, wherein the inputting the training corpus into a large questionnaire model prior to training further comprises:
acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature;
and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
8. The method of claim 7, wherein the obtaining the vocabulary of the medical domain based on the medical document comprises:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word;
based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set;
and determining the vocabulary of the medical field based on the candidate word segmentation.
9. The method of claim 8, wherein the determining the vocabulary of the medical field based on the candidate word segment comprises:
judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not;
And if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
10. The method of claim 7, wherein the obtaining the vocabulary of the medical domain based on the medical document comprises:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
performing character segmentation on the segmented words in the segmented word set to obtain a character sequence;
and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
11. The method of claim 10, wherein the encoding BBPE based on byte-level byte pairs clusters the characters in the character sequence with the characters in the initial vocabulary to obtain the vocabulary of the medical field, comprising:
acquiring the occurrence frequency of all adjacent character pairs from the character sequence;
selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary;
And (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
12. The method of claim 1, wherein after obtaining the target inquiry-size model, further comprising:
selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases;
acquiring a reference inquiry result of an expert system on the candidate diseases;
evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result;
and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
13. A large model-based interrogation method, wherein the method comprises:
calling a target inquiry large model, and inputting playing role prompt information to the target inquiry large model;
inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited;
Wherein the target inquiry large model is a large model trained by the training method according to any one of claims 1 to 12.
14. A training device for a large survey model, wherein the device comprises:
the acquisition module is used for acquiring a plurality of rounds of doctor-patient inquiry dialogues in the medical field, wherein the rounds of doctor-patient inquiry dialogues comprise a plurality of rounds of patient input information and doctor response information corresponding to each round of patient input information;
the generation module is used for determining prompt words of the large inquiry model and generating training corpus based on the prompt words and the multiple rounds of doctor-patient inquiry dialogue;
the training module is used for inputting the training corpus into a large inquiry model, outputting doctor prediction answers of each round of patient input information, and determining a loss function of the large inquiry model according to the doctor prediction answers of each round of patient input information and the doctor response information;
and the adjusting module is used for adjusting the big inquiry model based on the loss function, and continuously training the adjusted model until the training ending condition is met to obtain the target big inquiry model.
15. The apparatus of claim 14, wherein the training module is further to:
Masking and shielding each round of patient input information, and only reserving the doctor prediction reply of each round of patient input information;
and performing cross entropy loss calculation on the doctor prediction answer and the doctor response information of each round of patient input information, and determining a loss function of the large inquiry model.
16. The apparatus of claim 14, wherein the acquisition module is further configured to:
and acquiring a plurality of rounds of initial doctor-patient inquiry dialogues, and filtering the plurality of rounds of initial doctor-patient inquiry dialogues to obtain the plurality of rounds of doctor-patient inquiry dialogues.
17. The apparatus of claim 16, wherein the acquisition module is further configured to:
performing role differentiation of doctors and patients on the multiple rounds of initial doctor-patient consultation dialogues to obtain respective dialogue data of the roles;
obtaining scores of doctor dialogue data in a plurality of dimensions, and weighting the scores of the doctor dialogue data in the plurality of dimensions to obtain a comprehensive score of the doctor dialogue data;
and filtering the multiple rounds of initial doctor-patient inquiry dialogs based on the comprehensive scores of the doctor dialogue data to obtain the multiple rounds of doctor-patient inquiry dialogs.
18. The apparatus of claim 16, wherein the acquisition module is further configured to:
Acquiring medical data from a clinical trial database;
mining the knowledge graph of the medical data to obtain a medical knowledge graph;
inputting the medical knowledge graph into a large language model to generate the multiple rounds of initial doctor-patient inquiry dialogs.
19. The apparatus of claim 16, wherein the acquisition module is further configured to:
acquiring an online consultation dialogue, and acquiring the rounds of initial doctor-patient consultation dialogue according to the online consultation dialogue; and/or the number of the groups of groups,
and acquiring case information, and acquiring the rounds of initial doctor-patient inquiry dialogues according to the case information.
20. The apparatus of claim 14, wherein the training module is further to:
acquiring medical literature, and acquiring vocabulary of the medical field based on the medical literature;
and carrying out vector coding on the vocabulary based on the large inquiry model, and taking the coded word vector in the medical field as a minimum semantic unit of the large inquiry model.
21. The apparatus of claim 20, wherein the training module is further to:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
aiming at each word in a word segmentation set, acquiring word frequency TF-inverse text frequency index IDF of the word;
Based on the TF-IDF of the word segmentation, candidate word segmentation is obtained from the word segmentation set;
and determining the vocabulary of the medical field based on the candidate word segmentation.
22. The apparatus of claim 21, wherein the training module is further to:
judging whether the candidate word segmentation and the word in the initial vocabulary are repeated or not;
and if the candidate word is not repeated with the word in the vocabulary, updating the candidate word as the vocabulary of the medical field into the initial vocabulary.
23. The apparatus of claim 20, wherein the training module is further to:
performing word segmentation on the medical document to obtain a word segmentation set of the medical document;
performing character segmentation on the segmented words in the segmented word set to obtain a character sequence;
and carrying out clustering processing on the characters in the character sequence and the characters in the initial vocabulary based on the byte-level byte pair coding BBPE to obtain the vocabulary in the medical field.
24. The apparatus of claim 23, wherein the training module is further configured to:
acquiring the occurrence frequency of all adjacent character pairs from the character sequence;
selecting adjacent characters with highest occurrence frequency from all adjacent character pairs for combination to obtain a combined character segment, and combining the combined character segment with an initial vocabulary;
And (3) repeating the process until the iteration ending condition is met, and obtaining a target vocabulary, wherein the newly added vocabulary in the target vocabulary is the vocabulary in the medical field.
25. The apparatus of claim 14, wherein the adjustment module is further configured to:
selecting a plurality of specified candidate diseases, and carrying out inquiry on the candidate diseases based on the target inquiry large model to obtain a predicted inquiry result of the target inquiry large model on the candidate diseases;
acquiring a reference inquiry result of an expert system on the candidate diseases;
evaluating the target big inquiry model based on the predicted inquiry result and the reference inquiry result;
and if the evaluation result indicates that the target large inquiry model fails to pass the evaluation, continuing fine adjustment on the target large inquiry model.
26. A large model-based interrogation device, wherein the device comprises:
the calling module is used for calling the target inquiry large model and inputting playing role prompt information to the target inquiry large model;
the inquiry module is used for inputting at least one inquiry information to the target inquiry large model to obtain inquiry replies which are output by the target inquiry large model and are matched with each inquiry information until the inquiry is ended and the target inquiry large model is exited;
Wherein the target consultation large model is a large model trained by the training device according to any one of claims 14-25.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-13.
CN202311758183.9A 2023-12-20 2023-12-20 Training method of large inquiry model, inquiry method and device based on large inquiry model Pending CN117747087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311758183.9A CN117747087A (en) 2023-12-20 2023-12-20 Training method of large inquiry model, inquiry method and device based on large inquiry model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311758183.9A CN117747087A (en) 2023-12-20 2023-12-20 Training method of large inquiry model, inquiry method and device based on large inquiry model

Publications (1)

Publication Number Publication Date
CN117747087A true CN117747087A (en) 2024-03-22

Family

ID=90278875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311758183.9A Pending CN117747087A (en) 2023-12-20 2023-12-20 Training method of large inquiry model, inquiry method and device based on large inquiry model

Country Status (1)

Country Link
CN (1) CN117747087A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013021A (en) * 2024-04-08 2024-05-10 浙江口碑网络技术有限公司 Medicine answering method, device, equipment and medium based on large language model

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118013021A (en) * 2024-04-08 2024-05-10 浙江口碑网络技术有限公司 Medicine answering method, device, equipment and medium based on large language model

Similar Documents

Publication Publication Date Title
CN111444709B (en) Text classification method, device, storage medium and equipment
WO2021233112A1 (en) Multimodal machine learning-based translation method, device, equipment, and storage medium
EP4060565A1 (en) Method and apparatus for acquiring pre-trained model
Dharwadkar et al. A medical chatbot
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
Shou et al. Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis
WO2022007823A1 (en) Text data processing method and device
CN108491486B (en) Method, device, terminal equipment and storage medium for simulating patient inquiry dialogue
CN110427486B (en) Body condition text classification method, device and equipment
US11934790B2 (en) Neural network training method and apparatus, semantic classification method and apparatus and medium
CN111274397B (en) Method and device for establishing entity relation detection model
CN113407677B (en) Method, apparatus, device and storage medium for evaluating consultation dialogue quality
CN117747087A (en) Training method of large inquiry model, inquiry method and device based on large inquiry model
CN112100406A (en) Data processing method, device, equipment and medium
CN110399472A (en) Reminding method, device, computer equipment and storage medium are putd question in interview
CN110929532B (en) Data processing method, device, equipment and storage medium
CN115374771A (en) Text label determination method and device
CN110991183A (en) Method, device, equipment and storage medium for determining predicate of problem
Bai et al. Ofasys: A multi-modal multi-task learning system for building generalist models
CN116402166B (en) Training method and device of prediction model, electronic equipment and storage medium
CN117422067A (en) Information processing method, information processing device, electronic equipment and storage medium
CN116894498A (en) Training method, strategy identification method, device and equipment of network model
WO2023124837A1 (en) Inquiry processing method and apparatus, device, and storage medium
CN116994695A (en) Training method, device, equipment and storage medium of report generation model
CN113761899A (en) Medical text generation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination