WO2021139232A1 - 基于医疗知识图谱的分诊方法、装置、设备及存储介质 - Google Patents

基于医疗知识图谱的分诊方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2021139232A1
WO2021139232A1 PCT/CN2020/118139 CN2020118139W WO2021139232A1 WO 2021139232 A1 WO2021139232 A1 WO 2021139232A1 CN 2020118139 W CN2020118139 W CN 2020118139W WO 2021139232 A1 WO2021139232 A1 WO 2021139232A1
Authority
WO
WIPO (PCT)
Prior art keywords
entity
department
user
knowledge graph
triage
Prior art date
Application number
PCT/CN2020/118139
Other languages
English (en)
French (fr)
Inventor
林桂
黎旭东
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139232A1 publication Critical patent/WO2021139232A1/zh

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms

Definitions

  • This application relates to the field of smart medical care in digital medical care, and in particular to a triage method, device, equipment, and storage medium based on a medical knowledge graph.
  • the input is mainly based on colloquial symptoms, which is difficult to directly map to standard symptoms. The effect is poor and lacks flexibility.
  • the product mainly depends on the rules recommended by the department, such as recommending "fever” patients to rigid rules such as respiratory medicine. When a user enters a complex combination of symptoms, it is difficult to recommend the correct answer.
  • the main purpose of this application is to provide a triage method, device, equipment, and storage medium based on a medical knowledge map, which aims to solve the long interaction process between current triage products and users, poor oral symptom recognition, and poor diagnosis guidance.
  • this application proposes a triage method based on a medical knowledge graph, including:
  • an embodiment of the present application also provides a triage device based on a medical knowledge graph, including:
  • the receiving unit is used to receive the symptom description sentence input by the user;
  • the entity acquisition unit is configured to use the pre-trained BERT model to perform character encoding on the sentence to generate a word vector, and use the BILSTM model and the CRF model to decode the word vector to obtain the disease entity;
  • the entity linking unit uses an entity linking algorithm to link the disease entity to the standard disease entity in the knowledge graph
  • the classification unit is used to vectorize the standard disease entity text and input it into the pre-trained XGBoost classification model based on the medical knowledge graph, and the output result of the model is a recommended clinic;
  • the returning unit is used to return the recommended department to the user.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a triage method based on a medical knowledge graph is implemented.
  • the steps of the triage method based on the medical knowledge graph include:
  • the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a method for triage based on a medical knowledge graph is realized.
  • the steps of the triage method include:
  • the medical knowledge map-based triage method, device, computer equipment and readable storage medium of this application directly treat the user’s main complaint as serialized text, and use the XGBoost classification model for classification.
  • the classification result is the recommended clinic, reducing users The trouble caused by multiple rounds of input, the user experience is good, and the advantages in oral recognition are obvious, and the triage results are accurate.
  • FIG. 1 is a schematic flowchart of a triage method based on a medical knowledge graph according to an embodiment of the application
  • FIG. 2 is a schematic block diagram of the structure of a triage device based on a medical knowledge graph according to an embodiment of the application;
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • This application relates to the field of smart medical care in digital medicine.
  • an embodiment of this application provides a triage method based on a medical knowledge graph, which includes the steps:
  • This solution proposes a triage method based on medical knowledge graphs, which can be applied to intelligent diagnosis robots and self-service registration machines in hospitals, as well as to users’ smart mobile terminal devices such as mobile phones and tablets, which are collectively referred to as users hereinafter Terminals, these devices can be connected to the server via the Internet to complete data storage and classification calculations on the server side to improve operation and corresponding speed.
  • the intelligent triage system is a system that automatically recommends medical departments based on the user's symptom description, and first needs to obtain the user's symptoms.
  • the input methods commonly used by users include text input and voice input.
  • the voice input by the user needs to be converted into corresponding text.
  • the disease description sentence input by the user usually contains a disease entity, where the disease entity may be a disease entity or a symptom entity diagnosed by the user.
  • the disease entity may be a disease entity or a symptom entity diagnosed by the user.
  • BERT Bidirectional Encoder Representations From Transformers
  • model entity recognition of disease entities can identify colloquial disease entities. Take "What should I do if my head hurts" as an example, use the pre-trained BERT model to encode the sentence at the character level, and form a word vector as input.
  • the core model uses the classic BILSTM+CRF architecture, where BILSTM refers to two-way LSTM (Long Short Term Memory), and GRF is the abbreviation of Conditional Random Field.
  • BILSTM refers to two-way LSTM (Long Short Term Memory)
  • GRF is the abbreviation of Conditional Random Field.
  • the two-way LSTM mainly encodes the sentence. It is worth mentioning that the two-way LSTM is better than the one-way LSTM or GRU (Gate Recurrent Unit). Because the sentence is traversed before and after, it can better capture the semantic features and perform feature extraction. The role of. Then input it into the CRF layer for decoding operation, and calculate the label of each word in the sequence.
  • the model outputs the labels of the three words "headache” respectively as B, I, I, where B is Begin, which means the beginning of the noun phrase; I is Intermediate, which means the middle of the noun phrase; other words in the sentence
  • the tags are all O, that is, Other, which represents a non-noun phrase. Therefore, "headache" is a noun phrase, and here it is a symptom entity.
  • step S2 some colloquial symptoms can be identified by using the BERT model, the identification is more accurate, and the application range is wider.
  • the entity linking algorithm is used to link the colloquial symptoms to the standard symptoms in the knowledge graph. For example, the symptom entity "headache” will be linked to the symptom entity "headache” in the knowledge graph.
  • the XGBoost classification model pre-trained based on the knowledge graph is used to classify the symptom serialized text, and the output result of the model is the recommended clinic.
  • the classification model is obtained by using a machine learning algorithm to train the training data in the knowledge map.
  • the training data includes symptom data and data of recommended clinics.
  • the symptom data is used as input and the recommended clinics are used as output.
  • the classification model Conduct training.
  • the XGBoost model can complete classification and regression tasks well. The biggest feature of XGBoost is that it can automatically use the multi-threading of the CPU for parallelism, and at the same time improve the algorithm to increase the accuracy. This solution uses the XGBoost classification model to directly classify the user's disease sequence text.
  • the classification result is the recommended department.
  • the overall accuracy of the algorithm is as high as 94.8%, and the actual landing accuracy is as high as 85%, can avoid the direct use of the routine triage algorithm flow of symptom entity recognition, disease reasoning, and department recommendation, reducing the trouble caused by multiple rounds of input by the user, and the user experience is good.
  • step S5 after the recommended medical department is given by the model, the recommended medical department is returned to the user.
  • the step of receiving the symptom description sentence input by the user includes:
  • the speech is converted into text, and the text is used as the disease description sentence.
  • the triage method of this solution can be applied to an application program and implemented on a hospital device or a user's mobile terminal device.
  • voice input is a commonly used language input method, which can bring convenience to users. If the symptom description sentence input by the user is a voice input, in order to convert it into a language that can be processed by a computer or a server, the voice needs to be recognized and converted into a corresponding disease description sentence in text format.
  • the training method of the XGBoost classification model includes:
  • the training data of the XGBoost classification model comes from a comprehensive medical knowledge map.
  • the training data is randomly sampled into the training set, development set, and test set into 8:1:1.
  • the training set data is used to train the model parameters
  • the development set data is used to adjust the parameters
  • the test set data is used to evaluate the model algorithm.
  • Input to the model, the model performs feature selection, node splitting, and finally output through the softmax layer.
  • the main parameters of the trained model are “max_depth”: 6, “Eta”:0.5, “num_class”:32, that is, the maximum tree depth is 6, the learning rate is 0.5, the output category is 32, that is, 32 national standard first-level departments, where max_depth refers to the maximum depth of the tree, and the greater the max_depth,
  • max_depth refers to the maximum depth of the tree
  • num_class refers to the number of output categories.
  • the step of extracting data from the medical knowledge graph and obtaining training data includes:
  • mapping relationship between the label data set and the feature value data set is established according to the corresponding relationship between the department entity and the triage attribute to obtain the training data.
  • the above-mentioned disease entity and symptom attribute can be used as the triage attribute of the disease entity and other disease entities in the knowledge map, so that the triage is performed according to the triage attribute to determine the corresponding department after the triage.
  • Entities for example, the department entity is the Department of Respiratory Medicine, and the triage attribute of the department entity includes the disease entity of pneumonia, and can also include the symptom attributes of pneumonia "dyspnea", "loss of appetite” and so on.
  • the training data is a mapping data set of department entities and triage attributes, in which department entities are used as label data, and triage attributes are used as feature value data.
  • the step of returning the recommended department to the user includes:
  • the generated department recommendation results are sent to the user terminal, which can be sent by application, SMS, WeChat notification, or email, and then displayed on the interface of the user terminal or broadcast to the user by voice.
  • the step includes:
  • the knowledge graph is updated according to the modified sample data.
  • the user After the user receives the department recommendation result through the terminal, he registers for a doctor, and provides feedback based on the actual treatment situation, that is, whether the department is correct, whether a referral is required, and so on.
  • the user inputs the results of the medical consultation to the terminal.
  • the user can also input the information of the actual medical department into the terminal.
  • the user can input the actual medical department information through text, voice, etc., when the user terminal
  • a medical feedback result is generated according to the actual medical department information and sent to the server.
  • the server After the server receives the medical feedback result sent by the terminal, it extracts the user's actual medical department from the medical feedback result.
  • the server compares the extracted actual medical treatment information with the recommended department in the corresponding department recommendation result, and judges whether the recommended department is consistent with the actual medical department. When the recommended department is inconsistent with the actual medical department, the server will compare the standard disease entity with the clinic according to the standard disease entity.
  • the actual treatment department generates modified sample data, and updates the modified sample data to the knowledge map to ensure the data accuracy of the knowledge map.
  • the method further includes:
  • the doctors in the consultation are sorted according to the doctor recommendation scores, and a doctor appointment recommendation result is generated according to the sorted doctors in the consultation.
  • the user’s appointment time is the user’s appointment for medical treatment
  • the user’s location information is the user’s common residential address.
  • the user can input the scheduled medical visit and the address of the common residential address into the user terminal, and the user terminal can also automatically locate the user’s location information.
  • the user terminal converts the acquired residence address into latitude and longitude location information and generates user location information, and the user terminal sends the acquired user reservation time and user location information to the server.
  • the server receives the user's appointment time and user location information sent by the user terminal, searches for the doctors in the clinic corresponding to the recommended department in the department recommendation result, obtains the on-duty time corresponding to the doctor in the clinic, and filters out the on-duty time that matches the user's appointment time.
  • the server obtains the hospital scores, hospital location information, and doctor scores corresponding to the selected doctors in the examination.
  • the hospital location information is the latitude and longitude data of the hospital location.
  • the server calculates the location distance between the hospital and the user based on the user location information and the hospital location information, and searches for the distance between the hospital and the user.
  • the hospital distance score corresponding to the location distance is set and stored in advance. For example, the score corresponding to the location distance within 5km is 10 points, and the distance score corresponding to the location distance 5km-10km is 8 points, etc.
  • the server obtains the preset hospital scores, doctor scores, and hospital distance scores corresponding to the score weights, and weights and sums each score and the corresponding score weights to obtain the doctor recommendation scores of the doctors who are screened out. And sort the selected doctors according to the doctor recommendation score from high to low, extract the doctors who are ranked before the preset sorting position, and obtain the doctor information of the extracted doctors.
  • the doctor information can be Including the doctor's name, time of visit, hospital, hospital address, doctor's title, length of practice, field of expertise and other information, the server generates a doctor appointment recommendation result based on the acquired doctor information, and returns the doctor appointment recommendation result to the user terminal. By obtaining the user's appointment time and location information, the server can automatically match the user with a doctor who has a higher comprehensive score, which is convenient for the user to select the hospital and doctor.
  • the method of triage based on the medical knowledge graph in the embodiment of the application directly regards the user’s main complaint as a serialized text, and uses the XGBoost classification model to classify.
  • the classification result is the recommended medical department, which reduces the trouble caused by the user’s multiple rounds of input. Good user experience, obvious advantages in spoken language recognition, and accurate triage results.
  • an embodiment of the present application also provides a triage device based on a medical knowledge graph, including:
  • the receiving unit 1 is used to receive the symptom description sentence input by the user;
  • the entity acquisition unit 2 is configured to use the pre-trained BERT model to perform character encoding on the sentence to generate a word vector, and use the BILSTM model and the CRF model to decode the word vector to obtain the disease entity;
  • the entity linking unit 3 uses an entity linking algorithm to link the disease entity to the standard disease entity in the knowledge graph
  • the classification unit 4 is used to vectorize the standard disease entity text and input it into the XGBoost classification model pre-trained based on the medical knowledge graph, and the output result of the model is a recommended clinic;
  • the returning unit 5 is used to return the recommended department to the user.
  • the components of the medical knowledge graph-based triage device proposed in this application can realize the functions of any one of the above-mentioned medical knowledge graph-based triage methods, and the specific structure is no longer Go into details.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used for data such as medical knowledge graphs.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a triage method based on the medical knowledge graph.
  • the above-mentioned processor executes the above-mentioned triage method based on the medical knowledge graph, including: receiving a disease description sentence input by a user; using a pre-trained BERT model to perform character encoding on the sentence, generating a word vector, and using a BILSTM model and a CRF model Decode the word vector to obtain the disease entity; use the entity linking algorithm to link the disease entity to the standard disease entity in the knowledge graph; vectorize the text of the standard disease entity and input it into the pre-training based on the medical knowledge graph In a good XGBoost classification model, the output result of the model is a recommended clinic; the recommended clinic is returned to the user.
  • the step of receiving the symptom description sentence input by the user includes: judging whether the sentence input by the user is speech; if so, converting the speech into text and using the text as the symptom description sentence.
  • the step of extracting data from the medical knowledge graph and obtaining training data includes: obtaining the department entity in the knowledge graph and the triage attribute of each department entity, and the triage attribute refers to the The department entity can triage and treat the disease entity or/and symptom attributes; vectorize and store the department entity as a label data set, and vectorize and store the triage attribute as a feature value data set; according to the department entity
  • the mapping relationship between the label data set and the feature value data set is established by the correspondence relationship with the triage attribute to obtain the training data.
  • the step of returning recommended departments to the user includes: sending the results of the department recommendations to the user terminal; displaying on the interface of the user terminal or broadcasting to the user via voice.
  • the step includes: receiving a treatment feedback result sent by the user terminal, and extracting the actual treatment department from the treatment feedback result;
  • the revised sample data is generated according to the standard disease entity and the actual treatment department; and the knowledge map is updated according to the revised sample data.
  • the method further includes: obtaining the user's appointment time and user location information; and searching for the doctor who is corresponding to the department recommendation result and the user's appointment time Obtain the hospital score, hospital location information, and doctor score corresponding to the doctor in consultation; calculate the hospital distance score according to the hospital location information and the user location information; according to the hospital score, the doctor score, and the hospital The distance score calculates the doctor recommendation score; the doctors are sorted according to the doctor recommendation score, and the doctor appointment recommendation results are generated according to the sorted doctors.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. When the computer program is executed by a processor, A method for triage based on medical knowledge graphs includes the steps of: receiving a disease description sentence input by a user; using a pre-trained BERT model to encode the sentence, generating a word vector, and using the BILSTM model and the CRF model The word vector is decoded to obtain the disease entity; the entity linking algorithm is used to link the disease entity to the standard disease entity in the knowledge map; the text of the standard disease entity is vectorized and input into the pre-trained medical knowledge map In the XGBoost classification model, the output result of the model is the recommended clinic; the recommended clinic is returned to the user.
  • the above implementation of the medical knowledge map-based triage method directly treats the user’s main complaint as a serialized text, and uses the XGBoost classification model to classify.
  • the classification result is the recommended medical department, reducing the trouble caused by the user's multiple rounds of input. Good, and obvious advantages in spoken language recognition, and accurate triage results.
  • the step of receiving the symptom description sentence input by the user includes: judging whether the sentence input by the user is speech; if so, converting the speech into text and using the text as the symptom description sentence.
  • the step of extracting data from the medical knowledge graph and obtaining training data includes: obtaining the department entity in the knowledge graph and the triage attribute of each department entity, and the triage attribute refers to the The department entity can triage and treat the disease entity or/and symptom attributes; vectorize and store the department entity as a label data set, and vectorize and store the triage attribute as a feature value data set; according to the department entity
  • the mapping relationship between the label data set and the feature value data set is established by the correspondence relationship with the triage attribute to obtain the training data.
  • the step of returning recommended departments to the user includes: sending the results of the department recommendations to the user terminal; displaying on the interface of the user terminal or broadcasting to the user via voice.
  • the step includes: receiving a treatment feedback result sent by the user terminal, and extracting the actual treatment department from the treatment feedback result;
  • the revised sample data is generated according to the standard disease entity and the actual treatment department; and the knowledge map is updated according to the revised sample data.
  • the method further includes: obtaining the user's appointment time and user location information; and searching for the doctor who is corresponding to the department recommendation result and the user's appointment time Obtain the hospital score, hospital location information, and doctor score corresponding to the doctor in consultation; calculate the hospital distance score according to the hospital location information and the user location information; according to the hospital score, the doctor score, and the hospital The distance score calculates the doctor recommendation score; the doctors are sorted according to the doctor recommendation score, and the doctor appointment recommendation results are generated according to the sorted doctors.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种基于医疗知识图谱的分诊方法、装置、设备及存储介质,涉及数字医疗领域,方法包括:接收用户输入的病症描述语句(S1);使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体(S2);利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体(S3);将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室(S4);将所述推荐就诊科室返回给用户(S5)。该方法直接将用户的主诉视为序列化文本,采用XGBoost分类模型进行分类,分类结果即为推荐就诊科室,使用体验好,并在口语化识别方面优势明显,分诊结果准确。

Description

基于医疗知识图谱的分诊方法、装置、设备及存储介质
本申请要求于2020年06月30日提交中国专利局、申请号为202010621760.X,发明名称为“基于医疗知识图谱的分诊方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到数字医疗中的智慧医疗领域,特别是涉及到一种基于医疗知识图谱的分诊方法、装置、设备及存储介质。
背景技术
医院传统的分诊方式为设置分诊台,对医院的科室分类不了解、不清楚自己病症的患者可以去分诊台向工作人员咨询。但是,患者在向工作人员咨询时,一方面患者通常只能口语化表达自己的病情而无法使用医学术语,另一方面很多分诊台工作人员并非医师专业出身,所以,分诊台工作人员给出的科室推荐的准确性难以得到保证,且此种方式效率较低,人力成本较大。
发明人发现,目前市场上已经存在了一些智能分诊产品,如导诊机器人和分诊应用程序等,但在实际使用体验过程中,主要存在以下不足:现有产品在功能上更偏向于用户症状问询自诊,即通过与用户进行多轮交互,常多于三轮,用户通过多次点选应用推荐的相关症状,利用预先设计好的分类算法,推荐出最贴近用户描述的就诊科室,即推荐结果往往来自于较为冗长的用户与应用交互过程,当用户实际症状较简单或者明显时,仍需进行多轮交互;症状识别的智能化程度不高,分诊产品定位于普通用户,输入以口语化症状为主,难以直接映射到标准症状,效果较差,灵活性欠缺;同时,因为产品主要依赖于规则推荐科室,如将“发烧”的病人推荐到呼吸内科等硬性规则,当用户输入复杂的症状组合时,较难推荐正确答案。
技术问题
本申请的主要目的为提供一种基于医疗知识图谱的分诊方法、装置、设备及存储介质,旨在解决目前的分诊产品与用户交互过程冗长、口语化症状识别效果差、导诊效果差的技术问题。
技术解决方案
为了实现上述发明目的,特提出了以下技术方案:
第一方面,本申请提出一种基于医疗知识图谱的分诊方法,包括:
接收用户输入的病症描述语句;
使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
将所述推荐就诊科室返回给用户。
第二方面,本申请实施例还提供一种基于医疗知识图谱的分诊装置,包括:
接收单元,用于接收用户输入的症状描述语句;
实体获取单元,用于使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
实体链接单元,利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
分类单元,用于将所述标准病症实体文本向量化,输入到基于医疗知识图谱预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
返回单元,用于将所述推荐就诊科室返回给用户。
第三方面,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种基于医疗知识图谱的分诊方法,所述基于医疗知识图谱的分诊方法的步骤包括:
接收用户输入的病症描述语句;
使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
将所述推荐就诊科室返回给所述用户。
第四方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种基于医疗知识图谱的分诊方法,所述基于医疗知识图谱的分诊方法的步骤包括:
接收用户输入的病症描述语句;
使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
将所述推荐就诊科室返回给所述用户。
有益效果
本申请的基于医疗知识图谱的分诊方法、装置、计算机设备及可读存储介质,直接将用户的主诉视为序列化文本,采用XGBoost分类模型进行分类,分类结果即为推荐就诊科室,减少用户多轮输入带来的麻烦,使用体验好,并在口语化识别方面优势明显,分诊结果准确。
附图说明
图1为本申请一实施例的基于医疗知识图谱的分诊方法的流程示意图;
图2 为本申请一实施例的基于医疗知识图谱的分诊装置的结构示意框图;
图3 为本申请一实施例的计算机设备的结构示意框图。
本发明的最佳实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请涉及到数字医疗中的智慧医疗领域,参照图1,本申请实施例中提供一种基于医疗知识图谱的分诊方法,包括步骤:
S1、接收用户输入的症状描述语句;
S2、使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
S3、利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
S4、将所述标准病症实体文本向量化,输入到基于医疗知识图谱预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
S5、将所述推荐就诊科室返回给用户。
本方案提出了一种基于医疗知识图谱的分诊方法,可以应用于医院的智能导诊机器人、自助挂号机,也可以应用于用户的智能移动终端设备如手机、平板电脑等,下文统称为用户终端,这些设备可以通过互联网与服务器连接,在服务器端完成数据的存储与分类计算,提高运行与相应速度。
如上述步骤S1所述,智能分诊***是根据用户的症状描述自动推荐就诊科室的***,首先需要获取用户的症状。目前用户常用的输入方式有文字输入和语音输入,为方便分类模型对用户输入的症状描述语句进行分析处理,需要将用户输入的语音转化为对应的文字。
如上述步骤S2所述,用户输入的病症描述语句中通常会含有病症实体,其中病症实体可以是用户自行诊断的疾病实体或症状实体。例如,用户发烧流涕,稍有常识的病人都会首先怀疑自己是感冒。本方案中,利用BERT(Bidirectional Encoder Representations from Transformers)模型对病症实体进行实体识别可以识别到口语化的病症实体。以“头很痛怎么办”为例,使用预先训练好的BERT模型对该语句进行字符级别编码,分别形成字向量作为输入。核心模型使用经典的BILSTM+CRF架构,其中BILSTM是指双向LSTM(Long Short Term Memory),GRF是Conditional Random Field的缩写。其中双向LSTM主要对语句进行encode操作,值得一提的是,双向LSTM效果比单向LSTM或GRU(Gate Recurrent Unit)更好,由于对语句进行前后遍历,更能捕获语义特征,起到特征提取的作用。然后将其输入CRF层进行解码操作,计算序列中每个字的标签。经过上述操作,模型输出“头很痛”三个字的标签分别为B,I,I,其中B为Begin,表示名词短语的开始;I为Intermediate,表示名词短语的中间;该句的其他字标签均为O,即Other,表示非名词短语。因此“头很痛”为名词短语,在此为症状实体。
如上述步骤S3所述,在步骤S2中,利用BERT模型可以识别出一些口语化的症状,识别更加准确,应用范围更广。在此步骤中,根据识别出的口语化症状利用实体链接算法将口语化的症状链接到知识图谱中的标准症状。如“头很痛”这个症状实体就会被链接到知识图谱中的“头痛”这个症状实体。
如上述步骤S4所述,利用基于知识图谱预先训练好的XGBoost分类模型对症状序列化文本进行分类,模型输出结果为推荐就诊科室。所述分类模型是通过使用机器学习算法对知识图谱中的训练数据进行训练得到的,所述训练数据包括症状数据和推荐就诊科室数据,将症状数据作为输入,将推荐就诊科室作为输出对分类模型进行训练。XGBoost模型可以很好地完成分类和回归任务,XGBoost最大的特点在于它能够自动利用 CPU 的多线程进行并行,同时在算法上加以改进提高了精度。本方案采用XGBoost分类模型对用户病状序列文本直接进行分类,区别于传统的症状抽取到疾病诊断到科室的流程,分类结果即为推荐就诊科室,算法整体准确率高达94.8%,实际落地准确率高达85%,可以避免直接使用症状实体识别、疾病推理、科室推荐的常规分诊算法流程,减少了用户多轮输入带来的麻烦,使用体验良好。
如上述步骤S5所述,在模型给出推荐的就诊科室之后,将所述推荐就诊科室返回给用户。
在一个具体的实施例中,所述接收用户输入的症状描述语句的步骤,包括:
判断用户输入的语句是否为语音;
若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。如上所述,本方案的分诊方法可以应用于应用程序,在医院的设备或用户的移动终端设备上实现。目前,语音输入是一种常用的语言输入方式,可以给用户带来便利。如果用户输入症状描述语句是语音输入,为了将其转化为计算机或服务器可以处理的语言,需要对语音进行语音识别,转化为对应的文字格式的病状描述语句。
在一个具体的实施例中,所述XGBoost分类模型的训练方法,包括:
在医疗知识图谱中进行数据抽取,获取训练数据;
将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数最大树深max_depth=6,学习率eta=0.5,分类类别num_class=32。
如上所述,XGBoost分类模型的训练数据来自于完善的医疗知识图谱,训练过程中,将训练数据按随机抽样的形式将训练集、开发集、测试集切分为8:1:1,其中,训练集数据用于训练模型参数,开发集数据用于调整参数,测试集数据用于对模型算法进行评估。输入到模型中,模型进行特征选择,节点***,最终经过softmax层输出。训练完成的模型主要参数:“max_depth”:6, “eta”:0.5, “num_class”:32,即最大树深为6,学习率为0.5,输出类别为32,即32个国标一级科室,其中max_depth是指树的最大深度,max_depth越大,模型学习的更加具体,eta表示学习率,也称为迭代率,此数值影响模型的准确度和运行速度,num_class是指输出的类别数量。
在一个具体的实施例中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:
获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指该科室实体可以分诊治疗的疾病实体或/和症状属性;
将所述科室实体向量化并储存为标签数据集,将所述分诊属性向量化并储存为特征值数据集;
根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
如上步骤所述,上述疾病实体和症状属性均可以作为知识图谱中区分该疾病实体与其他疾病实体的分诊属性,从而根据该分诊属性去进行分诊,以确定分诊之后所对应的科室实体;比如科室实体为呼吸内科,该科室实体的分诊属性包括肺炎这种疾病实体,同时还可以包括肺炎的症状属性“呼吸困难”、“食欲减退”等。若自然问询语句中包含“肺炎”、“呼吸困难”这两个病症关键词,此时可以根据上述病症关键词(该病症关键词中的肺炎为分诊属性中的疾病实体,呼吸困难为分诊属性中的症状属性)确定其疾病实体为“肺炎”,进而根据该疾病实体(也即分诊属性)确定其分诊科室(科室实体)为呼吸内科。故在对分类模型进行训练时,训练数据为科室实体和分诊属性的映射数据集,其中以科室实体作为标签数据,以分诊属性作为特征值数据。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤包括:
将所述科室推荐结果发送至用户终端;
在用户终端的界面上显示或通过语音播报给用户。
如上所述,将生成的科室推荐结果发送至用户终端,具体可以应用、短信、微信通知或邮件等方式进行发送,然后在用户终端的界面上显示或通过语音播报给用户。
在一个具体的实施例中,所述将所述推荐就诊科室返回给用户的步骤之后包括:
接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;
当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;
根据所述修正样本数据对知识图谱进行更新。
如上所述,用户通过终端接收到科室推荐结果后进行挂号就诊,并根据实际的就诊情况即就诊科室是否正确、是否需要进行转诊等进行就诊反馈。用户向终端输入就诊反馈结果,当就诊科室与实际诊疗科室不一致时,用户可以将实际诊疗科室的科室信息也输入终端中,用户可以通过文字、语音等方式进行输入实际就诊科室信息,当用户终端获取到用户输入的实际就诊科室信息后,根据实际就诊科室信息生成就诊反馈结果并发送给服务器。服务器接收终端发送的就诊反馈结果后,从就诊反馈结果中提取用户的实际就诊科室。服务器将提取到的实际就诊信息与对应的科室推荐结果中的推荐科室进行比较,判断推荐科室与实际就诊科室是否一致,当推荐科室与实际就诊科室不一致时,服务器根据所述标准病症实体与所述实际就诊科室生成修正样本数据,将修正样本数据更新至知识图谱中,保证知识图谱的数据准确性。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤之后还包括:
获取用户预约时间和用户位置信息;
查找与所述科室推荐结果、所述用户预约时间对应的在诊医生;
获取所述在诊医生对应的医院评分、医院位置信息和医生评分;
根据所述医院位置信息和所述用户位置信息计算医院距离得分;
根据所述医院评分、所述医生评分和所述医院距离得分计算医生推荐得分;
根据所述医生推荐得分对所述在诊医生进行排序,根据排序后的所述在诊医生生成医生预约推荐结果。
如上所述,用户预约时间为用户的预约就诊时间,用户位置信息为用户的常用居住地址,用户可以向用户终端输入预约就诊时间和常用居住地地址,用户终端也可以自动定位用户的位置信息,用户终端将获取的居住地地址转换为经纬度位置信息并生成用户位置信息,用户终端将获取的用户预约时间和用户位置信息发送给服务器。服务器接收用户终端发送的用户预约时间和用户位置信息,查找与科室推荐结果中的推荐科室对应的在诊医生,获取在诊医生对应的值班时间,筛选出值班时间与用户预约时间相匹配的在诊医生。服务器获取筛选出的在诊医生对应的医院评分、医院位置信息和医生评分,医院位置信息为医院所在地经纬度数据,服务器根据用户位置信息与医院位置信息计算出医院与用户的位置距离,并查找与位置距离对应的医院距离得分,医院距离得分与位置距离的对应关系事先设定并存储,如位置距离5km以内对应的得分为10分,位置距离5km-10km对应的距离得分为8分等。服务器分别获取预先设定的医院评分、医生评分和医院距离得分对应的评分权重,对各项评分和对应的评分权重进行加权求和后得到筛选出的各在诊医生的医生推荐得分。并根据医生推荐得分由高到低的顺序对筛选出的在诊医生进行排序,将排在预设排序位置之前的在诊医生提取出来,获取提取出的在诊医生的医生信息,医生信息可以包括医生姓名、出诊时间、所属医院、医院地址、医生职称、执医年限、擅长领域等信息,服务器根据获取到的医生信息生成医生预约推荐结果,并将医生预约推荐结果返回给用户终端。服务器通过获取用户的预约时间和位置信息,可以自动为用户匹配综合评分较高的在诊医生,便于用户进行医院和医生的选择。
本申请实施例的基于医疗知识图谱的分诊方法,直接将用户的主诉视为序列化文本,采用XGBoost分类模型进行分类,分类结果即为推荐就诊科室,减少用户多轮输入带来的麻烦,使用体验好,并在口语化识别方面优势明显,分诊结果准确。
参照图2,本申请实施例中还提供一种基于医疗知识图谱的分诊装置,包括:
接收单元1,用于接收用户输入的症状描述语句;
实体获取单元2,用于使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
实体链接单元3,利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
分类单元4,用于将所述标准病症实体文本向量化,输入到基于医疗知识图谱预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
返回单元5,用于将所述推荐就诊科室返回给用户。
如上所述,可以理解地,本申请中提出的所述基于医疗知识图谱的分诊装置的各组成部分可以实现如上所述基于医疗知识图谱的分诊方法任一项的功能,具体结构不再赘述。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***、计算机程序和数据库。该内存器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于医疗知识图谱等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于医疗知识图谱的分诊方法。
上述处理器执行上述的基于医疗知识图谱的分诊方法,包括:接收用户输入的病症描述语句;使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;将所述推荐就诊科室返回给用户。
在一个实施例中,所述接收用户输入的症状描述语句的步骤,包括:判断用户输入的语句是否为语音;若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。在一个实施例中,所述XGBoost分类模型的训练方法,包括:在医疗知识图谱中进行数据抽取,获取训练数据;将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数max_depth=6,eta=0.5,num_class=32。
在一个具体的实施例中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指该科室实体可以分诊治疗的疾病实体或/和症状属性;将所述科室实体向量化并储存为标签数据集,将所述分诊属性向量化并储存为特征值数据集;根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤包括:将所述科室推荐结果发送至用户终端;在用户终端的界面上显示或通过语音播报给用户。
在一个具体的实施例中,所述将所述推荐就诊科室返回给用户的步骤之后包括:接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;根据所述修正样本数据对知识图谱进行更新。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤之后还包括:获取用户预约时间和用户位置信息;查找与所述科室推荐结果、所述用户预约时间对应的在诊医生;获取所述在诊医生对应的医院评分、医院位置信息和医生评分;根据所述医院位置信息和所述用户位置信息计算医院距离得分;根据所述医院评分、所述医生评分和所述医院距离得分计算医生推荐得分;根据所述医生推荐得分对所述在诊医生进行排序,根据排序后的所述在诊医生生成医生预约推荐结果。
本申请一实施例还提供一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性的,其上存储有计算机程序,计算机程序被处理器执行时实现一种基于医疗知识图谱的分诊方法,包括步骤:接收用户输入的病症描述语句;使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;将所述推荐就诊科室返回给用户。
上述执行的基于医疗知识图谱的分诊方法,直接将用户的主诉视为序列化文本,采用XGBoost分类模型进行分类,分类结果即为推荐就诊科室,减少用户多轮输入带来的麻烦,使用体验好,并在口语化识别方面优势明显,分诊结果准确。
在一个实施例中,所述接收用户输入的症状描述语句的步骤,包括:判断用户输入的语句是否为语音;若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。
在一个实施例中,所述XGBoost分类模型的训练方法,包括:在医疗知识图谱中进行数据抽取,获取训练数据;将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数max_depth=6,eta=0.5,num_class=32。
在一个具体的实施例中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指该科室实体可以分诊治疗的疾病实体或/和症状属性;将所述科室实体向量化并储存为标签数据集,将所述分诊属性向量化并储存为特征值数据集;根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤包括:将所述科室推荐结果发送至用户终端;在用户终端的界面上显示或通过语音播报给用户。
在一个具体的实施例中,所述将所述推荐就诊科室返回给用户的步骤之后包括:接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;根据所述修正样本数据对知识图谱进行更新。
在一个具体的实施例中,所述将推荐就诊科室返回给用户的步骤之后还包括:获取用户预约时间和用户位置信息;查找与所述科室推荐结果、所述用户预约时间对应的在诊医生;获取所述在诊医生对应的医院评分、医院位置信息和医生评分;根据所述医院位置信息和所述用户位置信息计算医院距离得分;根据所述医院评分、所述医生评分和所述医院距离得分计算医生推荐得分;根据所述医生推荐得分对所述在诊医生进行排序,根据排序后的所述在诊医生生成医生预约推荐结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种基于医疗知识图谱的分诊方法,其中,包括:
    接收用户输入的病症描述语句;
    使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
    利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
    将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
    将所述推荐就诊科室返回给所述用户。
  2. 根据权利要求1所述的基于医疗知识图谱的分诊方法,其中,所述接收用户输入的症状描述语句的步骤,包括:
    判断用户输入的语句是否为语音;
    若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。
  3. 根据权利要求1所述的基于医疗知识图谱的分诊方法,其中,所述XGBoost分类模型的训练方法,包括:
    在医疗知识图谱中进行数据抽取,获取训练数据;
    将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数最大树深max_depth=6,学习率eta=0.5,分类类别num_class=32。
  4. 根据权利要求3所述的基于医疗知识图谱的分诊方法,其中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:
    获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指所述科室实体可以分诊治疗的疾病实体或/和症状属性;
    将所述科室实体向量化并储存为标签数据集,以及将所述分诊属性向量化并储存为特征值数据集;
    根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
  5. 根据权利要求1所述的基于医疗知识图谱的分诊方法,其中,所述将推荐就诊科室返回给用户的步骤包括:
    将所述科室推荐结果发送至用户终端;
    在用户终端的界面上显示或通过语音播报给用户。
  6. 根据权利要求1所述的基于医疗知识图谱的分诊方法,其中,所述将所述推荐就诊科室返回给用户的步骤之后包括:
    接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;
    当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;
    根据所述修正样本数据对知识图谱进行更新。
  7. 根据权利要求1所述的基于医疗知识图谱的分诊方法,其中,
    所述将推荐就诊科室返回给用户的步骤之后还包括:
    获取用户预约时间和用户位置信息;
    查找与所述科室推荐结果、所述用户预约时间对应的在诊医生;
    获取所述在诊医生对应的医院评分、医院位置信息和医生评分;
    根据所述医院位置信息和所述用户位置信息计算医院距离得分;
    根据所述医院评分、所述医生评分和所述医院距离得分计算医生推荐得分;
    根据所述医生推荐得分对所述在诊医生进行排序,根据排序后的所述在诊医生生成医生预约推荐结果。
  8. 一种基于医疗知识图谱的分诊装置,其中,包括:
    接收单元,用于接收用户输入的症状描述语句;
    实体获取单元,用于使用预先训练好的BERT 模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
    实体链接单元,利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
    分类单元,用于将所述标准病症实体文本向量化,输入到基于医疗知识图谱预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
    返回单元,用于将所述推荐就诊科室返回给用户。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种基于医疗知识图谱的分诊方法,所述基于医疗知识图谱的分诊方法的步骤包括:
    接收用户输入的病症描述语句;
    使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
    利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
    将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
    将所述推荐就诊科室返回给所述用户。
  10. 根据权利要求9所述的计算机设备,其中,所述接收用户输入的症状描述语句的步骤,包括:
    判断用户输入的语句是否为语音;
    若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。
  11. 根据权利要求9所述的计算机设备,其中,所述XGBoost分类模型的训练方法,包括:
    在医疗知识图谱中进行数据抽取,获取训练数据;
    将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数最大树深max_depth=6,学习率eta=0.5,分类类别num_class=32。
  12. 根据权利要求11所述的计算机设备,其中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:
    获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指所述科室实体可以分诊治疗的疾病实体或/和症状属性;
    将所述科室实体向量化并储存为标签数据集,以及将所述分诊属性向量化并储存为特征值数据集;
    根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
  13. 根据权利要求9所述的计算机设备,其中,所述将推荐就诊科室返回给用户的步骤包括:
    将所述科室推荐结果发送至用户终端;
    在用户终端的界面上显示或通过语音播报给用户。
  14. 根据权利要求9所述的计算机设备,其中,所述将所述推荐就诊科室返回给用户的步骤之后包括:
    接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;
    当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;
    根据所述修正样本数据对知识图谱进行更新。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种基于医疗知识图谱的分诊方法,所述基于医疗知识图谱的分诊方法的步骤包括:
    接收用户输入的病症描述语句;
    使用预先训练好的BERT模型对所述语句进行字符编码,生成字向量,利用BILSTM模型和CRF模型对所述字向量进行解码,获取病症实体;
    利用实体链接算法将所述病症实体链接到知识图谱中的标准病症实体;
    将所述标准病症实体文本向量化,输入到基于医疗知识图谱的预先训练好的XGBoost分类模型中,模型输出结果为推荐就诊科室;
    将所述推荐就诊科室返回给所述用户。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述接收用户输入的症状描述语句的步骤,包括:
    判断用户输入的语句是否为语音;
    若是,则将语音转换成文字,并将所述文字作为所述病症描述语句。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述XGBoost分类模型的训练方法,包括:
    在医疗知识图谱中进行数据抽取,获取训练数据;
    将训练数据按训练集:开发集:测试集为8:1:1的比例拆分,对XGBoost分类模型进行训练,其中,模型主要参数最大树深max_depth=6,学习率eta=0.5,分类类别num_class=32。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述在医疗知识图谱中进行数据抽取,获取训练数据的步骤包括:
    获取知识图谱中的科室实体及各科室实体的分诊属性,所述分诊属性是指所述科室实体可以分诊治疗的疾病实体或/和症状属性;
    将所述科室实体向量化并储存为标签数据集,以及将所述分诊属性向量化并储存为特征值数据集;
    根据所述科室实体和所述分诊属性的对应关系建立所述标签数据集和所述特征值数据集的映射关系,得到所述训练数据。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述将推荐就诊科室返回给用户的步骤包括:
    将所述科室推荐结果发送至用户终端;
    在用户终端的界面上显示或通过语音播报给用户。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述推荐就诊科室返回给用户的步骤之后包括:
    接收所述用户终端发送的就诊反馈结果,从所述就诊反馈结果中提取实际就诊科室;
    当所述推荐就诊科室与所述实际就诊科室不一致时,根据所述标准病症实体与所述实际就诊科室生成修正样本数据;
    根据所述修正样本数据对知识图谱进行更新。
PCT/CN2020/118139 2020-06-30 2020-09-27 基于医疗知识图谱的分诊方法、装置、设备及存储介质 WO2021139232A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010621760.X 2020-06-30
CN202010621760.XA CN111785368A (zh) 2020-06-30 2020-06-30 基于医疗知识图谱的分诊方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021139232A1 true WO2021139232A1 (zh) 2021-07-15

Family

ID=72760410

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118139 WO2021139232A1 (zh) 2020-06-30 2020-09-27 基于医疗知识图谱的分诊方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN111785368A (zh)
WO (1) WO2021139232A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707304A (zh) * 2021-08-30 2021-11-26 平安国际智慧城市科技股份有限公司 分诊数据处理方法、装置、设备及存储介质
CN114974554A (zh) * 2022-02-23 2022-08-30 北京爱医声科技有限公司 融合图谱知识强化病历特征的方法、装置及存储介质

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164460B (zh) * 2020-10-19 2023-06-30 集美大学 一种基于医疗知识图谱的智能疾病辅助诊断***
CN112151188A (zh) * 2020-10-19 2020-12-29 科技谷(厦门)信息技术有限公司 一种基于医疗知识图谱的智能疾病预测***
CN112331283A (zh) * 2020-10-27 2021-02-05 贵州精准医疗电子有限公司 健康监测方法、装置及计算机可读介质
CN112530576A (zh) * 2020-11-30 2021-03-19 百度健康(北京)科技有限公司 一种线上医患匹配方法、装置、电子设备及存储介质
CN112700862B (zh) * 2020-12-25 2024-04-16 上海钛米机器人股份有限公司 目标科室的确定方法、装置、电子设备及存储介质
CN112614578B (zh) * 2020-12-29 2023-06-09 深圳平安智慧医健科技有限公司 医生智能推荐方法、装置、电子设备及存储介质
CN113782165A (zh) * 2021-04-02 2021-12-10 北京京东拓先科技有限公司 分诊方法及装置、计算机可存储介质
CN113327691B (zh) * 2021-06-01 2022-08-12 平安科技(深圳)有限公司 基于语言模型的问询方法、装置、计算机设备及存储介质
US20240203569A1 (en) * 2021-06-30 2024-06-20 Boe Technology Group Co., Ltd. Intelligent triage method and device, storage medium and electronic device
CN113724854A (zh) * 2021-07-27 2021-11-30 广州医科大学附属第二医院 一种基于机器学习的分级分诊方法、***及计算机设备
CN114093468A (zh) * 2021-07-27 2022-02-25 北京好欣晴移动医疗科技有限公司 心血管疾病信息实体标注和识别方法、装置和***
CN113707285A (zh) * 2021-08-30 2021-11-26 康键信息技术(深圳)有限公司 科室分诊方法、***、设备以及存储介质
CN113656588B (zh) * 2021-09-01 2024-05-10 深圳平安医疗健康科技服务有限公司 基于知识图谱的数据对码方法、装置、设备和存储介质
CN113610194B (zh) * 2021-09-09 2023-08-11 重庆数字城市科技有限公司 一种数字档案自动分类方法
CN114328979A (zh) * 2022-03-14 2022-04-12 北京泽桥医疗科技股份有限公司 一种基于医学知识图谱的医学点数字数据推荐算法
CN114783580B (zh) * 2022-06-20 2022-09-13 武汉博科国泰信息技术有限公司 一种医疗数据质量评估方法及***
CN117149998B (zh) * 2023-10-30 2024-01-23 北京南师信息技术有限公司 基于多目标优化的智能就诊推荐方法及***
CN117497158B (zh) * 2023-12-28 2024-04-16 智业软件股份有限公司 基于维度组合与权重计算推荐分诊等级的方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109659013A (zh) * 2018-11-28 2019-04-19 平安科技(深圳)有限公司 病症分诊及路径优化方法、装置、设备及存储介质
CN110069631A (zh) * 2019-04-08 2019-07-30 腾讯科技(深圳)有限公司 一种文本处理方法、装置以及相关设备
CN110085307A (zh) * 2019-04-04 2019-08-02 华东理工大学 一种基于多源知识图谱融合的智能导诊方法和***
US20190317994A1 (en) * 2018-04-16 2019-10-17 Tata Consultancy Services Limited Deep learning techniques based multi-purpose conversational agents for processing natural language queries
CN110516260A (zh) * 2019-08-30 2019-11-29 腾讯科技(深圳)有限公司 实体推荐方法、装置、存储介质及设备
CN110993078A (zh) * 2019-11-27 2020-04-10 华中科技大学同济医学院附属协和医院 一种医疗分诊方法、装置和存储介质
CN111310471A (zh) * 2020-01-19 2020-06-19 陕西师范大学 一种基于bblc模型的旅游命名实体识别方法
CN111339277A (zh) * 2020-02-28 2020-06-26 中国工商银行股份有限公司 基于机器学习的问答交互方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635122A (zh) * 2018-11-28 2019-04-16 平安科技(深圳)有限公司 智能疾病问询方法、装置、设备及存储介质
CN110489538B (zh) * 2019-08-27 2020-12-25 腾讯科技(深圳)有限公司 基于人工智能的语句应答方法、装置及电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190317994A1 (en) * 2018-04-16 2019-10-17 Tata Consultancy Services Limited Deep learning techniques based multi-purpose conversational agents for processing natural language queries
CN109659013A (zh) * 2018-11-28 2019-04-19 平安科技(深圳)有限公司 病症分诊及路径优化方法、装置、设备及存储介质
CN110085307A (zh) * 2019-04-04 2019-08-02 华东理工大学 一种基于多源知识图谱融合的智能导诊方法和***
CN110069631A (zh) * 2019-04-08 2019-07-30 腾讯科技(深圳)有限公司 一种文本处理方法、装置以及相关设备
CN110516260A (zh) * 2019-08-30 2019-11-29 腾讯科技(深圳)有限公司 实体推荐方法、装置、存储介质及设备
CN110993078A (zh) * 2019-11-27 2020-04-10 华中科技大学同济医学院附属协和医院 一种医疗分诊方法、装置和存储介质
CN111310471A (zh) * 2020-01-19 2020-06-19 陕西师范大学 一种基于bblc模型的旅游命名实体识别方法
CN111339277A (zh) * 2020-02-28 2020-06-26 中国工商银行股份有限公司 基于机器学习的问答交互方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113707304A (zh) * 2021-08-30 2021-11-26 平安国际智慧城市科技股份有限公司 分诊数据处理方法、装置、设备及存储介质
CN113707304B (zh) * 2021-08-30 2023-08-01 深圳平安智慧医健科技有限公司 分诊数据处理方法、装置、设备及存储介质
CN114974554A (zh) * 2022-02-23 2022-08-30 北京爱医声科技有限公司 融合图谱知识强化病历特征的方法、装置及存储介质

Also Published As

Publication number Publication date
CN111785368A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
WO2021139232A1 (zh) 基于医疗知识图谱的分诊方法、装置、设备及存储介质
US11810671B2 (en) System and method for providing health information
CN111274373B (zh) 一种基于知识图谱的电子病历问答方法及***
CN110069631B (zh) 一种文本处理方法、装置以及相关设备
CN111538894B (zh) 查询反馈方法、装置、计算机设备及存储介质
US11281861B2 (en) Method of calculating relevancy, apparatus for calculating relevancy, data query apparatus, and non-transitory computer-readable storage medium
WO2021189971A1 (zh) 基于知识图谱表征学习的医疗方案推荐***及方法
CN110675944A (zh) 分诊方法及装置、计算机设备及介质
US11670420B2 (en) Drawing conclusions from free form texts with deep reinforcement learning
CN111613339A (zh) 一种基于深度学习的相似病历查找方法与***
WO2023178971A1 (zh) 就医的互联网挂号方法、装置、设备及存储介质
CN112667799B (zh) 一种基于语言模型和实体匹配的医疗问答***构建方法
CN102663129A (zh) 医疗领域深度问答方法及医学检索***
WO2019232893A1 (zh) 文本的情感分析方法、装置、计算机设备和存储介质
CN113889259A (zh) 一种知识图谱辅助下的自动诊断对话***
CN111353049A (zh) 数据更新方法、装置、电子设备及计算机可读存储介质
CN113569023A (zh) 一种基于知识图谱的中文医药问答***及方法
WO2021139231A1 (zh) 基于神经网络模型的分诊方法、装置和计算机设备
CN111858940A (zh) 一种基于多头注意力的法律案例相似度计算方法及***
CN113764112A (zh) 一种在线医疗问答方法
WO2023098445A1 (zh) 食品安全突发事件的应急处置推荐方法及***
CN113851219A (zh) 一种基于多模态知识图谱的智能导诊方法
CN111897829A (zh) 一种用于医疗软件的自然语言查询方法及设备
CN114492443A (zh) 训练实体识别模型的方法及***和实体识别方法及***
CN117149998B (zh) 基于多目标优化的智能就诊推荐方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912581

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912581

Country of ref document: EP

Kind code of ref document: A1