CN113284577B - Medicine prediction method, device, equipment and storage medium - Google Patents

Medicine prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN113284577B
CN113284577B CN202110566394.7A CN202110566394A CN113284577B CN 113284577 B CN113284577 B CN 113284577B CN 202110566394 A CN202110566394 A CN 202110566394A CN 113284577 B CN113284577 B CN 113284577B
Authority
CN
China
Prior art keywords
inquiry
data
historical
medicine
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110566394.7A
Other languages
Chinese (zh)
Other versions
CN113284577A (en
Inventor
吴汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202110566394.7A priority Critical patent/CN113284577B/en
Publication of CN113284577A publication Critical patent/CN113284577A/en
Priority to PCT/CN2022/088787 priority patent/WO2022247549A1/en
Application granted granted Critical
Publication of CN113284577B publication Critical patent/CN113284577B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • G16H20/13ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients delivered from dispensers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the field of artificial intelligence, and discloses a medicine prediction method, device, equipment and storage medium, which are used for solving the technical problem of lower accuracy in medicine prediction by a medicine prediction method in the prior art. The method comprises the following steps: acquiring a plurality of historical inquiry records in authorized historical inquiry data, and extracting first inquiry characteristics; counting the number of historical inquiry records corresponding to each first inquiry feature, and generating distribution data of each first inquiry feature in the historical inquiry data; cleaning historical inquiry data, and training a preset deep learning tool according to an inquiry data training set formed by distribution data to obtain a medicine prediction model; acquiring a consultation information text based on the medicine prediction request, extracting a second consultation feature according to the consultation information text, and inputting the second consultation feature into a medicine prediction model for prediction to obtain a medicine prediction result. In addition, the invention also relates to a blockchain technology, and related information of medicine prediction can be stored in the blockchain.

Description

Medicine prediction method, device, equipment and storage medium
Technical Field
The present invention relates to the field of artificial intelligence, and in particular, to a method, apparatus, device, and storage medium for predicting a drug.
Background
When a patient is in a doctor visit, the doctor needs to comprehensively judge the illness state through the illness state description and the inspection result of the patient, and select the medication of the patient according to the judgment result; with the technical development of the artificial intelligence field, artificial intelligence is gradually adopted in various industries to assist or replace people to do some simple work, for example: drug prediction is performed according to the inquiry information, and a doctor or patient can perform drug selection based on the predicted result.
However, in the existing medicine prediction method, data learning and training are required, in the current learning and training process, certain regularity and specificity of original inquiry data in the data set are not considered in cleaning the training data set, and the inquiry data in the data set are directly cleaned, so that the regularity and the specificity of the data set are damaged to a certain extent in the data processing process, and the medicine prediction model obtained through training is inaccurate in recommendation, and further the accuracy of medicine prediction is reduced.
Disclosure of Invention
The invention mainly aims to solve the technical problem of lower prediction accuracy of a medicine prediction method in the prior art.
The first aspect of the present invention provides a drug prediction method, comprising: acquiring authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records; counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number; cleaning the history inquiry records corresponding to each first inquiry feature, and forming an inquiry data training set by the cleaned history inquiry records and the corresponding distribution data; training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model; after receiving a medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting second query characteristics in the query information text; and inputting the second inquiry feature into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry feature.
Optionally, in a first implementation manner of the first aspect of the present invention, the acquiring authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data includes: acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data; extracting inquiry information features and using medicine information in the history inquiry character string data, and calculating a correlation coefficient between the inquiry information features and the using medicine information; screening out the inquiry information features of which the correlation coefficients meet the preset correlation coefficient conditions, and obtaining first inquiry features.
Optionally, in a second implementation manner of the first aspect of the present invention, the screening out the feature of the inquiry information that the correlation coefficient meets the preset correlation coefficient condition, and obtaining the first inquiry feature includes: sequencing the correlation coefficients from high to low according to the correlation coefficient values to obtain a correlation coefficient sequence; and sequentially screening out a plurality of inquiry information features in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information features as first inquiry features.
Optionally, in a third implementation manner of the first aspect of the present invention, the counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating the distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number includes: classifying the historical inquiry records according to the using medicine information to obtain a classified inquiry record set; calling a principal component analysis method to analyze first inquiry features in the classified inquiry record set to obtain first inquiry features with the largest correlation in the classified inquiry record set, and marking the first inquiry features with the largest correlation as main features related to the classified inquiry records; distribution data of a first inquiry feature in the historical inquiry data is generated based on the number of the historical inquiry records containing each main feature in the historical inquiry records.
Optionally, in a fourth implementation manner of the first aspect of the present invention, cleaning the historical inquiry records corresponding to each first inquiry feature, and forming the cleaned historical inquiry records and the distribution data corresponding to the cleaned historical inquiry records into the inquiry data training set includes: performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set; performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set; and extracting historical inquiry data from the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data.
Optionally, in a fifth implementation manner of the first aspect of the present invention, performing data primary cleaning on the historical query data, and removing erroneous data, to obtain a primary cleaning data set includes: pre-cleaning the historical inquiry data, and removing dirty data to obtain a pre-cleaning data set; and carrying out validity matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a one-time cleaning data set.
Optionally, in a sixth implementation manner of the first aspect of the present invention, performing a secondary cleaning on the primary cleaning dataset, and removing the historical inquiry data that does not conform to the distribution data, to obtain the secondary cleaning dataset includes: acquiring the using medicine information in the primary cleaning data set, and drawing a box chart according to the type of the using medicine information and the first inquiry feature corresponding to the type of the medicine information; screening historical inquiry data in the primary cleaning data set based on the box graph to obtain abnormal data, and removing the abnormal data; and forming the remaining historical inquiry data in the primary cleaning data set into a secondary cleaning data set.
A second aspect of the present invention provides a medicine predicting apparatus comprising: the first feature acquisition module is used for acquiring authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records; the distribution data calculation module is used for counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating the distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number; the training set construction module is used for cleaning the history inquiry records corresponding to the first inquiry features and forming an inquiry data training set by the cleaned history inquiry records and the corresponding distribution data; the training module is used for training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model; the second feature acquisition module is used for acquiring a query information text corresponding to the medicine prediction request after receiving the medicine prediction request, and extracting second query features in the query information text; and the prediction module is used for inputting the second inquiry feature into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry feature.
Optionally, in a first implementation manner of the second aspect of the present invention, the first feature obtaining module includes: the character string acquisition unit is used for acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data; a correlation coefficient calculation unit for extracting inquiry information features and using medicine information in the history inquiry character string data, and calculating a correlation coefficient between the inquiry information features and the using medicine information; and the feature screening unit is used for screening out the inquiry information features of which the correlation coefficients meet the preset correlation coefficient conditions to obtain first inquiry features.
Optionally, in a second implementation manner of the second aspect of the present invention, the feature screening unit is specifically configured to: sequencing the correlation coefficients from high to low according to the correlation coefficient values to obtain a correlation coefficient sequence; and sequentially screening out a plurality of inquiry information features in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information features as first inquiry features.
Optionally, in a third implementation manner of the second aspect of the present invention, the distributed data calculating module includes: the data classification unit is used for classifying the historical inquiry records according to the using medicine information to obtain a classified inquiry record set; the feature analysis unit is used for calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry feature with the largest correlation in the classified inquiry record set, and marking the first inquiry feature with the largest correlation as the main feature related to the classified inquiry record; a calculation unit configured to generate distribution data of a first inquiry feature in the historical inquiry data based on the number of the historical inquiry records containing each of the main features in the historical inquiry records.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the training set construction module includes: the primary cleaning unit is used for carrying out primary data cleaning on the historical inquiry data and removing error data to obtain a primary cleaning data set; the secondary cleaning unit is used for performing secondary cleaning on the primary cleaning data set, removing the historical inquiry data which do not accord with the distribution data, and obtaining a secondary cleaning data set;
the training set construction unit is used for extracting historical inquiry data from the secondary cleaning data set according to the distribution data, and forming an inquiry data training set from the extracted historical inquiry data.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the data cleaning unit includes: the pre-cleaning subunit is used for pre-cleaning the historical inquiry data and removing dirty data to obtain a pre-cleaning data set; and the legal cleaning subunit is used for carrying out legal matching cleaning on the pre-cleaning data set, removing illegal data and obtaining a one-time cleaning data set.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the secondary cleaning unit includes: the box-type drawing subunit is used for acquiring the used medicine information in the primary cleaning data set and drawing a box-type drawing according to the type of the used medicine information and the first inquiry feature corresponding to the type of the medicine information; an abnormal value removing subunit, configured to screen the historical inquiry data in the primary cleaning data set based on the box graph, obtain abnormal data, and remove the abnormal data; and the data set construction subunit is used for forming the historical inquiry data remained in the primary cleaning data set into a secondary cleaning data set.
A third aspect of the present application provides a medicine predicting apparatus comprising: a memory and at least one processor, the memory having instructions stored therein; the at least one processor invokes the instructions in the memory to cause the drug prediction device to perform the drug prediction method described above.
A fourth aspect of the present application provides a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the above-described drug prediction method.
In the technical scheme provided by the application, the historical inquiry data are obtained, all first inquiry characteristics in the historical inquiry data are extracted, wherein the historical inquiry data comprise a plurality of historical inquiry records; counting the number of the historical inquiry records corresponding to each first inquiry feature in the historical inquiry records, and generating distribution data of the corresponding first inquiry feature in the historical inquiry data; cleaning the historical inquiry records corresponding to each first inquiry feature, and forming an inquiry data training set according to the generated distribution data; invoking the inquiry data training set to train a preset deep learning tool to obtain a medicine prediction model; after receiving the medicine prediction request, acquiring a consultation information text, and extracting a second consultation characteristic; and inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature. In the embodiment of the application, when the data processing is performed on the inquiry data training set for generating the medicine prediction model, the original distribution data in the historical inquiry data is processed, so that the accuracy of the medicine prediction model is improved, and the accuracy of medicine prediction is improved.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for predicting drugs according to the present application;
FIG. 2 is a schematic diagram of another embodiment of a drug prediction method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of another embodiment of a drug prediction method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of another embodiment of a drug prediction method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a box diagram used in an embodiment of the application;
FIG. 6 is a schematic diagram of an embodiment of a medicine predicting apparatus according to the present application;
FIG. 7 is a schematic diagram of another embodiment of a medicine predicting apparatus according to the present application;
fig. 8 is a schematic diagram of an embodiment of a medicine predicting apparatus according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides a medicine prediction method, a device, equipment and a storage medium, which specifically comprise the steps of acquiring historical inquiry data and extracting all first inquiry characteristics in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records; counting the number of the historical inquiry records corresponding to each first inquiry feature in the historical inquiry records, and generating distribution data of the corresponding first inquiry feature in the historical inquiry data; cleaning the historical inquiry records corresponding to each first inquiry feature, and forming an inquiry data training set according to the generated distribution data; invoking the inquiry data training set to train a preset deep learning tool to obtain a medicine prediction model; after receiving the medicine prediction request, acquiring a consultation information text, and extracting a second consultation characteristic; and inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature. In the embodiment of the application, when the data processing is performed on the inquiry data training set for generating the medicine prediction model, the original distribution data in the historical inquiry data is processed, so that the accuracy of the medicine prediction model is improved, and the accuracy of medicine prediction is improved.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
For ease of understanding, a specific flow of an embodiment of the present invention is described below with reference to fig. 1, and an embodiment of a method for predicting a drug in an embodiment of the present invention includes:
101. acquiring authorized historical inquiry data, and extracting all first inquiry features in the historical inquiry data;
it is to be understood that the execution subject of the present invention may be a drug prediction device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
In this embodiment, the history inquiry data includes a plurality of history inquiry records, and the contents in the history inquiry records on the inquiry platform or the network in this embodiment are extracted by an information extraction tool, where the history inquiry records on the inquiry platform or the network in this embodiment are information data of acquiring the usage rights after the inquiry party agrees, and the information data includes contents such as patient information of inquiry, inquiry information, diagnosis result, and medication information, where the patient information includes information such as age, sex, inoculation condition, allergy history, and contraindication of the patient; the inquiry information includes information such as the department of diagnosis, the content of the complaint, etc. The obtained information data comprises a plurality of structured, semi-structured and unstructured data types, so that the information data are firstly arranged and unified in data format to obtain historical inquiry data.
Because the patient information and the inquiry information contained in the historical inquiry data have a certain correlation with the diagnosis result, and the diagnosis result has a certain direct correlation with the medication information, in this embodiment, the filtering method is used to extract the data features contained in the obtained historical inquiry data, and the correlation degree scoring is performed according to the data features and the medication information, the data features with higher correlation degree are selected according to the correlation degree scoring, and the data features with higher correlation degree are stored as the first inquiry features, and all the first inquiry features contained in the historical inquiry data are extracted.
102. Counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating distribution data of the corresponding first inquiry features in the historical inquiry data based on the number;
in this embodiment, after the first inquiry feature is obtained, the obtained historical inquiry records are searched and screened according to each first inquiry feature, the number of the historical inquiry records containing each first inquiry feature in the historical inquiry records is calculated, and the obtained number information of a plurality of historical inquiry records containing different inquiry features forms feature statistics. And calculating distribution data of each first inquiry feature in the historical inquiry data based on the feature statistics.
Further, since one piece of historical inquiry data may include a plurality of first inquiry features, the embodiment may further extract the first inquiry feature with the greatest influence in the acquired plurality of pieces of historical inquiry data by using the principal component analysis method, obtain the first inquiry feature with the greatest correlation in each piece of historical inquiry data, and mark the first inquiry feature with the greatest correlation as the first inquiry feature of the corresponding historical inquiry data. And then classifying according to the first inquiry feature with the largest correlation in each piece of historical inquiry data to obtain a plurality of historical inquiry data classification sets. And counting the number of the historical inquiry data contained in the obtained multiple historical inquiry data classification sets to obtain feature statistical data. Distribution data of the first inquiry feature is calculated according to the feature statistics. For example: the number of acquired historical inquiry data related to the gynecological patients is a, and the number of all the acquired historical inquiry data obtained in the previous step is 10a, so that the distribution data of the first inquiry feature of the gynecological can be calculated to be 10%.
103. Cleaning the history inquiry records corresponding to each first inquiry feature, and forming an inquiry data training set by the cleaned history inquiry records and the corresponding distribution data;
because the historical inquiry records in the historical inquiry data have certain regularity and specificity, if the data are not directly screened according to the regularity and the specificity of the data, the structure of the data set is damaged, so in the embodiment, when the historical inquiry data are cleaned, the historical inquiry records are firstly classified according to the first inquiry characteristics, firstly, the classified inquiry record sets are respectively cleaned by data to remove dirty data and noise interference, a cleaned primary cleaning data set is obtained, and then, based on the distribution data obtained in the previous step, the cleaning inquiry data are extracted from the primary cleaning data set to form an inquiry data training set according to the corresponding distribution data.
The obtained distribution data in the questioning data training set is kept the same as the original data distribution in the historical questioning data set, so that the original regularity and specificity of the distribution data in the questioning data training set are ensured, the structure of the data set is prevented from being damaged in the data cleaning process, and the classified historical questioning data are respectively cleaned, for example: the data of a certain medicine corresponding to a certain first inquiry feature accounts for 10% of gynecological patients, but because the data of the gynecological patients accounts for only 10% of the historical inquiry data sets, the data of the certain medicine corresponding to the first inquiry feature accounts for only 1% of the whole historical inquiry data sets, and if the data are directly screened, the data can be cleared, so that the integrity of the historical inquiry data can be damaged.
104. Training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
after the cleaned inquiry data training set in the previous step is obtained, the historical inquiry data in the inquiry data training set is divided into a training set, a testing set and a verification set, wherein the distribution data of the historical inquiry data in the training set, the testing set and the verification set are identical to the distribution data in the inquiry data training set. Training a preset deep learning tool by adopting the training set, the testing set and the verification set, wherein the preset deep learning tool comprises a deep learning algorithm, the original parameters in the deep learning algorithm are adjusted based on the inquiry data training set, training parameters are obtained, and a medicine prediction model is obtained based on the training parameters.
105. After receiving the medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting a second query feature in the query information text;
after the medicine prediction model is built, receiving a medicine prediction request, acquiring a query information text which is contained in the medicine prediction request and corresponds to the medicine prediction request, and extracting second query characteristics which are contained in the query information text and are currently received according to the content in the query information text, wherein the second query characteristics are similar to the content of the first query characteristics extracted in the previous step, namely, the information such as the age, the sex, the inoculation condition, the allergic history, the tabu and the like of the patient is also acquired through acquiring the query information text; the inquiry information comprises information such as a consultation department, a main complaint content and the like, and the data features obtained by screening in the previous steps are matched according to the obtained information to obtain second inquiry features contained in the current inquiry information text.
106. And inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature.
And inputting the second inquiry feature into the constructed medicine prediction model for processing, and outputting a medicine prediction result corresponding to the second inquiry feature. The medicine prediction result is a candidate medicine which is output after being processed by a medicine prediction model according to the medicine used in the historical inquiry data. In addition, after outputting the candidate medicine, the medicine prediction model can also search the substitute medicine which is the same as or highly similar to the candidate medicine in the medicine database based on the obtained candidate medicine according to the pre-established medicine database as a recommended medicine output result.
According to the method and the device for predicting the medicine, when the data processing is carried out on the inquiry data training set for generating the medicine prediction model, the original distribution data in the historical inquiry data are processed, so that the accuracy of the medicine prediction method for predicting the medicine is improved.
Referring to fig. 2, another embodiment of the method for predicting drugs according to the present application includes:
201. acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
It is to be understood that the execution subject of the present invention may be a drug prediction device, and may also be a terminal or a server, which is not limited herein. The embodiment of the invention is described by taking a server as an execution main body as an example.
In this embodiment, the historical inquiry records on the inquiry platform or the network in this embodiment are extracted by the information extraction tool, where the historical inquiry records on the inquiry platform or the network in this embodiment are information data of the usage rights obtained after the inquiry party agrees, the historical inquiry records are encoded according to characters, and are converted into machine-readable character string data for storage.
202. Extracting inquiry information features and using medicine information in the history inquiry character string data, and calculating a correlation coefficient between the inquiry information features and the using medicine information;
203. screening out inquiry information features of which the correlation coefficients meet the preset correlation coefficient conditions, and obtaining first inquiry features;
extracting character string data corresponding to the history inquiry data obtained in the previous step, wherein the character string data comprises the contents of inquiry patient information, inquiry information, diagnosis results, medication information and the like, and the patient information comprises the information such as the age, sex, inoculation condition, allergy history, tabu and the like of the patient; the inquiry information includes information such as the department of diagnosis, the content of the complaint, etc. And extracting the characteristics contained in the information by using a filtering method, carrying out Deou inquiry on the characteristic information, and storing the information of the used medicine in each piece of character string data. Specifically, when the usage medicine information is stored, a medicine name information base is acquired in advance, different trade names indicating the same kind of medicine are associated, and when the usage medicine information is acquired, the inquiry information of the different trade names of the same kind of medicine is used as the inquiry information of the same kind of usage medicine.
Because the patient information, the inquiry information and the diagnosis result contained in the character string data corresponding to the historical inquiry data have a certain correlation, and the diagnosis result and the medication information have a certain direct correlation, in the embodiment, the single variable feature selection method is used for extracting the data features contained in the acquired historical inquiry data, the correlation degree scoring is carried out according to the data features and the medication information, the data features with higher correlation degree are selected according to the correlation degree scoring, and the data features with higher correlation degree are stored as the first inquiry features.
Specifically, a correlation coefficient between the features of the inquiry information and the using drug information is calculated, the correlation is scored according to the correlation coefficient, N features with the highest score of preset values or the features with the highest correlation score of a certain percentage are reserved, a common univariate statistical test can be applied to each feature, a false positive rate (Fpr, false positive rate), a false discovery rate (Fdr, false discovery rate) or a family error (Fwe, family wise error) is counted, so that the inquiry information feature meeting the correlation coefficient threshold is selected, and the inquiry information feature meeting the correlation coefficient threshold is saved as a first inquiry feature.
204. Counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating distribution data of the corresponding first inquiry features in the historical inquiry data based on the number;
the specific content in this step is substantially the same as that in step 102 in the previous embodiment, so that the description is omitted here.
205. Performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set;
firstly, cleaning historical inquiry data for one time to remove error data, wherein the primary cleaning in the step mainly uses a data cleaning technology to clean the data in a large scale, specifically, firstly, cleaning and removing the error data in the historical inquiry data, for example, the data which is insufficient to be used as training data and has quality problems, wherein the data can be specifically expressed as the data with the inquiry dialogue being completely interrupted; data lacking necessary characteristic items such as age, sex, and prescription results, etc.; data are obviously abnormal, such as data with age obviously exceeding normal values; there are data that are clearly unreasonable such as male 40 year old department showing pediatric etc.; or significantly repeated data. After the error data are taken out, a cleaning data set is obtained.
206. Performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
and (3) calling a data distribution analysis method to analyze data distribution characteristics of the primary cleaning data obtained in the previous step, and removing extreme values in the data to obtain a secondary cleaning data set. For example, the 99 year old male urological data that has been presented once belongs to a special minimum lot data, and such special minimum lot data is deleted.
207. Extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set from the extracted historical inquiry data;
the historical inquiry data is extracted from the secondary cleaning data set, specifically, the secondary cleaning data set is extracted according to the distribution data when the extraction is performed, the extracted historical inquiry data form an inquiry data training set, wherein the distribution data in the obtained inquiry data training set is identical with the original distribution data of the first inquiry feature obtained in the previous step, so that the original regularity and specificity in the historical inquiry data set are reserved in the inquiry data training set obtained in the embodiment, and the training effect of the prediction model is better.
208. Training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
209. after receiving the medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting a second query feature in the query information text;
210. and inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature.
The specific contents of steps 208, 209 and 210 are substantially the same as those of steps 104, 105 and 106 in the previous embodiment, so that the description thereof will not be repeated here.
In the embodiment of the application, when the data processing is carried out on the inquiry data training set for generating the medicine prediction model, firstly inquiry characteristics in the historical inquiry data are obtained, original distribution data in the historical inquiry data are calculated according to the inquiry characteristics, and the inquiry data training set is generated according to the distribution data so as to obtain the medicine prediction model.
Referring to fig. 3, another embodiment of the method for predicting drugs according to the present application includes:
301. acquiring authorized historical inquiry data, and extracting all first inquiry features in the historical inquiry data;
The specific content in this step is substantially the same as that in step 101 in the foregoing embodiment, so that the description thereof will not be repeated here.
302. Classifying the historical inquiry records according to the using medicine information to obtain a classified inquiry record set;
and acquiring the used medicine information in the history inquiry records, specifically, acquiring a medicine name information base in advance when the used medicine information is stored, associating different trade names representing the same medicine, and taking the inquiry information of the different trade names of the same medicine as the history inquiry records of the same used medicine when the used medicine information is acquired. And classifying the historical inquiry records according to the using medicine information to obtain a plurality of classified inquiry record sets.
303. Calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry feature with the largest correlation in the classified inquiry record set, and marking the first inquiry feature with the largest correlation as the main feature related to the classified inquiry record;
the method comprises the steps of collecting a plurality of classified inquiry records, wherein the same classified inquiry record set contains a plurality of first inquiry features, the first inquiry features contain a plurality of related categories such as patient information, inquiry information and the like, the first inquiry features in the classified historical inquiry data set are analyzed by calling a principal component analysis method for the purpose of regular data sets during calculation, and a plurality of first inquiry features with the largest correlation of the classified inquiry record set are selected to be used as the first inquiry features of the current classified inquiry record set for marking the classified inquiry record set. Principal component analysis (Principal Component Analysis, PCA) is a statistical method. A group of variables which possibly have correlation are converted into a group of linear uncorrelated variables through positive-negative conversion, and the converted group of variables is called a main component, and in the embodiment, a plurality of first inquiry features with the largest correlation in a data set are selected as main features.
304. Generating distribution data of the first inquiry feature in the historical inquiry data based on the number of the historical inquiry records containing each main feature in the historical inquiry records;
after the main features are obtained, counting the number of the history inquiry records containing each main feature in the history inquiry data, taking the obtained statistical result as feature statistical data, and calling a linear regression analysis method to calculate the feature statistical data to obtain the distribution data of each main feature in the history inquiry data, for example: the number of acquired historical inquiry data related to the gynecological patients is a, and the number of all the acquired historical inquiry data obtained in the previous step is 10a, so that the distribution data of the first inquiry feature of the gynecological can be calculated to be 10%.
305. Performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set;
306. performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
307. extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set from the extracted historical inquiry data;
The details of steps 305, 306 and 307 are substantially the same as those of steps 205, 206 and 207 in the previous embodiment, so that the details are not repeated here,
308. training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
309. after receiving the medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting a second query feature in the query information text;
310. and inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature.
The details of steps 308, 309 and 310 are substantially the same as those of steps 104, 105 and 106 in the previous embodiment, and thus are not described here again.
In the embodiment of the application, when the data processing is carried out on the inquiry data training set for generating the medicine prediction model, firstly inquiry characteristics in the historical inquiry data are obtained, original distribution data in the historical inquiry data are calculated according to the inquiry characteristics, and the inquiry data training set is generated according to the distribution data so as to obtain the medicine prediction model.
Referring to fig. 4 and 5, another embodiment of the method for predicting drugs according to the present invention includes:
401. acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
the specific content in this step is substantially the same as that in step 201 in the previous embodiment, so that the description thereof will not be repeated here.
402. Extracting inquiry information features and using medicine information in the history inquiry character string data, and calculating a correlation coefficient between the inquiry information features and the using medicine information;
403. sequencing the correlation coefficients according to the correlation coefficient values from high to low to obtain a correlation coefficient sequence;
404. sequentially screening out a plurality of inquiry information features according to the sequence of the correlation coefficients in the correlation coefficient sequence, and taking the screened inquiry information features as first inquiry features;
extracting character string data corresponding to the history inquiry data obtained in the previous step, wherein the character string data comprises the contents of inquiry patient information, inquiry information, diagnosis results, medication information and the like, and the patient information comprises the information such as the age, sex, inoculation condition, allergy history, tabu and the like of the patient; the inquiry information includes information such as the department of diagnosis, the content of the complaint, etc. And extracting the characteristics contained in the information by using a filtering method, carrying out Deou inquiry on the characteristic information, and storing the information of the used medicine in each piece of character string data. Specifically, when the usage medicine information is stored, a medicine name information base is acquired in advance, different trade names indicating the same kind of medicine are associated, and when the usage medicine information is acquired, the inquiry information of the different trade names of the same kind of medicine is used as the inquiry information of the same kind of usage medicine.
Because the patient information, the inquiry information and the diagnosis result contained in the historical inquiry character string data have a certain correlation, and the diagnosis result and the medication information have a certain direct correlation, in this embodiment, the univariate feature selection method is used to extract the data features contained in the obtained historical inquiry character string data, the correlation degree is scored according to the data features and the medication information, the data features with higher correlation degree are selected according to the correlation degree score, and the data features with higher correlation degree are stored as the first inquiry features.
Specifically, a correlation coefficient is calculated according to the inquiry information characteristics and the using medicine information, and the correlation is scored according to the correlation coefficient, so that a correlation coefficient score value is obtained. And sequencing the inquiry information features and the using medicine information from high to low according to the correlation degree according to the score value of the correlation coefficient to obtain a correlation coefficient sequence.
After the correlation coefficient sequence is obtained, at least one inquiry information feature is selected according to the ordering condition of the correlation coefficients in the correlation coefficient sequence and the ordering order. The first N features of the correlation coefficient sequence or the first M% of the inquiry information features in the correlation coefficient sequence may be reserved, and the screened inquiry information features are used as the first inquiry features.
In addition, a common univariate statistical test may be applied to each feature to calculate a false positive rate (Fpr, false positive rate), a false discovery rate (Fdr, false discovery rate), or a family error (Fwe, family wise error), so as to select a query information feature that meets the correlation coefficient threshold, and store the query information feature that meets the correlation coefficient threshold as the first query feature.
405. Classifying the historical inquiry records according to the using medicine information to obtain a classified inquiry record set;
406. calling a principal component analysis method to analyze the first inquiry features in the classified inquiry record set to obtain the first inquiry feature with the largest correlation in the classified inquiry record set, and marking the first inquiry feature with the largest correlation as the main feature related to the classified inquiry record;
407. generating distribution data of the first inquiry feature in the historical inquiry data based on the number of the historical inquiry records containing each main feature in the historical inquiry records;
the specific contents of steps 405, 406 and 407 are substantially the same as those of steps 302, 303 and 304 in the previous embodiment, so that the details are not repeated here.
408. Pre-cleaning historical inquiry data, and removing dirty data to obtain a pre-cleaning data set;
The data is cleaned in a large scale by using a data cleaning technology, in the step, firstly, the data containing quality problems in the historical inquiry data is cleaned and removed to obtain primary cleaning data, for example, the data which is insufficient to be used as training data and has the quality problems is obtained, wherein the data can be specifically represented as the data with the inquiry dialogue being completely interrupted; data lacking necessary characteristic items such as age, sex, and prescription results, etc.; data are obviously abnormal, such as data with age obviously exceeding normal values; there are significant errors in data such as male 40 year old department showing pediatric etc.
409. Performing validity matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a primary cleaning data set;
after primary cleaning data is obtained, the obtained first inquiry feature is utilized to carry out regular matching on the primary cleaning data set, specifically, a regular expression which is legally matched in the step is established in advance, the regular expression is called to filter character strings of the primary cleaning data obtained in the previous step, unnecessary characters are removed, and the cleaned historical inquiry data set is obtained.
410. Acquiring the information of the used medicines in the primary cleaning data set, and drawing a box chart according to the type of the information of the used medicines and the first inquiry characteristics corresponding to the type of the information of the medicines;
Referring to fig. 5, the usage drug information in the cleaning inquiry data set is acquired, each drug information type is used as a number axis, and a box diagram is drawn according to the corresponding first inquiry feature. The Box-plot is also called Box whisker plot, box plot or Box plot, is a statistical plot used for displaying a group of data dispersion condition data, is mainly used for reflecting the distribution characteristics of original data, and can also be used for comparing multiple groups of data distribution characteristics. The box diagram drawing method comprises the following steps: firstly, finding out the upper edge, the lower edge, the median and two quartiles of a group of data; then, connecting two quartiles to draw a box body; and then the upper edge and the lower edge are connected with the box body, and the median is arranged in the middle of the box body. In the step, a box type diagram corresponding to the cleaning inquiry data set is drawn through the steps.
411. Screening historical inquiry data in the primary cleaning data set based on the box graph to obtain abnormal data, and removing the abnormal data;
412. the historical inquiry data remained in the primary cleaning data set is formed into a secondary cleaning data set;
with continued reference to FIG. 5, after the box diagram is obtained, the box is then openedScreening the content in the pattern for data outliers, specifically, outliers defined as less than Q 1 -1.5IQR or greater than Q 3 A value of +1.5IQR, where Q 3 And Q 1 Respectively representing the upper quartile and the lower quartile of the data batch, IQR representing the quartile range, removing the abnormal value of the data, and forming the residual cleaning inquiry data into a secondary cleaning data set. The scheme in this embodiment can achieve the purpose of noise removal and abnormal value (outliers) interference by screening and removing the abnormal value of the data, for example, the 99 year old male urological department data only appearing once belongs to special minimum batch data, and the accuracy of the subsequent model prediction can be improved to a certain extent after the abnormal value is removed.
413. Extracting historical inquiry data in the secondary cleaning data set according to the distribution data, and forming an inquiry data training set from the extracted historical inquiry data;
the specific content in this step is substantially the same as that in step 207 in the previous embodiment, so that the description thereof will not be repeated here.
414. Training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
415. after receiving the medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting a second query feature in the query information text;
416. And inputting the second inquiry feature into a medicine prediction model to perform medicine prediction, and obtaining a medicine prediction result corresponding to the second inquiry feature.
The details of steps 414, 415 and 416 are substantially the same as those of steps 104, 105 and 106 in the previous embodiment, and thus are not described here.
In the embodiment of the application, when the data processing is performed on the inquiry data training set for generating the medicine prediction model, firstly, the distribution data according to the historical inquiry data is calculated according to the inquiry characteristics in the historical inquiry data, and after the historical inquiry data are cleaned and screened, the inquiry data training set is generated according to the obtained distribution data of the historical inquiry data, so that the regularity and the specificity of the original historical inquiry data are reserved, and the accuracy of the medicine prediction method for the medicine prediction is improved.
The method for predicting a drug in the embodiment of the present application is described above, and the device for predicting a drug in the embodiment of the present application is described below, referring to fig. 6, an embodiment of the device for predicting a drug in the embodiment of the present application includes:
a first feature acquiring module 601, configured to acquire authorized historical inquiry data, and extract all first inquiry features in the historical inquiry data, where the historical inquiry data includes a plurality of historical inquiry records;
The distribution data calculation module 602 is configured to count historical inquiry records in the historical inquiry data according to the first inquiry features, obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generate distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number;
the training set construction module 603 is configured to clean the historical inquiry records corresponding to each first inquiry feature, and form an inquiry data training set from the cleaned historical inquiry records and the distribution data corresponding to the cleaned historical inquiry records;
the training module 604 is configured to train a preset deep learning tool according to the questioning data training set to obtain a drug prediction model;
a second feature obtaining module 605, configured to obtain, after receiving a drug prediction request, a query information text corresponding to the drug prediction request, and extract a second query feature in the query information text;
and a prediction module 606, configured to input the second inquiry feature into the drug prediction model to perform drug prediction, so as to obtain a drug prediction result corresponding to the second inquiry feature.
According to the embodiment of the application, when the data processing is performed on the inquiry data training set for generating the medicine prediction model, the original distribution data in the historical inquiry data is processed, so that the accuracy of the medicine prediction device for medicine prediction is improved.
Referring to fig. 7, another embodiment of the drug predicting apparatus according to the present invention includes:
a first feature acquiring module 601, configured to acquire authorized historical inquiry data, and extract all first inquiry features in the historical inquiry data, where the historical inquiry data includes a plurality of historical inquiry records;
the distribution data calculation module 602 is configured to count historical inquiry records in the historical inquiry data according to the first inquiry features, obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generate distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number;
the training set construction module 603 is configured to clean the historical inquiry records corresponding to each first inquiry feature, and form an inquiry data training set from the cleaned historical inquiry records and the distribution data corresponding to the cleaned historical inquiry records;
the training module 604 is configured to train a preset deep learning tool according to the questioning data training set to obtain a drug prediction model;
a second feature obtaining module 605, configured to obtain, after receiving a drug prediction request, a query information text corresponding to the drug prediction request, and extract a second query feature in the query information text;
And a prediction module 606, configured to input the second inquiry feature into the drug prediction model to perform drug prediction, so as to obtain a drug prediction result corresponding to the second inquiry feature.
Optionally, the first feature acquisition module 601 includes:
the character string acquisition unit 6011 is configured to acquire a plurality of history inquiry records in authorized history inquiry data, and perform format conversion on the history inquiry records to obtain history inquiry character string data;
a correlation coefficient calculating unit 6012 configured to extract inquiry information features and usage drug information in the history inquiry character string data, and calculate a correlation coefficient between the inquiry information features and the usage drug information;
and the feature screening unit 6013 is configured to screen out the inquiry information features of which the correlation coefficients meet the preset correlation coefficient conditions, so as to obtain a first inquiry feature.
Optionally, the feature screening unit 6013 specifically functions to:
sequencing the correlation coefficients from high to low according to the correlation coefficient values to obtain a correlation coefficient sequence;
and sequentially screening out a plurality of inquiry information features in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information features as first inquiry features.
Optionally, the distributed data calculation module 602 includes:
a data classification unit 6021, configured to classify the historical inquiry records according to the usage drug information, so as to obtain a classified inquiry record set;
the feature analysis unit 6022 is configured to invoke a principal component analysis method to analyze the first query feature in the classified query record set, obtain a first query feature with the largest correlation in the classified query record set, and mark the first query feature with the largest correlation as a main feature related to the classified query record;
a calculation unit 6023 for generating distribution data of the first inquiry feature in the history inquiry data based on the number of history inquiry records containing each of the main features in the history inquiry records.
Optionally, the training set construction module 603 includes:
a primary cleaning unit 6031, configured to perform primary cleaning on the historical inquiry data, and remove error data to obtain a primary cleaning data set;
a secondary cleaning unit 6032, configured to perform secondary cleaning on the primary cleaning data set, and remove the historical inquiry data that does not conform to the distribution data, so as to obtain a secondary cleaning data set;
The training set construction unit 6033 is configured to extract historical inquiry data from the secondary cleaning data set according to the distribution data, and form an inquiry data training set from the extracted historical inquiry data.
Optionally, the primary cleaning unit 6031 includes:
the pre-cleaning subunit is used for pre-cleaning the historical inquiry data and removing dirty data to obtain a pre-cleaning data set;
and the legal cleaning subunit is used for carrying out legal matching cleaning on the pre-cleaning data set, removing illegal data and obtaining a one-time cleaning data set.
Optionally, the secondary cleaning unit 6032 includes:
the box-type drawing subunit is used for acquiring the used medicine information in the primary cleaning data set and drawing a box-type drawing according to the type of the used medicine information and the first inquiry feature corresponding to the type of the medicine information;
an abnormal value removing subunit, configured to screen the historical inquiry data in the primary cleaning data set based on the box graph, obtain abnormal data, and remove the abnormal data;
and the data set construction subunit is used for forming the historical inquiry data remained in the primary cleaning data set into a secondary cleaning data set.
In the embodiment of the application, when the data processing is performed on the inquiry data training set for generating the medicine prediction model, firstly, the distribution data according to the historical inquiry data is calculated according to the inquiry characteristics in the historical inquiry data, and after the historical inquiry data are cleaned and screened, the inquiry data training set is generated according to the obtained distribution data of the historical inquiry data, so that the regularity and the specificity of the original historical inquiry data are reserved, and the accuracy of the medicine prediction device for medicine prediction is improved.
The medicine predicting apparatus in the embodiment of the present application is described in detail above in fig. 6 and 7 from the point of view of modularized functional entities, and the medicine predicting device in the embodiment of the present application is described in detail below from the point of view of hardware processing.
Fig. 8 is a schematic diagram of a medicine predicting device according to an embodiment of the present application, where the medicine predicting device 800 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing application programs 833 or data 832. Wherein memory 820 and storage medium 830 can be transitory or persistent. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a series of instruction operations in the medication prediction apparatus 800. Still further, the processor 810 may be configured to communicate with the storage medium 830 and execute a series of instruction operations in the storage medium 830 on the medication prediction device 800.
Drug prediction device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input/output interfaces 860, and/or one or more operating systems 831, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the drug prediction device structure illustrated in fig. 8 is not limiting of the drug prediction device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The present invention also provides a medicine predicting apparatus including a memory and a processor, the memory storing computer readable instructions which, when executed by the processor, cause the processor to perform the steps of the medicine predicting method in the above embodiments.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, or a volatile computer readable storage medium, having stored therein instructions that, when executed on a computer, cause the computer to perform the steps of the drug prediction method.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A method of predicting a drug, the method comprising:
acquiring authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records;
counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number;
cleaning the history inquiry records corresponding to each first inquiry feature, and forming an inquiry data training set by the cleaned history inquiry records and the corresponding distribution data;
Training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
after receiving a medicine prediction request, acquiring a query information text corresponding to the medicine prediction request, and extracting second query characteristics in the query information text;
inputting the second inquiry feature into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry feature;
the step of cleaning the history inquiry records corresponding to the first inquiry features, and forming the cleaned history inquiry records and the distribution data corresponding to the cleaned history inquiry records into an inquiry data training set comprises the following steps:
performing data primary cleaning on the historical inquiry data, and removing error data to obtain a primary cleaning data set;
performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set;
and extracting historical inquiry data from the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data.
2. The drug substance prediction method of claim 1, wherein the acquiring authorized historical interview data and extracting all first interview features in the historical interview data comprises:
Acquiring a plurality of historical inquiry records in authorized historical inquiry data, and performing format conversion on the historical inquiry records to obtain historical inquiry character string data;
extracting inquiry information features and using medicine information in the history inquiry character string data, and calculating a correlation coefficient between the inquiry information features and the using medicine information;
screening out the inquiry information features of which the correlation coefficients meet the preset correlation coefficient conditions, and obtaining first inquiry features.
3. The method for predicting a drug according to claim 2, wherein screening out the features of the inquiry information whose correlation coefficients satisfy the preset correlation coefficient condition, and obtaining the first inquiry feature includes:
sequencing the correlation coefficients from high to low according to the correlation coefficient values to obtain a correlation coefficient sequence;
and sequentially screening out a plurality of inquiry information features in the correlation coefficient sequence according to the sequence of the correlation coefficients, and taking the screened inquiry information features as first inquiry features.
4. The drug prediction method according to claim 2 or 3, wherein the counting historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating the distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number comprises:
Classifying the historical inquiry records according to the using medicine information to obtain a classified inquiry record set;
calling a principal component analysis method to analyze first inquiry features in the classified inquiry record set to obtain first inquiry features with the largest correlation in the classified inquiry record set, and marking the first inquiry features with the largest correlation as main features related to the classified inquiry records;
distribution data of a first inquiry feature in the historical inquiry data is generated based on the number of the historical inquiry records containing each main feature in the historical inquiry records.
5. The method of claim 1, wherein performing a data one-time cleaning of the historical query data to remove erroneous data, the obtaining a one-time cleaning data set comprising:
pre-cleaning the historical inquiry data, and removing dirty data to obtain a pre-cleaning data set;
and carrying out validity matching cleaning on the pre-cleaning data set, and removing illegal data to obtain a one-time cleaning data set.
6. The method of claim 1 or 5, wherein performing a secondary cleaning on the primary cleaning dataset to remove historical interrogation data that does not match the distribution data, the secondary cleaning dataset comprising:
Acquiring the using medicine information in the primary cleaning data set, and drawing a box chart according to the type of the using medicine information and the first inquiry feature corresponding to the type of the medicine information;
screening historical inquiry data in the primary cleaning data set based on the box graph to obtain abnormal data, and removing the abnormal data;
and forming the remaining historical inquiry data in the primary cleaning data set into a secondary cleaning data set.
7. A drug prediction device, comprising:
the first feature acquisition module is used for acquiring authorized historical inquiry data and extracting all first inquiry features in the historical inquiry data, wherein the historical inquiry data comprises a plurality of historical inquiry records;
the distribution data calculation module is used for counting the historical inquiry records in the historical inquiry data according to the first inquiry features to obtain the number of the historical inquiry records corresponding to each first inquiry feature, and generating the distribution data of the corresponding first inquiry feature in the historical inquiry data based on the number;
the training set construction module is used for cleaning the history inquiry records corresponding to the first inquiry features and forming an inquiry data training set by the cleaned history inquiry records and the corresponding distribution data;
The training module is used for training a preset deep learning tool according to the inquiry data training set to obtain a medicine prediction model;
the second feature acquisition module is used for acquiring a query information text corresponding to the medicine prediction request after receiving the medicine prediction request, and extracting second query features in the query information text;
the prediction module is used for inputting the second inquiry feature into the medicine prediction model to perform medicine prediction, so as to obtain a medicine prediction result corresponding to the second inquiry feature;
the training set construction module is specifically configured to perform data primary cleaning on the historical inquiry data, remove error data, and obtain a primary cleaning data set; performing secondary cleaning on the primary cleaning data set, and removing historical inquiry data which do not accord with the distribution data to obtain a secondary cleaning data set; and extracting historical inquiry data from the secondary cleaning data set according to the distribution data, and forming an inquiry data training set by the extracted historical inquiry data.
8. A medication prediction apparatus, characterized in that the medication prediction apparatus comprises: a memory and at least one processor, the memory having instructions stored therein;
The at least one processor invokes the instructions in the memory to cause the drug prediction device to perform the steps of the drug prediction method of any one of claims 1-6.
9. A computer readable storage medium having instructions stored thereon, which when executed by a processor, implement the steps of the drug prediction method of any of claims 1-6.
CN202110566394.7A 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium Active CN113284577B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110566394.7A CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium
PCT/CN2022/088787 WO2022247549A1 (en) 2021-05-24 2022-04-24 Drug prediction method, apparatus and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110566394.7A CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113284577A CN113284577A (en) 2021-08-20
CN113284577B true CN113284577B (en) 2023-08-11

Family

ID=77281166

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110566394.7A Active CN113284577B (en) 2021-05-24 2021-05-24 Medicine prediction method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113284577B (en)
WO (1) WO2022247549A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284577B (en) * 2021-05-24 2023-08-11 康键信息技术(深圳)有限公司 Medicine prediction method, device, equipment and storage medium
CN113688329A (en) * 2021-08-25 2021-11-23 平安国际智慧城市科技股份有限公司 Information pushing method, device, equipment and storage medium based on medical service

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597603A (en) * 2018-05-04 2018-09-28 吉林大学 Cancer return forecasting system based on Multi-dimensional Gaussian distribution Bayes's classification
JP2020021371A (en) * 2018-08-02 2020-02-06 Necソリューションイノベータ株式会社 Post-operation infection predicting apparatus, method of producing post-operation infection predicting apparatus, post-operation infection predicting method and program
CN112037880A (en) * 2020-08-31 2020-12-04 康键信息技术(深圳)有限公司 Medication recommendation method, device, equipment and storage medium
CN112214613A (en) * 2020-10-15 2021-01-12 平安国际智慧城市科技股份有限公司 Artificial intelligence-based medication recommendation method and device, electronic equipment and medium
CN112489769A (en) * 2019-08-22 2021-03-12 浙江远图互联科技股份有限公司 Intelligent traditional Chinese medicine diagnosis and medicine recommendation system for chronic diseases based on deep neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11899022B2 (en) * 2017-08-23 2024-02-13 The General Hospital Corporation Multiplexed proteomics and predictive drug candidate assessment
CN109087691A (en) * 2018-08-02 2018-12-25 科大智能机器人技术有限公司 A kind of OTC drugs recommender system and recommended method based on deep learning
CN109360604B (en) * 2018-11-21 2021-09-24 南昌大学 Ovarian cancer molecular typing prediction system
CN111613289B (en) * 2020-05-07 2023-04-28 浙江大学医学院附属第一医院 Individuation medicine dosage prediction method, device, electronic equipment and storage medium
CN112735535B (en) * 2021-04-01 2021-06-25 腾讯科技(深圳)有限公司 Prediction model training method, prediction model training device, data prediction method, data prediction device and storage medium
CN113284577B (en) * 2021-05-24 2023-08-11 康键信息技术(深圳)有限公司 Medicine prediction method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108597603A (en) * 2018-05-04 2018-09-28 吉林大学 Cancer return forecasting system based on Multi-dimensional Gaussian distribution Bayes's classification
JP2020021371A (en) * 2018-08-02 2020-02-06 Necソリューションイノベータ株式会社 Post-operation infection predicting apparatus, method of producing post-operation infection predicting apparatus, post-operation infection predicting method and program
CN112489769A (en) * 2019-08-22 2021-03-12 浙江远图互联科技股份有限公司 Intelligent traditional Chinese medicine diagnosis and medicine recommendation system for chronic diseases based on deep neural network
CN112037880A (en) * 2020-08-31 2020-12-04 康键信息技术(深圳)有限公司 Medication recommendation method, device, equipment and storage medium
CN112214613A (en) * 2020-10-15 2021-01-12 平安国际智慧城市科技股份有限公司 Artificial intelligence-based medication recommendation method and device, electronic equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Lei Zhang等.Ancient terms of chronic renal failure: The key to ancient literature mining.《2013 IEEE International Conference on Bioinformatics and Biomedicine》.2014,319-323. *

Also Published As

Publication number Publication date
CN113284577A (en) 2021-08-20
WO2022247549A1 (en) 2022-12-01

Similar Documents

Publication Publication Date Title
CN113284577B (en) Medicine prediction method, device, equipment and storage medium
CN109920506B (en) Medical statistics report generation method, device, equipment and storage medium
CN112035741B (en) Reservation method, device, equipment and storage medium based on user physical examination data
US20150220868A1 (en) Evaluating Data Quality of Clinical Trials
CN113345577B (en) Diagnosis and treatment auxiliary information generation method, model training method, device, equipment and storage medium
CN109299227B (en) Information query method and device based on voice recognition
CN110910991B (en) Medical automatic image processing system
US7805421B2 (en) Method and system for reducing a data set
US20100198738A1 (en) Patent power calculating device and method for operating patent power calculating device
CN111429989A (en) Method and device for generating pre-diagnosis medical record
CN113658712A (en) Doctor-patient matching method, device, equipment and storage medium
CN111370132A (en) Electronic file analysis method and device, computer equipment and storage medium
CN113611401A (en) Perioperative blood management system and method
CN112487146A (en) Legal case dispute focus acquisition method and device and computer equipment
Genes et al. Validating emergency department vital signs using a data quality engine for data warehouse
CN115185936A (en) Medical clinical data quality analysis system based on big data
CN113643814A (en) Health management scheme recommendation method, device, equipment and storage medium
Ostropolets et al. Phenotyping in distributed data networks: selecting the right codes for the right patients
CN109522331B (en) Individual-centered regionalized multi-dimensional health data processing method and medium
CN115775635A (en) Medicine risk identification method and device based on deep learning model and terminal equipment
CN115391315A (en) Data cleaning method and device
CN108630287B (en) Data integration method
CN115101193A (en) Symptom recommendation method and device and computer-readable storage medium
Margret et al. Implementation of Data mining in Medical fraud Detection
Heslop et al. An analysis of high-cost users at an Australian public health service organization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant