CN110097975A

CN110097975A - A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion

Info

Publication number: CN110097975A
Application number: CN201910347848.4A
Authority: CN
Inventors: 彭访; 蔡志平; 方胜群; 李振华
Original assignee: Hunan Blue Dragonfly Network Technology Co Ltd
Current assignee: Hunan Lanqingting Network Technology Co ltd; National University of Defense Technology
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2019-08-06

Abstract

The present invention discloses a kind of nosocomial infection intelligent diagnosing method and system based on multi-model, comprising: obtains several medical record datas relevant to nosocomial infection；Medical record data is pre-processed, several discrete word lists corresponding with every part of medical record data are obtained；Training set and test set are proportionally divided into all word lists；Optimal characteristics collection is obtained for different infection types in the training set；Tune ginseng is carried out to two or more basic mode types respectively, selects optimized parameter to obtain two or more optimal base models, all optimal base models is merged, obtains diagnostic model；Diagnostic model is tested with test set, and the performance of analyzing and diagnosing model.Accuracy is low caused by the diagnostic model that the program solves is single and the high problem of rate of failing to report, obtains diagnostic model by a variety of Model Fusions to improve the accuracy of diagnosis and reduce rate of failing to report.

Description

A kind of nosocomial infection intelligent diagnosing method and system based on multi-model fusion

Technical field

The present invention relates to technical field of data processing, especially a kind of nosocomial infection intelligent diagnosing method based on multi-model And system.

Background technique

Nosocomial infection refers to the infection that inpatient obtains in hospital, and usual diagnosis algorithm can be divided into two steps: pass through first Clinical data, laboratory examination results and various professional diagnosis indexs are to determine whether infection；Whether then analysis infection belongs to Nosocomial infection.The diagnosis principle of nosocomial infection is divided into two classes, for there is clearly preclinical infectious diseases, is admitted to hospital certainly first day Start to calculate, the infection occurred after average latency is nosocomial infection, is generally acknowledged that incubation period indefinite person after being admitted to hospital 48h and sends out Raw infection can preliminary judgement be nosocomial infection.It can be seen that the diagnosis for nosocomial infection from the two steps, most critical Step is to judge whether patient infects, and since infection type is numerous, such as the infection of the upper respiratory tract, urethral infection, and part Clinical manifestation is quite similar between infection, distinguishes difficulty is big, and such as foundation clinical manifestation diagnoses pneumonia and have special experimental check It was found that symptoms of pneumonia performance it is just quite similar, all comprising such as breath sound slightly, be short of breath, dry and wet rale symptom, Zhi Nengtong Crossing experimental check could effectively distinguish.Therefore, one can assist the intelligent diagnosis system for carrying out Infect And Diagnose for medical care people It is particularly significant for member.

For assisting the mode of Infect And Diagnose to be broadly divided into two classes: (1) expert system in knowledge based library；For being based on knowing Know the expert system in library, this kind of system utilizes computer technology, and simulative medicine expert handles medical data (such as electronic health record) simultaneously Complete analysis, diagnosis, the process for treating disease.Medical expert system can be divided mainly into 4 types: Consultant, tutor auxiliary platform type, Clinical diagnosis and therapeutic type diagnose identification type automatically.It can be really doctor institute due to many but in this kind of expert system Receive and puts into the but few for number of clinical use.

(2) based on the intelligent diagnostics of machine learning.Infection intelligent diagnostics model is established using machine learning method, mainly Using computer simulation doctor for the analytic process of patient, patient is determined by information such as the sign, the symptoms that include in case history Infection type, and patient is divided into certain infection types, that is, utilize the sorting algorithm in machine learning, such as SVM (support vector machines), RF (random forest), Bayes (Bayes classifier) etc., establish patient's text data, as course of disease information, Corresponding relationship between image description information, body temperature information etc. and patient's infection conditions.Intelligent diagnostics model is established in machine learning Advantage be:

1, in the case where data volume abundance, it can be directed to every kind of infection training pattern, this just effectively alleviates infection Numerous types perplex to caused by diagnosis；

It 2, is that effective information is excavated from data the characteristics of machine learning, it, can be from disease so using machine learning method The depth characteristic of different infection is found in personal data, so that clinical manifestation is quite similar between efficiently solving part infection, is distinguished The big problem of difficulty.

But the existing intelligent diagnosing method by machine learning generally use single computation model learnt and based on It calculates, the accuracy rate of diagnosis is not high, while rate of failing to report is higher.

Summary of the invention

The present invention provides a kind of nosocomial infection intelligent diagnosing method and system based on multi-model, for overcoming the prior art In diagnosis caused by single computation model it is not high and the defects of rate of failing to report is significant, merged to data by multi-model Reason improves the accuracy rate of diagnosis, and reduces rate of failing to report.

To achieve the above object, the present invention proposes a kind of nosocomial infection intelligent diagnosing method based on multi-model, including with Lower step:

Obtain several medical record datas relevant to nosocomial infection；

Medical record data is pre-processed, several discrete word lists corresponding with every part of medical record data are obtained；

Training set and test set are proportionally divided into all word lists；Different senses is directed in the training set It contaminates type and obtains optimal characteristics collection；

Tune ginseng is carried out to two or more basic mode types respectively, optimized parameter is selected to obtain two or more optimal base models, it is right All optimal base models are merged, and diagnostic model is obtained；

Diagnostic model is tested with test set, and the performance of analyzing and diagnosing model.

To achieve the above object, the present invention also proposes a kind of nosocomial infection intelligent diagnosis system based on multi-model, including Processor, and the memory being connected to the processor, the memory are stored with the intelligence of the nosocomial infection based on multi-model Diagnostic program, the nosocomial infection intelligent diagnostics program based on multi-model is realized when being executed by the processor above-mentioned states method The step of.

Nosocomial infection intelligent diagnosing method and system provided by the invention based on multi-model, passes through the disease to multiple patients It counts one by one according to being pre-processed, several discrete word lists is obtained, using most of word list as training set, by being calculated as The type of difference infection selects the strongest element of relevance to form optimal characteristics collection from word list；Utilize two or more calculations Method carries out fusion to trained basic mode type and obtains diagnostic model by preferentially parameter training basic mode type, finally in test set Word list tests diagnostic model, and then the performance of analyzing and diagnosing model；The good diagnostic model of passage capacity is to hospital Infection carries out intelligent diagnostics, can carry out early warning to nosocomial infection, and the infection conditions of patient are made with the diagnosis of early stage, auxiliary doctor Shield personnel carry out more comprehensive, accurate and efficient analysis to the infection conditions of patient, and complicated infection conditions can be done A comprehensive analysis out, and the ability for utilizing machine learning to find information from data, excavate feature, can be more efficiently The quite similar infection of clinical manifestation is distinguished, more accurate diagnosis is made；On the other hand based on hospital's sense of multi-model fusion Dye intelligent diagnosing method overcomes the problems in traditional expert system, and machine learning classification model is to utilize patient's history's number According to being trained, as long as obtaining new patient data can be carried out a new wheel training, model can be constantly updated, and in mould After the completion of type training, it is only necessary to the relevant parameter of preservation model, when testing unknown sample, it is only necessary to according to ginseng Number calling models you can get it the corresponding infection type of the sample；Furthermore this programme uses multi-model building diagnosis mould Type, on the one hand, as it is assumed that space is very big, may can reach same optimal performance in training set there are many situation is assumed, if making Generalization Capability may be kept bad because falsely dropping with single learner, this risk can be reduced by multi-model fusion, to improve The accuracy rate of model, on the other hand, single learning algorithm may fall into Local Minimum, and infect is caused to be infected when predicting part more There is infection and fails to report in prediction error, and multiple learners can then expand hypothesis space, reduce the possibility for falling into Local Minimum Property, to reduce the rate of failing to report of model.Accordingly, with respect to single diagnostic model, this programme accuracy is higher, and leakage is effectively reduced Report rate.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with The structure shown according to these attached drawings obtains other attached drawings.

Fig. 1 is the flow chart for the nosocomial infection intelligent diagnosing method based on multi-model that inventive embodiments one provide；

Fig. 2 is the flow chart of data prediction step in Fig. 1；

Fig. 3 is the flow chart of feature selection step in Fig. 1；

Fig. 4 is the flow chart of model construction in Fig. 1.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

It is to be appreciated that the directional instruction (such as up, down, left, right, before and after ...) of institute is only used in the embodiment of the present invention In explaining in relative positional relationship, the motion conditions etc. under a certain particular pose (as shown in the picture) between each component, if should When particular pose changes, then directionality instruction also correspondingly changes correspondingly.

In addition, the description for being such as related to " first ", " second " in the present invention is used for description purposes only, and should not be understood as Its relative importance of indication or suggestion or the quantity for implicitly indicating indicated technical characteristic.Define as a result, " first ", The feature of " second " can explicitly or implicitly include at least one of the features.In the description of the present invention, " multiple " contain Justice is at least two, such as two, three etc., unless otherwise specifically defined.

In the present invention unless specifically defined or limited otherwise, term " connection ", " fixation " etc. shall be understood in a broad sense, For example, " fixation " may be a fixed connection, it may be a detachable connection, or integral；It can be mechanical connection, be also possible to Electrical connection can also be physical connection or wireless communication connection；It can be directly connected, the indirect phase of intermediary can also be passed through Even, the connection inside two elements or the interaction relationship of two elements be can be, unless otherwise restricted clearly.For this For the those of ordinary skill in field, the specific meanings of the above terms in the present invention can be understood according to specific conditions.

It in addition, the technical solution between each embodiment of the present invention can be combined with each other, but must be general with this field Based on logical technical staff can be realized, it will be understood that when the combination of technical solution appearance is conflicting or cannot achieve this The combination of technical solution is not present, also not the present invention claims protection scope within.

The present invention proposes a kind of nosocomial infection intelligent diagnosing method and system based on multi-model.

Embodiment one

Fig. 1-4 is please referred to, the present invention proposes a kind of nosocomial infection intelligent diagnosing method based on multi-model, including following step It is rapid:

Step S1 obtains several medical record datas relevant to nosocomial infection；

The medical record data includes: course of disease information, checks checking information；The course of disease information includes: admission records, for the first time Text data during progress note, attending physician are made the rounds of the wards, discharge records etc. for case history description；The inspection checking information packet Include: image information, physical examination information, physical examination result data and body personality check data, body temperature information etc..It is corresponded to according to different infection Number of patients screened, reject the infection that corresponding number of patients is less than 500, and data are divided into training set and test Collection.

Step S2, pre-processes medical record data, obtains several discrete word column corresponding with every part of medical record data Table；

Negative phrase filtering is carried out to text data, deletes such as " do not hear and bubble ", " lymph node not enlargement " feminine gender Symptom, then according to the syntactic structure feature of different piece in electronic health record, including plain text, noun+numeral classifier phrase and name Word+Adjective Phrases design different segmenting methods.Plain text is directly carried out using the participle tool (such as Ansj) of open source Word segmentation processing, but due in case history include a large amount of medical professionalism terms, in order to using open source the effective cutting of participle tool this A little terms need to construct professional domain dictionary；It is usually to be used to record patient for the noun in electronic health record+numeral-classifier compound phrase Sign information and inspection result, can first extract numerical value, and the numerical value is compared with pre-set threshold value, according to than Such phrase is converted to word feature by relatively result；Patient is typically occurred in for noun+Adjective Phrases in electronic health record Physical examination part, such as sanity, check cooperation, by key-value pair transformation approach this kind of phrase is handled in the way of be will be short The noun conversion bonding that patient's attribute is described in language, is converted to value for the adjective for describing attribute corresponding states in phrase, such as right Cooperate in checking, transformation result is " mind "-" clear ".After pretreatment by original medical record data from continuous text be converted into from Scattered word list.

Step S3 is proportionally divided into training set and test set to all word lists；It is directed in the training set Different infection types obtains optimal characteristics collection；

According to the corresponding word list of medical record data obtained in step S2, the characteristic set to be selected of different infection is collected, Every kind is selected to infect the most representative feature of preceding N kind most using Chi-square Test and the feature selection approach based on class discrimination degree For the characteristic set of different infection.The size of N is determined by experiment.

Step S4 carries out tune ginseng to two or more basic mode types respectively, optimized parameter is selected to obtain two or more optimal bases Model merges all optimal base models, obtains diagnostic model；

Step S5 tests diagnostic model with test set, and the performance of analyzing and diagnosing model.

Using RandomForest, XGBoost, GradientBoosting, ExtraTrees, SVC training basic mode type, and This five basic mode types are merged with stacking Model Fusion method, obtain final disaggregated model.Basic mode type is being trained, The optimized parameter of different basic mode types is found using grid search (GridSearch).

Nosocomial infection intelligent diagnosing method and system provided by the invention based on multi-model, passes through the disease to multiple patients It counts one by one according to being pre-processed, several discrete word lists is obtained, using most of word list as training set, by being calculated as The type of difference infection selects the strongest element of relevance to form optimal characteristics collection from word list；Utilize two or more calculations Method carries out fusion to trained basic mode type and obtains diagnostic model by preferentially parameter training basic mode type, finally in test set Word list tests diagnostic model, and then the performance of analyzing and diagnosing model；The good diagnostic model of passage capacity is to hospital Infection carries out intelligent diagnostics, can carry out early warning to nosocomial infection, and the infection conditions of patient are made with the diagnosis of early stage, auxiliary doctor Shield personnel carry out more comprehensive, accurate and efficient analysis to the infection conditions of patient, and complicated infection conditions can be done A comprehensive analysis out, and the ability for utilizing machine learning to find information from data, excavate feature, can be more efficiently The quite similar infection of clinical manifestation is distinguished, more accurate diagnosis is made；On the other hand based on hospital's sense of multi-model fusion Dye intelligent diagnosing method overcomes the problems in traditional expert system, and machine learning classification model is to utilize patient's history's number According to being trained, as long as obtaining new patient data can be carried out a new wheel training, model can be constantly updated, and in mould After the completion of type training, it is only necessary to the relevant parameter of preservation model, when testing unknown sample, it is only necessary to according to ginseng Number calling models you can get it the corresponding infection type of the sample；Furthermore this programme is relative to single diagnostic model, accuracy compared with Height, and rate of failing to report is effectively reduced.

Preferably as a kind of embodiment of data prediction: it is described that medical record data is pre-processed, it obtains several The step of discrete word list corresponding with every part of medical record data includes:

Step S21a, to the phrase for being divided, being formed after cutting about the text data of case history description in medical record data It is filtered, will be filtered out comprising the relevant phrases for negating word；

The phrase of step S22a, reservation are connected using preset connector, are formed case history and are described segment；

Step S23a describes the medical terms for including in segment to the case history and carries out cutting, according to known drug name Register and disease name register is claimed to establish professional domain dictionary.

Comprising a large amount of negative phrases in Chinese case history, " do not hear and bubble ", " lymph node not enlargement ", " negative hepatitis History " etc., this kind of phrase is little generally for the diagnostic effect of disease, can be used as noise and directly eliminates, and in Chinese electronic health record, The description of one section of case history is usually made of several sentences, between sentence and sentence usually by ".","！","；" and "? " these characterize The symbol that one sentence terminates is separated, and in a sentence, generally comprises several phrases, between phrase usually by ", " into Row separate (", " etc. the word of symbols connection to can be generally thought be comprising in the same phrase).According to this separation mode, into When row negative phrase filtering, the whole section of case history description of acquisition patient first, then utilize ".","！","；" and "? " paragraph is drawn It is divided into multiple sentences, recycles ", " that sentence is cut into multiple phrases, finally traverse these phrases, judge whether deposit in phrase Negative word in negative word list, which is then deleted, otherwise retain if it exists, finally by institute's phrase with a grain of salt in advance The connector being first arranged is attached, and is obtained filtered case history and is described segment；Due to including a large amount of Special Medicals in electronic health record Term is treated, in order to utilize these terms of the effective cutting of participle tool of increasing income, from state food pharmaceuticals administration general bureau official Net has crawled common drug Chinese, obtains common disease Chinese name from ICD-10 (International Classification of Diseases coding) system Claim, constructs professional domain dictionary using these drugs and disease name.

Preferably as the another embodiment of data prediction: it is described that medical record data is pre-processed, if obtaining The step of doing discrete word list corresponding with every part of medical record data include:

Step S21b extracts the name in data to the physical examination information and physical examination result data recorded in medical record data respectively Word part and numeral-classifier compound part, and connect to form phrase using preset connector；

Step S22b is the range that different signs or different inspections divide threshold value according to medical standard, by numeral-classifier compound Numerical value judges suitable threshold value comparison with the noun connecting through the numeral-classifier compound；

Step S23b converts word feature for the phrase according to comparison result.

When being segmented, participle tool used herein is AnsjSeg Words partition system, which is based on the Chinese Academy of Sciences The Words partition system ICTCLAS exploitation that calculation machine is developed, be the java Chinese automatic word-cut based on n-gram+CRF+HMM； Noun+numeral-classifier compound phrase in electronic health record is usually the sign information and inspection result for being used to record patient, such as records patient The sign data of body temperature (38.2 DEG C of body temperature), and record patient's hemoglobin count the inspection result etc. of (HGB:118g/L), For the noun in electronic health record+numeral-classifier compound phrase, numerical value can be first extracted, and the numerical value and pre-set threshold value are carried out Compare, such phrase is converted to by word feature according to comparison result.For the temperature data (such as 38.2 DEG C of body temperature) of patient Conversion process is as follows:

Separate noun part and numeral-classifier compound part.38.2 DEG C of separating resultings of body temperature are as follows: body temperature (38.2 DEG C)；

It is that different signs or inspection divide threshold range according to related specifications, as the oral cavity normal temperature range of body temperature is 36.3 DEG C~37.2 DEG C, the armpit normal temperature range of body temperature is 36.1 DEG C~37 DEG C, according to the threshold range by the body of patient Warm information is converted, and such as 38.2 DEG C have been more than normality threshold range, it is possible to be converted into " body temperature (higher) " or " body temperature (rising) "；

Conversion results are converted into the form of key-value pair, such as the key-value pair form of body temperature are as follows: body temperature (decline, it is normal, on It rises).

Preferably as another embodiment of data prediction: it is described that medical record data is pre-processed, if obtaining The step of doing discrete word list corresponding with every part of medical record data include:

Step S21c, checks data to the body personality that records in medical record data, extract respectively noun part in data and Adjective part, and connect to form phrase using preset connector；

The noun that patient attribute is described in the phrase is converted to key using key-value pair transformation approach, by institute by step S22c It states and describes the adjective of noun corresponding states in phrase and be converted into value, and using preset connector connecting key and value, form key Value tag.

Step S23c, noun+Adjective Phrases in electronic health record typically occur in patient's physical examination part, such as mind Clear, inspection cooperation etc..By key-value pair transformation approach this kind of phrase is handled in the way of be that patient's attribute will be described in phrase Noun converts bonding, and the adjective that attribute corresponding states is described in phrase is converted to value, such as cooperates for checking, transformation result For " mind "-" clear ".For the same key, corresponding value may more than one, such as " mind " in addition to can use " clear " into Row description can also be held with morphologies such as " fuzzy ", " unclear " and " in a trance ", so when being converted, when needing to collect and survey The corresponding different value of the same key obtained, such as " mind ": " clear ", " in a trance ".

Preferably, described pair of all word list is proportionally divided into training set and test set；In the training set Include: for the step of different infection type acquisition optimal characteristics collection

Step S31, to all word lists according to 7:3 or 8:2 ratio cut partition be training set and test set；

Step S32, according to the corresponding word list of every part of medical record data, obtains different infection type packets in training set The characteristic set to be selected contained；

Step S33, by Chi-square Test feature selection approach or the selection of the feature selection approach based on class discrimination degree is every The preceding most representative feature of N kind is as optimal characteristics set in kind infection type；The size of N is determined by experiment.

After data prediction, the corresponding word list of available every part of case history corresponds to patient according to case history and is suffered from Infection arranged, the corresponding characteristic set of available every kind of infection.Specific feature selection approach includes Chi-square Test With the feature selecting based on class discrimination degree.

Preferably, described to select preceding N kind in every kind of infection type most representative by Chi-square Test feature selection approach Feature as optimal characteristics set；The step of size of N is determined by experiment include:

Step S331a, it is assumed that feature and infection are unrelated, obtain the deviation of actual value and theoretical value；

Step S332a, according to take from high to low the corresponding N kind feature of deviation as optimal characteristics set；

The size of step S333a, N are determined by experiment.

The basic thought of Chi-square Test is that theoretical correctness is determined by the deviation of observation actual value and theoretical value, When carrying out feature selecting, null hypothesis is that feature and infection are unrelated, is worth bigger, representative and null hypothesis when Chi-square Test is calculated Deviation is bigger, also just represents this feature and infection correlation is higher.

Preferably, described that preceding N kind is selected in every kind of infection type most by the feature selection approach based on class discrimination degree Representative feature is as optimal characteristics set；The step of size of N is determined by experiment include:

Step S331b calculates different characteristic for the representative degree of infection type, is arranged from high in the end according to representative degree Sequence, representative degree is bigger to represent feature and the correlation of infection is higher；

Step S332b, optimal characteristics collection of the n feature as different infection before selecting, the size of n are carried out true by experiment Fixed, experimental evaluation standard includes accuracy rate and rate of failing to report.

Feature selecting based on class discrimination degree is using the composition and characteristic distributions infected in case history: (1) feeling in case history The repetition rate of the keyword of dye is low；(2) the electronic health record key symptoms word overlapping degree of the patient with similar infection is high； (3) the key symptoms word between different infection excludes each other, and obtains one and calculates different characteristic for the representative degree of infection, It is ranked up according to representative degree, representative degree is bigger to represent feature and the correlation of infection is higher.Obtain feature ordering result it Afterwards, optimal characteristics collection of the n feature as different infection before selecting, and the size of n can not can only pass through reality by artificially determining It tests and is determined, experimental evaluation standard includes accuracy rate and rate of failing to report.

Preferably, described that tune ginseng is carried out respectively to two or more basic mode types, select optimized parameter acquisition two or more most Excellent basic mode type merges all optimal base models, obtain diagnostic model the step of include:

Step S41, the basic mode type include: RandomForest, XGBoost, GradientBoosting, ExtraTrees and SVC；The optimized parameter of the model is found using grid-search algorithms GridSearch；

It before training pattern, needs that training set is first carried out random division, obtains part training set and verifying collection.To base It when model is trained, needs to carry out model tune ginseng, obtains different model optimized parameters.

For RandomForest, to parameter n_estimators, bootstrap, criterion and min_ in detail below Samples_leaf is adjusted and preferentially obtains optimal RandomForest model, and wherein n_estimators indicates decision tree Number, bootstrap indicate whether put back to sampling, and criterion indicates the evaluation criterion used in partitioning site, The least sample number of leaf node is indicated including Geordie purity gini and information gain entropy, min_samples_leaf；

For XGBoost, to parameter booster, eta in detail below, min_child_weight, gamma, Objective is adjusted and preferentially obtains optimal XGBoost model, and booster indicates the model of each iteration, comprising: base Model gbtree, linear model gbliner in tree；Eta indicates that learning rate, min_samples_leaf determine minimum leaf section Point sample weights and；Least disadvantage function minimum needed for gamma specifies node split；Objective defines needs The loss function being minimized, common function include binary:logistic, multi:softmax；

For GradientBoosting, to following parameter loss, learning_rate, n_estimators, max_ Depth is adjusted and preferentially obtains optimal GradientBoosting model, and wherein loss indicates the loss function of selection, Learning-rate indicates that learning rate, n_estimators indicate the number of weak learner, and max_depth indicates each weak The depth capacity for practising device, for limiting the interstitial content of regression tree；

For ExtraTrees and SVC, following parameter C, kernel, degree, gamma, coef0 are adjusted simultaneously Optimal ExtraTrees model and optimal RF model are preferentially obtained respectively, and C indicates slack variable, the i.e. penalty term to mistake classification Coefficient, kernel indicate kernel function type, including linear kernel function linear, gaussian kernel function RBF, Polynomial kernel function Poly, sigmoid kernel function, degree indicate multinomial dimension when kernel function is poly, and gamma indicates that when kernel function be Gauss The parameter of kernel function impliedly determines the distribution that data are mapped to after the feature space newly arrived；

Step S42 merges above-mentioned model according to stacking algorithm, by the output of above-mentioned model as new number Linear regression is used to export as final disaggregated model according to new one new model of data set re -training according to collection.

Table 1 is that the infection data that the embodiment of the present invention uses are described in detail.

Table 1

Since in the training process, the size of data set can have a significant impact to final result, so after collecting data It needs to be filtered, the infection by number of patients less than 500 is rejected, and remaining infection type is respectively clinical septicopyemia, master The infection of table shallow cut and urethral infection are wanted, then will be infected in remaining 3 and be divided into training set and test set according to 7:3.

Table 2 is in the present invention using Chi-square Test and the rows of the feature selecting based on class discrimination degree obtains 3 kinds of infection 5 feature before name, feature A is the feature that the feature selecting based on class discrimination degree obtains, and feature B is the spy that Chi-square Test obtains Sign.

Table 2

Case history content mainly includes patient from being admitted to hospital to a series of records during discharge, such as admission records, attending physician Make the rounds of the wards and leave hospital record etc.；After pre-processing to medical record data, the corresponding word list of case history is obtained, then by data by certain Ratio is divided, and training set and test set are obtained；Feature selecting is carried out to training set, using Chi-square Test and is based on classification area The feature selecting of indexing obtains the optimal characteristics collection of different infection, and the dimension size n of feature set is determined by experiment；In structure When building intelligent diagnostics model, tune ginseng is carried out to 5 kinds of basic mode types, selects optimized parameter to obtain optimal base model, then to 5 kinds of basic modes Type is merged, and is obtained final diagnostic model, is finally tested with test set, the performance of analyzing and diagnosing model.

Embodiment two

Based on embodiment one, also correspondence of the embodiment of the present invention proposes a kind of nosocomial infection intelligent diagnostics system based on multi-model System, including processor, and the memory being connected to the processor, the memory are stored with the sense of the hospital based on multi-model Intelligent diagnostics program is contaminated, the nosocomial infection intelligent diagnostics program based on multi-model is realized above-mentioned when being executed by the processor The step of method of embodiment.Specific implementation process and technical effect are referring to embodiment one.

The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all at this Under the inventive concept of invention, using equivalent structure transformation made by description of the invention and accompanying drawing content, or directly/use indirectly It is included in other related technical areas in scope of patent protection of the invention.

Claims

1. a kind of nosocomial infection intelligent diagnosing method based on multi-model, which comprises the following steps:

Obtain several medical record datas relevant to nosocomial infection；

Training set and test set are proportionally divided into all word lists；Different infection classes is directed in the training set Type obtains optimal characteristics collection；

Tune ginseng is carried out to two or more basic mode types respectively, optimized parameter is selected to obtain two or more optimal base models, to all The optimal base model is merged, and diagnostic model is obtained；

2. the nosocomial infection intelligent diagnosing method based on multi-model as described in claim 1, which is characterized in that it is described acquisition with In the step of nosocomial infection relevant several medical record datas:

The medical record data includes: course of disease information, checks checking information；

The course of disease information includes: the text data for case history description；

The inspection checking information includes: image information, physical examination information, physical examination result data and physical data.

3. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 2, which is characterized in that described to case history Data are pre-processed, obtain several discrete word lists corresponding with every part of medical record data the step of include:

The phrase divided to the text data in medical record data about case history description, formed after cutting is filtered, and will be wrapped The relevant phrases of the word containing negative filter out；

The phrase of reservation is connected using preset connector, is formed case history and is described segment；

The medical terms for including in segment are described to the case history and carry out cutting, according to known nomenclature of drug register and disease name Register is claimed to establish professional domain dictionary.

4. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 2, which is characterized in that described to case history Data are pre-processed, obtain several discrete word lists corresponding with every part of medical record data the step of include:

To the physical examination information and physical examination result data recorded in medical record data, noun part and the numeral-classifier compound in data are extracted respectively Part, and connect to form phrase using preset connector；

It is the range that different signs or different inspections divide threshold value according to medical standard, by the numerical value of numeral-classifier compound and through the quantity The noun of word connection judges suitable threshold value comparison；

Word feature is converted by the phrase according to comparison result.

5. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 2, which is characterized in that described to case history Data are pre-processed, obtain several discrete word lists corresponding with every part of medical record data the step of include:

To the physical data recorded in medical record data, the noun part and adjective part in data are extracted respectively, and adopt It connects to form phrase with preset connector；

The noun for describing patient attribute in the phrase is converted into key using key-value pair transformation approach, name will be described in the phrase The adjective of word corresponding states is converted into value, and using preset connector connecting key and value, forms key assignments feature.

6. such as the described in any item nosocomial infection intelligent diagnosing methods based on multi-model of claim 2~5, which is characterized in that Described pair of all word list is proportionally divided into training set and test set；Different infection classes is directed in the training set Type obtain optimal characteristics collection the step of include:

To all word lists according to 7:3 or 8:2 ratio cut partition be training set and test set；

In training set, according to the corresponding word list of every part of medical record data, the feature to be selected that different infection types include is obtained Set；

It is selected in every kind of infection type by Chi-square Test feature selection approach or the feature selection approach based on class discrimination degree The preceding most representative feature of N kind is as optimal characteristics set；The size of N is determined by experiment.

7. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 6, which is characterized in that described to pass through card Square verification characteristics selection method selects in every kind of infection type the preceding most representative feature of N kind as optimal characteristics set；N's The step of size is determined by experiment include:

Assuming that feature and infection are unrelated, the deviation of actual value and theoretical value is obtained；

According to take from high to low the corresponding N kind feature of deviation as optimal characteristics set；

The size of N is determined by experiment.

8. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 6, which is characterized in that described to pass through base Select in every kind of infection type the preceding most representative feature of N kind as optimal characteristics in the feature selection approach of class discrimination degree Set；The step of size of N is determined by experiment include:

Different characteristic is calculated for the representative degree of infection type, is ranked up from high in the end according to representative degree, representative degree bigger generation Table feature and the correlation of infection are higher；

The size of optimal characteristics collection of the n feature as different infection before selecting, n is determined by experiment, experimental evaluation mark Standard includes accuracy rate and rate of failing to report.

9. the nosocomial infection intelligent diagnosing method based on multi-model as claimed in claim 8, which is characterized in that described to two kinds Above basic mode type carries out tune ginseng respectively, selects optimized parameter to obtain two or more optimal base models, to all optimal bases Model is merged, obtain diagnostic model the step of include:

The basic mode type includes: RandomForest, XGBoost, GradientBoosting, ExtraTrees and SVC；It utilizes Grid-search algorithms GridSearch finds the optimized parameter of the model；

Above-mentioned model is merged according to stacking algorithm, by the output of above-mentioned model as new data set, according to new One new model of data set re -training, use linear regression to export as final disaggregated model.

10. a kind of nosocomial infection intelligent diagnosis system based on multi-model, which is characterized in that including processor, and with it is described The memory of processor connection, the memory is stored with the nosocomial infection intelligent diagnostics program based on multi-model, described to be based on The step of 9 the method for the claims is realized when the nosocomial infection intelligent diagnostics program of multi-model is executed by the processor Suddenly.