CN109509517A - A kind of medical test Index for examination modified method automatically - Google Patents

A kind of medical test Index for examination modified method automatically Download PDF

Info

Publication number
CN109509517A
CN109509517A CN201811204706.4A CN201811204706A CN109509517A CN 109509517 A CN109509517 A CN 109509517A CN 201811204706 A CN201811204706 A CN 201811204706A CN 109509517 A CN109509517 A CN 109509517A
Authority
CN
China
Prior art keywords
index
cluster
standard
synonymous
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811204706.4A
Other languages
Chinese (zh)
Inventor
叶琪
张佳影
张欢欢
阮彤
王祺
张知行
翟洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201811204706.4A priority Critical patent/CN109509517A/en
Publication of CN109509517A publication Critical patent/CN109509517A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to computer application field, a kind of medical test Index for examination modified method automatically, this method specifically: data prediction is carried out to the index set of input, obtains the index set that capital and small letter is unified, unit is unified are disclosed;According to the literal feature of index, index gathering is obtained by density-based algorithms;The synonymous index of cluster internal standard index name is found out using two sorting algorithms, it synonymous index and standard index title is carried out index maps to obtain index to be aligned result, it is iteratively repeated the step for being left index non-synonymous, until being only to remain 1 index in synonymous index or cluster in all clusters;Artificial correction and mapping processing, obtain by standardized index name.Experiment shows that this method F1-score can achieve 85.26%.

Description

A kind of medical test Index for examination modified method automatically
Technical field
The invention belongs to medical information process fields, more specifically more particularly to a kind of medical test Index for examination from Move modified method.
Background technique
Due to historical reasons, each hospital is not quite similar about the same appellation for examining Index for examination.Only with " serum sodium " For, just there are more than 10 kinds of " Na ion concentration ", " NA+ ", " arterial blood sodium ", " blood sodium (Na) " etc. different sayings.Due at present simultaneously Without complete available index thesaurus to carry out index mapping, this problem has seriously affected interregional medical information Interconnectivity and sharing.Index for examination is examined to do standardization in area medical health platform as a result, by the same of each hospital The different appellations of index are mapped to unified title, just seem most important.However, due to examining Index for examination to be related to A large amount of medical knowledge, the index system of each hospital is numerous and complicated numerous and jumbled in addition, carries out labor standard to it by medical professional Change, needs to take a substantial amount of time and energy.Therefore, the standardized algorithm for how designing an inspection Index for examination, becomes Key point.
The standardization issue for examining Index for examination, can be regarded as an entity alignment problem, i.e., by medical treatment & health platform In actual measurement index be mapped on standard index.It is aligned about entity, method that there are two main classes at present, is different knowledge bases respectively Entity link in example match and text between middle entity between entity and knowledge base entity.The former often utilizes knowledge base Middle entity attributes information carries out example match, and the latter often utilizes in text entity in the contextual information of entity and knowledge base Attribute information carries out entity link.However, Index for examination is examined to be present among electronic health record, only corresponding value and value Range may be not present attribute information, while also not possess contextual information;Importantly, current China and none standard Knowledge base provides the title of all indexs.In short, the prior art is all difficult to solve that the standardization of Index for examination is examined to ask Topic.
The invention proposes a kind of medical test Index for examination modified methods automatically, the experimental results showed that, in Shanghai City 8 On the experimental data set of family's tertiary hospitals, the F1-score of final mapping result can achieve 85.26%.
Summary of the invention
In view of this, the invention discloses a kind of medical test Index for examination modified methods automatically.Its concrete scheme is such as Under:
Achievement data pretreatment: pre-processing achievement data, realizes that capital and small letter is unified, unit is unified and index reference Value is extracted;
Index cluster: using the literal feature of index, by density-based algorithms, it is one that different indexs, which is gathered, Each and every one index cluster, to reduce the alignment range of index;
Two classification in cluster: a title is determined for each cluster, and finds out cluster internal standard name using two sorting algorithms The synonymous index claimed carries out index mapping, for being left index non-synonymous, is screened out from it a new title, continues The lookup of synonymous index is carried out using two sorting algorithms, such iteration carries out, until being in synonymous index or cluster in all clusters Only until surplus 1 index;
Artificial correction and mapping: processing is modified to index alignment result by medical professional again and mapping is handled.
Achievement data in case history is excluded choosing and fills out item, mainly include index in required item by achievement data pretreatment stage The fields such as title, abbreviation, reference value, unit, affiliated check item, Index for examination result, abnormal index prompt.Wherein, affiliated inspection Item is looked into because each Hospital Standard is different, Index for examination result is because it varies with each patient, abnormal index prompt is not because having finger for its value It marks discrimination and loses the meaning as criterion feature.Therefore, available field is only limitted to index name, contracting substantially It writes, reference value and unit this 4.Data prediction, mainly unified metric capital and small letter, unified metric unit are carried out to index, And extract index reference value.
Index clustering phase is gathered different indexs in index cluster one by one using density-based algorithms.Base Carry out cluster dividing according to the tightness degree of sample distribution in the clustering algorithm of density, it mainly investigates the connectivity of sample, and It can connect and obtain final index alignment result on the basis of sample by continuous extended clustering cluster.
The present invention is based on DBSCAN algorithm, service index title and its abbreviation carry out index cluster.Specifically, it gives and refers to Mark set D={ x1,x2,...,xn},Wherein,Indicate the finger of i-th of index Entitling claims,Indicate the title abbreviation of i-th of index,Indicate the index unit of i-th of index,Indicate i-th of index Index reference value, define ε-neighborhood and kernel object are as follows:
1 (ε-neighborhood) is defined for xi∈ D, its ε-neighborhood be in data set D with xiDistance be not more than ε all samples This, i.e. N ε (xi)={ xi∈D|dist(xi,xj)≤ε}。
2 (kernel objects) are defined if xiε-neighborhood in include at least minPts sample, i.e., | N ε (xi)|≥ MinPts, then xiIt is a kernel object.
Particularly, it when determining ε-neighborhood, provides and combines distance distjoint(xi,xj): by achievement data xi、xjIt is divided into two Part calculates, and calculates the index name of multi-hot form (different dimensions indicates different Chinese characters in 0-1 vector) firstCOS distance:
Then parameter is abridgedEditing distance:
WhereinIt is index abbreviationString length,Indicate byIt is inserted into, replaced It changes, delete operation changes intoRequired minimal action number.Finally, using comprehensive two distances of harmonic average obtain combining away from From:
Clustering algorithm is constantly extended to the outside from kernel object, and then generates clustering cluster, collects C={ C1,C2,..., Cm, wherein index cluster CiIn include that index name and index are abridged.
Since cluster is a unsupervised learning process, there may be two problems for it: 1) gathering the index reality for cluster Medical implication is different on border, but because title is close or abbreviation is similar and is classified as cluster;2) some outliers are neither core Object, and cannot be accessed by kernel object, because without being clustered.It is therefore possible to use any in the following two kinds method Two methods of a kind of or simultaneous selection post-process cluster result.
Method one, unit verifying.Assuming that the unit of synonymous index is identical, then can carry out to every cluster index single The index of not commensurate, is separated into different clusters by position verifying.
Method two, outlier are recommended.For the outlier not being clustered, it is contemplated that outlier and other clusters all distances compared with Far, it is likely that outlier is established an individual cluster by an inherently completely new index respectively.
Two sorting phase in cluster determines a title for each cluster, and utilizes two sorting algorithms by index in cluster It is divided into two class of synonymous index and index non-synonymous of title, synonymous index and cluster internal standard index name are subjected to index Mapping obtains index alignment result;For index cluster CiIn remaining index non-synonymous, filter out one from remaining index list A new standard index title continues with lookup and index mapping process that two sorting algorithms carry out synonymous index, iteration weight This multiple process, until being only to remain 1 index in synonymous index or cluster in all clusters.
Particularly, post-processing amendment is carried out to index alignment result for convenience of medical professional, it is contemplated that standard index It should be most common index, the present invention is using the most index of frequency of occurrence in cluster as standard index.
Two classification in cluster method particularly includes:
Firstly, determining a standard index title for each index cluster Ci in index gathering C;Then, according to knowledge base Data enhancing is carried out to standard index title, obtains the synonymous index of standard index title;Then, to the finger in an index cluster Entitling, which claims to abridge with index, utilizes longest common subsequence similarity, Jaccard similarity, cosine similarity, Editing similarity Method obtains similarity score, obtains piecemeal using piecemeal marking characterization method to the index reference value in an index cluster and obtains Point, using similarity score and piecemeal score as two sorting algorithm of characteristic use judge whether be standard index title synonymous finger Mark;Finally, synonymous index is mapped as standard index title, index alignment result is obtained.
The present invention devises two classification of 2 category features for index, is similarity feature and piecemeal marking feature respectively:
The first is characterized in that similarity feature, similarity feature include: longest common subsequence similarity, Jaccard phase Like degree, cosine similarity, Editing similarity.This category feature mainly consider in cluster each actual measurement index and standard index and its The title similarity and abbreviation similarity of all synonyms.For the convenience of description, (the abbreviation similarity by taking title similarity as an example It is similarly), we provide that it is x that index name is surveyed in clusterna, standard index name set isWherein subscript n is the total number of standard index and its synonymous index.
Longest common subsequence similarityWherein | xna| it is The string length of index name is surveyed,Indicate the maximum common subsequence of two indices title.This Similarity can be determined that the index of similar hyponymy, such as " blood glucose " and " blood glucose (emergency treatment) " in longest common subsequence phase Like in degree be 1.
Jaccard similarityThis similarity can be determined that title Sequentially different indexs, for example the Jaccard similarity of " B-typeNatriuretic Peptide " and " natriuretic peptide Type B " is 1.
Cosine similarityWherein xnaWithIt is Multi-hot form (different dimensions indicates different Chinese characters in 0-1 vector).This measuring similarity is two multi- The cosine angle of the index name of hot form, it is influenced smaller by format issues such as similar intermediate insertion "-".
Editing similarityWherein | xna| refer to entitling Claim xnaString length,It indicates by xnaIt is changed into through insertion, replacement, delete operationIt is required most Lack number of operations, wherein xnaWithIt is multi-hot form (different dimensions indicates different Chinese characters in 0-1 vector). This measuring similarity be two multi-hot forms index name editing distance.
Second of piecemeal marking feature.Since Different hospital is to the same index, have in the bound setting of reference value When can be slightly different, therefore the phenomenon that corresponding to multiple reference values there is an index name in practicing.To cope with this problem, The present invention is aligned block algorithm using knowledge base entity, proposes the index piecemeal marking algorithm based on reference value.Piecemeal was given a mark Journey consists of two parts: firstly, each reference value for standard index finds a most like actual measurement index reference therewith Value;Then, the matching piecemeal between these most like reference values, building actual measurement index and standard index.
Specifically, a certain actual measurement index x in cluster is given, the reference value set corresponding to it isWhereinThe i-th kind of reference range and standard for indicating actual measurement index x refer to Mark (and its synonymous index) refer to value setWhereinIndicate standard index s I-th kind of reference range.The present invention gives two indices reference value xrefAnd sref, reference value similarity formula are as follows:
For each reference value of standard indexFound out from cluster one withMost like actual measurement index Reference valueSo thatAnd the two indexs are formed Reference value pairAccording to reference value pairIndex set can be constructed to pi=(Xi, Si), wherein Xi It is for all reference valuesActual measurement index set, SiIt is for all reference valuesStandard index and its synonymous finger Target set.Give two reference values pairWithReference value is defined to similarity:
Wherein simp_cos(X1, X2) indicate index set X1、X2It is (different in 0-1 vector to be expressed as one-hot form Dimension indicates different indexs) both rear cosine similarity.
According to below with reference to the accompanying drawings becoming to detailed description of illustrative embodiments, other feature of the invention and aspect It is clear.
Detailed description of the invention
Reader is after having read a specific embodiment of the invention referring to attached drawing, it will more clearly understands of the invention Various aspects.Wherein,
Fig. 1 shows an embodiment according to the present invention, test rating title modified flow diagram automatically;
Fig. 2 shows an embodiment according to the present invention, the flow diagrams of index cluster;
Fig. 3 shows an embodiment according to the present invention, the flow diagram of two classification in cluster;
Fig. 4 is the schematic diagram using piecemeal disclosed by the invention marking characterization method to reference value to similarity calculation.
Specific embodiment
In order to keep techniques disclosed in this application content more detailed with it is complete, can refer to attached drawing and of the invention following Various specific embodiments, identical label represents the same or similar component in attached drawing.However, those skilled in the art It should be appreciated that embodiment provided hereinafter is not intended to limit the invention covered range.In addition, attached drawing is used only for It is schematically illustrated, and is drawn not according to its full size.The embodiment of the present invention one discloses a kind of medical test inspection Index automatic correcting method is looked into, it is shown in Figure 1, this method comprises:
Step S1 carries out data prediction to the index set of input, obtains capital and small letter unification, the index abbreviation of index name Capital and small letter is unified, index set that index unit is unified Wherein,Indicate the index name of i-th of index,Indicate the title abbreviation of i-th of index,Indicate i-th of index Index unit,Indicate the index reference value of i-th of index;
Step S2, according to the literal feature that index name in the index set D and index are abridged, by based on density Different index names and index abbreviation are clustered obtain index gathering C={ C respectively by clustering algorithm1,C2,...,Cm, wherein Index cluster CiIn include that index name and index are abridged;
Step S3 is each index cluster C in the index gathering CiIt determines a standard index title, and utilizes two points Class algorithm finds out the synonymous index of cluster internal standard index name, by the synonymous index and the cluster internal standard index name into Row index maps to obtain index alignment as a result, for index cluster CiIn remaining index non-synonymous, sieved from remaining index list A new standard index title is selected, lookup and index mapping process that two sorting algorithms carry out synonymous index are continued with, It is iteratively repeated step S3, until being only to remain 1 index in synonymous index or cluster in all clusters;
Step S4 carries out artificial correction to index alignment result and mapping is handled, obtains by standardized index Title.
Particularly, post-processing amendment is carried out to index alignment result for convenience of medical professional, it is contemplated that standard index It should be most common index, the most index name of frequency of occurrence in entitled each the index cluster Ci of standard index.
Two sorting algorithms can rise decision tree, logistic regression, naive Bayesian, support vector machines or random for gradient Any one in forest algorithm.
Inventive embodiments two disclose a kind of specific medical test Index for examination automatic correcting method, compared to upper one Embodiment, the present embodiment technical method have done further instruction and optimization for step S2, as shown in Figure 2.Specifically:
Step S21 gives field parameter ε and minPts and index set D;
Step S22 calculates xiε-field N ε (xi), wherein N ε (xi) expression formula are as follows:
Nε(xi)={ xi∈D|dist(xi,xj)≤ε }, wherein dist (xi,xj) it is index xiAnd xjJoint distance, table Up to formula are as follows:
WhereinRefer to entitling ClaimWithCOS distance, expression formula are as follows:
It is index abbreviationWithEditor Distance, expression formula are as follows:
WhereinRefer to the string length of index abbreviation,Indicate byThrough insertion, replaces, deletes Except operation changes intoRequired minimal action number.
Step S23, will | N ε (xi) | all x of >=minPtsiIt is added in kernel object collection T;
Step S24 randomly selects a kernel object o from the kernel object collection T, will | N ε (o) | >=minPts's All indexs are added to index cluster CiIn, then calculate the index cluster CiIn each index ε-field N ε (xj), when | N ε (xj) | by N ε (x when >=minPtsj) all indexs be added to the index cluster CiIn, until being newly added to the index cluster Ci In each index ε-field | N ε (xj) | < minPts, finally by the index cluster CiAll kernel objects of middle appearance exist It is deleted in the kernel object collection T;
Step S25 repeats step S24 and obtains the index gathering C={ C1,C2,...,Cm}。
It should be strongly noted that step S26 can be increased after the present embodiment step S25, to C={ C1,C2,..., CmIn every cluster index carry out index unit verifying, by the index decomposition of different index units be different clusters.
Step S27 can also be increased after the step S25 of embodiment two, to each not in the finger that peels off of any cluster Mark, establishes an individual cluster respectively.
Wherein, performance is more preferable when the value that parameter ε in field is 0.35, minPts is 3.As shown in table 1, of the invention based on close It is poly- that the F1-score of the clustering algorithm of degree is apparently higher than k mean cluster, mean shift algorithm, gauss hybrid models and cohesion level Class, increase rate is 10% or more.
The performance comparison of the different clustering algorithms of table 1
Inventive embodiments three disclose a kind of specific medical test Index for examination automatic correcting method, compared to upper one Embodiment, the present embodiment technical method have done further instruction and optimization for step S3, as shown in Figure 3.Specifically:
It step 31, is each index cluster C in the index gathering CiDetermine a standard index title;
Step 32, data enhancing is carried out to standard index title according to existing knowledge base, obtains the same of standard index title Adopted index;
Step 33, in one index cluster index name and index abbreviation utilize longest common subsequence similarity Method obtains similarity score simlcs(xna,Sna), similarity score sim is obtained using Jaccard similarity based methodjac(xna, Sna), similarity score sim is obtained using cosine similarity methodcos(xna,Sna), it is obtained using Editing similarity method similar Spend score simmed(xna,Sna), the index reference value in one index cluster is divided using piecemeal marking characterization method Block score, by simlcs(xna,Sna)、simjac(xna,Sna)、simcos(xna,Sna)、simmed(xna,Sna) and the piecemeal score Judge whether as two sorting algorithm of characteristic use be standard index title synonymous index, wherein
Wherein,Subscript n is the total number of standard index and its same index, | xna| it is real The string length of index name is surveyed,Indicate the maximum common subsequence of two indices title,Indicate byIt is changed into through insertion, replacement, delete operationRequired minimal action number;
Step 34, synonymous index is mapped as standard index title, obtains index alignment result.
Since medical professional is difficult to enumerate all synonymous indexs without foundation, in addition some indexs might have and name Claim unrelated synonym (such as " b-type natriuretic peptide " and " brain natriuretic peptide "), therefore in terms of data set generation, except by medical speciality Personnel mark the synonymous index in part for utilizing SNOMED CT knowledge base, LOINC knowledge base, hundred except classifier training manually The approach such as degree encyclopaedia 3 carry out the synonym of draw standard index for training.Wherein, SNOMED CT knowledge base is library all over Britain, mesh Before have no Chinese version, it is therefore desirable to by Baidu translation, Tencent translation, Iciba translation etc. translation tools English index is turned over It is translated into Chinese index.Wherein, even if to the same index, translation tool is also possible to that different translation results can be obtained, therefore Translation itself is also one of the approach for obtaining synonym.Table 2 gives " b-type natriuretic peptide " and shows through the enhanced synonymous index of data Example.
The synonymous index example of table 2
Further, piecemeal score is obtained using piecemeal marking characterization method to the index reference value in one index cluster Process, comprising:
For each reference value of standard index sFound from cluster one withThe ginseng of most like actual measurement index Examine valueSo thatAnd the two indexs are formed into reference value pairWherein, the formula of reference value similarity is calculated are as follows:
According to reference value pairIndex set is constructed to pi=(Xi,Si), wherein XiIt is for all reference valuesActual measurement The set of index, SiIt is for all reference valuesStandard index and its synonymous index set;
The reference value of two reference values pair is calculated to similarity, formula are as follows:
Wherein, simp_cos(X1,X2) indicate to incite somebody to action Index set X1And X2The cosine similarity both being expressed as after 0-1 vector;
When two reference values pair similarity be greater than threshold values θ, i.e.,When, index set will be surveyed Close X1、X2With standard index set S1、S2It is included in the same piecemeal BiIn;
To any one block BiMarking, obtains piecemeal score scorei, formula are as follows:
WhereinFor specific gravity shared by block Plays index, S ' For the set of all standard indexs, α is weight parameter, block BiIn all indexs share the same scoreiScore;
When an index appears in M piecemeal simultaneously, then according to the weight beta of different massesiThe weighting for calculating index is flat Equal score
When an index is not in any one block, then piecemeal score scoreiIt is 0.
Shown in Fig. 4, standard reference valueFor section [0,100], most like actual measurement reference valueFor section [0, 100], therefore it corresponds to index set to for p1=(X1, S1)=({ A, B }, { a, b }).Similarly, standard reference value Corresponding index set is to p2=(X2, S2)=({ A, B, C }, { a, b }).
As a result,
Threshold values θ is 0.7, α 0.6, βiIt is as shown in table 3 for the experimental result of 1/M, wherein the title of feature field, abbreviation Title similarity feature, abbreviation similarity feature and reference value piecemeal marking feature are respectively indicated with reference value.
The performance comparison of 3 different classifications algorithm of table
From table 3 it is observed that special when using title similarity feature, abbreviation similarity feature and reference value piecemeal to give a mark Sign, when being aided with GBDT classifier, classifying quality is best, and F1 value is up to 85.26%.No matter which kind of classifier, be substantially with Increasing for number of features, classifying quality is become better and better, and when using whole three classes characteristic of division, classifying quality reaches best.
Above, a specific embodiment of the invention is described with reference to the accompanying drawings.But those skilled in the art It is understood that without departing from the spirit and scope of the present invention, can also make to a specific embodiment of the invention each Kind change and replacement.These changes and replacement are all fallen within the scope of the invention as defined in the claims.

Claims (10)

1. a kind of medical test Index for examination modified method automatically, which is characterized in that the described method comprises the following steps:
Step S1 carries out data prediction to the index set of input, and the capital and small letter for obtaining index name is unified, index abbreviation Index set D={ the x that capital and small letter is unified, index unit is unified1,x2,...,xn}, Wherein,Indicate the index name of i-th of index,Indicate the title abbreviation of i-th of index,Indicate i-th of index Index unit,Indicate the index reference value of i-th of index;
Step S2 passes through density clustering according to the literal feature that index name in the index set D and index are abridged Different index names and index abbreviation are clustered obtain index gathering C={ C respectively by algorithm1,C2,...,Cm, wherein index Cluster CiIn include that index name and index are abridged;
Step S3 is each index cluster C in the index gathering CiIt determines a standard index title, and is calculated using two classification Method finds out the synonymous index of cluster internal standard index name, and the synonymous index and the cluster internal standard index name are referred to Mark mapping obtains index alignment as a result, for index cluster CiIn remaining index non-synonymous, filtered out from remaining index list One new standard index title continues with lookup and index mapping process that two sorting algorithms carry out synonymous index, iteration Step S3 is repeated, until being only to remain 1 index in synonymous index or cluster in all clusters;
Step S4 carries out artificial correction to index alignment result and mapping is handled, obtains by standardized index name.
2. the method as described in claim 1, according to the literal feature that index name in the index set D and index are abridged, By density-based algorithms, different index names and index abbreviation are clustered obtain index gathering C={ C respectively1, C2,...,CmThe step of, specifically:
Step S21 gives field parameter ε and minPts and index set D;
Step S22 calculates xiε-field N ε (xi), wherein N ε (xi) expression formula are as follows:
Nε(xi)={ xi∈D|dist(xi,xj)≤ε }, wherein dist (xi,xj) it is index xiAnd xjJoint distance, expression formula Are as follows:
Wherein
It is index nameWithCOS distance, expression formula are as follows:
It is index abbreviationWithEditing distance, expression formula are as follows:
WhereinRefer to the string length of index abbreviation,Indicate byThrough insertion, replacement, delete operation It changes intoRequired minimal action number;
Step S23, will | N ε (xi) | all x of >=minPtsiIt is added in kernel object collection T;
Step S24 randomly selects a kernel object o from the kernel object collection T, will | N ε (o) | >=minPts's is all Index is added to index cluster CiIn, then calculate the index cluster CiIn each index ε-field N ε (xj), when | N ε (xj)|≥ By N ε (x when minPtsj) all indexs be added to the index cluster CiIn, until being newly added to the index cluster CiIn it is each ε-field of index | N ε (xj) | < minPts, finally by the index cluster CiAll kernel objects of middle appearance are in the core It is deleted in heart object set T;
Step S25 repeats step S24 and obtains the index gathering C={ C1,C2,...,Cm}。
3. method according to claim 2, which is characterized in that increase following steps after step S25:
Step S26, to C={ C1,C2,...,CmIn every cluster index carry out index unit verifying, by different index units Index decomposition is different cluster.
4. method according to claim 2, which is characterized in that increase following steps after step S25:
Step S27 establishes an individual cluster to each not in the index that peels off of any cluster respectively.
5. method according to claim 2, which is characterized in that the field parameter ε is 0.35, and the value of the minPts is 3.
6. the method as described in claim 1, which is characterized in that the process of step S3, comprising:
It step 31, is each index cluster C in the index gathering CiDetermine a standard index title;
Step 32, data enhancing is carried out to standard index title according to existing knowledge base, obtains the synonymous finger of standard index title Mark;
Step 33, in one index cluster index name and index abbreviation utilize longest common subsequence similarity based method Obtain similarity score simlcs(xna,Sna), similarity score sim is obtained using Jaccard similarity based methodjac(xna,Sna), Similarity score sim is obtained using cosine similarity methodcos(xna,Sna), similarity, which is obtained, using Editing similarity method obtains Divide simmed(xna,Sna), piecemeal is obtained using piecemeal marking characterization method to the index reference value in one index cluster and is obtained Divide score ', by simlcs(xna,Sna)、simjac(xna,Sna)、simcos(xna,Sna)、simmed(xna,Sna) and the piecemeal obtain Be allocated as being characterized using two sorting algorithms judge whether be standard index title synonymous index, wherein
Wherein,Subscript n is the total number of standard index and its same index, | xna| refer to for actual measurement The string length that entitling claims,Indicate the maximum common subsequence of two indices title,Table Show byIt is changed into through insertion, replacement, delete operationRequired minimal action number;
Step 34, synonymous index is mapped as standard index title, obtains index alignment result.
7. method as claimed in claim 6, which is characterized in that obtain the mistake of piecemeal score using piecemeal marking characterization method Journey, comprising:
For each reference value of standard index sFound from cluster one withThe reference value of most like actual measurement indexSo thatAnd the two indexs are formed into reference value pairWherein, the formula of reference value similarity is calculated are as follows:
According to reference value pairIndex set is constructed to pi=(Xi,Si), wherein XiIt is for all reference valuesActual measurement index Set, SiIt is for all reference valuesStandard index and its synonymous index set;
The reference value of two reference values pair is calculated to similarity, formula are as follows:
Wherein, simp_cos(X1,X2) indicate index Set X1And X2The cosine similarity both being expressed as after 0-1 vector;
When two reference values pair similarity be greater than threshold values θ, i.e.,When, index set X will be surveyed1、 X2With standard index set S1、S2It is included in the same piecemeal BiIn;
To any one block BiMarking, obtains piecemeal score scorei, formula are as follows:
WhereinFor specific gravity shared by block Plays index, S ' is institute There is the set of standard index, α is weight parameter, block BiIn all indexs share the same scoreiScore;
When an index appears in M piecemeal simultaneously, then according to the weight beta of different massesiCalculate the weighted average score of index
When an index is not in any one block, then piecemeal score score ' is 0.
8. the method for claim 7, which is characterized in that the threshold values θ is 0.7, and the α is 0.6, the βiFor 1/M.
9. the method as described in claim 1, which is characterized in that entitled each the index cluster C of the standard indexiInterior appearance The most index name of the frequency.
10. the method as described in claim 1, which is characterized in that two sorting algorithm is that gradient rises decision tree, logic is returned Return, any one in naive Bayesian, support vector machines or random forests algorithm.
CN201811204706.4A 2018-10-16 2018-10-16 A kind of medical test Index for examination modified method automatically Withdrawn CN109509517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811204706.4A CN109509517A (en) 2018-10-16 2018-10-16 A kind of medical test Index for examination modified method automatically

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811204706.4A CN109509517A (en) 2018-10-16 2018-10-16 A kind of medical test Index for examination modified method automatically

Publications (1)

Publication Number Publication Date
CN109509517A true CN109509517A (en) 2019-03-22

Family

ID=65746696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811204706.4A Withdrawn CN109509517A (en) 2018-10-16 2018-10-16 A kind of medical test Index for examination modified method automatically

Country Status (1)

Country Link
CN (1) CN109509517A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104400A (en) * 2019-12-24 2020-05-05 天津新开心生活科技有限公司 Data normalization method and device, electronic equipment and storage medium
CN111860359A (en) * 2020-07-23 2020-10-30 江苏食品药品职业技术学院 Point cloud classification method based on improved random forest algorithm
CN112768058A (en) * 2021-01-22 2021-05-07 武汉大学 Method and device for processing medical data of metering information type

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
CN107093005A (en) * 2017-03-24 2017-08-25 北明软件有限公司 The method that tax handling service hall's automatic classification is realized based on big data mining algorithm
CN107818124A (en) * 2017-03-03 2018-03-20 平安医疗健康管理股份有限公司 Data matching method and device
US20180089300A1 (en) * 2016-09-23 2018-03-29 International Business Machines Corporation Merging synonymous entities from multiple structured sources into a dataset
CN108491406A (en) * 2018-01-23 2018-09-04 深圳市阿西莫夫科技有限公司 Information classification approach, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180089300A1 (en) * 2016-09-23 2018-03-29 International Business Machines Corporation Merging synonymous entities from multiple structured sources into a dataset
CN107818124A (en) * 2017-03-03 2018-03-20 平安医疗健康管理股份有限公司 Data matching method and device
CN106933806A (en) * 2017-03-15 2017-07-07 北京大数医达科技有限公司 The determination method and apparatus of medical synonym
CN107093005A (en) * 2017-03-24 2017-08-25 北明软件有限公司 The method that tax handling service hall's automatic classification is realized based on big data mining algorithm
CN108491406A (en) * 2018-01-23 2018-09-04 深圳市阿西莫夫科技有限公司 Information classification approach, device, computer equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YAN ZHUANG ET AL.: ""Hike:A hybrid human-machine method for entity Alignment in Large-Scale Knowledge Bases"", 《PROCEEDINGS OF THE 2017 ACM ON CONF ON INFORMATION AND KNOWLEDGE MANAGEMENT》 *
周保兴著: "《三维激光扫描技术及其在变形监测中的应用》", 31 January 2018 *
朱灿等: "实体解析技术综述与展望", 《计算机科学》 *
栗伟等: "一种面向医学短文本的自适应聚类方法", 《东北大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104400A (en) * 2019-12-24 2020-05-05 天津新开心生活科技有限公司 Data normalization method and device, electronic equipment and storage medium
CN111860359A (en) * 2020-07-23 2020-10-30 江苏食品药品职业技术学院 Point cloud classification method based on improved random forest algorithm
CN112768058A (en) * 2021-01-22 2021-05-07 武汉大学 Method and device for processing medical data of metering information type

Similar Documents

Publication Publication Date Title
Christe et al. Computer-aided diagnosis of pulmonary fibrosis using deep learning and CT images
Akter et al. Prediction of cervical cancer from behavior risk using machine learning techniques
CN110059697B (en) Automatic lung nodule segmentation method based on deep learning
Sharma et al. Determining similarity in histological images using graph-theoretic description and matching methods for content-based image retrieval in medical diagnostics
CN112633601B (en) Method, device, equipment and computer medium for predicting disease event occurrence probability
JP2014029644A (en) Similar case retrieval device and similar case retrieval method
CN109509517A (en) A kind of medical test Index for examination modified method automatically
CN111191456B (en) Method for identifying text segments by using sequence labels
CN106845058A (en) The standardized method of disease data and modular station
de Sousa Costa et al. Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance
Wang et al. A ResNet‐based approach for accurate radiographic diagnosis of knee osteoarthritis
US20230021868A1 (en) Data-sharding for efficient record search
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN107169264B (en) complex disease diagnosis system
CN111028940B (en) Multi-scale lung nodule detection method, device, equipment and medium
CN111524600A (en) Liver cancer postoperative recurrence risk prediction system based on neighbor2vec
CN111581969A (en) Medical term vector representation method, device, storage medium and electronic equipment
CN118312816A (en) Cluster weighted clustering integrated medical data processing method and system based on member selection
Lonij et al. Open-world visual recognition using knowledge graphs
CN109885712A (en) Lung neoplasm image search method and system based on content
CN109783483A (en) A kind of method, apparatus of data preparation, computer storage medium and terminal
CN111640517B (en) Medical record coding method and device, storage medium and electronic equipment
CN116737945B (en) Mapping method for EMR knowledge map of patient
CN116110594B (en) Knowledge evaluation method and system of medical knowledge graph based on associated literature
Wang et al. Improved V-Net lung nodule segmentation method based on selective kernel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190322