CN111709492A - Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium - Google Patents

Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium Download PDF

Info

Publication number
CN111709492A
CN111709492A CN202010632086.5A CN202010632086A CN111709492A CN 111709492 A CN111709492 A CN 111709492A CN 202010632086 A CN202010632086 A CN 202010632086A CN 111709492 A CN111709492 A CN 111709492A
Authority
CN
China
Prior art keywords
dimensional
medical record
space
data
projection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010632086.5A
Other languages
Chinese (zh)
Inventor
李雪
王新琪
来关军
于丹
孙箫宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Neusoft Education Technology Group Co ltd
Original Assignee
Dalian Neusoft Education Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Neusoft Education Technology Group Co ltd filed Critical Dalian Neusoft Education Technology Group Co ltd
Priority to CN202010632086.5A priority Critical patent/CN111709492A/en
Publication of CN111709492A publication Critical patent/CN111709492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2132Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on discrimination criteria, e.g. discriminant analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Epidemiology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The invention provides a dimension reduction visualization method and device for a high-dimensional electronic medical record table and a storage medium. The method comprises the following steps: performing data preprocessing on the medical record table to obtain an input data matrix; extracting discriminant features of the patient record data to obtain a discriminant feature data set; converting the high-dimensional feature data into a two-dimensional visual space to obtain projection coordinates of the high-dimensional feature data and the two-dimensional visual space in the two-dimensional space; and displaying the visualization result of the electronic medical record, and drawing a scatter diagram of the medical record data in the two-dimensional space for displaying. The invention effectively improves the classification in the visual space by using the characteristics with discrimination capability which are extracted by the neural network model and represent the rule per se and the dimension reduction method through linear discrimination analysis, and improves the generalization of the projection of the case of the patient to be detected.

Description

Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium
Technical Field
The invention relates to the technical field of intelligent disease diagnosis, in particular to a dimension reduction visualization method and device for a high-dimensional electronic medical record table and a storage medium.
Background
With the rise of artificial intelligence, deep learning is more and more widely applied in the biomedical field.
In the biomedical field, more and more monitoring data can be obtained from different types of sensing equipment, and a more complete data basis is provided for medical workers to know the state of a patient, distinguish diseases and develop a treatment plan. In the field of artificial intelligence, one method commonly used for modeling high-dimensional big data is a deep learning method. The electronic medical record of the patient is subjected to multiple mappings of neurons in a hidden layer in the middle of the neural network to extract characteristic information (embedding) of high-dimensional data, and finally, the differentiation of different types of patients is realized. However, the neural network model is equivalent to a black box, and the mapping process is difficult to analyze. And because the features are sparsely distributed in a high-dimensional space, the similarity and density distribution among different types of samples cannot be directly realized.
In the existing high-dimensional medical data visualization work, the method is mainly divided into a linear dimension reduction method and a nonlinear dimension reduction method (manifold learning method).
The common nonlinear dimension reduction methods include isometry mapping (Isomap), Laplace mapping (LE), t-distribution neighborhood embedding algorithm (t-SNE), and the like. The work is generally used for visualizing medical image data such as CT, MRI and the like, images of patients with different types of diseases and high-dimensional vectors in the middle layer of a convolution network are mapped to a low-dimensional visual space through nonlinear transformation, and the local neighbor relation of original data is kept as much as possible in the mapping process. However, for some numerical medical record lists, the visualization results obtained by the method are difficult to obviously distinguish different types of patients and difficult to meet the requirements of assisting doctors in diagnosis. And iterative optimization is needed when the nonlinear dimension reduction algorithms such as t-SNE and the like are solved, the operation time is long, and real-time visual display is difficult.
Disclosure of Invention
According to the problem that the visualization effect of the nonlinear dimension reduction on some numerical medical record data is poor, and the obtained visual space is difficult to obviously distinguish different types of patients, the dimension reduction visualization method of the high-dimensional electronic medical record table is provided. The invention mainly performs space transformation of high-dimensional electronic recording characteristics on data characteristics of a middle layer of a neural network model based on an LDA dimension reduction algorithm, displays the distribution condition of electronic medical records in a two-dimensional scatter diagram mode, enables the records with similar characteristics in a projection space to be close to each other, and distinguishes the records of different disease types from each other.
The technical means adopted by the invention are as follows:
a dimension reduction visualization method of a high-dimensional electronic medical record list comprises the following steps:
acquiring a medical record table, and performing data preprocessing on the medical record table to obtain a high-dimensional record data matrix;
carrying out multilayer nonlinear mapping on a high-dimensional recorded data matrix through a neural network classification model, and acquiring a distinguishing characteristic data set from the output of a hidden layer of the neural network classification model;
projecting the distinguishing feature data in the distinguishing feature data set to a two-dimensional visual space by adopting a linear distinguishing analysis method to obtain a projection coordinate of the distinguishing feature data in the two-dimensional space;
drawing a two-dimensional scatter diagram according to the projection coordinates of the distinguishing feature data in the two-dimensional space;
and calculating the projection coordinates of the record of the patient to be diagnosed, and drawing the projection points of the record of the patient to be diagnosed on the two-dimensional scatter diagram.
Further, the data preprocessing of the medical record table comprises:
performing Z-score standardization on the continuous numerical characteristics in the medical record table;
carrying out one-hot coding on discrete numerical characteristics in the medical record table;
the text type features in the medical record list are converted into numerical discrete features and then are subjected to one-hot coding.
Further, the discriminant feature is an output of a last hidden layer of the neural network classification model.
Further, the converting the high-dimensional feature data into a two-dimensional visual space includes:
calculating an intra-class divergence matrix and an inter-class divergence matrix of the case characteristic data set;
according to the intra-class divergence matrix and the inter-class divergence matrix, optimally solving a transformation space basis vector;
performing coordinate transformation on the features in the case feature data set by using the space basis vectors, solving projection space coordinates, and performing coordinate normalization;
and saving the recorded projection coordinates and the recorded label.
A dimension reduction visualization apparatus for a high-dimensional electronic medical record form, comprising:
the preprocessing unit is used for acquiring a medical record table and preprocessing data of the medical record table to obtain a high-dimensional record data matrix;
the extraction unit is used for carrying out multilayer nonlinear mapping on the high-dimensional recorded data matrix through a neural network classification model and acquiring a distinguishing characteristic data set from the output of a hidden layer of the neural network classification model;
the conversion unit is used for projecting the distinguishing characteristic data in the distinguishing characteristic data set to a two-dimensional visual space by adopting a linear distinguishing analysis method to obtain the projection coordinates of the distinguishing characteristic data in the two-dimensional space; (ii) a
And the display unit is used for drawing a two-dimensional scatter diagram according to the projection coordinates of the distinguishing feature data in the two-dimensional space.
Further, the preprocessing unit performs data preprocessing on the medical record table, including:
performing Z-score standardization on the continuous numerical characteristics in the medical record table;
carrying out one-hot coding on discrete numerical characteristics in the medical record table;
the text type features in the medical record list are converted into numerical discrete features and then are subjected to one-hot coding.
Further, the extraction unit extracts the output of the last hidden layer of the neural network classification model as discriminant features.
Further, the conversion unit converts the high-dimensional feature data into a two-dimensional visual space, including:
calculating an intra-class divergence matrix and an inter-class divergence matrix of the case characteristic data set;
according to the intra-class divergence matrix and the inter-class divergence matrix, optimally solving a transformation space basis vector;
performing coordinate transformation on the features in the case feature data set by using the space basis vectors, solving projection space coordinates, and performing coordinate normalization;
and saving the recorded projection coordinates and the recorded label.
A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by the processor, implement a method for reduced-dimension visualization of a high-dimension electronic medical record form as described above.
Compared with the prior art, the invention has the following advantages:
1. the invention improves the projection accuracy and generalization of high-dimensional medical records. Compared with the method of directly carrying out dimensionality reduction projection on original data or carrying out projection after dimensionality reduction by PCA, the method utilizes the characteristics which are extracted by a neural network model and represent the rule per se, and then projects the data to a two-dimensional visual space by combining an LDA dimensionality reduction algorithm, so that the distinguishing capability of different types of data in the visual space by the existing method is effectively improved, and the generalization of case projection of a patient to be detected is improved.
2. The technology of the invention is convenient for non-professional technicians to know more abundant case information. The invention projects the characteristic vector of the neural network hidden layer to a two-dimensional space which can be intuitively understood by human eyes, so that a high-dimensional data sample falls on a two-dimensional plane, and each point on the plane corresponds to a patient sample in a medical record table. The user can observe which region of different types of diseases the point falls on, and the point is probably closer to which type of diseases, so that the point can be used as another basis for diagnosis and judgment of diseases.
3. The method has high calculation speed and is convenient for embedding a medical diagnosis system. The invention uses the linear dimensionality reduction method when carrying out data space conversion, compared with the nonlinear dimensionality reduction method, the invention does not need iteration when solving the projection space base vector, and the calculated amount is small.
In conclusion, the technical scheme of the invention overcomes the defects in the prior art, and provides the electronic case visualization method which can provide visual explanation for the deep learning model classification result and assist non-technical personnel such as medical workers to know richer patient case information.
For the above reasons, the present invention can be widely applied to the fields of biological medicine and the like.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for dimension reduction visualization of a high-dimensional electronic medical record table in an embodiment.
FIG. 2 is a diagram illustrating data flow in an embodiment.
FIG. 3 shows the results of a visual analysis using the method of the present invention in an example.
FIG. 4 is a visualization analysis result of dimension reduction using LDA algorithm in the embodiment.
FIG. 5 is a result of a visual analysis using t-SNE nonlinear dimension reduction in the embodiment.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Currently, in the field of artificial intelligence disease diagnosis, a visualization method of an electronic case is needed to provide an intuitive explanation for deep learning model classification results and assist non-technical personnel such as medical workers to know more abundant patient case information, such as the similarity between a current patient and a previous case, the possibility that the patient belongs to different types of diseases, and the like. The medical staff can intuitively sense the overall distribution condition of the electronic medical record of the patient, know the similarity degree of the case of the patient to be diagnosed and the previous case, and provide a basis for making a diagnosis and treatment scheme.
In view of the above, the invention provides a dimension reduction visualization method for a high-dimensional electronic medical record form, which performs spatial transformation of high-dimensional electronic record features on data features of a middle layer of a neural network model based on an LDA dimension reduction algorithm, displays the distribution condition of electronic medical records in a two-dimensional scatter diagram form, enables records with similar features in a projection space to be close to each other, and distinguishes records of different disease types from each other. The method comprises the steps of firstly, carrying out data preprocessing on an electronic medical record table, extracting discriminant features of patient record data through a neural network classification model, then converting high-dimensional feature data into a two-dimensional visual space through a space conversion method, and displaying a visual result of the patient electronic medical record through drawing a scatter diagram of the medical record data in the two-dimensional space. The method specifically comprises the following steps: performing data preprocessing on the medical record table to obtain an input data matrix; extracting discriminant features of patient record data, and performing multilayer nonlinear mapping on an input data matrix through a neural network classification model to obtain a discriminant feature data set; converting the high-dimensional feature data into a two-dimensional visual space, and performing feature space transformation on the distinguishing feature data set and the recorded category labels to obtain projection coordinates of the distinguishing feature data set in the two-dimensional space; and displaying the visualization result of the electronic medical record, and drawing a scatter diagram of the medical record data in the two-dimensional space for displaying.
The embodiments of the present invention will be described in detail with reference to the accompanying drawings.
A dimension reduction visualization method for a high-dimensional electronic medical record table, as shown in fig. 1-2, comprising the steps of:
and S1, preprocessing data. And taking the original electronic medical record table as input data of the step, and outputting a high-dimensional record data matrix after data processing in the step, wherein the high-dimensional record data matrix is marked as X.
Specifically, Z-score normalization processing may be performed on the continuous numerical features in the record table, and one-hot (one-hot) encoding may be performed on the discrete numerical features. For text type features, the text type features can be converted into numerical discrete features, and then one-hot encoding is carried out.
And S2, acquiring the discriminant characteristics. The input of the step is a preprocessed high-dimensional recording data matrix X and a neural network classification model, and the output is an extracted discriminant feature data set F. First, a trained neural network classification model is loaded. The neural network model carries out multi-layer nonlinear mapping on a high-dimensional recorded data matrix X in a matrix input mode. Features representing the intrinsic regularity of the data and discriminating the disease discrimination are obtained from the output of the hidden layer.
Specifically, the output of the input high-dimensional records in the last hidden layer of the neural network is taken as the feature vector of the neural network, and a matrix formed by the feature vectors corresponding to all the input records is taken as a feature data set F. Assuming that the number of the pre-trained neural network discriminant model layers is L (including an input layer and an output layer), the mapping function of the front L-1 layers of the model is phi (). Assuming that the obtained discriminating characteristic is M-dimensional, there is Fφ:
Figure BDA0002569364240000061
The neural network extraction feature calculation formula is as follows:
F=φ(X)
and S3, a feature space transformation step. The input of the step is a distinguishing characteristic data set F extracted by a neural network and a recorded category label, and the output is a projection coordinate in a two-dimensional space. The feature space transformation solves a space transformation matrix through a dimensionality reduction algorithm in machine learning, and projects data from high-dimensional features to a low-dimensional visual space (S is less than or equal to 3).
For the numerical medical record, the linear dimensionality reduction Method mainly includes a Principal Component Analysis (PCA) algorithm, an Independent Component Analysis (ICA) algorithm, a Partial Least Squares (Partial Least Squares) algorithm, and the like. Certain sample discriminant information is lost in the mapping process of the low-dimensional subspace obtained by dimension reduction, the reserved number of the main components needs to be set through multiple tests, and the parameter selection has a large influence on the result.
The linear discriminant analysis algorithm (LDA) enables the intra-class spacing and the inter-class spacing of the samples to be small and large as much as possible in the dimension reduction process, so that the samples after dimension reduction retain discriminant information. However, the LDA algorithm has certain limitations: under the condition of small sample size and high dimensionality, a sick matrix problem occurs in the singular value decomposition process, and an accurate space basis vector is difficult to obtain. According to the invention, the data is subjected to nonlinear mapping in the neural network, low-dimensional features reflecting the data rule are automatically extracted, and the problem of the ill-conditioned matrix can be improved by using the LDA algorithm for the feature vectors, so that more accurate space basis vectors are obtained.
Specifically, in the present embodiment, the dimension reduction algorithm may adopt a Linear Discriminant Analysis (LDA) method. Therefore, the dimension S of the LDA dimension-reducing target needs to be smaller than the number of record categories.
For example, now, assuming that the number of categories of the electronic medical record data is 3, S may be set to 2, i.e., the feature vector f is projected to a two-dimensional space. The space basis vector is then represented as W ═ W1,w2]The projection coordinate of the visual space is (y)1,y2) Then there is
Figure BDA0002569364240000071
The LDA-based spatial transformation method specifically comprises the following steps:
s301, inputting the electronic record table into a distinguishing characteristic unit, and recording the obtained case characteristic as F.
S302, calculating an intra-class divergence matrix S of the case characteristic data set FWAnd between-class divergence matrix SBIn which S isWAnd SBIs defined as follows:
SWiis a sample point of class i relative to the class center point μiDegree of hashing.
SBiIs the covariance matrix of the centroid of class i relative to the centroid of the sample μ, i.e., the degree of hashing of class i relative to μ.
SW=SW1+SW2+SW3
Figure BDA0002569364240000081
SB=SB1+SB2+SB3
=(μ1-μ)(μ1-μ)T+(μ2-μ)(μ2-μ)T+(μ3-μ)(μ3-μ)T
In the formula:
Figure BDA0002569364240000082
niindicates the number of samples of the ith class, and ci indicates the ith class.
And S303, optimally solving the transformation space basis vector W. The goal of the LDA coordinate transformation algorithm is to make different classes of samples farther apart and intra-class samples closer together after projection. The transformation matrix W that maximizes the optimization objective function j (W) is solved.
Figure BDA0002569364240000083
According to
Figure BDA0002569364240000084
Solving by methods such as singular value decomposition
Figure BDA0002569364240000085
Is (W) is the feature vector of1,w2)。
And S304, solving the projection space coordinate, and carrying out coordinate normalization. For the feature f, the space base vector obtained in the previous step is used for coordinate transformation to obtain projection coordinates
Figure BDA0002569364240000086
And (5) normalizing the coordinates, and scaling the two-dimensional coordinates to a [0,1] interval.
Figure BDA0002569364240000087
Wherein y' is the normalized coordinate, y is the normalized pre-coordinate, yminIs the minimum value of the coordinate, ymaxIs the maximum value of the coordinates.
And saving the recorded projection coordinates and the recorded label.
And S4, a scatter diagram drawing step. In this step, a two-dimensional scattergram is drawn based on the projection coordinates (y1, y2) of the feature data f in the two-dimensional space obtained by the data space conversion unit, and the specific steps are as follows:
s401, obtaining the existing patient recordProjection coordinates of a recorded data set
Figure BDA0002569364240000088
i denotes the ith record and n denotes the total number of records currently collected for the patient.
S402, drawing a two-dimensional scatter diagram, and coloring the projection points with different colors according to the type labels of the cases. Thus, different types of case projection blocks can be obtained.
S403, drawing a projection point of the case record of the new patient on the scatter diagram.
The effects of the present invention will be further explained by the following specific application examples, in conjunction with the accompanying drawings.
The application example judges the disease types (disease A, disease B and disease C) of the patient according to the electronic medical record table of the patient, and divides a data set into a training set and a testing set in order to test the generalization and the stability of the visualization method. The space transformation unit only uses the training set to solve the transformation space basis vector W, and the case of the training set and the test set are projected by using the basis vector. The projection rendering results are shown in fig. 3. In addition, in order to compare the visualization effect of the invention with that of the existing invention, the dimension reduction technology of the LDA algorithm and the t-SNE nonlinear dimension reduction technology are respectively adopted to visualize the same electronic medical record data, and the results are respectively shown in fig. 4 and fig. 5.
The circles in the figure represent training set case samples, the pentagons represent test set case samples, and different types of case projections are colored with different colors.
Comparing the result of the method of the invention with the visual result chart of the existing common method, it can be seen that:
1) compared with the t-SNE method, the method can distinguish the test cases of different disease types in the projection space, and the visualization method has better discrimination capability on the electronic medical records of different disease types. In addition, the method of the invention enables the projections of the patients with similar data characteristics to be close to each other, thereby being more convenient for doctors to observe the similarity degree of the new case and the past case.
2) The projection point of the test set is consistent with the projection area of the training set, and the method is more concentrated than the projection distribution only using the LDA algorithm. The method of the invention has good generalization.
Corresponding to the method for reducing the dimension visualization of the high-dimensional electronic medical record table in the application, the application also provides a device for reducing the dimension visualization of the high-dimensional electronic medical record table, which comprises a preprocessing unit 10, an extraction unit 11, a conversion unit 12 and a display unit 13. Wherein:
the preprocessing unit 10 is configured to obtain a medical record table, and perform data preprocessing on the medical record table to obtain a high-dimensional record data matrix X;
the extraction unit 11 is used for carrying out multilayer nonlinear mapping on the high-dimensional recorded data matrix X through a neural network classification model and acquiring a distinguishing characteristic data set F from the output of a hidden layer of the neural network classification model;
a conversion unit 12, which projects the distinguishing feature data in the distinguishing feature data set X output by the extraction unit 11 to a two-dimensional visual space by using a linear distinguishing analysis method to obtain the projection coordinates of the distinguishing feature data in the two-dimensional space;
and the display unit 13 is used for drawing a two-dimensional scatter diagram according to the projection coordinates of the distinguishing feature data in the two-dimensional space, and is used for displaying the visualization result of the electronic medical record.
For the embodiments of the present invention, the description is simple because it corresponds to the above embodiments, and for the related similarities, please refer to the description in the above embodiments, and the detailed description is omitted here.
The embodiment of the application also discloses a computer readable storage medium, wherein a computer instruction set is stored in the computer readable storage medium; the set of computer instructions, when executed by the processor, implement the method for dimension reduction visualization of high-dimensional electronic medical record form described above.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A dimension reduction visualization method for a high-dimensional electronic medical record list is characterized by comprising the following steps:
acquiring a medical record table, and performing data preprocessing on the medical record table to obtain a high-dimensional record data matrix;
carrying out multilayer nonlinear mapping on a high-dimensional recorded data matrix through a neural network classification model, and acquiring a distinguishing characteristic data set from the output of a hidden layer of the neural network classification model;
projecting the distinguishing feature data in the distinguishing feature data set to a two-dimensional visual space by adopting a linear distinguishing analysis method to obtain a projection coordinate of the distinguishing feature data in the two-dimensional space;
drawing a two-dimensional scatter diagram according to the projection coordinates of the distinguishing feature data in the two-dimensional space;
and calculating the projection coordinates of the record of the patient to be diagnosed, and drawing the projection points of the record of the patient to be diagnosed on the two-dimensional scatter diagram.
2. The method for dimension reduction visualization of high-dimensional electronic medical record list according to claim 1, wherein the data preprocessing of the medical record list comprises:
performing Z-score standardization on the continuous numerical characteristics in the medical record table;
carrying out one-hot coding on discrete numerical characteristics in the medical record table;
the text type features in the medical record list are converted into numerical discrete features and then are subjected to one-hot coding.
3. The method of claim 1, wherein the discriminative feature is an output of a last hidden layer of the neural network classification model.
4. The method for dimension reduction visualization of high-dimensional electronic medical record list according to claim 1, wherein the converting the high-dimensional feature data into two-dimensional visual space comprises:
calculating an intra-class divergence matrix and an inter-class divergence matrix of the case characteristic data set;
according to the intra-class divergence matrix and the inter-class divergence matrix, optimally solving a transformation space basis vector;
performing coordinate transformation on the features in the case feature data set by using the space basis vectors, solving projection space coordinates, and performing coordinate normalization;
and saving the recorded projection coordinates and the recorded label.
5. A dimension reduction visualization device for a high-dimensional electronic medical record list is characterized by comprising:
the preprocessing unit is used for acquiring a medical record table and preprocessing data of the medical record table to obtain a high-dimensional record data matrix;
the extraction unit is used for carrying out multilayer nonlinear mapping on the high-dimensional recorded data matrix through a neural network classification model and acquiring a distinguishing characteristic data set from the output of a hidden layer of the neural network classification model;
the conversion unit is used for projecting the distinguishing characteristic data in the distinguishing characteristic data set to a two-dimensional visual space by adopting a linear distinguishing analysis method to obtain the projection coordinates of the distinguishing characteristic data in the two-dimensional space; (ii) a
And the display unit is used for drawing a two-dimensional scatter diagram according to the projection coordinates of the distinguishing feature data in the two-dimensional space.
6. The apparatus for performing dimension reduction visualization on the high-dimensional electronic medical record list according to claim 5, wherein the preprocessing unit performs data preprocessing on the medical record list, and comprises:
performing Z-score standardization on the continuous numerical characteristics in the medical record table;
carrying out one-hot coding on discrete numerical characteristics in the medical record table;
the text type features in the medical record list are converted into numerical discrete features and then are subjected to one-hot coding.
7. The apparatus according to claim 5, wherein the extraction unit extracts an output of a last hidden layer of the neural network classification model as a discriminant feature.
8. The apparatus for reducing the dimension of the high-dimensional electronic medical record list according to claim 5, wherein the converting unit converts the high-dimensional feature data into a two-dimensional visual space, comprising:
calculating an intra-class divergence matrix and an inter-class divergence matrix of the case characteristic data set;
according to the intra-class divergence matrix and the inter-class divergence matrix, optimally solving a transformation space basis vector;
performing coordinate transformation on the features in the case feature data set by using the space basis vectors, solving projection space coordinates, and performing coordinate normalization;
and saving the recorded projection coordinates and the recorded label.
9. A computer-readable storage medium having a set of computer instructions stored therein; the set of computer instructions, when executed by a processor, implement a method for dimension reduction visualization of a high-dimensional electronic medical record form as recited in any of claims 1-4.
CN202010632086.5A 2020-07-03 2020-07-03 Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium Pending CN111709492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010632086.5A CN111709492A (en) 2020-07-03 2020-07-03 Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010632086.5A CN111709492A (en) 2020-07-03 2020-07-03 Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium

Publications (1)

Publication Number Publication Date
CN111709492A true CN111709492A (en) 2020-09-25

Family

ID=72546463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010632086.5A Pending CN111709492A (en) 2020-07-03 2020-07-03 Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium

Country Status (1)

Country Link
CN (1) CN111709492A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298289A (en) * 2021-04-14 2021-08-24 北京市燃气集团有限责任公司 Method and device for predicting gas consumption of gas user
CN116542956A (en) * 2023-05-25 2023-08-04 广州机智云物联网科技有限公司 Automatic detection method and system for fabric components and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN109829587A (en) * 2019-02-12 2019-05-31 国网山东省电力公司电力科学研究院 Zonule grade ultra-short term and method for visualizing based on depth LSTM network
CN110955809A (en) * 2019-11-27 2020-04-03 南京大学 High-dimensional data visualization method supporting topology structure maintenance

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427966A (en) * 2018-03-12 2018-08-21 成都信息工程大学 A kind of magic magiscan and method based on PCA-LDA
CN109829587A (en) * 2019-02-12 2019-05-31 国网山东省电力公司电力科学研究院 Zonule grade ultra-short term and method for visualizing based on depth LSTM network
CN110955809A (en) * 2019-11-27 2020-04-03 南京大学 High-dimensional data visualization method supporting topology structure maintenance

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298289A (en) * 2021-04-14 2021-08-24 北京市燃气集团有限责任公司 Method and device for predicting gas consumption of gas user
CN116542956A (en) * 2023-05-25 2023-08-04 广州机智云物联网科技有限公司 Automatic detection method and system for fabric components and readable storage medium
CN116542956B (en) * 2023-05-25 2023-11-17 广州机智云物联网科技有限公司 Automatic detection method and system for fabric components and readable storage medium

Similar Documents

Publication Publication Date Title
Lanitis et al. Automatic face identification system using flexible appearance models
CN110464366A (en) A kind of Emotion identification method, system and storage medium
Materka Texture analysis methodologies for magnetic resonance imaging
Tang et al. Research on medical image classification based on machine learning
CN108403105B (en) Display method and display device for electrocardio scatter points
Wang et al. Penalized fisher discriminant analysis and its application to image-based morphometry
CN104361318A (en) Disease diagnosis auxiliary system and disease diagnosis auxiliary method both based on diffusion tensor imaging technology
KR102162683B1 (en) Reading aid using atypical skin disease image data
CN111709492A (en) Dimension reduction visualization method and device for high-dimensional electronic medical record list and storage medium
Huang et al. A multiview feature fusion model for heartbeat classification
Messadi et al. Extraction of specific parameters for skin tumour classification
Mishne et al. Automated cellular structure extraction in biological images with applications to calcium imaging data
WO2023097780A1 (en) Classification method and device for classifying patient‑ventilator asynchrony phenomenon in mechanical ventilation process
CN114242243A (en) User health assessment method, device, equipment and storage medium
Breve et al. Visual ECG Analysis in Real-world Scenarios.
Hortinela IV et al. Development of abnormal red blood cells classifier using image processing techniques with support vector machine
CN111275754B (en) Face acne mark proportion calculation method based on deep learning
CN114999638B (en) Big data visualization processing method and system for medical diagnosis based on artificial intelligence
Jin et al. Simulated multimodal deep facial diagnosis
CN112561935B (en) Intelligent classification method, device and equipment for brain images
Laskaris et al. Fuzzy description of skin lesions
Nirmala An automated detection of notable ABCD diagnostics of melanoma in dermoscopic images
Narlagiri et al. Biometric authentication system based on face recognition
Piątkowska et al. Spontaneous facial expression recognition: automatic aggression detection
Jiji et al. CBIR-based diagnosis of dermatology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant after: Neusoft Education Technology Group Co.,Ltd.

Address before: 116000 room 206, no.8-9, software garden road, Ganjingzi District, Dalian City, Liaoning Province

Applicant before: Dalian Neusoft Education Technology Group Co.,Ltd.